[mpich-discuss] Hanging behavior with derived types in a 'user-defined gatherv'

Balaji, Pavan balaji at anl.gov
Tue Apr 25 05:05:07 CDT 2017


I have been able to reproduce it with a much simpler program, that I have added to the below ticket.  It does seem like a bug in mpich, unfortunately, though I'm surprised that we didn't encounter this earlier.  Perhaps this shows a hole in our test coverage.

  -- Pavan

> On Apr 25, 2017, at 3:57 AM, Sewall, Jason <jason.sewall at intel.com> wrote:
> 
>> From: Latham, Robert J. [mailto:robl at mcs.anl.gov]
>> Sent: Monday, April 24, 2017 3:35 PM
> 
>> I can confirm that with NOLOCAL, some processes are stuck in BARRIER
>> while one process is stuck in MPI_Waitsome
> 
> Dear Robert,
> 
> That barrier is likely a subordinate call inside the top-level Bcast, no? I don't have any Barriers in the code, as I recall. 
> 
> The Waitsome issue is indeed what I saw with my printf debugging, although it really only clarified the symptom.
> 
>> I hope someone more familiar with the collectives can suggest a fix.
>> For now, I've triaged this in issue #2609
>> 
>> 
>> https://github.com/pmodels/mpich/issues/2609
> 
> Let me know if there is something else I can do to help here. 
> 
> Cheers,
> Jason
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list