[mpich-discuss] Hanging behavior with derived types in a 'user-defined gatherv'
Balaji, Pavan
balaji at anl.gov
Tue Apr 25 05:05:07 CDT 2017
I have been able to reproduce it with a much simpler program, that I have added to the below ticket. It does seem like a bug in mpich, unfortunately, though I'm surprised that we didn't encounter this earlier. Perhaps this shows a hole in our test coverage.
-- Pavan
> On Apr 25, 2017, at 3:57 AM, Sewall, Jason <jason.sewall at intel.com> wrote:
>
>> From: Latham, Robert J. [mailto:robl at mcs.anl.gov]
>> Sent: Monday, April 24, 2017 3:35 PM
>
>> I can confirm that with NOLOCAL, some processes are stuck in BARRIER
>> while one process is stuck in MPI_Waitsome
>
> Dear Robert,
>
> That barrier is likely a subordinate call inside the top-level Bcast, no? I don't have any Barriers in the code, as I recall.
>
> The Waitsome issue is indeed what I saw with my printf debugging, although it really only clarified the symptom.
>
>> I hope someone more familiar with the collectives can suggest a fix.
>> For now, I've triaged this in issue #2609
>>
>>
>> https://github.com/pmodels/mpich/issues/2609
>
> Let me know if there is something else I can do to help here.
>
> Cheers,
> Jason
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list