[mpich-discuss] Overlapping non-blocking collectives leads to deadlock

Congiu, Giuseppe gcongiu at anl.gov
Mon Nov 18 09:38:57 CST 2019

Hello Mark,

I don’t think that is a bug in MPICH, it’s a bug in your code. The MPI standard requires that collectives (non-blocking ones are no exception) must be invoked in the same order in all processes. If T0 in process 0 runs first and T1 in process 1 runs first you have a mismatch and a resulting deadlock.


> On Nov 18, 2019, at 8:27 AM, Mark Davis via discuss <discuss at mpich.org> wrote:
> I realized something else relevant: I mentioned above that this
> deadlock occurs sometimes but not all of the time; I think I've
> narrowed down when it happens. Here's the above example with thread
> IDs annotated in:
> PROCESS 0 (root for ireduce and ibcast):
> // T0 is always the thread that calls MPI functions
> T0: MPI_Ireduce(..., &req)
> T0: MPI_Wait(&req);  <-- blocking here
> ...
> T0: MPI_Ibcast(..., &req2);
> T0: MPI_Wait(&req2);
> PROCESS 1 (non-root for ireduce and ibcast):
> // T0 is the root for the reduce
> T0: MPI_Ireduce(..., &req)
> T0:MPI_Wait(&req);
> ...
> // T1 is the root for the bcast
> T1: MPI_Ibcast(..., &req2);
> T1: MPI_Wait(&req2); <-- blocking here
> Note that the non-root process has two different threads, T0 and T1,
> and T0 does the Ireduce and T1 does the bcast. I believe the T0 call
> to MPI_Ireduce is concurrent with the T1 call to MPI_Ibcast (both as
> non-roots).
> So, I believe the question is: is it legal in MPI to have two threads
> in a given MPI process call different non-blocking collectives (e.g.,
> reduce and bcast) concurrently with MPI_THREAD_MULTIPLE enabled?
> Thank you
> On Mon, Nov 18, 2019 at 10:05 AM Mark Davis <markdavisinboston at gmail.com> wrote:
>> Hello, I'm experimenting with non-blocking collectives using MPICH in
>> a multithreaded C++ program (with MPI_THREAD_MULTIPLE initialization).
>> I'm currently doing a non-blocking reduce followed by a non-blocking
>> broadcast (I realize I can just use an allreduce but for my
>> experiment, I need to serialize these operations). I was able to
>> produce this bug with only two MPI processes. I see in gdb that the
>> root process is stuck trying to execute the MPI_Ireduce in cases where
>> the non-root process does the MPI_Ireduce and gets to the MPI_Ibcast
>> quickly. That is, process 0 (root) isn't able to complete the
>> MPI_Ireduce wait while process 1 is stuck in the MPI_Ibcast wait.
>> PROCESS 0 (root for ireduce and ibcast):
>> MPI_Ireduce(..., &req)
>> MPI_Wait(&req);  <-- blocking here
>> ...
>> MPI_Ibcast(..., &req2);
>> MPI_Wait(&req2);
>> PROCESS 1 (non-root for ireduce and ibcast):
>> MPI_Ireduce(..., &req)
>> MPI_Wait(&req);
>> ...
>> MPI_Ibcast(..., &req2);
>> MPI_Wait(&req2); <-- blocking here
>> Much of the time, the program deadlocks as shown above; sometimes this
>> works fine, though, perhaps due to subtle timing differences.  I
>> mentioned above that this is a multithreaded program. I'm able to
>> produce the issue with two threads with two MPI procs. The other
>> threads are not calling MPI functions -- they are helping with other
>> computation. I've verified that I don't have any TSAN or ASAN errors
>> in this program. However, when I only have one thread per process, I
>> don't have this issue. I think there's a decent chance, though, that
>> this has to do with timing differences as opposed to changing anything
>> with the MPI calls. I have verified that only one thread per process
>> is calling the MPI routines in the multithreaded case.
>> When I change the MPI_Ireduce to a blocking MPI_Reduce and I keep the
>> MPI_Ibcast non-blocking, the program runs fine. Only when BOTH
>> MPI_Ireduce and MPI_Ibcast happen serially do I see this deadlock
>> (again, some of the time).
>> Unfortunately, this program is part of a very large system and it's
>> not straightforward to give a fully working example. So, I'm just
>> looking for any ideas anyone has for what sort of thing may be
>> happening, any information that may be helpful about how two
>> coincident non-blocking requests could interact with each other, etc.
>> Also, if anyone has tips on how to debug this sort of thing in gdb
>> that would be helpful. For example, are there ways to introspect the
>> MPI_Request object, etc.?
>> Thanks
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

More information about the discuss mailing list