[mpich-discuss] Overlapping non-blocking collectives leads to deadlock

Mark Davis markdavisinboston at gmail.com
Mon Nov 18 09:05:28 CST 2019

 Hello, I'm experimenting with non-blocking collectives using MPICH in
a multithreaded C++ program (with MPI_THREAD_MULTIPLE initialization).

I'm currently doing a non-blocking reduce followed by a non-blocking
broadcast (I realize I can just use an allreduce but for my
experiment, I need to serialize these operations). I was able to
produce this bug with only two MPI processes. I see in gdb that the
root process is stuck trying to execute the MPI_Ireduce in cases where
the non-root process does the MPI_Ireduce and gets to the MPI_Ibcast
quickly. That is, process 0 (root) isn't able to complete the
MPI_Ireduce wait while process 1 is stuck in the MPI_Ibcast wait.

PROCESS 0 (root for ireduce and ibcast):
MPI_Ireduce(..., &req)
MPI_Wait(&req);  <-- blocking here
MPI_Ibcast(..., &req2);

PROCESS 1 (non-root for ireduce and ibcast):
MPI_Ireduce(..., &req)
MPI_Ibcast(..., &req2);
MPI_Wait(&req2); <-- blocking here

Much of the time, the program deadlocks as shown above; sometimes this
works fine, though, perhaps due to subtle timing differences.  I
mentioned above that this is a multithreaded program. I'm able to
produce the issue with two threads with two MPI procs. The other
threads are not calling MPI functions -- they are helping with other
computation. I've verified that I don't have any TSAN or ASAN errors
in this program. However, when I only have one thread per process, I
don't have this issue. I think there's a decent chance, though, that
this has to do with timing differences as opposed to changing anything
with the MPI calls. I have verified that only one thread per process
is calling the MPI routines in the multithreaded case.

When I change the MPI_Ireduce to a blocking MPI_Reduce and I keep the
MPI_Ibcast non-blocking, the program runs fine. Only when BOTH
MPI_Ireduce and MPI_Ibcast happen serially do I see this deadlock
(again, some of the time).

Unfortunately, this program is part of a very large system and it's
not straightforward to give a fully working example. So, I'm just
looking for any ideas anyone has for what sort of thing may be
happening, any information that may be helpful about how two
coincident non-blocking requests could interact with each other, etc.

Also, if anyone has tips on how to debug this sort of thing in gdb
that would be helpful. For example, are there ways to introspect the
MPI_Request object, etc.?


More information about the discuss mailing list