[mpich-discuss] Overlapping non-blocking collectives leads to deadlock

Mon Nov 18 09:27:12 CST 2019

I realized something else relevant: I mentioned above that this
deadlock occurs sometimes but not all of the time; I think I've
narrowed down when it happens. Here's the above example with thread
IDs annotated in:

PROCESS 0 (root for ireduce and ibcast):
// T0 is always the thread that calls MPI functions
T0: MPI_Ireduce(..., &req)
T0: MPI_Wait(&req);  <-- blocking here
...
T0: MPI_Ibcast(..., &req2);
T0: MPI_Wait(&req2);

PROCESS 1 (non-root for ireduce and ibcast):
// T0 is the root for the reduce
T0: MPI_Ireduce(..., &req)
T0:MPI_Wait(&req);
...
// T1 is the root for the bcast
T1: MPI_Ibcast(..., &req2);
T1: MPI_Wait(&req2); <-- blocking here

Note that the non-root process has two different threads, T0 and T1,
and T0 does the Ireduce and T1 does the bcast. I believe the T0 call
to MPI_Ireduce is concurrent with the T1 call to MPI_Ibcast (both as
non-roots).

So, I believe the question is: is it legal in MPI to have two threads
in a given MPI process call different non-blocking collectives (e.g.,
reduce and bcast) concurrently with MPI_THREAD_MULTIPLE enabled?

Thank you

On Mon, Nov 18, 2019 at 10:05 AM Mark Davis <markdavisinboston at gmail.com> wrote:
>
>  Hello, I'm experimenting with non-blocking collectives using MPICH in
> a multithreaded C++ program (with MPI_THREAD_MULTIPLE initialization).
>
> I'm currently doing a non-blocking reduce followed by a non-blocking
> broadcast (I realize I can just use an allreduce but for my
> experiment, I need to serialize these operations). I was able to
> produce this bug with only two MPI processes. I see in gdb that the
> root process is stuck trying to execute the MPI_Ireduce in cases where
> the non-root process does the MPI_Ireduce and gets to the MPI_Ibcast
> quickly. That is, process 0 (root) isn't able to complete the
> MPI_Ireduce wait while process 1 is stuck in the MPI_Ibcast wait.
>
> PROCESS 0 (root for ireduce and ibcast):
> MPI_Ireduce(..., &req)
> MPI_Wait(&req);  <-- blocking here
> ...
> MPI_Ibcast(..., &req2);
> MPI_Wait(&req2);
>
> PROCESS 1 (non-root for ireduce and ibcast):
> MPI_Ireduce(..., &req)
> MPI_Wait(&req);
> ...
> MPI_Ibcast(..., &req2);
> MPI_Wait(&req2); <-- blocking here
>
> Much of the time, the program deadlocks as shown above; sometimes this
> works fine, though, perhaps due to subtle timing differences.  I
> mentioned above that this is a multithreaded program. I'm able to
> produce the issue with two threads with two MPI procs. The other
> threads are not calling MPI functions -- they are helping with other
> computation. I've verified that I don't have any TSAN or ASAN errors
> in this program. However, when I only have one thread per process, I
> don't have this issue. I think there's a decent chance, though, that
> this has to do with timing differences as opposed to changing anything
> with the MPI calls. I have verified that only one thread per process
> is calling the MPI routines in the multithreaded case.
>
> When I change the MPI_Ireduce to a blocking MPI_Reduce and I keep the
> MPI_Ibcast non-blocking, the program runs fine. Only when BOTH
> MPI_Ireduce and MPI_Ibcast happen serially do I see this deadlock
> (again, some of the time).
>
> Unfortunately, this program is part of a very large system and it's
> not straightforward to give a fully working example. So, I'm just
> looking for any ideas anyone has for what sort of thing may be
> happening, any information that may be helpful about how two
> coincident non-blocking requests could interact with each other, etc.
>
> Also, if anyone has tips on how to debug this sort of thing in gdb
> that would be helpful. For example, are there ways to introspect the
> MPI_Request object, etc.?
>
> Thanks