[mpich-discuss] Deadlock in MPI_Ibarrier over ch3:sock
Dave Goodell
goodell at mcs.anl.gov
Wed Jan 23 14:44:51 CST 2013
On Jan 23, 2013, at 2:13 PM CST, Jed Brown wrote:
> As we've discussed here several times before, PETSc's --download-mpich option uses ch3:sock by default because this issue
>
> http://trac.mpich.org/projects/mpich/ticket/79
>
> makes nemesis unusable when oversubscribed.
Your complaint has been duly noted (again).
> We're also checking for MPI_Ibarrier and using Torsten's "nonblocking consensus" algorithm when possible. Unfortunately, MPI_Ibarrier deadlocks when run over ch3:sock. (Both procs have posted the Ibarrier, but the requests never complete.) I can prepare a reduced test case if you'd like.
Hmm… it looks like we aren't poking NBC progress properly when a "test" or "iprobe" routine is called. MPIDI_CH3i_Progress_test is missing a call to MPIDU_Sched_progress: http://git.mpich.org/mpich.git/blob/refs/heads/master:/src/mpid/ch3/channels/sock/src/ch3_progress.c#l51
I'm surprised this isn't caught by the "coll/nonblock3" test, although we may just be making too much progress some other way, or we may not have enough coverage of the test itself in that test: http://git.mpich.org/mpich.git/blob/refs/heads/master:/test/mpi/coll/nonblocking3.c
A modest test case would be very helpful.
> Are you still supporting ch3:sock?
We are, although we have chosen not to support a small subset of the MPI-3 functionality in ch3:sock, such as shared memory windows. We still run sock in the nightly tests:
http://www.mpich.org/static/cron/tests/#hydra-ch3:sock
> Is there some test we can do to check whether ch3:sock is being used so that we can choose a safe algorithm?
A run-time test that doesn't require looking at which symbols are compiled into the app? I can't think of one off the top of my head, although I suppose we could add one in the future via either MPI_T or the new MPI_Get_library_version routine.
-Dave
More information about the discuss
mailing list