[mpich-discuss] Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q
Halim
halim.amer at gmail.com
Thu Jun 26 14:31:06 CDT 2014
Hi,
I have a specific issue that arises with MPICH (I use 3.1.1 built with
gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.
A deadlock happens in the attached hybrid MPI+OpenMP example code with 2
processes and more than one thread per process.
Debugging shows that one process is stuck in MPI_Allreduce while the
other is blocked in MPI_Finalize.
A similar communication pattern happens in my application, but in this
case both processes are stuck in MPI_Allreduce.
Note that the problem disappears when removing MPI_Allreduce, or
avoiding request cancellation (cancel+wait+test_cancelled). Both
Allreduce and cancellation operations can be avoided in this test while
ensuring a correct result. But in my application, both operations are
necessary.
In addition, using a global critical section (default) results in a
correct execution.
My configure line is as follows:
./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux
--with-device=pamid --with-file-system=gpfs:BGQ
--with-file-system=bg+bglockless --with-atomic-primitives
--enable-handle-allocation=tls --enable-refcount=lock-free
-disable-predefined-refcount --disable-error-checking --without-timing
--without-mpit-pvars --enable-fast=O3,ndebug --enable-thread-cs=per-object
I appreciate any advice to solve this issue?
Regards,
--Halim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: allred_cancel.c
Type: text/x-csrc
Size: 2109 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140626/5ab0d8ac/attachment.bin>
More information about the discuss
mailing list