[mpich-discuss] Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q

Halim halim.amer at gmail.com
Thu Jun 26 14:31:06 CDT 2014


Hi,

I have a specific issue that arises with MPICH (I use 3.1.1 built with 
gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.

A deadlock happens in the attached hybrid MPI+OpenMP example code with 2 
processes and more than one thread per process.

Debugging shows that one process is stuck in MPI_Allreduce while the 
other is blocked in MPI_Finalize.

A similar communication pattern happens in my application, but in this 
case both processes are stuck in MPI_Allreduce.

Note that the problem disappears when removing MPI_Allreduce, or 
avoiding request cancellation (cancel+wait+test_cancelled). Both 
Allreduce and cancellation operations can be avoided in this test while 
ensuring a correct result. But in my application, both operations are 
necessary.

In addition, using a global critical section (default) results in a 
correct execution.

My configure line is as follows:

./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux 
--with-device=pamid --with-file-system=gpfs:BGQ 
--with-file-system=bg+bglockless --with-atomic-primitives 
--enable-handle-allocation=tls --enable-refcount=lock-free 
-disable-predefined-refcount --disable-error-checking --without-timing 
--without-mpit-pvars --enable-fast=O3,ndebug --enable-thread-cs=per-object

I appreciate any advice to solve this issue?

Regards,
--Halim
-------------- next part --------------
A non-text attachment was scrubbed...
Name: allred_cancel.c
Type: text/x-csrc
Size: 2109 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140626/5ab0d8ac/attachment.bin>


More information about the discuss mailing list