[mpich-discuss] Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q

Michael Blocksome blocksom at us.ibm.com
Thu Jun 26 17:41:37 CDT 2014


Halim,

This problem sounds similar to another issue we are debugging related to 
cancel and multiple endpoints in per-object locking mode.  I'll try a few 
things and post status.

Thanks,

Michael Blocksome
Parallel Environment MPI Middleware Team Lead, TCEM
POWER, x86, and Blue Gene HPC Messaging
blocksom at us.ibm.com




From:   Halim <halim.amer at gmail.com>
To:     discuss at mpich.org, 
Date:   06/26/2014 02:31 PM
Subject:        [mpich-discuss] Deadlock when using MPICH 3.1.1 and 
per-object critical sections on BG/Q
Sent by:        discuss-bounces at mpich.org



Hi,

I have a specific issue that arises with MPICH (I use 3.1.1 built with 
gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.

A deadlock happens in the attached hybrid MPI+OpenMP example code with 2 
processes and more than one thread per process.

Debugging shows that one process is stuck in MPI_Allreduce while the 
other is blocked in MPI_Finalize.

A similar communication pattern happens in my application, but in this 
case both processes are stuck in MPI_Allreduce.

Note that the problem disappears when removing MPI_Allreduce, or 
avoiding request cancellation (cancel+wait+test_cancelled). Both 
Allreduce and cancellation operations can be avoided in this test while 
ensuring a correct result. But in my application, both operations are 
necessary.

In addition, using a global critical section (default) results in a 
correct execution.

My configure line is as follows:

./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux 
--with-device=pamid --with-file-system=gpfs:BGQ 
--with-file-system=bg+bglockless --with-atomic-primitives 
--enable-handle-allocation=tls --enable-refcount=lock-free 
-disable-predefined-refcount --disable-error-checking --without-timing 
--without-mpit-pvars --enable-fast=O3,ndebug --enable-thread-cs=per-object

I appreciate any advice to solve this issue?

Regards,
--Halim
[attachment "allred_cancel.c" deleted by Michael Blocksome/Rochester/IBM] 
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140626/a6d9b3c6/attachment.html>


More information about the discuss mailing list