<font size=2 face="sans-serif">Halim,</font>
<br>
<br><font size=2 face="sans-serif">This problem sounds similar to another
issue we are debugging related to cancel and multiple endpoints in per-object
locking mode. I'll try a few things and post status.</font>
<br>
<br><font size=2 face="sans-serif">Thanks,</font>
<br><font size=2 face="sans-serif"><br>
Michael Blocksome<br>
Parallel Environment MPI Middleware Team Lead, TCEM<br>
POWER, x86, and Blue Gene HPC Messaging<br>
blocksom@us.ibm.com<br>
</font>
<br>
<br>
<br>
<br><font size=1 color=#5f5f5f face="sans-serif">From:
</font><font size=1 face="sans-serif">Halim <halim.amer@gmail.com></font>
<br><font size=1 color=#5f5f5f face="sans-serif">To:
</font><font size=1 face="sans-serif">discuss@mpich.org,
</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Date:
</font><font size=1 face="sans-serif">06/26/2014 02:31 PM</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Subject:
</font><font size=1 face="sans-serif">[mpich-discuss]
Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q</font>
<br><font size=1 color=#5f5f5f face="sans-serif">Sent by:
</font><font size=1 face="sans-serif">discuss-bounces@mpich.org</font>
<br>
<hr noshade>
<br>
<br>
<br><tt><font size=2>Hi,<br>
<br>
I have a specific issue that arises with MPICH (I use 3.1.1 built with
<br>
gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.<br>
<br>
A deadlock happens in the attached hybrid MPI+OpenMP example code with
2 <br>
processes and more than one thread per process.<br>
<br>
Debugging shows that one process is stuck in MPI_Allreduce while the <br>
other is blocked in MPI_Finalize.<br>
<br>
A similar communication pattern happens in my application, but in this
<br>
case both processes are stuck in MPI_Allreduce.<br>
<br>
Note that the problem disappears when removing MPI_Allreduce, or <br>
avoiding request cancellation (cancel+wait+test_cancelled). Both <br>
Allreduce and cancellation operations can be avoided in this test while
<br>
ensuring a correct result. But in my application, both operations are <br>
necessary.<br>
<br>
In addition, using a global critical section (default) results in a <br>
correct execution.<br>
<br>
My configure line is as follows:<br>
<br>
./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux <br>
--with-device=pamid --with-file-system=gpfs:BGQ <br>
--with-file-system=bg+bglockless --with-atomic-primitives <br>
--enable-handle-allocation=tls --enable-refcount=lock-free <br>
-disable-predefined-refcount --disable-error-checking --without-timing
<br>
--without-mpit-pvars --enable-fast=O3,ndebug --enable-thread-cs=per-object<br>
<br>
I appreciate any advice to solve this issue?<br>
<br>
Regards,<br>
--Halim<br>
[attachment "allred_cancel.c" deleted by Michael Blocksome/Rochester/IBM]
_______________________________________________<br>
discuss mailing list discuss@mpich.org<br>
To manage subscription options or unsubscribe:<br>
</font></tt><a href=https://lists.mpich.org/mailman/listinfo/discuss><tt><font size=2>https://lists.mpich.org/mailman/listinfo/discuss</font></tt></a>
<br>