[mpich-discuss] Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q
Michael Blocksome
blocksom at us.ibm.com
Thu Jun 26 17:41:37 CDT 2014
Halim,
This problem sounds similar to another issue we are debugging related to
cancel and multiple endpoints in per-object locking mode. I'll try a few
things and post status.
Thanks,
Michael Blocksome
Parallel Environment MPI Middleware Team Lead, TCEM
POWER, x86, and Blue Gene HPC Messaging
blocksom at us.ibm.com
From: Halim <halim.amer at gmail.com>
To: discuss at mpich.org,
Date: 06/26/2014 02:31 PM
Subject: [mpich-discuss] Deadlock when using MPICH 3.1.1 and
per-object critical sections on BG/Q
Sent by: discuss-bounces at mpich.org
Hi,
I have a specific issue that arises with MPICH (I use 3.1.1 built with
gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.
A deadlock happens in the attached hybrid MPI+OpenMP example code with 2
processes and more than one thread per process.
Debugging shows that one process is stuck in MPI_Allreduce while the
other is blocked in MPI_Finalize.
A similar communication pattern happens in my application, but in this
case both processes are stuck in MPI_Allreduce.
Note that the problem disappears when removing MPI_Allreduce, or
avoiding request cancellation (cancel+wait+test_cancelled). Both
Allreduce and cancellation operations can be avoided in this test while
ensuring a correct result. But in my application, both operations are
necessary.
In addition, using a global critical section (default) results in a
correct execution.
My configure line is as follows:
./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux
--with-device=pamid --with-file-system=gpfs:BGQ
--with-file-system=bg+bglockless --with-atomic-primitives
--enable-handle-allocation=tls --enable-refcount=lock-free
-disable-predefined-refcount --disable-error-checking --without-timing
--without-mpit-pvars --enable-fast=O3,ndebug --enable-thread-cs=per-object
I appreciate any advice to solve this issue?
Regards,
--Halim
[attachment "allred_cancel.c" deleted by Michael Blocksome/Rochester/IBM]
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140626/a6d9b3c6/attachment.html>
More information about the discuss
mailing list