[mpich-discuss] Deadlock when using MPICH 3.1.1 and per-object critical sections on BG/Q

Michael Blocksome blocksom at us.ibm.com
Fri Jul 18 13:54:57 CDT 2014


Halim,

Thanks for creating the ticket.  This likely will not be fixed in time for 
mpich 3.1.2, but I hope to get time to investigate this problem in the 
3.1.3 timerframe .... whenever that is scheduled.

Michael Blocksome
Parallel Environment MPI Middleware Team Lead, TCEM
POWER, x86, and Blue Gene HPC Messaging
blocksom at us.ibm.com




From:   Halim <halim.amer at gmail.com>
To:     discuss at mpich.org
Date:   07/17/2014 02:54 PM
Subject:        Re: [mpich-discuss] Deadlock when using MPICH 3.1.1 and 
per-object critical sections on BG/Q
Sent by:        discuss-bounces at mpich.org



Hi Michael,

Thanks. I created a ticket (#2132) on trac.mpich.org to track the problem.

Regards,
--Halim

On 2014年06月26日 17:41, Michael Blocksome wrote:
> Halim,
>
> This problem sounds similar to another issue we are debugging related to
> cancel and multiple endpoints in per-object locking mode.  I'll try a 
few
> things and post status.
>
> Thanks,
>
> Michael Blocksome
> Parallel Environment MPI Middleware Team Lead, TCEM
> POWER, x86, and Blue Gene HPC Messaging
> blocksom at us.ibm.com
>
>
>
>
> From:   Halim <halim.amer at gmail.com>
> To:     discuss at mpich.org,
> Date:   06/26/2014 02:31 PM
> Subject:        [mpich-discuss] Deadlock when using MPICH 3.1.1 and
> per-object critical sections on BG/Q
> Sent by:        discuss-bounces at mpich.org
>
>
>
> Hi,
>
> I have a specific issue that arises with MPICH (I use 3.1.1 built with
> gcc) + MPI_THREAD_MULTIPLE + per-object critical sections on BG/Q.
>
> A deadlock happens in the attached hybrid MPI+OpenMP example code with 2
> processes and more than one thread per process.
>
> Debugging shows that one process is stuck in MPI_Allreduce while the
> other is blocked in MPI_Finalize.
>
> A similar communication pattern happens in my application, but in this
> case both processes are stuck in MPI_Allreduce.
>
> Note that the problem disappears when removing MPI_Allreduce, or
> avoiding request cancellation (cancel+wait+test_cancelled). Both
> Allreduce and cancellation operations can be avoided in this test while
> ensuring a correct result. But in my application, both operations are
> necessary.
>
> In addition, using a global critical section (default) results in a
> correct execution.
>
> My configure line is as follows:
>
> ./configure --prefix=/home/aamer/usr --host=powerpc64-bgq-linux
> --with-device=pamid --with-file-system=gpfs:BGQ
> --with-file-system=bg+bglockless --with-atomic-primitives
> --enable-handle-allocation=tls --enable-refcount=lock-free
> -disable-predefined-refcount --disable-error-checking --without-timing
> --without-mpit-pvars --enable-fast=O3,ndebug 
--enable-thread-cs=per-object
>
> I appreciate any advice to solve this issue?
>
> Regards,
> --Halim
> [attachment "allred_cancel.c" deleted by Michael 
Blocksome/Rochester/IBM]
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140718/677e139e/attachment.html>


More information about the discuss mailing list