[mpich-devel] MPI_Bsend under MPIU_THREAD_GRANULARITY_PER_OBJECT

Dave Goodell goodell at mcs.anl.gov
Thu Apr 25 11:03:23 CDT 2013


The Bsend paths almost certainly have not been protected correctly.  Patches to fix the issue are most welcome.

-Dave

On Apr 25, 2013, at 10:49 AM CDT, Bob Cernohous <bobc at us.ibm.com> wrote:

> Start by saying that I have not been involved in the nitty gritty of the per-object locking design. 
> 
> What protects the attached buffer/data structures/request when doing multithreaded MPI_Bsend()'s?  All I see in the code path is a (no-op) MPIU_THREAD_CS_ENTER(ALLFUNC,). 
> 
> I have a customer test in which the threads seem to be walking all over the request around: 
> 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpid/pamid/include/../src/mpid_request.h:259 
> 0000000001088c0c MPIR_Request_complete 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpi/pt2pt/mpir_request.c:87 
> 000000000106e874 MPIR_Test_impl 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpi/pt2pt/test.c:62 
> 00000000010188f0 MPIR_Bsend_check_active 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpi/pt2pt/bsendutil.c:455 
> 0000000001018dc0 MPIR_Bsend_isend 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpi/pt2pt/bsendutil.c:226 
> 0000000001008734 PMPI_Bsend 
>         /bgusr/bobc/bgq/comm/lib/dev/mpich2/src/mpi/pt2pt/bsend.c:163 
> 00000000010009c0 00000012.long_branch_r2off.__libc_start_main+0 
>         :0 
> 000000000130cbc0 start_thread 
> 
> eg. (fprinting from MPIU_HANDLE_LOG_REFCOUNT_CHANGE) 
> 
> stderr[8]: set 0x15f8048 (0xac0000ff kind=REQUEST) refcount to 2 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to 1 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to 0 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -1 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -2 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -3 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -4 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -5 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -6 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -7 
> stderr[8]: decr 0x15f8048 (0xac0000ff kind=REQUEST) refcount to -8 
> 
> 
> Bob Cernohous:  (T/L 553) 507-253-6093
> 
> BobC at us.ibm.com
> IBM Rochester, Building 030-2(C335), Department 61L
> 3605 Hwy 52 North, Rochester,  MN 55901-7829
> 
> > Chaos reigns within.
> > Reflect, repent, and reboot.
> > Order shall return.



More information about the devel mailing list