[mpich-devel] MPI_Bsend fails multithreaded

Bob Cernohous bobc at us.ibm.com
Thu Apr 25 23:20:17 CDT 2013


I think my problem with MPIU_THREAD_GRANULARITY_GLOBAL is a CS_YIELD. 

MPIU_THREAD_CS_ENTER(ALLFUNC,);

....

MPIR_Bsend_data_t *active = BsendBuffer.active, *next_active;

    while (active) {

        fprintf(stderr,"%2.2u:%u:active %p (0x%08x kind=%d) refcount 
%d\n",

                Kernel_ProcessorID(),__LINE__,

                (active->request),  

                (active->request)->handle,  

                active->request->kind,

                        MPIU_Object_get_ref((active->request)));
...

There's one or more yields somewhere... in test and/or progress. I haven't 
tracked it down and I'm out tomorrow.   I end up with 3 threads (26, 54, 
48) working on the same active request.   26 frees it and moves on to the 
next active.  48 chokes on the freed request.

stderr[0]: threaded exit
stderr[0]: 26:441:active 0x15d1c78 (0xac000003 kind=1) refcount 2
stderr[0]: 26:decr 0x15d1aa8 (0xac000001 kind=REQUEST) refcount to 1
stderr[0]: 26:decr 0x15d1c78 (0xac000003 kind=REQUEST) refcount to 1
stderr[0]: yield

stderr[0]: 54:441:active 0x15d1c78 (0xac000003 kind=1) refcount 1
stderr[0]: 54:set 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 1
stderr[0]: 54:set 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 2
stderr[0]: 54:decr 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 1
stderr[0]: yield

stderr[0]: 48:441:active 0x15d1c78 (0xac000003 kind=1) refcount 1
stderr[0]: 48:set 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 1
stderr[0]: 48:set 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 2
stderr[0]: 48:decr 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 1
stderr[0]: yield

stderr[0]: 26:decr 0x15d1c78 (0xac000003 kind=REQUEST) refcount to 0
stderr[0]: 26:decr 0x1560f78 (0x44000000 kind=COMM) refcount to 3
stderr[0]: 26:free 0x15d1c78 (0xac000003 kind=0) refcount 0
stderr[0]: 26:356:prev 0x15d1c78, active 0x15d1aa8 (0xac000001 kind=1) 
refcount 1
stderr[0]: 26:441:active 0x15d1aa8 (0xac000001 kind=1) refcount 1

stderr[0]: yield
stderr[0]: 32:441:active 0x15d1aa8 (0xac000001 kind=1) refcount 1

stderr[0]: yield
stderr[0]: 48:badcase 0x15d1c78 (0xac000003 kind=0) refcount 0
stderr[0]: Abort(1) on node 0 (rank 0 in comm 1140850688): Fatal error in 
MPI_Bsend: Internal MPI error!, error stack:
stderr[0]: MPI_Bsend(181)..............: MPI_Bsend(buf=0x19c8a06d70, 
count=1024, MPI_CHAR, dest=1, tag=0, MPI_COMM_WORLD) failed
stderr[0]: MPIR_Bsend_isend(226).......: 
stderr[0]: MPIR_Bsend_check_active(474): 
stderr[0]: MPIR_Test_impl(65)..........: 
stderr[0]: MPIR_Request_complete(239)..: INTERNAL ERROR: unexpected value 
in case statement (value=0)


I'm guessing the problem with MPIU_THREAD_GRANULARITY_PER_OBJECT is 
there's no lock and the threads are all over each other... no yield 
needed?  It's just not thread safe with the static BsendBuffer.


Bob Cernohous:  (T/L 553) 507-253-6093

BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester,  MN 55901-7829

> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130425/3935e608/attachment.html>


More information about the devel mailing list