[mpich-devel] MPI_Bsend fails multithreaded
Bob Cernohous
bobc at us.ibm.com
Fri Apr 26 00:05:33 CDT 2013
If I put a new, big CS_ENTER(BSENDDATA) lock around the bsend processing
(that doesn't yield), then MPIU_THREAD_GRANULARITY_PER_OBJECT works ok.
Bob Cernohous: (T/L 553) 507-253-6093
BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester, MN 55901-7829
> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
devel-bounces at mpich.org wrote on 04/25/2013 11:20:17 PM:
> From: Bob Cernohous/Rochester/IBM at IBMUS
> To: mpich2-dev at mcs.anl.gov,
> Cc: Nysal K Jan <jnysal at in.ibm.com>, Haizhu Liu/Poughkeepsie/
> IBM at IBMUS, Sameh S Sharkawi/Austin/IBM at IBMUS, Su
Huang/Poughkeepsie/IBM at IBMUS
> Date: 04/25/2013 11:20 PM
> Subject: Re: [mpich-devel] MPI_Bsend fails multithreaded
> Sent by: devel-bounces at mpich.org
>
> I think my problem with MPIU_THREAD_GRANULARITY_GLOBAL is a CS_YIELD.
>
> MPIU_THREAD_CS_ENTER(ALLFUNC,);
> ....
>
> MPIR_Bsend_data_t *active = BsendBuffer.active, *next_active;
> while (active) {
> fprintf(stderr,"%2.2u:%u:active %p (0x%08x kind=%d) refcount
%d\n",
> Kernel_ProcessorID(),__LINE__,
> (active->request),
> (active->request)->handle,
> active->request->kind,
> MPIU_Object_get_ref((active->request)));
> ...
>
> There's one or more yields somewhere... in test and/or progress. I
> haven't tracked it down and I'm out tomorrow. I end up with 3
> threads (26, 54, 48) working on the same active request. 26 frees
> it and moves on to the next active. 48 chokes on the freed request.
>
> stderr[0]: threaded exit
> stderr[0]: 26:441:active 0x15d1c78 (0xac000003 kind=1) refcount 2
> stderr[0]: 26:decr 0x15d1aa8 (0xac000001 kind=REQUEST) refcount to 1
> stderr[0]: 26:decr 0x15d1c78 (0xac000003 kind=REQUEST) refcount to 1
> stderr[0]: yield
>
> stderr[0]: 54:441:active 0x15d1c78 (0xac000003 kind=1) refcount 1
> stderr[0]: 54:set 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 1
> stderr[0]: 54:set 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 2
> stderr[0]: 54:decr 0x15d1d60 (0xac000004 kind=REQUEST) refcount to 1
> stderr[0]: yield
>
> stderr[0]: 48:441:active 0x15d1c78 (0xac000003 kind=1) refcount 1
> stderr[0]: 48:set 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 1
> stderr[0]: 48:set 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 2
> stderr[0]: 48:decr 0x15d1e48 (0xac000005 kind=REQUEST) refcount to 1
> stderr[0]: yield
>
> stderr[0]: 26:decr 0x15d1c78 (0xac000003 kind=REQUEST) refcount to 0
> stderr[0]: 26:decr 0x1560f78 (0x44000000 kind=COMM) refcount to 3
> stderr[0]: 26:free 0x15d1c78 (0xac000003 kind=0) refcount 0
> stderr[0]: 26:356:prev 0x15d1c78, active 0x15d1aa8 (0xac000001
> kind=1) refcount 1
> stderr[0]: 26:441:active 0x15d1aa8 (0xac000001 kind=1) refcount 1
>
> stderr[0]: yield
> stderr[0]: 32:441:active 0x15d1aa8 (0xac000001 kind=1) refcount 1
>
> stderr[0]: yield
> stderr[0]: 48:badcase 0x15d1c78 (0xac000003 kind=0) refcount 0
> stderr[0]: Abort(1) on node 0 (rank 0 in comm 1140850688): Fatal
> error in MPI_Bsend: Internal MPI error!, error stack:
> stderr[0]: MPI_Bsend(181)..............: MPI_Bsend(buf=0x19c8a06d70,
> count=1024, MPI_CHAR, dest=1, tag=0, MPI_COMM_WORLD) failed
> stderr[0]: MPIR_Bsend_isend(226).......:
> stderr[0]: MPIR_Bsend_check_active(474):
> stderr[0]: MPIR_Test_impl(65)..........:
> stderr[0]: MPIR_Request_complete(239)..: INTERNAL ERROR: unexpected
> value in case statement (value=0)
>
>
> I'm guessing the problem with MPIU_THREAD_GRANULARITY_PER_OBJECT is
> there's no lock and the threads are all over each other... no yield
> needed? It's just not thread safe with the static BsendBuffer.
>
>
> Bob Cernohous: (T/L 553) 507-253-6093
>
> BobC at us.ibm.com
> IBM Rochester, Building 030-2(C335), Department 61L
> 3605 Hwy 52 North, Rochester, MN 55901-7829
>
> > Chaos reigns within.
> > Reflect, repent, and reboot.
> > Order shall return.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130426/6140bfc0/attachment-0003.html>
More information about the devel
mailing list