[mpich-discuss] mpich hangs in MPI_Send() - multi threaded environment

Jeff Hammond jeff.science at gmail.com
Fri Jul 17 23:03:35 CDT 2015


Blocking Send to self will hang if Irecv not pre-posted. You doing that?

Jeff

On Friday, July 17, 2015, Nenad Vukicevic <nenad at intrepid.com> wrote:

> I will via separate mail.  But, I see that I made a mistake in my
> description as pthread does MPI_Send () to itself and not to another rank.
> I'll try reversing the order (send to others then to yourself).
>
> On 7/17/2015 4:26 PM, Balaji, Pavan wrote:
>
>> You should not need to pass any additional configuration options.  MPICH
>> is thread-safe by default.
>>
>> Can you send us a simple program that reproduces the error?
>>
>>    -- Pavan
>>
>>
>>
>>
>>
>> On 7/17/15, 6:16 PM, "Nenad Vukicevic" <nenad at intrepid.com> wrote:
>>
>>  I am having a problem where the system locks up inside the MPI_Send()
>>> routine.  In my test, each MPI rank has an additional pthread and system
>>> locks up when:
>>>
>>> - main thread does MPI_Recv from ANY rank
>>> - pthread does MPI_Send to another rank
>>>
>>> I verified with MPI_Init_thread() that I can run MPI_THREAD_MULTIPLE
>>> environment.
>>>
>>> The same thing happened on MPICH with Fedora 20 (3.0.4) and the one
>>> build from 3.2b3 sources.  When building from sources I provided
>>> '--enable-threads=multiple' option.  I also tried to play with
>>> '--enable-thread-cs' option but got build failure when 'per-object' was
>>> selected.
>>>
>>> Is this supposed to work?
>>>
>>> Thanks,
>>> Nenad
>>>
>>>
>>> I am attaching traces form GDB for the rank that locks up.
>>>
>>>
>>> (gdb) info thread
>>>    Id   Target Id         Frame
>>>    2    Thread 0x7ffff6a5a700 (LWP 29570) "barrier_test"
>>> 0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /usr/lib64/libpthread.so.0
>>> * 1    Thread 0x7ffff7a64b80 (LWP 29568) "barrier_test"
>>> 0x00007ffff7c58717 in MPIU_Thread_CS_yield_lockname_recursive_impl_ (
>>>      lockname=0x7ffff7cdc8b1 "global_mutex", mutex=<optimized out>,
>>> kind=MPIU_Nest_global_mutex) at
>>> ../src/src/include/mpiimplthreadpost.h:190
>>> (gdb) where
>>> #0  0x00007ffff7c58717 in MPIU_Thread_CS_yield_lockname_recursive_impl_
>>> (lockname=0x7ffff7cdc8b1 "global_mutex", mutex=<optimized out>,
>>>      kind=MPIU_Nest_global_mutex) at
>>> ../src/src/include/mpiimplthreadpost.h:190
>>> #1  0x00007ffff7c5db42 in MPIDI_CH3I_Progress
>>> (progress_state=progress_state at entry=0x7fffffffd2c0,
>>> is_blocking=is_blocking at entry=1)
>>>      at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:507
>>> #2  0x00007ffff7b5e795 in PMPI_Recv (buf=0x7fffffffd61c, count=1,
>>> datatype=1275069445, source=-2, tag=299, comm=1140850688,
>>> status=0x7fffffffd620)
>>>      at ../src/src/mpi/pt2pt/recv.c:157
>>> #3  0x0000000000401732 in receive_int () at comm.c:52
>>> #4  0x0000000000400bf2 in main (argc=1, argv=0x7fffffffd758) at
>>> barrier_test.c:39
>>> (gdb) thread 2
>>> [Switching to thread 2 (Thread 0x7ffff6a5a700 (LWP 29570))]
>>> #0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /usr/lib64/libpthread.so.0
>>> (gdb) where
>>> #0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from
>>> /usr/lib64/libpthread.so.0
>>> #1  0x00007ffff7c5d614 in MPIDI_CH3I_Progress_delay
>>> (completion_count=<optimized out>)
>>>      at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:566
>>> #2  MPIDI_CH3I_Progress
>>> (progress_state=progress_state at entry=0x7ffff6a59710,
>>> is_blocking=is_blocking at entry=1)
>>>      at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:347
>>> #3  0x00007ffff7b632ec in PMPI_Send (buf=0x7ffff6a5985c, count=1,
>>> datatype=1275069445, dest=0, tag=199, comm=1140850688)
>>>      at ../src/src/mpi/pt2pt/send.c:145
>>> #4  0x0000000000400e42 in barrier_thread_release (id=0) at barrier.c:115
>>> #5  0x0000000000401098 in barrier_helper (arg=0x0) at barrier.c:186
>>> #6  0x0000003ef0407ee5 in start_thread () from /usr/lib64/libpthread.so.0
>>> #7  0x0000003eef8f4d1d in clone () from /usr/lib64/libc.so.6
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>


-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150717/58e68085/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list