[mpich-discuss] mpich hangs in MPI_Send() - multi threaded environment

Balaji, Pavan balaji at anl.gov
Fri Jul 17 18:26:43 CDT 2015


You should not need to pass any additional configuration options.  MPICH is thread-safe by default.

Can you send us a simple program that reproduces the error?

  -- Pavan





On 7/17/15, 6:16 PM, "Nenad Vukicevic" <nenad at intrepid.com> wrote:

>I am having a problem where the system locks up inside the MPI_Send() 
>routine.  In my test, each MPI rank has an additional pthread and system 
>locks up when:
>
>- main thread does MPI_Recv from ANY rank
>- pthread does MPI_Send to another rank
>
>I verified with MPI_Init_thread() that I can run MPI_THREAD_MULTIPLE 
>environment.
>
>The same thing happened on MPICH with Fedora 20 (3.0.4) and the one 
>build from 3.2b3 sources.  When building from sources I provided 
>'--enable-threads=multiple' option.  I also tried to play with 
>'--enable-thread-cs' option but got build failure when 'per-object' was 
>selected.
>
>Is this supposed to work?
>
>Thanks,
>Nenad
>
>
>I am attaching traces form GDB for the rank that locks up.
>
>
>(gdb) info thread
>   Id   Target Id         Frame
>   2    Thread 0x7ffff6a5a700 (LWP 29570) "barrier_test" 
>0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from 
>/usr/lib64/libpthread.so.0
>* 1    Thread 0x7ffff7a64b80 (LWP 29568) "barrier_test" 
>0x00007ffff7c58717 in MPIU_Thread_CS_yield_lockname_recursive_impl_ (
>     lockname=0x7ffff7cdc8b1 "global_mutex", mutex=<optimized out>, 
>kind=MPIU_Nest_global_mutex) at ../src/src/include/mpiimplthreadpost.h:190
>(gdb) where
>#0  0x00007ffff7c58717 in MPIU_Thread_CS_yield_lockname_recursive_impl_ 
>(lockname=0x7ffff7cdc8b1 "global_mutex", mutex=<optimized out>,
>     kind=MPIU_Nest_global_mutex) at 
>../src/src/include/mpiimplthreadpost.h:190
>#1  0x00007ffff7c5db42 in MPIDI_CH3I_Progress 
>(progress_state=progress_state at entry=0x7fffffffd2c0, 
>is_blocking=is_blocking at entry=1)
>     at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:507
>#2  0x00007ffff7b5e795 in PMPI_Recv (buf=0x7fffffffd61c, count=1, 
>datatype=1275069445, source=-2, tag=299, comm=1140850688, 
>status=0x7fffffffd620)
>     at ../src/src/mpi/pt2pt/recv.c:157
>#3  0x0000000000401732 in receive_int () at comm.c:52
>#4  0x0000000000400bf2 in main (argc=1, argv=0x7fffffffd758) at 
>barrier_test.c:39
>(gdb) thread 2
>[Switching to thread 2 (Thread 0x7ffff6a5a700 (LWP 29570))]
>#0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from 
>/usr/lib64/libpthread.so.0
>(gdb) where
>#0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from 
>/usr/lib64/libpthread.so.0
>#1  0x00007ffff7c5d614 in MPIDI_CH3I_Progress_delay 
>(completion_count=<optimized out>)
>     at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:566
>#2  MPIDI_CH3I_Progress 
>(progress_state=progress_state at entry=0x7ffff6a59710, 
>is_blocking=is_blocking at entry=1)
>     at ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:347
>#3  0x00007ffff7b632ec in PMPI_Send (buf=0x7ffff6a5985c, count=1, 
>datatype=1275069445, dest=0, tag=199, comm=1140850688)
>     at ../src/src/mpi/pt2pt/send.c:145
>#4  0x0000000000400e42 in barrier_thread_release (id=0) at barrier.c:115
>#5  0x0000000000401098 in barrier_helper (arg=0x0) at barrier.c:186
>#6  0x0000003ef0407ee5 in start_thread () from /usr/lib64/libpthread.so.0
>#7  0x0000003eef8f4d1d in clone () from /usr/lib64/libc.so.6
>_______________________________________________
>discuss mailing list     discuss at mpich.org
>To manage subscription options or unsubscribe:
>https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list