[mpich-discuss] mpich hangs in MPI_Send() - multi threaded environment

Nenad Vukicevic nenad at intrepid.com
Mon Jul 20 15:41:15 CDT 2015


Yes. Send did not have a matching tag receive as thread was waiting for 
something else.

Switching to Bsend should help?

On 7/17/2015 9:03 PM, Jeff Hammond wrote:
> Blocking Send to self will hang if Irecv not pre-posted. You doing that?
>
> Jeff
>
> On Friday, July 17, 2015, Nenad Vukicevic <nenad at intrepid.com 
> <mailto:nenad at intrepid.com>> wrote:
>
>     I will via separate mail.  But, I see that I made a mistake in my
>     description as pthread does MPI_Send () to itself and not to
>     another rank.
>     I'll try reversing the order (send to others then to yourself).
>
>     On 7/17/2015 4:26 PM, Balaji, Pavan wrote:
>
>         You should not need to pass any additional configuration
>         options.  MPICH is thread-safe by default.
>
>         Can you send us a simple program that reproduces the error?
>
>            -- Pavan
>
>
>
>
>
>         On 7/17/15, 6:16 PM, "Nenad Vukicevic" <nenad at intrepid.com> wrote:
>
>             I am having a problem where the system locks up inside the
>             MPI_Send()
>             routine.  In my test, each MPI rank has an additional
>             pthread and system
>             locks up when:
>
>             - main thread does MPI_Recv from ANY rank
>             - pthread does MPI_Send to another rank
>
>             I verified with MPI_Init_thread() that I can run
>             MPI_THREAD_MULTIPLE
>             environment.
>
>             The same thing happened on MPICH with Fedora 20 (3.0.4)
>             and the one
>             build from 3.2b3 sources.  When building from sources I
>             provided
>             '--enable-threads=multiple' option.  I also tried to play with
>             '--enable-thread-cs' option but got build failure when
>             'per-object' was
>             selected.
>
>             Is this supposed to work?
>
>             Thanks,
>             Nenad
>
>
>             I am attaching traces form GDB for the rank that locks up.
>
>
>             (gdb) info thread
>                Id   Target Id         Frame
>                2    Thread 0x7ffff6a5a700 (LWP 29570) "barrier_test"
>             0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2 () from
>             /usr/lib64/libpthread.so.0
>             * 1    Thread 0x7ffff7a64b80 (LWP 29568) "barrier_test"
>             0x00007ffff7c58717 in
>             MPIU_Thread_CS_yield_lockname_recursive_impl_ (
>                  lockname=0x7ffff7cdc8b1 "global_mutex",
>             mutex=<optimized out>,
>             kind=MPIU_Nest_global_mutex) at
>             ../src/src/include/mpiimplthreadpost.h:190
>             (gdb) where
>             #0  0x00007ffff7c58717 in
>             MPIU_Thread_CS_yield_lockname_recursive_impl_
>             (lockname=0x7ffff7cdc8b1 "global_mutex", mutex=<optimized
>             out>,
>                  kind=MPIU_Nest_global_mutex) at
>             ../src/src/include/mpiimplthreadpost.h:190
>             #1  0x00007ffff7c5db42 in MPIDI_CH3I_Progress
>             (progress_state=progress_state at entry=0x7fffffffd2c0,
>             is_blocking=is_blocking at entry=1)
>                  at
>             ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:507
>             #2  0x00007ffff7b5e795 in PMPI_Recv (buf=0x7fffffffd61c,
>             count=1,
>             datatype=1275069445, source=-2, tag=299, comm=1140850688,
>             status=0x7fffffffd620)
>                  at ../src/src/mpi/pt2pt/recv.c:157
>             #3  0x0000000000401732 in receive_int () at comm.c:52
>             #4  0x0000000000400bf2 in main (argc=1,
>             argv=0x7fffffffd758) at
>             barrier_test.c:39
>             (gdb) thread 2
>             [Switching to thread 2 (Thread 0x7ffff6a5a700 (LWP 29570))]
>             #0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2
>             () from
>             /usr/lib64/libpthread.so.0
>             (gdb) where
>             #0  0x0000003ef040bca0 in pthread_cond_wait@@GLIBC_2.3.2
>             () from
>             /usr/lib64/libpthread.so.0
>             #1  0x00007ffff7c5d614 in MPIDI_CH3I_Progress_delay
>             (completion_count=<optimized out>)
>                  at
>             ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:566
>             #2  MPIDI_CH3I_Progress
>             (progress_state=progress_state at entry=0x7ffff6a59710,
>             is_blocking=is_blocking at entry=1)
>                  at
>             ../src/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:347
>             #3  0x00007ffff7b632ec in PMPI_Send (buf=0x7ffff6a5985c,
>             count=1,
>             datatype=1275069445, dest=0, tag=199, comm=1140850688)
>                  at ../src/src/mpi/pt2pt/send.c:145
>             #4  0x0000000000400e42 in barrier_thread_release (id=0) at
>             barrier.c:115
>             #5  0x0000000000401098 in barrier_helper (arg=0x0) at
>             barrier.c:186
>             #6  0x0000003ef0407ee5 in start_thread () from
>             /usr/lib64/libpthread.so.0
>             #7  0x0000003eef8f4d1d in clone () from /usr/lib64/libc.so.6
>             _______________________________________________
>             discuss mailing list discuss at mpich.org
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> -- 
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150720/a256f1d6/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list