[mpich-discuss] Spawned process hanging in MPI_Finalize
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Tue Mar 2 18:54:12 CST 2021
I have a parent process that creates a child via MPI_Comm_spawn(). When the child decides it has to exit, it is hanging in MPI_Finalize(). It does the same if it calls MPI_Comm_disconnect() before MPI_Finalize.
Here is the stack trace in the child:
(gdb) where
#0 0x00007fc6f2fedde0 in __poll_nocancel () from /usr/lib/gcc/x86_64-redhat-linux/4.8.5/../../../../lib64/libc.so.6
#1 0x00007fc6f4dc840e in MPID_nem_tcp_connpoll () at src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c:1819
#2 0x00007fc6f4db857e in MPID_nem_network_poll () at src/mpid/ch3/channels/nemesis/src/mpid_nem_network_poll.c:16
#3 0x00007fc6f4dafc43 in MPIDI_CH3I_Progress () at src/mpid/ch3/channels/nemesis/src/ch3_progress.c:1019
#4 0x00007fc6f4d5094d in MPIDI_CH3U_VC_WaitForClose () at src/mpid/ch3/src/ch3u_handle_connection.c:383
#5 0x00007fc6f4d94efa in MPID_Finalize () at src/mpid/ch3/src/mpid_finalize.c:110
#6 0x00007fc6f4c432ca in PMPI_Finalize () at src/mpi/init/finalize.c:260
#7 0x0000000000408a85 in needles::MpiWorker::finalize () at src/MpiWorker.cpp:470
Maybe I have a communication that hasn't completed, or the child is waiting for the parent to call MPI_Finalize. I believe that you (Ken, Hui) told me that it shouldn't do the latter.
Is there a way for the child to cleanly exit without hanging in MPI_Finalize? I tried calling MPI_Cancel() in the child on the only possible communication request that I knew of, but it didn't help.
It just occurred to me that I haven't tried calling MPI_Cancel on the requests in the parent...
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210303/ab57dbef/attachment.html>
More information about the discuss
mailing list