[mpich-discuss] MPI_Finalize hangs in dynamic connection in case of failed process
Roy, Hirak
Hirak_Roy at mentor.com
Thu Feb 19 05:15:16 CST 2015
Hi All,
I am using MPICH with sock connection.
I also setup processes using dynamic connection method (MPI_Comm_connect/MPI_Comm_accept). It's a master-slave architecture where master accepts the connections from slaves.
Now if one of the process dies (or get killed), I can still recover from this (without using checkpoint/restore method).
For the particular process in master, I do not call MPI_disconnect (it hangs and does not complete).
As a result, my MPI_Finalize in master hangs and does not complete.
Do you have a workaround to forcefully complete MPI_Finalize or MPI_disconnect?
I tried MPI_Comm_free on the failed connection. However, it does not solve the hang in finalize.
Thanks,
Hirak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150219/eaa9078d/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list