[mpich-discuss] MPI_Finalize hangs in dynamic connection in case of failed process

Roy, Hirak Hirak_Roy at mentor.com
Thu Feb 19 05:15:16 CST 2015


Hi All,

I am using MPICH with sock connection.
I also setup processes using dynamic connection method (MPI_Comm_connect/MPI_Comm_accept). It's a master-slave architecture where master accepts the connections from slaves.

Now if one of the process dies (or get killed), I can still recover from this (without using checkpoint/restore method).
For the particular process in master, I do not call MPI_disconnect (it hangs and does not complete).
As a result, my MPI_Finalize in master hangs and does not complete.
Do you have a workaround to forcefully complete MPI_Finalize or MPI_disconnect?
I tried MPI_Comm_free on the failed connection. However, it does not solve the hang in finalize.

Thanks,
Hirak
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150219/eaa9078d/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list