[mpich-discuss] Fault tolerance of an MPI cluster after one node dies
YANG Fan
iddmbr at gmail.com
Wed Dec 10 08:31:52 CST 2014
Hi,
Is it possible for an MPI distributed cluster to continue working if one
node dies? I'm not sure if MPICH provides such functionality.
It seems that MPI_Comm_create requires that all processes in the superset
communicators to be alive; while the errhandler with --disable-auto-cleanup
also does not avoid such issue, as one process cannot call MPI_Finalize().
Thanks in advance!
Best Regards,
Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141210/ca37a6d7/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list