[mpich-discuss] Fault tolerance of an MPI cluster after one node dies

YANG Fan iddmbr at gmail.com
Wed Dec 10 08:31:52 CST 2014


Hi,

Is it possible for an MPI distributed cluster to continue working if one
node dies? I'm not sure if MPICH provides such functionality.

It seems that MPI_Comm_create requires that all processes in the superset
communicators to be alive; while the errhandler with --disable-auto-cleanup
also does not avoid such issue, as one process cannot call MPI_Finalize().

Thanks in advance!

Best Regards,
Fan
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141210/ca37a6d7/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list