[mpich-discuss] Does MPICH3 support fault tolerance and dynamiclly adding nodes in the cluster

Guo, Yanfei yguo at anl.gov
Sun Feb 28 12:48:47 CST 2016


Hi Tom,

You can use MPI_Comm_spawn to create new processes. For fault tolerance, MPICH supports the User-Level Failure Mitigation (ULFM) extension for MPI standard. It is implemented as MPIX_Comm_{agree, failure_ack, failure_get_acked, revoke, shrink} functions in MPICH.

Yanfei Guo
Postdoctoral Researcher
MCS Division, ANL







On 2/27/16, 9:20 PM, "杏花雨闲客" <1450306854 at qq.com> wrote:

>
>
>
>Hello,
>
>
>   There is a distributed system based on MPICH, it need to add nodes dynamically to enhance the computing power of the cluster,and support fault tolerant. How to use MPICH3 to implement the goals? Can anyone give me some advice? Thanks!
>
>
>Tom,
>2016-2-28
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list