[mpich-discuss] Mpich + Siesta erro

Pavan Balaji balaji at mcs.anl.gov
Fri Dec 6 19:02:09 CST 2013


Also, there have been several fixes for this in the recently released 3.1rc2.  I’d recommend trying that out instead of mpich-3.0.4.

Regards,

  — Pavan

On Dec 6, 2013, at 7:01 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

> Hi Julio,
> 
> There are two steps needed for this:
> 
> 1. You need to tell your MPI application to return errors instead of aborting.
> 
> 2. Tell the process manager to not clean up your remaining processes when one of the processes dies.
> 
> Details on both these steps are listed in the "Fault Tolerance” section of the MPICH README.  Please try it out and let us know how it goes.
> 
>  — Pavan
> 
> On Dec 6, 2013, at 6:54 PM, Julio Henrique <juliohenrique at msn.com> wrote:
> 
>> 
>> I am using mpich-3.0.4 on cluster with 7 nodes running the latest version of siesta. My problem is that when a one node goes down, the siesta and mpich stops running and giveserror.
>> How do I get when a node falls, siesta and mpich continue to run?
>> Thank's.
>> Julio.
>> 
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the discuss mailing list