[mpich-discuss] Mpich + Siesta erro
Pavan Balaji
balaji at mcs.anl.gov
Fri Dec 6 19:02:09 CST 2013
Also, there have been several fixes for this in the recently released 3.1rc2. I’d recommend trying that out instead of mpich-3.0.4.
Regards,
— Pavan
On Dec 6, 2013, at 7:01 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> Hi Julio,
>
> There are two steps needed for this:
>
> 1. You need to tell your MPI application to return errors instead of aborting.
>
> 2. Tell the process manager to not clean up your remaining processes when one of the processes dies.
>
> Details on both these steps are listed in the "Fault Tolerance” section of the MPICH README. Please try it out and let us know how it goes.
>
> — Pavan
>
> On Dec 6, 2013, at 6:54 PM, Julio Henrique <juliohenrique at msn.com> wrote:
>
>>
>> I am using mpich-3.0.4 on cluster with 7 nodes running the latest version of siesta. My problem is that when a one node goes down, the siesta and mpich stops running and giveserror.
>> How do I get when a node falls, siesta and mpich continue to run?
>> Thank's.
>> Julio.
>>
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list