[mpich-discuss] Mpich + Siesta erro
Julio Henrique
juliohenrique at msn.com
Fri Dec 6 19:17:05 CST 2013
Ok Jeff. Thank you.
julio.
CC: discuss at mpich.org
From: jeff.science at gmail.com
Date: Fri, 6 Dec 2013 19:09:36 -0600
To: discuss at mpich.org
Subject: Re: [mpich-discuss] Mpich + Siesta erro
I bet you (1) Siesta doesn't check MPI return codes and (2) Siesta has no way to handle node failure. I bet it can't even handle malloc returning NULL.
If a node fails more than once a month, the hardware is bad and you should buy new stuff.
Jeff
Sent from my iPhone
On Dec 6, 2013, at 7:04 PM, Julio Henrique <juliohenrique at msn.com> wrote:
Okay Pavan. I'll try that. Then I'll return the result.
Thank's.
Julio.
> From: balaji at mcs.anl.gov
> Date: Fri, 6 Dec 2013 19:01:08 -0600
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] Mpich + Siesta erro
>
> Hi Julio,
>
> There are two steps needed for this:
>
> 1. You need to tell your MPI application to return errors instead of aborting.
>
> 2. Tell the process manager to not clean up your remaining processes when one of the processes dies.
>
> Details on both these steps are listed in the "Fault Tolerance” section of the MPICH README. Please try it out and let us know how it goes.
>
> — Pavan
>
> On Dec 6, 2013, at 6:54 PM, Julio Henrique <juliohenrique at msn.com> wrote:
>
> >
> > I am using mpich-3.0.4 on cluster with 7 nodes running the latest version of siesta. My problem is that when a one node goes down, the siesta and mpich stops running and giveserror.
> > How do I get when a node falls, siesta and mpich continue to run?
> > Thank's.
> > Julio.
> >
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20131207/97d8577d/attachment.html>
More information about the discuss
mailing list