[mpich-discuss] Status of checkpointing mechanisms with Slurm + MPICH + BLCR

Manuel Rodríguez Pascual manuel.rodriguez.pascual at gmail.com
Thu Oct 23 10:18:09 CDT 2014


Good afternoon all,

I am a newbie in this MPICH world. I am trying to install a cluster with
MPICH, having the possibility of checkpoint parallel tasks.

My original idea was a software stack based on SLURM  14.03.8 + MPICH
mpich-3.1.3 + BLCR 0.8.5 . They are supposed to have good integration among
them, and the configuration process has been quite smooth until now.

I have found however that the checkpoint of MPICH tasks is not working. At
first I though it was my fault (configuration issues or whatever) due to it
can be read in MPICH home page that BLCR integration is possible
https://wiki.mpich.org/mpich/index.php/Checkpointing

However, when looking for the solution I found this thread in this same
mailing list:
http://lists.mpich.org/pipermail/discuss/2014-April/002498.html

saying " BLCR checkpointing hasn't worked for a few versions now. It's
something we're working to fix in a future version".

My question is then,

-Is it possible right now to checkpoint MPICH with BLCR?

-If not, is there any working checkpoint mechanism that you can suggest me?

-If not, are you aware of a previous MPICH version where BLCR does work?
Are there any drawbacks on employing it while you get the new one fixed?
(Are you getting the new one fixed?)

 Thanks for your attention. Best regards,



Manuel


-- 
Dr. Manuel Rodríguez-Pascual
skype: manuel.rodriguez.pascual
phone: (+34) 913466173 // (+34) 679925108

CIEMAT-Moncloa
Edificio 22, desp. 1.25
Avenida Complutense, 40
28040- MADRID
SPAIN
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20141023/396f2cc2/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list