[mpich-discuss] blcr problem

Wesley Bland wbland at mcs.anl.gov
Mon Jul 1 13:15:18 CDT 2013


There's been lots of turnover on the project recently so the previous experts for checkpoint/restart in MPICH are gone, but from what I can tell, it appears that it might currently be broken. This ticket (https://trac.mpich.org/projects/mpich/ticket/1144) is the only documentation about the subject that I can find, but I think the assumption is that everything has bit rotted and will need to be updated before it works again.

Wesley

On Jul 1, 2013, at 10:19 AM, basma a.azeem <basmaabdelazeem at hotmail.com> wrote:

> i created a checkpoint file of the integer sort NPB after 60 sec of running the application , the checkpoint file size is 121.6 MB (121,633,453 bytes).
> 
> i have Blcr for checkpoint/restart
> when i try to restart from the checkpoint file i had , nothing happened , it just hangs
> what i did wrong ?
> this is the command i used.
> 
> 
> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr \
> > -ckpoint-prefix /home/basma/ckpts/app.ckpoint \
> > -ckpoint-num 0 -n 4
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130701/8f33d600/attachment.html>


More information about the discuss mailing list