[mpich-discuss] blcr problem
Wesley Bland
wbland at mcs.anl.gov
Mon Jul 1 13:15:18 CDT 2013
There's been lots of turnover on the project recently so the previous experts for checkpoint/restart in MPICH are gone, but from what I can tell, it appears that it might currently be broken. This ticket (https://trac.mpich.org/projects/mpich/ticket/1144) is the only documentation about the subject that I can find, but I think the assumption is that everything has bit rotted and will need to be updated before it works again.
Wesley
On Jul 1, 2013, at 10:19 AM, basma a.azeem <basmaabdelazeem at hotmail.com> wrote:
> i created a checkpoint file of the integer sort NPB after 60 sec of running the application , the checkpoint file size is 121.6 MB (121,633,453 bytes).
>
> i have Blcr for checkpoint/restart
> when i try to restart from the checkpoint file i had , nothing happened , it just hangs
> what i did wrong ?
> this is the command i used.
>
>
> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr \
> > -ckpoint-prefix /home/basma/ckpts/app.ckpoint \
> > -ckpoint-num 0 -n 4
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130701/8f33d600/attachment.html>
More information about the discuss
mailing list