[mpich-discuss] ckpoint-num error

Wesley Bland wbland at mcs.anl.gov
Tue Jun 11 08:20:58 CDT 2013


Did you check if there's actually anything in the checkpoint files? If they're empty, that probably means that you're checkpointing too frequently.

On Jun 10, 2013, at 5:17 PM, john donald <johnd9886 at gmail.com> wrote:

> i raised it to 20 sec but same results
> sorry i am new to checkpoint restart
> i am trying this initially on one multicore pc 
> how should it look like if the restart succeed? should it work in the same terminal in which i am running restart command 
> my test app has 5000 iterations , checkpoint is taken at iteration no 300 for example , if i choose to restart from this checkpoint file should it restart near this iteration no 300 
> 
> 
> 2013/6/6 Wesley Bland <wbland at mcs.anl.gov>
> Is there actually anything in those checkpoints? With a checkpoint happening every 4 seconds you may be overdoing it.
> 
> Wesley
> 
> On Jun 5, 2013, at 2:14 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> 
> > I don't know, but see if anything on this page helps:
> > http://wiki.mpich.org/mpich/index.php/Checkpointing
> >
> > On Jun 5, 2013, at 4:09 PM, john donald wrote:
> >
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: john donald <johnd9886 at gmail.com>
> >> Date: 2013/6/3
> >> Subject: ckpoint-num error
> >> To: mpich-discuss at mcs.anl.gov
> >>
> >>
> >> i used mpiexec with checkpoint and created two checkpoint files:
> >>
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint  -ckpoint-interval 4  -n 4  /home/john/app/md
> >>
> >> context-num0-0-0
> >> context-num1-0-0
> >>
> >>
> >> i am trying to make a restart
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 1
> >>
> >> but nothing happened it just hangs
> >> i also tried:
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 0-0-0
> >> also hangs
> >>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130611/e1bd5bd7/attachment.html>


More information about the discuss mailing list