[mpich-discuss] Fwd: ckpoint-num error

john donald johnd9886 at gmail.com
Mon Jun 10 17:17:09 CDT 2013


i raised it to 20 sec but same results
sorry i am new to checkpoint restart
i am trying this initially on one multicore pc
how should it look like if the restart succeed? should it work in the same
terminal in which i am running restart command
my test app has 5000 iterations , checkpoint is taken at iteration no 300
for example , if i choose to restart from this checkpoint file should it
restart near this iteration no 300


2013/6/6 Wesley Bland <wbland at mcs.anl.gov>

> Is there actually anything in those checkpoints? With a checkpoint
> happening every 4 seconds you may be overdoing it.
>
> Wesley
>
> On Jun 5, 2013, at 2:14 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
>
> > I don't know, but see if anything on this page helps:
> > http://wiki.mpich.org/mpich/index.php/Checkpointing
> >
> > On Jun 5, 2013, at 4:09 PM, john donald wrote:
> >
> >>
> >>
> >> ---------- Forwarded message ----------
> >> From: john donald <johnd9886 at gmail.com>
> >> Date: 2013/6/3
> >> Subject: ckpoint-num error
> >> To: mpich-discuss at mcs.anl.gov
> >>
> >>
> >> i used mpiexec with checkpoint and created two checkpoint files:
> >>
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint
>  -ckpoint-interval 4  -n 4  /home/john/app/md
> >>
> >> context-num0-0-0
> >> context-num1-0-0
> >>
> >>
> >> i am trying to make a restart
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint
> -n 4 -ckpoint-num 1
> >>
> >> but nothing happened it just hangs
> >> i also tried:
> >> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint
> -n 4 -ckpoint-num 0-0-0
> >> also hangs
> >>
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130611/5e839524/attachment.html>


More information about the discuss mailing list