<div dir="rtl"><div dir="ltr">the file size is 121.6 MB <br></div><div dir="ltr">after i raised the interval to 20 sec<br></div><div dir="ltr">my test application is MPI/c integer sort program with 5000 iterations<br></div>
<div dir="ltr">sorry for the trivial question but how to know that the checkpoint file is empty or not<br></div><div dir="ltr">how can i open it?<br></div><div dir="ltr"><br></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">
<div dir="ltr">2013/6/11 Wesley Bland <span dir="ltr"><<a href="mailto:wbland@mcs.anl.gov" target="_blank">wbland@mcs.anl.gov</a>></span></div><blockquote class="gmail_quote" style="margin:0 .8ex;border-left:1px #ccc solid;border-right:1px #ccc solid;padding-left:1ex;padding-right:1ex">
<div style="word-wrap:break-word">Did you check if there's actually anything in the checkpoint files? If they're empty, that probably means that you're checkpointing too frequently.<div><div class="h5"><div><br>
<div><div>On Jun 10, 2013, at 5:17 PM, john donald <<a href="mailto:johnd9886@gmail.com" target="_blank">johnd9886@gmail.com</a>> wrote:</div><br><blockquote type="cite"><div dir="rtl"><div dir="ltr">i raised it to 20 sec but same results<br>
</div><div dir="ltr">sorry i am new to checkpoint restart<br></div><div dir="ltr">i am trying this initially on one multicore pc <br></div><div dir="ltr">
how should it look like if the restart succeed? should it work in the same terminal in which i am running restart command <br></div><div dir="ltr">my test app has 5000 iterations , checkpoint is taken at iteration no 300 for example , if i choose to restart from this checkpoint file should it restart near this iteration no 300 <br>
</div><div class="gmail_extra"><br><br><div class="gmail_quote"><div dir="ltr">2013/6/6 Wesley Bland <span dir="ltr"><<a href="mailto:wbland@mcs.anl.gov" target="_blank">wbland@mcs.anl.gov</a>></span></div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
Is there actually anything in those checkpoints? With a checkpoint happening every 4 seconds you may be overdoing it.<br>
<span><font color="#888888"><br>
Wesley<br>
</font></span><div><div><br>
On Jun 5, 2013, at 2:14 PM, Rajeev Thakur <<a href="mailto:thakur@mcs.anl.gov" target="_blank">thakur@mcs.anl.gov</a>> wrote:<br>
<br>
> I don't know, but see if anything on this page helps:<br>
> <a href="http://wiki.mpich.org/mpich/index.php/Checkpointing" target="_blank">http://wiki.mpich.org/mpich/index.php/Checkpointing</a><br>
><br>
> On Jun 5, 2013, at 4:09 PM, john donald wrote:<br>
><br>
>><br>
>><br>
>> ---------- Forwarded message ----------<br>
>> From: john donald <<a href="mailto:johnd9886@gmail.com" target="_blank">johnd9886@gmail.com</a>><br>
>> Date: 2013/6/3<br>
>> Subject: ckpoint-num error<br>
>> To: <a href="mailto:mpich-discuss@mcs.anl.gov" target="_blank">mpich-discuss@mcs.anl.gov</a><br>
>><br>
>><br>
>> i used mpiexec with checkpoint and created two checkpoint files:<br>
>><br>
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -ckpoint-interval 4 -n 4 /home/john/app/md<br>
>><br>
>> context-num0-0-0<br>
>> context-num1-0-0<br>
>><br>
>><br>
>> i am trying to make a restart<br>
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 1<br>
>><br>
>> but nothing happened it just hangs<br>
>> i also tried:<br>
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 0-0-0<br>
>> also hangs<br>
>><br>
>> _______________________________________________<br>
>> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
>> To manage subscription options or unsubscribe:<br>
>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br></div></div>
</blockquote></div><br></div></div></div></div></blockquote></div><br></div>