[mpich-discuss] Fwd: ckpoint-num error
Wesley Bland
wbland at mcs.anl.gov
Wed Jun 5 22:19:08 CDT 2013
Is there actually anything in those checkpoints? With a checkpoint happening every 4 seconds you may be overdoing it.
Wesley
On Jun 5, 2013, at 2:14 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:
> I don't know, but see if anything on this page helps:
> http://wiki.mpich.org/mpich/index.php/Checkpointing
>
> On Jun 5, 2013, at 4:09 PM, john donald wrote:
>
>>
>>
>> ---------- Forwarded message ----------
>> From: john donald <johnd9886 at gmail.com>
>> Date: 2013/6/3
>> Subject: ckpoint-num error
>> To: mpich-discuss at mcs.anl.gov
>>
>>
>> i used mpiexec with checkpoint and created two checkpoint files:
>>
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -ckpoint-interval 4 -n 4 /home/john/app/md
>>
>> context-num0-0-0
>> context-num1-0-0
>>
>>
>> i am trying to make a restart
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 1
>>
>> but nothing happened it just hangs
>> i also tried:
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 0-0-0
>> also hangs
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list