[mpich-discuss] Fwd: ckpoint-num error

Wesley Bland wbland at mcs.anl.gov
Wed Jun 5 22:19:08 CDT 2013


Is there actually anything in those checkpoints? With a checkpoint happening every 4 seconds you may be overdoing it. 

Wesley

On Jun 5, 2013, at 2:14 PM, Rajeev Thakur <thakur at mcs.anl.gov> wrote:

> I don't know, but see if anything on this page helps:
> http://wiki.mpich.org/mpich/index.php/Checkpointing
> 
> On Jun 5, 2013, at 4:09 PM, john donald wrote:
> 
>> 
>> 
>> ---------- Forwarded message ----------
>> From: john donald <johnd9886 at gmail.com>
>> Date: 2013/6/3
>> Subject: ckpoint-num error
>> To: mpich-discuss at mcs.anl.gov
>> 
>> 
>> i used mpiexec with checkpoint and created two checkpoint files:
>> 
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint  -ckpoint-interval 4  -n 4  /home/john/app/md
>> 
>> context-num0-0-0
>> context-num1-0-0
>> 
>> 
>> i am trying to make a restart 
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 1
>> 
>> but nothing happened it just hangs
>> i also tried:
>> mpiexec -ckpointlib blcr -ckpoint-prefix /home/john/ckpts/app.ckpoint -n 4 -ckpoint-num 0-0-0
>> also hangs
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list