[mpich-discuss] checkpoint error
basma a.azeem
basmaabdelazeem at hotmail.com
Sun May 19 16:02:49 CDT 2013
Thank you for your help
i am using mpich-3.0.3 . with blcr-0.8.5 to checkpoint the integer Sort App of NPB, it give me the following error :
basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr -ckpoint-prefix /home/buntinas/ckpts/app.ckpoint -ckpoint-interval 1 -n 4 /home/basma/NPB3.3/NPB3.3/NPB3.3-MPI/bin/is.A.4
NAS Parallel Benchmarks 3.3 -- IS Benchmark
Size: 8388608 (class A)
Iterations: 100
Number of processes: 4
iteration
1
2
3
4
5
6
7
8
9
10
[proxy:0:0 at basma-Satellite-A500] requesting checkpoint
[proxy:0:0 at basma-Satellite-A500] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:106): Failed to stat checkpoint prefix "/home/buntinas/ckpts/app.ckpoint": No such file or directory
[proxy:0:0 at basma-Satellite-A500] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0 at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at basma-Satellite-A500] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at basma-Satellite-A500] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at basma-Satellite-A500] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at basma-Satellite-A500] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
basma at basma-Satellite-A500:~$
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130519/00a7ca12/attachment.html>
More information about the discuss
mailing list