[mpich-discuss] checkpoint error

basma a.azeem basmaabdelazeem at hotmail.com
Sun May 19 16:33:14 CDT 2013


same error

basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr -ckpoint-prefix /home/basma/ckpts/app.ckpoint -ckpoint-interval 1  -n 4 /home/basma/NPB3.3/NPB3.3/NPB3.3-MPI/bin/is.A.4


 NAS Parallel Benchmarks 3.3 -- IS Benchmark

 Size:  8388608  (class A)
 Iterations:   100
 Number of processes:     4

   iteration
        1
        2
        3
        4
        5
        6
        7
        8
        9
        10
        11
[proxy:0:0 at basma-Satellite-A500] requesting checkpoint
[proxy:0:0 at basma-Satellite-A500] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:106): Failed to stat checkpoint prefix "/home/basma/ckpts/app.ckpoint": No such file or directory
[proxy:0:0 at basma-Satellite-A500] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed
[proxy:0:0 at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[proxy:0:0 at basma-Satellite-A500] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event
[mpiexec at basma-Satellite-A500] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed
[mpiexec at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status
[mpiexec at basma-Satellite-A500] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event
[mpiexec at basma-Satellite-A500] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion
basma at basma-Satellite-A500:~$ 


Date: Sun, 19 May 2013 16:26:32 -0500
From: correac2 at illinois.edu
To: discuss at mpich.org
Subject: Re: [mpich-discuss] checkpoint error



On May 19, 2013 4:02 PM, "basma a.azeem" <basmaabdelazeem at hotmail.com> wrote:

>

> Thank you for your help

>

> i am using  mpich-3.0.3 . with blcr-0.8.5  to checkpoint the integer Sort App of NPB, it give me the following error :

>

>

> basma at basma-Satellite-A500:~$ mpiexec -ckpointlib blcr -ckpoint-prefix /home/buntinas/ckpts/app.ckpoint -ckpoint-interval 1  -n 4 /home/basma/NPB3.3/NPB3.3/NPB3.3-MPI/bin/is.A.4

It sounds like that path should be something like /home/basma/ckpts/app.ckpoint instead.
>

>

>  NAS Parallel Benchmarks 3.3 -- IS Benchmark

>

>  Size:  8388608  (class A)

>  Iterations:   100

>  Number of processes:     4

>

>    iteration

>         1

>         2

>         3

>         4

>         5

>         6

>         7

>         8

>         9

>         10

> [proxy:0:0 at basma-Satellite-A500] requesting checkpoint

> [proxy:0:0 at basma-Satellite-A500] HYDT_ckpoint_checkpoint (./tools/ckpoint/ckpoint.c:106): Failed to stat checkpoint prefix "/home/buntinas/ckpts/app.ckpoint": No such file or directory

> [proxy:0:0 at basma-Satellite-A500] HYD_pmcd_pmip_control_cmd_cb (./pm/pmiserv/pmip_cb.c:905): checkpoint suspend failed

> [proxy:0:0 at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

> [proxy:0:0 at basma-Satellite-A500] main (./pm/pmiserv/pmip.c:206): demux engine error waiting for event

> [mpiexec at basma-Satellite-A500] control_cb (./pm/pmiserv/pmiserv_cb.c:202): assert (!closed) failed

> [mpiexec at basma-Satellite-A500] HYDT_dmxu_poll_wait_for_event (./tools/demux/demux_poll.c:77): callback returned error status

> [mpiexec at basma-Satellite-A500] HYD_pmci_wait_for_completion (./pm/pmiserv/pmiserv_pmci.c:197): error waiting for event

> [mpiexec at basma-Satellite-A500] main (./ui/mpich/mpiexec.c:331): process manager error waiting for completion

> basma at basma-Satellite-A500:~$ 

>

>

> _______________________________________________

> discuss mailing list     discuss at mpich.org

> To manage subscription options or unsubscribe:

> https://lists.mpich.org/mailman/listinfo/discuss



_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss 		 	   		  
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130519/41128cbe/attachment.html>


More information about the discuss mailing list