[mpich-discuss] MPICH 2.1.5 checkpoint
Bland, Wesley B.
wbland at anl.gov
Thu Jul 17 09:43:54 CDT 2014
You’ll probably need to somehow get your application to run longer then. Unfortunately, MPICH doesn’t support manually starting a checkpoint at this time.
On Jul 17, 2014, at 9:29 AM, Marcela Castro León <mcastrol at gmail.com<mailto:mcastrol at gmail.com>> wrote:
Hi
I just want to make a one checkpoint during the execution, at the middle.
We'are trying to observe the I/O to reduce the time using a parallel file system.
Thanks.
2014-07-17 13:30 GMT+02:00 Bland, Wesley B. <wbland at anl.gov<mailto:wbland at anl.gov>>:
Is there a reason you can't just take the checkpoints less frequently?
On Jul 17, 2014, at 4:41 AM, "Marcela Castro León" <mcastrol at gmail.com<mailto:mcastrol at gmail.com>> wrote:
Hi,
I'm using mpich 2.1.5 compiled to use blcr checkpoint.
I'm having problems with the checkpoint interval.
When I execute:
mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk -ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
In fact, at the second 120, the execution is interrupted for checkpointing, but, as the checkpoint last more than 120 seconds, another checkpoint is immediately triggered instead of resuming the application.
I only achieve to get a checkpoint by setting a checkpoint interval almost at the end of the execution but it is not useful.
How can I solve it?
Besides, Is there a way to know how long is the checkpoint in time?
Thank you very much.
Marcela
mpiexec -info
HYDRA build details:
Version: 1.5
Release Date: Mon Oct 8 14:00:48 CDT 2012
CC: gcc
CXX: c++
F77: gfortran
F90: gfortran
Configure options: '--disable-option-checking' '--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr' '--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null' '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64 -L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS= -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs
Checkpointing libraries available: blcr
Demux engines available: poll select
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/da35b41a/attachment.html>
More information about the discuss
mailing list