[mpich-discuss] MPICH 2.1.5 checkpoint
Marcela Castro León
mcastrol at gmail.com
Thu Jul 17 04:41:28 CDT 2014
Hi,
I'm using mpich 2.1.5 compiled to use blcr checkpoint.
I'm having problems with the checkpoint interval.
When I execute:
mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk
-ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
In fact, at the second 120, the execution is interrupted for
checkpointing, but, as the checkpoint last more than 120 seconds, another
checkpoint is immediately triggered instead of resuming the application.
I only achieve to get a checkpoint by setting a checkpoint interval almost
at the end of the execution but it is not useful.
How can I solve it?
Besides, Is there a way to know how long is the checkpoint in time?
Thank you very much.
Marcela
mpiexec -info
HYDRA build details:
Version: 1.5
Release Date: Mon Oct 8 14:00:48 CDT 2012
CC: gcc
CXX: c++
F77: gfortran
F90: gfortran
Configure options: '--disable-option-checking'
'--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr'
'--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null'
'--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64
-L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS=
-I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include
-I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src
-I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge
manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs
Checkpointing libraries available: blcr
Demux engines available: poll select
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/74a6203b/attachment.html>
More information about the discuss
mailing list