[mpich-discuss] MPICH 2.1.5 checkpoint
Marcela Castro León
mcastrol at gmail.com
Thu Jul 17 09:29:27 CDT 2014
Hi
I just want to make a one checkpoint during the execution, at the middle.
We'are trying to observe the I/O to reduce the time using a parallel file
system.
Thanks.
2014-07-17 13:30 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:
> Is there a reason you can't just take the checkpoints less frequently?
>
> On Jul 17, 2014, at 4:41 AM, "Marcela Castro León" <mcastrol at gmail.com>
> wrote:
>
> Hi,
> I'm using mpich 2.1.5 compiled to use blcr checkpoint.
> I'm having problems with the checkpoint interval.
> When I execute:
> mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk
> -ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
>
> In fact, at the second 120, the execution is interrupted for
> checkpointing, but, as the checkpoint last more than 120 seconds, another
> checkpoint is immediately triggered instead of resuming the application.
> I only achieve to get a checkpoint by setting a checkpoint interval almost
> at the end of the execution but it is not useful.
>
> How can I solve it?
> Besides, Is there a way to know how long is the checkpoint in time?
>
> Thank you very much.
>
> Marcela
>
>
>
> mpiexec -info
> HYDRA build details:
> Version: 1.5
> Release Date: Mon Oct 8 14:00:48 CDT 2012
> CC: gcc
> CXX: c++
> F77: gfortran
> F90: gfortran
> Configure options: '--disable-option-checking'
> '--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr'
> '--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null'
> '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64
> -L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS=
> -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include
> -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src
> -I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
> Process Manager: pmi
> Launchers available: ssh rsh fork slurm ll lsf sge
> manual persist
> Topology libraries available: hwloc
> Resource management kernels available: user slurm ll lsf sge pbs
> Checkpointing libraries available: blcr
> Demux engines available: poll select
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/8b35670c/attachment.html>
More information about the discuss
mailing list