[mpich-discuss] MPICH 2.1.5 checkpoint

Marcela Castro León mcastrol at gmail.com
Thu Jul 17 09:29:27 CDT 2014


Hi

I just want to make a one checkpoint during the execution, at the middle.
We'are trying to observe the I/O to reduce the time using a parallel file
system.

Thanks.


2014-07-17 13:30 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:

>  Is there a reason you can't just take the checkpoints less frequently?
>
> On Jul 17, 2014, at 4:41 AM, "Marcela Castro León" <mcastrol at gmail.com>
> wrote:
>
>   Hi,
> I'm using mpich 2.1.5  compiled to use blcr checkpoint.
> I'm having problems with the checkpoint interval.
> When I execute:
> mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk
> -ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
>
>  In fact,  at the second 120, the execution is interrupted for
> checkpointing, but, as the checkpoint last more than 120 seconds, another
> checkpoint is immediately triggered instead of resuming the application.
> I only achieve to get a checkpoint by setting a checkpoint interval almost
> at the end of the execution but it is not useful.
>
>  How can I solve it?
> Besides, Is there a way to know how long is the checkpoint in time?
>
>  Thank you very much.
>
>  Marcela
>
>
>
>  mpiexec -info
> HYDRA build details:
>     Version:                                 1.5
>     Release Date:                            Mon Oct  8 14:00:48 CDT 2012
>     CC:                              gcc
>     CXX:                             c++
>     F77:                             gfortran
>     F90:                             gfortran
>     Configure options:                       '--disable-option-checking'
> '--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr'
> '--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null'
> '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64
> -L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS=
> -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include
> -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src
> -I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
>     Process Manager:                         pmi
>     Launchers available:                     ssh rsh fork slurm ll lsf sge
> manual persist
>     Topology libraries available:            hwloc
>     Resource management kernels available:   user slurm ll lsf sge pbs
>     Checkpointing libraries available:       blcr
>     Demux engines available:                 poll select
>
>
>    _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/8b35670c/attachment.html>


More information about the discuss mailing list