[mpich-discuss] MPICH 2.1.5 checkpoint

Marcela Castro León mcastrol at gmail.com
Thu Jul 17 09:52:28 CDT 2014


But Do you know why after the first checkpoint the execution is frezzed
during a time similar to the interval and,  after that, a new checkpoint is
triggered?
Thank you.


2014-07-17 16:43 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:

>  You’ll probably need to somehow get your application to run longer then.
> Unfortunately, MPICH doesn’t support manually starting a checkpoint at this
> time.
>
>
>  On Jul 17, 2014, at 9:29 AM, Marcela Castro León <mcastrol at gmail.com>
> wrote:
>
>  Hi
>
>  I just want to make a one checkpoint during the execution, at the
> middle.
> We'are trying to observe the I/O to reduce the time using a parallel file
> system.
>
>  Thanks.
>
>
> 2014-07-17 13:30 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:
>
>>  Is there a reason you can't just take the checkpoints less frequently?
>>
>> On Jul 17, 2014, at 4:41 AM, "Marcela Castro León" <mcastrol at gmail.com>
>> wrote:
>>
>>   Hi,
>> I'm using mpich 2.1.5  compiled to use blcr checkpoint.
>> I'm having problems with the checkpoint interval.
>> When I execute:
>> mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk
>> -ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
>>
>>  In fact,  at the second 120, the execution is interrupted for
>> checkpointing, but, as the checkpoint last more than 120 seconds, another
>> checkpoint is immediately triggered instead of resuming the application.
>> I only achieve to get a checkpoint by setting a checkpoint interval
>> almost at the end of the execution but it is not useful.
>>
>>  How can I solve it?
>> Besides, Is there a way to know how long is the checkpoint in time?
>>
>>  Thank you very much.
>>
>>  Marcela
>>
>>
>>
>>  mpiexec -info
>> HYDRA build details:
>>     Version:                                 1.5
>>     Release Date:                            Mon Oct  8 14:00:48 CDT 2012
>>     CC:                              gcc
>>     CXX:                             c++
>>     F77:                             gfortran
>>     F90:                             gfortran
>>     Configure options:                       '--disable-option-checking'
>> '--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr'
>> '--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null'
>> '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64
>> -L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS=
>> -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include
>> -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src
>> -I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
>>     Process Manager:                         pmi
>>     Launchers available:                     ssh rsh fork slurm ll lsf
>> sge manual persist
>>     Topology libraries available:            hwloc
>>     Resource management kernels available:   user slurm ll lsf sge pbs
>>     Checkpointing libraries available:       blcr
>>     Demux engines available:                 poll select
>>
>>
>>     _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>  _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/6135d4ae/attachment.html>


More information about the discuss mailing list