[mpich-discuss] MPICH 2.1.5 checkpoint
Marcela Castro León
mcastrol at gmail.com
Thu Jul 17 09:52:28 CDT 2014
But Do you know why after the first checkpoint the execution is frezzed
during a time similar to the interval and, after that, a new checkpoint is
triggered?
Thank you.
2014-07-17 16:43 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:
> You’ll probably need to somehow get your application to run longer then.
> Unfortunately, MPICH doesn’t support manually starting a checkpoint at this
> time.
>
>
> On Jul 17, 2014, at 9:29 AM, Marcela Castro León <mcastrol at gmail.com>
> wrote:
>
> Hi
>
> I just want to make a one checkpoint during the execution, at the
> middle.
> We'are trying to observe the I/O to reduce the time using a parallel file
> system.
>
> Thanks.
>
>
> 2014-07-17 13:30 GMT+02:00 Bland, Wesley B. <wbland at anl.gov>:
>
>> Is there a reason you can't just take the checkpoints less frequently?
>>
>> On Jul 17, 2014, at 4:41 AM, "Marcela Castro León" <mcastrol at gmail.com>
>> wrote:
>>
>> Hi,
>> I'm using mpich 2.1.5 compiled to use blcr checkpoint.
>> I'm having problems with the checkpoint interval.
>> When I execute:
>> mpiexec -ckpointlib blcr -ckpoint-prefix /partnfs/mpichchk
>> -ckpoint-interval 120 -f maquinas -n 16 ./bt.C.16
>>
>> In fact, at the second 120, the execution is interrupted for
>> checkpointing, but, as the checkpoint last more than 120 seconds, another
>> checkpoint is immediately triggered instead of resuming the application.
>> I only achieve to get a checkpoint by setting a checkpoint interval
>> almost at the end of the execution but it is not useful.
>>
>> How can I solve it?
>> Besides, Is there a way to know how long is the checkpoint in time?
>>
>> Thank you very much.
>>
>> Marcela
>>
>>
>>
>> mpiexec -info
>> HYDRA build details:
>> Version: 1.5
>> Release Date: Mon Oct 8 14:00:48 CDT 2012
>> CC: gcc
>> CXX: c++
>> F77: gfortran
>> F90: gfortran
>> Configure options: '--disable-option-checking'
>> '--prefix=/soft/mpich2/mpich2-1.5-blcr8.3' '--with-hydra-ckpointlib=blcr'
>> '--with-blcr=/soft/blcr' '--enable-checkpointing' '--cache-file=/dev/null'
>> '--srcdir=.' 'CC=gcc' 'CFLAGS= -O2' 'LDFLAGS= -L/soft/blcr/lib64
>> -L/soft/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS=
>> -I/SRC/mpi/mpich2-1.5/src/mpl/include -I/SRC/mpi/mpich2-1.5/src/mpl/include
>> -I/SRC/mpi/mpich2-1.5/src/openpa/src -I/SRC/mpi/mpich2-1.5/src/openpa/src
>> -I/SRC/mpi/mpich2-1.5/src/mpi/romio/include -I/soft/blcr/include'
>> Process Manager: pmi
>> Launchers available: ssh rsh fork slurm ll lsf
>> sge manual persist
>> Topology libraries available: hwloc
>> Resource management kernels available: user slurm ll lsf sge pbs
>> Checkpointing libraries available: blcr
>> Demux engines available: poll select
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140717/6135d4ae/attachment.html>
More information about the discuss
mailing list