[mpich-discuss] mpich2 - checkpointing error

Marcelo Paiva Ramos marcelo.paiva at cptec.inpe.br
Tue Apr 8 06:26:24 CDT 2014


It works in another version of MPICH2 and BLCR?

Best regards,
Marcelo.


On 07-04-2014 09:16, Wesley Bland wrote:
> Unfortunately, this is a known problem at the moment. BLCR 
> checkpointing hasn't worked for a few versions now. It's something 
> we're working to fix in a future version.
>
> Thanks,
> Wesley
>
> On Monday, April 7, 2014, Marcelo Paiva Ramos 
> <marcelo.paiva at cptec.inpe.br <mailto:marcelo.paiva at cptec.inpe.br>> wrote:
>
>     Hi,
>     Can you help me to solve this problem?
>
>     cat /etc/issue
>     CentOS release 6.5 (Final)
>     Kernel \r on an \m
>
>     uname -a
>     Linux server 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55
>     UTC 2014 x86_64 x86_64 x86_64 GNU/Linux
>
>     *INSTALL: blcr-0.8.5*
>     tar xzvf blcr-0.8.5.tar.gz
>     cd blcr-0.8.5
>     mkdir builddir
>     cd builddir
>     ../configure --prefix=/opt/blcr
>     make
>     make install
>     /sbin/insmod
>     /opt/blcr/lib/blcr/2.6.32-431.11.2.el6.x86_64/blcr_imports.ko
>     /sbin/insmod /opt/blcr/lib/blcr/2.6.32-431.11.2.el6.x86_64/blcr.ko
>     uname -r
>     2.6.32-431.11.2.el6.x86_64
>     lsmod | grep blcr
>     blcr                  115465  0
>     blcr_imports           10715  1 blcr
>     ldconfig -p | grep blcr
>     libcr_run.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr_run.so.0
>     libcr_run.so (libc6,x86-64) => /opt/blcr/lib/libcr_run.so
>     libcr_omit.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr_omit.so.0
>     libcr_omit.so (libc6,x86-64) => /opt/blcr/lib/libcr_omit.so
>     libcr.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr.so.0
>     libcr.so (libc6,x86-64) => /opt/blcr/lib/libcr.so
>     chkconfig --list | grep blcr
>     blcr               0:off    1:off    2:on    3:on 4:on    5:on   
>     6:off
>
>     *INSTALL: mpich-3.1*
>     tar xzvf mpich-3.1.tar.gz
>     cd mpich-3.1
>     ./configure --disable-fast CFLAGS=-O2 FFLAGS=-O2 CXXFLAGS=-O2
>     FCFLAGS=-O2 --prefix=/opt/mpich2/ CC=/opt/intel/bin/icc
>     FC=/opt/intel/bin/ifort F77=/opt/intel/bin/ifort
>     --enable-checkpointing --with-hydra-ckpointlib=blcr
>     --with-blcr=/opt/blcr --with-blcr-include=/opt/blcr/include
>     --with-blcr-lib=/opt/blcr/lib
>     make
>     make install
>
>     *.bashrc*
>     export
>     PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/blcr/bin:/opt/mpich2/bin
>     export
>     LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/opt/intel/lib/intel64:/opt/blcr/lib:/opt/mpich2/lib
>
>     *mpiexec -info*
>     HYDRA build details:
>         Version:                                 3.1
>         Release Date:                            Thu Feb 20 11:41:13
>     CST 2014
>         CC:  /opt/intel/bin/icc  -O2
>         CXX:                             g++  -O2
>         F77: /opt/intel/bin/ifort -O2
>         F90: /opt/intel/bin/ifort -O2
>         Configure options: '--disable-option-checking'
>     '--prefix=/opt/mpich2' '--disable-fast' 'CFLAGS=-O2 -O0'
>     'FFLAGS=-O2 -O0' 'CXXFLAGS=-O2 ' 'FCFLAGS=-O2 '
>     'CC=/opt/intel/bin/icc' 'FC=/opt/intel/bin/ifort'
>     'F77=/opt/intel/bin/ifort' '--enable-checkpointing'
>     '--with-hydra-ckpointlib=blcr' '--with-blcr=/opt/blcr'
>     '--with-blcr-include=/opt/blcr/include'
>     '--with-blcr-lib=/opt/blcr/lib' '--cache-file=/dev/null'
>     '--srcdir=.' 'LDFLAGS= -L/opt/blcr/lib' 'LIBS=-lrt -lcr -lpthread
>     ' 'CPPFLAGS= -I/root/mpich-3.1/src/mpl/include
>     -I/root/mpich-3.1/src/mpl/include -I/root/mpich-3.1/src/openpa/src
>     -I/root/mpich-3.1/src/openpa/src
>     -I/root/mpich-3.1/src/mpi/romio/include -I/opt/blcr/include'
>         Process Manager:                         pmi
>         Launchers available:                     ssh rsh fork slurm ll
>     lsf sge pbs manual persist
>         Topology libraries available:            hwloc
>         Resource management kernels available:   user slurm ll lsf sge
>     pbs cobalt
>         Checkpointing libraries available:       blcr
>         Demux engines available:                 poll select
>
>
>     *ERROR*
>     mpiexec -n 1 -ckpointlib blcr -ckpoint-interval 20 -ckpoint-prefix
>     /home/marcelo/TESTE/ ./teste
>     [proxy:0:0 at server] requesting checkpoint
>     [proxy:0:0 at server] checkpoint completed
>     [proxy:0:0 at server] HYDT_ckpoint_blcr_checkpoint
>     (tools/ckpoint/blcr/ckpoint_blcr.c:241): Checkpointing failed.
>      Make sure BLCR kernel module is loaded. Unknown error 2356
>     [proxy:0:0 at server] ckpoint_thread (tools/ckpoint/ckpoint.c:76):
>     blcr checkpoint returned error
>     [proxy:0:0 at server] requesting checkpoint
>     [proxy:0:0 at server] checkpoint completed
>     [proxy:0:0 at server] HYDT_ckpoint_blcr_checkpoint
>     (tools/ckpoint/blcr/ckpoint_blcr.c:241): Checkpointing failed.
>      Make sure BLCR kernel module is loaded. Unknown error 2356
>     [proxy:0:0 at server] ckpoint_thread (tools/ckpoint/ckpoint.c:76):
>     blcr checkpoint returned error
>
>     Best regards,
>     Marcelo.
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140408/5581e4a6/attachment.html>


More information about the discuss mailing list