[mpich-discuss] mpich2 - checkpointing error

Marcelo Paiva Ramos marcelo.paiva at cptec.inpe.br
Mon Apr 7 06:26:37 CDT 2014


Hi,
Can you help me to solve this problem?

cat /etc/issue
CentOS release 6.5 (Final)
Kernel \r on an \m

uname -a
Linux server 2.6.32-431.11.2.el6.x86_64 #1 SMP Tue Mar 25 19:59:55 UTC 
2014 x86_64 x86_64 x86_64 GNU/Linux

*INSTALL: blcr-0.8.5*
tar xzvf blcr-0.8.5.tar.gz
cd blcr-0.8.5
mkdir builddir
cd builddir
../configure --prefix=/opt/blcr
make
make install
/sbin/insmod /opt/blcr/lib/blcr/2.6.32-431.11.2.el6.x86_64/blcr_imports.ko
/sbin/insmod /opt/blcr/lib/blcr/2.6.32-431.11.2.el6.x86_64/blcr.ko
uname -r
2.6.32-431.11.2.el6.x86_64
lsmod | grep blcr
blcr                  115465  0
blcr_imports           10715  1 blcr
ldconfig -p | grep blcr
libcr_run.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr_run.so.0
libcr_run.so (libc6,x86-64) => /opt/blcr/lib/libcr_run.so
libcr_omit.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr_omit.so.0
libcr_omit.so (libc6,x86-64) => /opt/blcr/lib/libcr_omit.so
libcr.so.0 (libc6,x86-64) => /opt/blcr/lib/libcr.so.0
libcr.so (libc6,x86-64) => /opt/blcr/lib/libcr.so
chkconfig --list | grep blcr
blcr               0:off    1:off    2:on    3:on    4:on 5:on    6:off

*INSTALL: mpich-3.1*
tar xzvf mpich-3.1.tar.gz
cd mpich-3.1
./configure --disable-fast CFLAGS=-O2 FFLAGS=-O2 CXXFLAGS=-O2 
FCFLAGS=-O2 --prefix=/opt/mpich2/ CC=/opt/intel/bin/icc 
FC=/opt/intel/bin/ifort F77=/opt/intel/bin/ifort --enable-checkpointing 
--with-hydra-ckpointlib=blcr --with-blcr=/opt/blcr 
--with-blcr-include=/opt/blcr/include --with-blcr-lib=/opt/blcr/lib
make
make install

*.bashrc*
export 
PATH=$PATH:/usr/local/sbin:/usr/local/bin:/usr/sbin:/usr/bin:/sbin:/bin:/opt/blcr/bin:/opt/mpich2/bin
export 
LD_LIBRARY_PATH=/usr/local/lib:/usr/local/lib64:/opt/intel/lib/intel64:/opt/blcr/lib:/opt/mpich2/lib

*mpiexec -info*
HYDRA build details:
     Version:                                 3.1
     Release Date:                            Thu Feb 20 11:41:13 CST 2014
     CC:                              /opt/intel/bin/icc  -O2
     CXX:                             g++  -O2
     F77:                             /opt/intel/bin/ifort -O2
     F90:                             /opt/intel/bin/ifort -O2
     Configure options: '--disable-option-checking' 
'--prefix=/opt/mpich2' '--disable-fast' 'CFLAGS=-O2 -O0' 'FFLAGS=-O2 
-O0' 'CXXFLAGS=-O2 ' 'FCFLAGS=-O2 ' 'CC=/opt/intel/bin/icc' 
'FC=/opt/intel/bin/ifort' 'F77=/opt/intel/bin/ifort' 
'--enable-checkpointing' '--with-hydra-ckpointlib=blcr' 
'--with-blcr=/opt/blcr' '--with-blcr-include=/opt/blcr/include' 
'--with-blcr-lib=/opt/blcr/lib' '--cache-file=/dev/null' '--srcdir=.' 
'LDFLAGS= -L/opt/blcr/lib' 'LIBS=-lrt -lcr -lpthread ' 'CPPFLAGS= 
-I/root/mpich-3.1/src/mpl/include -I/root/mpich-3.1/src/mpl/include 
-I/root/mpich-3.1/src/openpa/src -I/root/mpich-3.1/src/openpa/src 
-I/root/mpich-3.1/src/mpi/romio/include -I/opt/blcr/include'
     Process Manager:                         pmi
     Launchers available:                     ssh rsh fork slurm ll lsf 
sge pbs manual persist
     Topology libraries available:            hwloc
     Resource management kernels available:   user slurm ll lsf sge pbs 
cobalt
     Checkpointing libraries available:       blcr
     Demux engines available:                 poll select


*ERROR*
mpiexec -n 1 -ckpointlib blcr -ckpoint-interval 20 -ckpoint-prefix 
/home/marcelo/TESTE/ ./teste
[proxy:0:0 at server] requesting checkpoint
[proxy:0:0 at server] checkpoint completed
[proxy:0:0 at server] HYDT_ckpoint_blcr_checkpoint 
(tools/ckpoint/blcr/ckpoint_blcr.c:241): Checkpointing failed.  Make 
sure BLCR kernel module is loaded. Unknown error 2356
[proxy:0:0 at server] ckpoint_thread (tools/ckpoint/ckpoint.c:76): blcr 
checkpoint returned error
[proxy:0:0 at server] requesting checkpoint
[proxy:0:0 at server] checkpoint completed
[proxy:0:0 at server] HYDT_ckpoint_blcr_checkpoint 
(tools/ckpoint/blcr/ckpoint_blcr.c:241): Checkpointing failed.  Make 
sure BLCR kernel module is loaded. Unknown error 2356
[proxy:0:0 at server] ckpoint_thread (tools/ckpoint/ckpoint.c:76): blcr 
checkpoint returned error

Best regards,
Marcelo.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140407/e5d22d8d/attachment.html>


More information about the discuss mailing list