<div style="line-height:1.7;color:#000000;font-size:14px;font-family:Arial"><br>Well. Thanks! Look forward to the version which is supported with the fault tolerance.<br><br><br><br><div></div><div id="divNeteaseMailCard"></div><br>At 2014-08-18 09:58:59, "Bland, Wesley B." <wbland@anl.gov> wrote:<br> <blockquote id="isReplyContent" style="PADDING-LEFT: 1ex; MARGIN: 0px 0px 0px 0.8ex; BORDER-LEFT: #ccc 1px solid">
Unfortunately, you¡¯re correct that there isn¡¯t currently a solution. Until we fix that ticket, checkpointing is currently not functioning in MPICH. It¡¯s on the roadmap to be fixed along with some new fault tolerance features in the future, but it¡¯s not there
yet.
<div><br>
</div>
<div>Thanks,</div>
<div>Wesley<br>
<div><br>
<div>
<blockquote type="cite">
<div>On Aug 18, 2014, at 8:51 AM, myself <<a href="mailto:chcdlf@126.com">chcdlf@126.com</a>> wrote:</div>
<br class="Apple-interchange-newline">
<div>
<div style="line-height: 1.7; font-size: 14px; font-family: Arial;">
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">I tried to use BLCR with MPICH3. However, it seems not to work. I compile the blcr in CentOS and `make test` show not fail tests. Then, I compile mpich
with BLCR. The information is shown as follows,</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;"><br>
</span></font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">$ mpichversion </span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH Version: <span class="Apple-tab-span" style="white-space:pre">
</span>3.1.2</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH Release date:<span class="Apple-tab-span" style="white-space:pre">
</span>Mon Jul 21 16:00:21 CDT 2014</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH Device: <span class="Apple-tab-span" style="white-space:pre">
</span>ch3:nemesis</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH configure:
<span class="Apple-tab-span" style="white-space:pre"></span>--prefix=/home/test/develop/mpich3-blcr --with-device=ch3:nemesis CFLAGS=-fPIC --enable-checkpointing --with-blcr=/home/test/develop/blcr-0.8.5 --with-hydra-ckpointlib=blcr</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH CC:
<span class="Apple-tab-span" style="white-space:pre"></span>gcc -fPIC -O2</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH CXX:
<span class="Apple-tab-span" style="white-space:pre"></span>g++ -O2</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH F77:
<span class="Apple-tab-span" style="white-space:pre"></span>gfortran -O2</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">MPICH FC:
<span class="Apple-tab-span" style="white-space:pre"></span>gfortran -O2</span></font></div>
</blockquote>
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div style="font-size: 12px; line-height: 19px;"><br>
</div>
<div style="font-size: 12px; line-height: 19px;">After that, I compile my application like this</div>
<div style="font-size: 12px; line-height: 19px;"><br>
</div>
</font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div style="font-size: 12px; line-height: 19px;">$ mpicc mpiblcr.c -o mpiblcr -lcr</div>
<div style="font-size: 12px; line-height: 19px;"><br>
</div>
</font></div>
</blockquote>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">When I firstly run the application, it seems ok to make the checkpoint files, such as context-num0-0-0.</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;"><br>
</span></font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">$ mpiexec -ckpointlib blcr -ckpoint-prefix `pwd` -ckpoint-interval 2 -n 2 ./mpiblcr</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">5411) Step 0</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">5410) Step 0</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">5410) Step 1</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">5411) Step 1</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">[proxy:0:0@node1] requesting checkpoint</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">[proxy:0:0@node1] checkpoint completed</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">5410) Step 2</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">
<div>5411) Step 2</div>
<div><br>
</div>
</span></font></div>
</blockquote>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">However, when I try to restart the process with checkpoint, it hangs and thereis no information printed.</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;"><br>
</span></font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div><span style="font-size: 12px; line-height: 19px;">$ mpiexec -ckpointlib blcr -ckpoint-prefix `pwd` -n 2 -ckpoint-num 1</span></div>
<div><br>
</div>
</font></div>
</blockquote>
<div><font color="#555555" face="Microsoft Yahei, verdana">The pstree shows the pmi start application process</font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana"><br>
</font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div> ©À©¤sshd©¤©Ð©¤3*[sshd©¤©¤©¤sshd©¤©¤©¤bash]</div>
<div> ©¦ ©À©¤sshd©¤©¤©¤sshd©¤©¤©¤bash©¤©¤©¤mpiexec©¤©¤©¤hydra_pmi_proxy©¤©¤©¤mpiblcr</div>
<div> ©¦ ©¸©¤sshd©¤©¤©¤sshd©¤©¤©¤bash©¤©¤©¤pstree</div>
<div><br>
</div>
</font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div style="font-size: 12px; line-height: 19px;">and `ps aux` shows the process is defunct</div>
<div style="font-size: 12px; line-height: 19px;"><br>
</div>
</font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;">
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">$ ps aux | grep osu_bw</span></font></div>
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div><span style="font-size: 12px; line-height: 19px;">test 15290 0.0 0.0 0 0 ? Z 21:44 0:00 [mpiblcr] <defunct></span></div>
<div><span style="font-size: 12px; line-height: 19px;"><br>
</span></div>
</font></div>
</blockquote>
<font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;">I don't know how to identify this problem. I also see someone had the same problem like me several years ago
<a href="http://trac.mpich.org/projects/mpich/ticket/1144">#1144</a>. But, there are no solutions.<br>
</span></font>
<div><font color="#555555" face="Microsoft Yahei, verdana">
<div style="font-size: 12px; line-height: 19px;"><br>
</div>
</font></div>
<blockquote style="margin: 0 0 0 40px; border: none; padding: 0px;"><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;"><br>
</span></font></blockquote>
<div><font color="#555555" face="Microsoft Yahei, verdana"><span style="font-size: 12px; line-height: 19px;"><br>
</span></font></div>
</div>
<br>
<br>
<span title="neteasefooter"><span id="netease_mail_footer"></span></span>_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a></div>
</blockquote>
</div>
<br>
</div>
</div>
</blockquote></div><br><br><span title="neteasefooter"><span id="netease_mail_footer"></span></span>