<div dir="ltr">Hi Anatoly,<div><br></div><div>I think the problem may be the way that you're aborting. MPICH catches the system abort call and kills the entire application when it's called. Instead, I suggest using MPI_Abort(MPI_COMM_WORLD, 1); That's what I use in my tests and it works fine. It also seemed to work for your code when I tried. I'll attach my modified version of your code. I switched it to C since I happened to have C++ support disabled on my local install, but that shouldn't change anything.</div>
<div><br></div><div><div>Thanks,</div><div>Wesley</div></div></div><div class="gmail_extra"><br><br><div class="gmail_quote">On Sun, May 18, 2014 at 5:18 AM, Anatoly G <span dir="ltr"><<a href="mailto:anatolyrishon@gmail.com" target="_blank">anatolyrishon@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div>
<div dir="ltr">Dear MPICH2,
<div>Can you please help me with understanding Fault Tolerance in MPICH3.0.4</div>
<div>I have a simple <span>MPI</span> program:</div>
<div>Master calls <span>MPI</span>_<span>Irecv</span> + <span>
MPI</span>_Wait in loop.</div>
<div>Single slave: calls <span>MPI</span>_Send x 5 times, then calls abort.</div>
<div><br>
</div>
<div>When I execute program with MPICH2 hydra I get multiple times Master process prints about fail in slave. In MPICH3 hydra I get a single message about fail of slave and then Master process enters to endless wait for next
<span>Irecv</span> completion. <br>
</div>
<div>In both cases I compiled program with MPICH3.0.4<br>
</div>
<div><br>
</div>
<div>In other words, with MPICH2 hydra each <span>Irecv</span> completes (even if slave died before execution of
<span>Irecv</span>) but in MPICH3 hydra not. Causes <span>MPI</span>_<span>Irecv</span> endless wait.</div>
<div><br>
</div>
<div>If I compile same program with MPICH2 and use MPICH2 hydra, I get the same result as compiling with MPICH3.0.4 and running with MPICH2 hydra.</div>
<div><br>
</div>
<div>Execution command:</div>
<div>
<div><span>mpiexec</span>.hydra -<span>genvall</span> -disable-auto-cleanup -f MpiConfigMachines1.<span>txt</span> -launcher=<span>rsh</span> -n 2
<span>mpi</span>_<span>irecv</span>_ft_simple</div>
</div>
<div><br>
</div>
<div><br>
</div>
<div>Both hydra's configured with:</div>
<div> $ ./configure --prefix=/space/local/<span>mpich</span>-3.0.4/ --enable-error-checking=<span>runtime</span> --enable-g=<span>dbg</span>
<span>CFLAGS</span>=-<span>fPIC</span> <span>CXXFLAGS</span>=-<span>fPIC</span>
<span>FFLAGS</span>=-<span>fpic</span> --enable-threads=<span>runtime</span> --enable-<span>totalview</span> --enable-static --disable-f77 --disable-<span>fc</span> --no-recursion<br>
</div>
<div><br>
</div>
<div> $ ./configure --prefix=/space/local/mpich2-1.5b2/ --enable-error-checking=<span>runtime</span> --enable-g=<span>dbg</span>
<span>CFLAGS</span>=-<span>fPIC</span> <span>CXXFLAGS</span>=-<span>fPIC</span>
<span>FFLAGS</span>=-<span>fpic</span> --enable-threads=<span>runtime</span> --enable-<span>totalview</span> --enable-static --disable-f77 --disable-<span>fc</span> <br>
</div>
<div><br>
</div>
<div>Can you advice please?</div>
<div><br>
</div>
<div>Regards,</div>
<div><span>Anatoly</span>.</div>
</div>
</div>
</blockquote></div><br></div>