<div dir="ltr">Dear MPICH2,<div>Can you please help me with understanding Fault Tolerance in MPICH3.0.4</div><div>I have a simple <span class="">MPI</span> program:</div><div>Master calls <span class="">MPI</span>_<span class="">Irecv</span> + <span class="">MPI</span>_Wait in loop.</div>
<div>Single slave: calls <span class="">MPI</span>_Send x 5 times, then calls abort.</div><div><br></div><div>When I execute program with MPICH2 hydra I get multiple times Master process prints about fail in slave. In MPICH3 hydra I get a single message about fail of slave and then Master process enters to endless wait for next <span class="">Irecv</span> completion. <br>
</div><div>In both cases I compiled program with MPICH3.0.4<br></div><div><br></div><div>In other words, with MPICH2 hydra each <span class="">Irecv</span> completes (even if slave died before execution of <span class="">Irecv</span>) but in MPICH3 hydra not. Causes <span class="">MPI</span>_<span class="">Irecv</span> endless wait.</div>
<div><br></div><div>If I compile same program with MPICH2 and use MPICH2 hydra, I get the same result as compiling with MPICH3.0.4 and running with MPICH2 hydra.</div><div><br></div><div>Execution command:</div><div><div>
<span class="">mpiexec</span>.hydra -<span class="">genvall</span> -disable-auto-cleanup -f MpiConfigMachines1.<span class="">txt</span> -launcher=<span class="">rsh</span> -n 2 <span class="">mpi</span>_<span class="">irecv</span>_ft_simple</div>
</div><div><br></div><div><br></div><div>Both hydra's configured with:</div><div> $ ./configure --prefix=/space/local/<span class="">mpich</span>-3.0.4/ --enable-error-checking=<span class="">runtime</span> --enable-g=<span class="">dbg</span> <span class="">CFLAGS</span>=-<span class="">fPIC</span> <span class="">CXXFLAGS</span>=-<span class="">fPIC</span> <span class="">FFLAGS</span>=-<span class="">fpic</span> --enable-threads=<span class="">runtime</span> --enable-<span class="">totalview</span> --enable-static --disable-f77 --disable-<span class="">fc</span> --no-recursion<br>
</div><div><br></div><div> $ ./configure --prefix=/space/local/mpich2-1.5b2/ --enable-error-checking=<span class="">runtime</span> --enable-g=<span class="">dbg</span> <span class="">CFLAGS</span>=-<span class="">fPIC</span> <span class="">CXXFLAGS</span>=-<span class="">fPIC</span> <span class="">FFLAGS</span>=-<span class="">fpic</span> --enable-threads=<span class="">runtime</span> --enable-<span class="">totalview</span> --enable-static --disable-f77 --disable-<span class="">fc</span> <br>
</div><div><br></div><div>Can you advice please?</div><div><br></div><div>Regards,</div><div><span class="">Anatoly</span>.</div></div>