<div dir="ltr"><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px">Dear MPICH team,</span><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br>
</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">I use MPICH2.</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br>
</span></font><div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px">I have configuration:</span></div><div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px">Main application which executes:</span></div>
<div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">"mpiexec.hydra -genvall -disable-auto-cleanup -f MpiConfigMachines.txt -launcher=rsh -n 20 node"</span></font></div>
<div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br></span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">After fail of single "node" process, I need to restart all system w/o restarting Main application process.</span></font></div>
<div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br></span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">After fail of "node" process, I execute some inner logic and then I call MPI_Abort from Master process (rank 0) to abort all "node" processes. Then I send signal SIG_TERM to mpiexec.hydra in order to finish hydra process and executing again:</span></font></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px">"mpiexec.hydra -genvall -disable-auto-cleanup -f MpiConfigMachines.txt -launcher=rsh -n 20 node"</span><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br>
</span></font></div><div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px">The problem:</span></div>
<div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">sometimes I see mpiexec.hydra which is stalled</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br>
</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">I execute:</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><div>
::kill(mProcId, SIGTERM)</div><div>where mProcId is "mpiexec.hydra" id.</div></span></font><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br></span></font></div><div>
<font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px;white-space:pre">Then I see that mpiexec.hydra process is still exist, sometimes hydra_pmi_proxy too.</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px;white-space:pre">If I execute "kill -9" mpiexec.hydra always killed and </span></font><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px;white-space:pre">hydra_pmi_proxy too.</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px;white-space:pre"><br></span></div><div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px;white-space:pre">My question is if "kill -9" in the case of "node" process failure is recommended way?</span></div>
<div><span style="color:rgb(0,0,0);font-family:arial,sans-serif;font-size:12.666666984558105px;white-space:pre">If not, what is recommended way.</span></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br>
</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">Regards,</span></font></div><div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px">Anatoly.</span></font></div>
<div><font color="#000000" face="arial, sans-serif"><span style="font-size:12.666666984558105px"><br></span></font></div></div></div>