[mpich-discuss] Using mpiexc.hydra with -disable-auto-cleanup.
Anatoly G
anatolyrishon at gmail.com
Sun Mar 30 03:44:37 CDT 2014
Dear MPICH team,
I use MPICH2.
I have configuration:
Main application which executes:
"mpiexec.hydra -genvall -disable-auto-cleanup -f MpiConfigMachines.txt
-launcher=rsh -n 20 node"
After fail of single "node" process, I need to restart all system w/o
restarting Main application process.
After fail of "node" process, I execute some inner logic and then I call
MPI_Abort from Master process (rank 0) to abort all "node" processes. Then
I send signal SIG_TERM to mpiexec.hydra in order to finish hydra process
and executing again:
"mpiexec.hydra -genvall -disable-auto-cleanup -f MpiConfigMachines.txt
-launcher=rsh -n 20 node"
The problem:
sometimes I see mpiexec.hydra which is stalled
I execute:
::kill(mProcId, SIGTERM)
where mProcId is "mpiexec.hydra" id.
Then I see that mpiexec.hydra process is still exist, sometimes
hydra_pmi_proxy too.
If I execute "kill -9" mpiexec.hydra always killed and hydra_pmi_proxy too.
My question is if "kill -9" in the case of "node" process failure is
recommended way?
If not, what is recommended way.
Regards,
Anatoly.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140330/0594d8ee/attachment.html>
More information about the discuss
mailing list