<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8"></head>
Hello!<br>
I'm a newbie in MPI world.<br>
I have some question about program execution.<br>
I build some program (for example, exp1) using mpicc and run it to multiple hosts.<br>
mpiexec -f my_hosts -n 8 ./exp1<br>
4 exp1 are running on host A (rank 0-3) and 4 - on host B (rank 4-7).<br>
If one of them crashes, all other are terminating too. mpiexec print:<br>
===================================================================================<br>
= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<br>
= PID 5774 RUNNING AT slave1<br>
= EXIT CODE: 11<br>
= CLEANING UP REMAINING PROCESSES<br>
= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<br>
===================================================================================<br>
<br>
If I understand right, I can handle in my code only MPI function errors.<br>
In my project I need if one process is terminated, all other processes will stay running. For example, if slave node lose power, processes on master node stay running. Master node will know that processes on slave node are terminated and after some time master node will rerun these processes on slave node.<br>
Is it possible? If yes, how?<br>
<br>
Big thanks!</html>