[mpich-discuss] Fault tolerance after MPI_Comm_connect/accept
Jim Dinan
dinan at mcs.anl.gov
Tue Mar 5 09:33:31 CST 2013
Hi Mathieu,
I created an MPI Forum ticket for this:
https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/365
In terms of what is guaranteed by the standard, the behavior is
undefined. In terms of that MPICH will do, I am not sure, although my
guess is that current MPICH will be unable to continue working after
such a failure. You may need to do some testing or read the code to
find out.
Cheers,
~Jim.
On 3/4/13 7:56 AM, Matthieu Dorier wrote:
> Hi,
>
> I'm connecting two MPI applications A and B using MPI_Comm_accept in A
> and MPI_Comm_connect in B. I would like to know a bit more about the
> behavior in case one application stops (say B): will a communication
> attempt (e.g. MPI_Send) from a process from A to a process from B crash?
> return an error? block?
> Is there a way for application A to notice that B has stopped in order
> to avoid communicating with it?
>
> Thanks.
>
> PS: by the way for whom is involved in the MPI forum, an
> MPI_Comm_iaccept in the MPI3 standard would have been useful. Something
> to keep in mind for the next version maybe ;)
>
> Matthieu Dorier
> PhD student at ENS Cachan Brittany and IRISA
> http://people.irisa.fr/Matthieu.Dorier
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list