[mpich-discuss] Fault tolerance after MPI_Comm_connect/accept

Jim Dinan dinan at mcs.anl.gov
Tue Mar 5 09:33:31 CST 2013


Hi Mathieu,

I created an MPI Forum ticket for this:

https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/365

In terms of what is guaranteed by the standard, the behavior is 
undefined.  In terms of that MPICH will do, I am not sure, although my 
guess is that current MPICH will be unable to continue working after 
such a failure.  You may need to do some testing or read the code to 
find out.

Cheers,
  ~Jim.

On 3/4/13 7:56 AM, Matthieu Dorier wrote:
> Hi,
>
> I'm connecting two MPI applications A and B using MPI_Comm_accept in A
> and MPI_Comm_connect in B. I would like to know a bit more about the
> behavior in case one application stops (say B): will a communication
> attempt (e.g. MPI_Send) from a process from A to a process from B crash?
> return an error? block?
> Is there a way for application A to notice that B has stopped in order
> to avoid communicating with it?
>
> Thanks.
>
> PS: by the way for whom is involved in the MPI forum, an
> MPI_Comm_iaccept in the MPI3 standard would have been useful. Something
> to keep in mind for the next version maybe ;)
>
> Matthieu Dorier
> PhD student at ENS Cachan Brittany and IRISA
> http://people.irisa.fr/Matthieu.Dorier
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



More information about the discuss mailing list