[mpich-discuss] Fault tolerance after MPI_Comm_connect/accept

Jim Dinan dinan at mcs.anl.gov
Tue Mar 5 10:34:29 CST 2013


Hi Mathieu,

You should ignore my comments on MPICH FT.  My info is clearly 
out-of-date.  It sounds like what you're looking for should be fully 
supported.  :)

  ~Jim.

On 3/5/13 9:42 AM, Matthieu Dorier wrote:
> Alright, thanks for the answer (and for the ticket).
> Cheers,
>
> Matthieu
>
> ----- Mail original -----
>> De: "Jim Dinan" <dinan at mcs.anl.gov>
>> À: discuss at mpich.org
>> Envoyé: Mardi 5 Mars 2013 16:33:31
>> Objet: Re: [mpich-discuss] Fault tolerance after MPI_Comm_connect/accept
>>
>> Hi Mathieu,
>>
>> I created an MPI Forum ticket for this:
>>
>> https://svn.mpi-forum.org/trac/mpi-forum-web/ticket/365
>>
>> In terms of what is guaranteed by the standard, the behavior is
>> undefined.  In terms of that MPICH will do, I am not sure, although
>> my
>> guess is that current MPICH will be unable to continue working after
>> such a failure.  You may need to do some testing or read the code to
>> find out.
>>
>> Cheers,
>>    ~Jim.
>>
>> On 3/4/13 7:56 AM, Matthieu Dorier wrote:
>>> Hi,
>>>
>>> I'm connecting two MPI applications A and B using MPI_Comm_accept
>>> in A
>>> and MPI_Comm_connect in B. I would like to know a bit more about
>>> the
>>> behavior in case one application stops (say B): will a
>>> communication
>>> attempt (e.g. MPI_Send) from a process from A to a process from B
>>> crash?
>>> return an error? block?
>>> Is there a way for application A to notice that B has stopped in
>>> order
>>> to avoid communicating with it?
>>>
>>> Thanks.
>>>
>>> PS: by the way for whom is involved in the MPI forum, an
>>> MPI_Comm_iaccept in the MPI3 standard would have been useful.
>>> Something
>>> to keep in mind for the next version maybe ;)
>>>
>>> Matthieu Dorier
>>> PhD student at ENS Cachan Brittany and IRISA
>>> http://people.irisa.fr/Matthieu.Dorier
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



More information about the discuss mailing list