[mpich-discuss] Fault tolerance after MPI_Comm_connect/accept

Pavan Balaji balaji at mcs.anl.gov
Tue Mar 5 10:43:06 CST 2013


On 03/05/2013 10:34 AM US Central Time, Jim Dinan wrote:
> You should ignore my comments on MPICH FT.  My info is clearly
> out-of-date.  It sounds like what you're looking for should be fully
> supported.  :)

Well, almost :-).

Some things could not be done cleanly while staying within MPI-3.  For
example, when you do a wildcard receive, it will always return an error
if any process in the communicator is dead.  The MPI Forum is working on
fixing this by allowing the user to "opt in" for this wildcard stuff.
I'm a few months behind on the MPI-3.1 fault-tolerance proposal, but I
believe this is still present.

 -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list