[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.1.2-53-g29d4c54

Service Account noreply at mpich.org
Thu Jul 31 09:39:54 CDT 2014


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  29d4c54f7a0adba04464f3cfc2ba917a3dc3f8b5 (commit)
       via  8aed7f12d7e5bbb679638f09f9722a7a14479cd8 (commit)
       via  2bdc3116b520f75856606033661fe4bece971e75 (commit)
       via  6b8cee7071209d8aacb6a88e797ac3afad234322 (commit)
       via  05cb62bdca14de1fff9b3599e901efffd8c31088 (commit)
       via  1f0ee13674dff8ad44c647c4748e677c7fcfb756 (commit)
       via  5be10ce97cdf586ccaa5ab86f29d3827bb215056 (commit)
       via  ee5173e396f43adb3dd0660e59e7b2e19cb856c5 (commit)
       via  57f6ee88801fd9d2959cc133fe4bb10b25848f4f (commit)
       via  628d2daf99811e7a426c08f3726ec8072d927178 (commit)
       via  5c71c3a8bf633063445cdc29b19f1c1104527bb9 (commit)
       via  39b958059fbd50f05d92190ce8eb507437a4878e (commit)
       via  8652e0ade03c6b5a8dcc8205a1d978413471f130 (commit)
       via  665ced285ab9e2f655852c901b9a819f6390474e (commit)
       via  782d036c4f898f786bb3a4f90b02f5d99971d9c6 (commit)
       via  3325b6f7b416647a7c66878a71cac19708096c8a (commit)
       via  ed98c9834b6b827eaea970616590e8095d0ef418 (commit)
       via  6ce715477e725d550af675fbd10cc3b2ff0c615c (commit)
       via  c83eddd9138b21558a618457910583cf7c1ba321 (commit)
       via  b68657dcaff19cd0a164f75f31eace6ef64d324b (commit)
      from  edd6daa5c47dfa19acf5836f2781d1c564116e37 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/29d4c54f7a0adba04464f3cfc2ba917a3dc3f8b5

commit 29d4c54f7a0adba04464f3cfc2ba917a3dc3f8b5
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Jul 30 16:11:31 2014 -0500

    Mark new tests as xfail
    
    The new tests don't pass yet due to some corner cases. However, we need to go
    ahead and push this into master, so they'll be xfail for now. This will get
    picked up as part of #1945.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/ft/testlist b/test/mpi/ft/testlist
index 5e8b54b..cdc0168 100644
--- a/test/mpi/ft/testlist
+++ b/test/mpi/ft/testlist
@@ -11,7 +11,7 @@ gather 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=f
 reduce 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 bcast 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 scatter 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
-anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
-revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultsTest=TestStatusNoErrors strict=false timelimit=10
-shrink 8 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
-agree 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
+anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
+revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultsTest=TestStatusNoErrors strict=false timelimit=10 xfail=ticket1945
+shrink 8 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
+agree 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945

http://git.mpich.org/mpich.git/commitdiff/8aed7f12d7e5bbb679638f09f9722a7a14479cd8

commit 8aed7f12d7e5bbb679638f09f9722a7a14479cd8
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Jul 23 10:10:37 2014 -0500

    CH3_ISENDV must correctly set mpi_errno
    
    Previously, CH3_ISENDV was only setting the error value in the send request.
    This isn't enough because upper level functions don't check that. They only
    check the return value which wasn't set at all.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c b/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
index d5fe393..15ac7c8 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
@@ -31,6 +31,7 @@ int MPIDI_CH3_iSendv (MPIDI_VC_t *vc, MPID_Request *sreq, MPID_IOV *iov, int n_i
     if (vc->state == MPIDI_VC_STATE_MORIBUND) {
         sreq->status.MPI_ERROR = MPI_SUCCESS;
         MPIU_ERR_SET1(sreq->status.MPI_ERROR, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+        MPIU_ERR_SET1(mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
         MPIDI_CH3U_Request_complete(sreq);
         goto fn_fail;
     }

http://git.mpich.org/mpich.git/commitdiff/2bdc3116b520f75856606033661fe4bece971e75

commit 2bdc3116b520f75856606033661fe4bece971e75
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Jul 18 10:45:40 2014 -0500

    Add some basic resilience to allred_group
    
    If a process is dead, collectives still do all of the communictaions to
    prevent a deadlock. However, if we just skip the part where the data is
    updated in the allreduce_group function, we can let it be slightly more
    resilient to failures and possibly even produce a correct answer in the
    presence of a failure.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpi/coll/allred_group.c b/src/mpi/coll/allred_group.c
index 396fb76..5db6b6a 100644
--- a/src/mpi/coll/allred_group.c
+++ b/src/mpi/coll/allred_group.c
@@ -165,25 +165,26 @@ int MPIR_Allreduce_group_intra(void *sendbuf, void *recvbuf, int count,
                     *errflag = TRUE;
                     MPIU_ERR_SET(mpi_errno, MPI_ERR_OTHER, "**fail");
                     MPIU_ERR_ADD(mpi_errno_ret, mpi_errno);
-                }
-
-                /* tmp_buf contains data received in this step.
-                   recvbuf contains data accumulated so far */
-
-                if (is_commutative  || (dst < group_rank)) {
-                    /* op is commutative OR the order is already right */
-                    mpi_errno = MPIR_Reduce_local_impl(tmp_buf, recvbuf, count, datatype, op);
-                    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-                }
-                else {
-                    /* op is noncommutative and the order is not right */
-                    mpi_errno = MPIR_Reduce_local_impl(recvbuf, tmp_buf, count, datatype, op);
-                    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
-                    /* copy result back into recvbuf */
-                    mpi_errno = MPIR_Localcopy(tmp_buf, count, datatype,
-                                               recvbuf, count, datatype);
-                    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+                } else {
+
+                    /* tmp_buf contains data received in this step.
+                       recvbuf contains data accumulated so far */
+
+                    if (is_commutative  || (dst < group_rank)) {
+                        /* op is commutative OR the order is already right */
+                        mpi_errno = MPIR_Reduce_local_impl(tmp_buf, recvbuf, count, datatype, op);
+                        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+                    }
+                    else {
+                        /* op is noncommutative and the order is not right */
+                        mpi_errno = MPIR_Reduce_local_impl(recvbuf, tmp_buf, count, datatype, op);
+                        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+                        /* copy result back into recvbuf */
+                        mpi_errno = MPIR_Localcopy(tmp_buf, count, datatype,
+                                recvbuf, count, datatype);
+                        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+                    }
                 }
                 mask <<= 1;
             }

http://git.mpich.org/mpich.git/commitdiff/6b8cee7071209d8aacb6a88e797ac3afad234322

commit 6b8cee7071209d8aacb6a88e797ac3afad234322
Author: Wesley Bland <wbland at anl.gov>
Date:   Sat May 10 14:39:26 2014 -0500

    Don't check the tag if the coll op completed with error
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpi/coll/helper_fns.c b/src/mpi/coll/helper_fns.c
index cf10a9e..1ee252c 100644
--- a/src/mpi/coll/helper_fns.c
+++ b/src/mpi/coll/helper_fns.c
@@ -367,7 +367,7 @@ int MPIC_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag,
         if (MPIR_TAG_CHECK_ERROR_BIT(status->MPI_TAG)) {
             *errflag = TRUE;
             MPIR_TAG_CLEAR_ERROR_BIT(status->MPI_TAG);
-        } else {
+        } else if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(status->MPI_ERROR)) {
             MPIU_Assert(status->MPI_TAG == tag);
         }
     }
@@ -486,11 +486,11 @@ int MPIC_Sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
         if (MPIR_TAG_CHECK_ERROR_BIT(status->MPI_TAG)) {
             *errflag = TRUE;
             MPIR_TAG_CLEAR_ERROR_BIT(status->MPI_TAG);
-        } else {
+        } else if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(status->MPI_ERROR)) {
             MPIU_Assert(status->MPI_TAG == recvtag);
         }
     }
-    
+
  fn_exit:
     MPIU_DBG_MSG_S(PT2PT, TYPICAL, "OUT: errflag = %s", *errflag?"TRUE":"FALSE");
 
@@ -602,7 +602,7 @@ int MPIC_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype,
         if (MPIR_TAG_CHECK_ERROR_BIT(status->MPI_TAG)) {
             *errflag = TRUE;
             MPIR_TAG_CLEAR_ERROR_BIT(status->MPI_TAG);
-        } else {
+        } else if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(status->MPI_ERROR)) {
             MPIU_Assert(status->MPI_TAG == recvtag);
         }
     }

http://git.mpich.org/mpich.git/commitdiff/05cb62bdca14de1fff9b3599e901efffd8c31088

commit 05cb62bdca14de1fff9b3599e901efffd8c31088
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri May 2 10:17:34 2014 -0500

    Change MPID_Comm_valid_ptr to optionally ignore revoke
    
    Adds a parameter to MPID_Comm_valid_ptr to take a second parameter that will
    either cause the macro to ignore the revoke flag or not.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 79a0bc8..882e366 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -494,14 +494,13 @@ int MPIU_Handle_free( void *((*)[]), int );
    for now */
 /* ticket #1441: check (refcount<=0) to cover the case of 0, an "over-free" of
  * -1 or similar, and the 0xecec... case when --enable-g=mem is used */
-#define MPID_Comm_valid_ptr(ptr,err) {                \
+#define MPID_Comm_valid_ptr(ptr,err,ignore_rev) {     \
      MPID_Valid_ptr_class(Comm,ptr,MPI_ERR_COMM,err); \
      if ((ptr) && MPIU_Object_get_ref(ptr) <= 0) {    \
          MPIU_ERR_SET(err,MPI_ERR_COMM,"**comm");     \
          ptr = 0;                                     \
-     } else if (ptr->revoked) {                       \
+     } else if (ptr->revoked && !ignore_rev) {        \
          MPIU_ERR_SET(err,MPIX_ERR_REVOKED,"**comm"); \
-         ptr = 0;                                     \
      }                                                \
 }
 #define MPID_Group_valid_ptr(ptr,err) MPID_Valid_ptr_class(Group,ptr,MPI_ERR_GROUP,err)
diff --git a/src/mpi/attr/attr_delete.c b/src/mpi/attr/attr_delete.c
index 7594dcc..1e624f9 100644
--- a/src/mpi/attr/attr_delete.c
+++ b/src/mpi/attr/attr_delete.c
@@ -86,7 +86,7 @@ int MPI_Attr_delete(MPI_Comm comm, int keyval)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             /* Validate keyval_ptr */
 	    MPID_Keyval_valid_ptr( keyval_ptr, mpi_errno );
diff --git a/src/mpi/attr/attr_get.c b/src/mpi/attr/attr_get.c
index f02ac02..eee5394 100644
--- a/src/mpi/attr/attr_get.c
+++ b/src/mpi/attr/attr_get.c
@@ -112,7 +112,7 @@ int MPI_Attr_get(MPI_Comm comm, int keyval, void *attribute_val, int *flag)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(attribute_val, "attribute_val", mpi_errno);
 	    MPIR_ERRTEST_ARGNULL(flag, "flag", mpi_errno);
diff --git a/src/mpi/attr/attr_put.c b/src/mpi/attr/attr_put.c
index 91b74fb..89987fd 100644
--- a/src/mpi/attr/attr_put.c
+++ b/src/mpi/attr/attr_put.c
@@ -104,7 +104,7 @@ int MPI_Attr_put(MPI_Comm comm, int keyval, void *attribute_val)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/attr/comm_delete_attr.c b/src/mpi/attr/comm_delete_attr.c
index 683960e..c184d43 100644
--- a/src/mpi/attr/comm_delete_attr.c
+++ b/src/mpi/attr/comm_delete_attr.c
@@ -140,7 +140,7 @@ int MPI_Comm_delete_attr(MPI_Comm comm, int comm_keyval)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             /* Validate keyval_ptr */
 	    MPID_Keyval_valid_ptr( keyval_ptr, mpi_errno );
diff --git a/src/mpi/attr/comm_get_attr.c b/src/mpi/attr/comm_get_attr.c
index 0cc2c56..28a8e19 100644
--- a/src/mpi/attr/comm_get_attr.c
+++ b/src/mpi/attr/comm_get_attr.c
@@ -81,7 +81,7 @@ int MPIR_CommGetAttr( MPI_Comm comm, int comm_keyval, void *attribute_val,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(attribute_val, "attr_val", mpi_errno);
 	    MPIR_ERRTEST_ARGNULL(flag, "flag", mpi_errno);
diff --git a/src/mpi/attr/comm_set_attr.c b/src/mpi/attr/comm_set_attr.c
index 71b4e6b..2aa9457 100644
--- a/src/mpi/attr/comm_set_attr.c
+++ b/src/mpi/attr/comm_set_attr.c
@@ -142,7 +142,7 @@ int MPIR_CommSetAttr( MPI_Comm comm, int comm_keyval, void *attribute_val,
             MPID_Keyval *keyval_ptr = NULL;
 
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    /* Validate keyval_ptr */
             MPID_Keyval_get_ptr( comm_keyval, keyval_ptr );
@@ -249,7 +249,7 @@ int MPI_Comm_set_attr(MPI_Comm comm, int comm_keyval, void *attribute_val)
             MPID_Keyval *keyval_ptr = NULL;
 
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    /* Validate keyval_ptr */
             MPID_Keyval_get_ptr( comm_keyval, keyval_ptr );
diff --git a/src/mpi/coll/allgather.c b/src/mpi/coll/allgather.c
index 216e601..31df564 100644
--- a/src/mpi/coll/allgather.c
+++ b/src/mpi/coll/allgather.c
@@ -933,7 +933,7 @@ int MPI_Allgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
         {
             MPID_Datatype *recvtype_ptr=NULL, *sendtype_ptr=NULL;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTERCOMM) {
diff --git a/src/mpi/coll/allgatherv.c b/src/mpi/coll/allgatherv.c
index fe12f92..674ba31 100644
--- a/src/mpi/coll/allgatherv.c
+++ b/src/mpi/coll/allgatherv.c
@@ -1045,7 +1045,7 @@ int MPI_Allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
             MPID_Datatype *recvtype_ptr=NULL, *sendtype_ptr=NULL;
             int i, comm_size;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->comm_kind == MPID_INTERCOMM)
diff --git a/src/mpi/coll/allreduce.c b/src/mpi/coll/allreduce.c
index bc55aaf..2d6dee5 100644
--- a/src/mpi/coll/allreduce.c
+++ b/src/mpi/coll/allreduce.c
@@ -851,7 +851,7 @@ int MPI_Allreduce(const void *sendbuf, void *recvbuf, int count,
             MPID_Datatype *datatype_ptr = NULL;
             MPID_Op *op_ptr = NULL;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
 	    MPIR_ERRTEST_DATATYPE(datatype, "datatype", mpi_errno);
diff --git a/src/mpi/coll/alltoall.c b/src/mpi/coll/alltoall.c
index 3c6ed8b..0cae194 100644
--- a/src/mpi/coll/alltoall.c
+++ b/src/mpi/coll/alltoall.c
@@ -836,7 +836,7 @@ int MPI_Alltoall(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
         {
 	    MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (sendbuf != MPI_IN_PLACE) {
diff --git a/src/mpi/coll/alltoallv.c b/src/mpi/coll/alltoallv.c
index edfe401..713b944 100644
--- a/src/mpi/coll/alltoallv.c
+++ b/src/mpi/coll/alltoallv.c
@@ -471,7 +471,7 @@ int MPI_Alltoallv(const void *sendbuf, const int *sendcounts,
             int i, comm_size;
             int check_send = (comm_ptr->comm_kind == MPID_INTRACOMM && sendbuf != MPI_IN_PLACE);
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/alltoallw.c b/src/mpi/coll/alltoallw.c
index 11758bb..31e7a1e 100644
--- a/src/mpi/coll/alltoallw.c
+++ b/src/mpi/coll/alltoallw.c
@@ -466,7 +466,7 @@ int MPI_Alltoallw(const void *sendbuf, const int sendcounts[],
             int i, comm_size;
             int check_send;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             check_send = (comm_ptr->comm_kind == MPID_INTRACOMM && sendbuf != MPI_IN_PLACE);
diff --git a/src/mpi/coll/barrier.c b/src/mpi/coll/barrier.c
index 0e1ff06..96813d8 100644
--- a/src/mpi/coll/barrier.c
+++ b/src/mpi/coll/barrier.c
@@ -403,7 +403,7 @@ int MPI_Barrier( MPI_Comm comm )
         MPID_BEGIN_ERROR_CHECKS;
         {
 	    /* Validate communicator */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
diff --git a/src/mpi/coll/bcast.c b/src/mpi/coll/bcast.c
index 38f3757..836dd4d 100644
--- a/src/mpi/coll/bcast.c
+++ b/src/mpi/coll/bcast.c
@@ -1564,7 +1564,7 @@ int MPI_Bcast( void *buffer, int count, MPI_Datatype datatype, int root,
         {
             MPID_Datatype *datatype_ptr = NULL;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
 	    MPIR_ERRTEST_DATATYPE(datatype, "datatype", mpi_errno);
diff --git a/src/mpi/coll/exscan.c b/src/mpi/coll/exscan.c
index 573770f..a83a543 100644
--- a/src/mpi/coll/exscan.c
+++ b/src/mpi/coll/exscan.c
@@ -338,7 +338,7 @@ int MPI_Exscan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datat
             MPID_Op *op_ptr = NULL;
             int rank;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
diff --git a/src/mpi/coll/gather.c b/src/mpi/coll/gather.c
index d31af10..cb87ef2 100644
--- a/src/mpi/coll/gather.c
+++ b/src/mpi/coll/gather.c
@@ -794,7 +794,7 @@ int MPI_Gather(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
 	    MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
 	    int rank;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/gatherv.c b/src/mpi/coll/gatherv.c
index 7134af3..48a265c 100644
--- a/src/mpi/coll/gatherv.c
+++ b/src/mpi/coll/gatherv.c
@@ -319,7 +319,7 @@ int MPI_Gatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
 	    MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
             int i, rank, comm_size;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/iallgather.c b/src/mpi/coll/iallgather.c
index 38c9b32..18d798c 100644
--- a/src/mpi/coll/iallgather.c
+++ b/src/mpi/coll/iallgather.c
@@ -695,7 +695,7 @@ int MPI_Iallgather(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (sendbuf != MPI_IN_PLACE && HANDLE_GET_KIND(sendtype) != HANDLE_KIND_BUILTIN) {
                 MPID_Datatype *sendtype_ptr = NULL;
                 MPID_Datatype_get_ptr(sendtype, sendtype_ptr);
diff --git a/src/mpi/coll/iallgatherv.c b/src/mpi/coll/iallgatherv.c
index 8fd90a2..5d5fd15 100644
--- a/src/mpi/coll/iallgatherv.c
+++ b/src/mpi/coll/iallgatherv.c
@@ -798,7 +798,7 @@ int MPI_Iallgatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype, v
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (sendbuf != MPI_IN_PLACE) {
diff --git a/src/mpi/coll/iallreduce.c b/src/mpi/coll/iallreduce.c
index e2b2e8b..3ff69f4 100644
--- a/src/mpi/coll/iallreduce.c
+++ b/src/mpi/coll/iallreduce.c
@@ -750,7 +750,7 @@ int MPI_Iallreduce(const void *sendbuf, void *recvbuf, int count,
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
                 MPID_Datatype *datatype_ptr = NULL;
                 MPID_Datatype_get_ptr(datatype, datatype_ptr);
diff --git a/src/mpi/coll/ialltoall.c b/src/mpi/coll/ialltoall.c
index a8ef91c..6b18b47 100644
--- a/src/mpi/coll/ialltoall.c
+++ b/src/mpi/coll/ialltoall.c
@@ -626,7 +626,7 @@ int MPI_Ialltoall(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (sendbuf != MPI_IN_PLACE && HANDLE_GET_KIND(sendtype) != HANDLE_KIND_BUILTIN) {
diff --git a/src/mpi/coll/ialltoallv.c b/src/mpi/coll/ialltoallv.c
index 595fd79..93b1f63 100644
--- a/src/mpi/coll/ialltoallv.c
+++ b/src/mpi/coll/ialltoallv.c
@@ -356,7 +356,7 @@ int MPI_Ialltoallv(const void *sendbuf, const int sendcounts[], const int sdispl
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (sendbuf != MPI_IN_PLACE) {
diff --git a/src/mpi/coll/ialltoallw.c b/src/mpi/coll/ialltoallw.c
index a6727a2..9651b19 100644
--- a/src/mpi/coll/ialltoallw.c
+++ b/src/mpi/coll/ialltoallw.c
@@ -358,7 +358,7 @@ int MPI_Ialltoallw(const void *sendbuf, const int sendcounts[], const int sdispl
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (sendbuf != MPI_IN_PLACE) {
diff --git a/src/mpi/coll/ibarrier.c b/src/mpi/coll/ibarrier.c
index 7b558c7..6b36160 100644
--- a/src/mpi/coll/ibarrier.c
+++ b/src/mpi/coll/ibarrier.c
@@ -272,7 +272,7 @@ int MPI_Ibarrier(MPI_Comm comm, MPI_Request *request)
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             MPIR_ERRTEST_ARGNULL(request,"request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
diff --git a/src/mpi/coll/ibcast.c b/src/mpi/coll/ibcast.c
index 812cfd1..a35bcd5 100644
--- a/src/mpi/coll/ibcast.c
+++ b/src/mpi/coll/ibcast.c
@@ -958,7 +958,7 @@ int MPI_Ibcast(void *buffer, int count, MPI_Datatype datatype, int root, MPI_Com
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
diff --git a/src/mpi/coll/iexscan.c b/src/mpi/coll/iexscan.c
index a1f706e..d5dbb3f 100644
--- a/src/mpi/coll/iexscan.c
+++ b/src/mpi/coll/iexscan.c
@@ -287,7 +287,7 @@ int MPI_Iexscan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype data
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
             if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
                 MPID_Datatype *datatype_ptr = NULL;
diff --git a/src/mpi/coll/igather.c b/src/mpi/coll/igather.c
index f17238c..5fa603a 100644
--- a/src/mpi/coll/igather.c
+++ b/src/mpi/coll/igather.c
@@ -626,7 +626,7 @@ int MPI_Igather(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
             MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
             int rank;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/igatherv.c b/src/mpi/coll/igatherv.c
index 1c492b4..c5d7426 100644
--- a/src/mpi/coll/igatherv.c
+++ b/src/mpi/coll/igatherv.c
@@ -220,7 +220,7 @@ int MPI_Igatherv(const void *sendbuf, int sendcount, MPI_Datatype sendtype, void
             MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
             int i, rank, comm_size;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/ired_scat.c b/src/mpi/coll/ired_scat.c
index bc74aac..884c8da 100644
--- a/src/mpi/coll/ired_scat.c
+++ b/src/mpi/coll/ired_scat.c
@@ -1114,7 +1114,7 @@ int MPI_Ireduce_scatter(const void *sendbuf, void *recvbuf, const int recvcounts
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_ARGNULL(recvcounts,"recvcounts", mpi_errno);
diff --git a/src/mpi/coll/ired_scat_block.c b/src/mpi/coll/ired_scat_block.c
index 04fa432..3368ea4 100644
--- a/src/mpi/coll/ired_scat_block.c
+++ b/src/mpi/coll/ired_scat_block.c
@@ -1018,7 +1018,7 @@ int MPI_Ireduce_scatter_block(const void *sendbuf, void *recvbuf,
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
                 MPID_Datatype *datatype_ptr = NULL;
                 MPID_Datatype_get_ptr(datatype, datatype_ptr);
diff --git a/src/mpi/coll/ireduce.c b/src/mpi/coll/ireduce.c
index 5f31903..e5736ce 100644
--- a/src/mpi/coll/ireduce.c
+++ b/src/mpi/coll/ireduce.c
@@ -869,7 +869,7 @@ int MPI_Ireduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype data
         {
             int rank;
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (HANDLE_GET_KIND(datatype) != HANDLE_KIND_BUILTIN) {
                 MPID_Datatype *datatype_ptr = NULL;
                 MPID_Datatype_get_ptr(datatype, datatype_ptr);
diff --git a/src/mpi/coll/iscan.c b/src/mpi/coll/iscan.c
index c8fa41e..ff7f6b2 100644
--- a/src/mpi/coll/iscan.c
+++ b/src/mpi/coll/iscan.c
@@ -422,7 +422,7 @@ int MPI_Iscan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype dataty
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
diff --git a/src/mpi/coll/iscatter.c b/src/mpi/coll/iscatter.c
index 8c51ed4..0f7beed 100644
--- a/src/mpi/coll/iscatter.c
+++ b/src/mpi/coll/iscatter.c
@@ -652,7 +652,7 @@ int MPI_Iscatter(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
         MPID_BEGIN_ERROR_CHECKS
         {
             MPID_Datatype *sendtype_ptr, *recvtype_ptr;
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
                 MPIR_ERRTEST_INTRA_ROOT(comm_ptr, root, mpi_errno);
 
diff --git a/src/mpi/coll/iscatterv.c b/src/mpi/coll/iscatterv.c
index eca8081..56257b4 100644
--- a/src/mpi/coll/iscatterv.c
+++ b/src/mpi/coll/iscatterv.c
@@ -229,7 +229,7 @@ int MPI_Iscatterv(const void *sendbuf, const int sendcounts[], const int displs[
             MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
             int i, comm_size, rank;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/red_scat.c b/src/mpi/coll/red_scat.c
index ebf63e8..46949f6 100644
--- a/src/mpi/coll/red_scat.c
+++ b/src/mpi/coll/red_scat.c
@@ -1154,7 +1154,7 @@ int MPI_Reduce_scatter(const void *sendbuf, void *recvbuf, const int recvcounts[
             MPID_Op *op_ptr = NULL;
             int i, size, sum;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             size = comm_ptr->local_size; 
diff --git a/src/mpi/coll/red_scat_block.c b/src/mpi/coll/red_scat_block.c
index 2cd929e..45b6b30 100644
--- a/src/mpi/coll/red_scat_block.c
+++ b/src/mpi/coll/red_scat_block.c
@@ -1125,7 +1125,7 @@ int MPI_Reduce_scatter_block(const void *sendbuf, void *recvbuf,
 	    MPID_Datatype *datatype_ptr = NULL;
             MPID_Op *op_ptr = NULL;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_COUNT(recvcount,mpi_errno);
diff --git a/src/mpi/coll/reduce.c b/src/mpi/coll/reduce.c
index 41acffc..ff482f2 100644
--- a/src/mpi/coll/reduce.c
+++ b/src/mpi/coll/reduce.c
@@ -1163,7 +1163,7 @@ int MPI_Reduce(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datat
             MPID_Op *op_ptr = NULL;
             int rank;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/scan.c b/src/mpi/coll/scan.c
index 8a6557b..c9965e2 100644
--- a/src/mpi/coll/scan.c
+++ b/src/mpi/coll/scan.c
@@ -522,7 +522,7 @@ int MPI_Scan(const void *sendbuf, void *recvbuf, int count, MPI_Datatype datatyp
 	    MPID_Datatype *datatype_ptr = NULL;
             MPID_Op *op_ptr = NULL;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
diff --git a/src/mpi/coll/scatter.c b/src/mpi/coll/scatter.c
index f3029ad..852861f 100644
--- a/src/mpi/coll/scatter.c
+++ b/src/mpi/coll/scatter.c
@@ -707,7 +707,7 @@ int MPI_Scatter(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
 	    MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
 	    int rank;
 
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/coll/scatterv.c b/src/mpi/coll/scatterv.c
index ce4b29e..485fde8 100644
--- a/src/mpi/coll/scatterv.c
+++ b/src/mpi/coll/scatterv.c
@@ -266,7 +266,7 @@ int MPI_Scatterv(const void *sendbuf, const int *sendcounts, const int *displs,
 	    MPID_Datatype *sendtype_ptr=NULL, *recvtype_ptr=NULL;
             int i, comm_size, rank;
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             if (comm_ptr->comm_kind == MPID_INTRACOMM) {
diff --git a/src/mpi/comm/comm_agree.c b/src/mpi/comm/comm_agree.c
index 3ee995e..870f43d 100644
--- a/src/mpi/comm/comm_agree.c
+++ b/src/mpi/comm/comm_agree.c
@@ -150,9 +150,8 @@ int MPIX_Comm_agree(MPI_Comm comm, int *flag)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
-            if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(mpi_errno) && mpi_errno)
-                goto fn_fail;
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
     }
diff --git a/src/mpi/comm/comm_compare.c b/src/mpi/comm/comm_compare.c
index 37d5a5d..42ee9e1 100644
--- a/src/mpi/comm/comm_compare.c
+++ b/src/mpi/comm/comm_compare.c
@@ -101,9 +101,9 @@ int MPI_Comm_compare(MPI_Comm comm1, MPI_Comm comm2, int *result)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr1, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr1, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
-            MPID_Comm_valid_ptr( comm_ptr2, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr2, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ARGNULL( result, "result", mpi_errno );
         }
diff --git a/src/mpi/comm/comm_create.c b/src/mpi/comm/comm_create.c
index 4bb63bb..3825f6f 100644
--- a/src/mpi/comm/comm_create.c
+++ b/src/mpi/comm/comm_create.c
@@ -543,7 +543,7 @@ int MPI_Comm_create(MPI_Comm comm, MPI_Group group, MPI_Comm *newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* If comm_ptr is not valid, it will be reset to null */
 
             /* only test for MPI_GROUP_NULL after attempting to convert the comm
diff --git a/src/mpi/comm/comm_create_group.c b/src/mpi/comm/comm_create_group.c
index 2ce1419..7be2949 100644
--- a/src/mpi/comm/comm_create_group.c
+++ b/src/mpi/comm/comm_create_group.c
@@ -175,7 +175,7 @@ int MPI_Comm_create_group(MPI_Comm comm, MPI_Group group, int tag, MPI_Comm * ne
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
             /* If comm_ptr is not valid, it will be reset to null */
             MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
diff --git a/src/mpi/comm/comm_dup.c b/src/mpi/comm/comm_dup.c
index aadba80..0137833 100644
--- a/src/mpi/comm/comm_dup.c
+++ b/src/mpi/comm/comm_dup.c
@@ -147,7 +147,7 @@ int MPI_Comm_dup(MPI_Comm comm, MPI_Comm *newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
             MPIR_ERRTEST_ARGNULL(newcomm, "newcomm", mpi_errno);
diff --git a/src/mpi/comm/comm_dup_with_info.c b/src/mpi/comm/comm_dup_with_info.c
index 476a9fa..dc82c4d 100644
--- a/src/mpi/comm/comm_dup_with_info.c
+++ b/src/mpi/comm/comm_dup_with_info.c
@@ -118,9 +118,8 @@ int MPI_Comm_dup_with_info(MPI_Comm comm, MPI_Info info, MPI_Comm * newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
-            if (mpi_errno)
-                goto fn_fail;
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
+            if (mpi_errno) goto fn_fail;
             /* If comm_ptr is not valid, it will be reset to null */
             MPIR_ERRTEST_ARGNULL(newcomm, "newcomm", mpi_errno);
         }
diff --git a/src/mpi/comm/comm_failure_ack.c b/src/mpi/comm/comm_failure_ack.c
index 7f2dab8..4337eb3 100644
--- a/src/mpi/comm/comm_failure_ack.c
+++ b/src/mpi/comm/comm_failure_ack.c
@@ -79,7 +79,7 @@ int MPIX_Comm_failure_ack( MPI_Comm comm )
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/comm/comm_failure_get_acked.c b/src/mpi/comm/comm_failure_get_acked.c
index 83d9eda..aac0c95 100644
--- a/src/mpi/comm/comm_failure_get_acked.c
+++ b/src/mpi/comm/comm_failure_get_acked.c
@@ -83,7 +83,7 @@ int MPIX_Comm_failure_get_acked( MPI_Comm comm, MPI_Group *failedgrp )
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/comm/comm_free.c b/src/mpi/comm/comm_free.c
index 7e4eb9e..9046745 100644
--- a/src/mpi/comm/comm_free.c
+++ b/src/mpi/comm/comm_free.c
@@ -105,7 +105,7 @@ int MPI_Comm_free(MPI_Comm *comm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    
 	    /* Cannot free the predefined communicators */
diff --git a/src/mpi/comm/comm_get_info.c b/src/mpi/comm/comm_get_info.c
index f3bc876..75d99c4 100644
--- a/src/mpi/comm/comm_get_info.c
+++ b/src/mpi/comm/comm_get_info.c
@@ -107,9 +107,8 @@ int MPI_Comm_get_info(MPI_Comm comm, MPI_Info * info_used)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
-            if (mpi_errno)
-                goto fn_fail;
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
     }
diff --git a/src/mpi/comm/comm_get_name.c b/src/mpi/comm/comm_get_name.c
index 574f1dd..0d05f5a 100644
--- a/src/mpi/comm/comm_get_name.c
+++ b/src/mpi/comm/comm_get_name.c
@@ -92,7 +92,7 @@ int MPI_Comm_get_name(MPI_Comm comm, char *comm_name, int *resultlen)
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 
 	    /* If comm_ptr is not valid, it will be reset to null */
diff --git a/src/mpi/comm/comm_group.c b/src/mpi/comm/comm_group.c
index ba758e5..075e734 100644
--- a/src/mpi/comm/comm_group.c
+++ b/src/mpi/comm/comm_group.c
@@ -148,8 +148,8 @@ int MPI_Comm_group(MPI_Comm comm, MPI_Group *group)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
-	    /* If comm_ptr is not valid, it will be reset to null */
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
diff --git a/src/mpi/comm/comm_idup.c b/src/mpi/comm/comm_idup.c
index 5ceecc1..d434761 100644
--- a/src/mpi/comm/comm_idup.c
+++ b/src/mpi/comm/comm_idup.c
@@ -123,7 +123,7 @@ int MPI_Comm_idup(MPI_Comm comm, MPI_Comm *newcomm, MPI_Request *request)
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
diff --git a/src/mpi/comm/comm_rank.c b/src/mpi/comm/comm_rank.c
index 4694319..89ece48 100644
--- a/src/mpi/comm/comm_rank.c
+++ b/src/mpi/comm/comm_rank.c
@@ -81,8 +81,8 @@ int MPI_Comm_rank( MPI_Comm comm, int *rank )
         {
             MPIR_ERRTEST_ARGNULL(rank,"rank",mpi_errno);
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
-	    /* If comm_ptr is not value, it will be reset to null */
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            /* If comm_ptr is not value, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
diff --git a/src/mpi/comm/comm_remote_group.c b/src/mpi/comm/comm_remote_group.c
index 1a97702..c71e11c 100644
--- a/src/mpi/comm/comm_remote_group.c
+++ b/src/mpi/comm/comm_remote_group.c
@@ -129,7 +129,7 @@ int MPI_Comm_remote_group(MPI_Comm comm, MPI_Group *group)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr && comm_ptr->comm_kind != MPID_INTERCOMM) {
 		mpi_errno = MPIR_Err_create_code( MPI_SUCCESS, 
diff --git a/src/mpi/comm/comm_remote_size.c b/src/mpi/comm/comm_remote_size.c
index 72e73e4..853d7b3 100644
--- a/src/mpi/comm/comm_remote_size.c
+++ b/src/mpi/comm/comm_remote_size.c
@@ -82,7 +82,7 @@ int MPI_Comm_remote_size(MPI_Comm comm, int *size)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr && comm_ptr->comm_kind != MPID_INTERCOMM) {
 		mpi_errno = MPIR_Err_create_code( MPI_SUCCESS, 
diff --git a/src/mpi/comm/comm_set_info.c b/src/mpi/comm/comm_set_info.c
index 2bf3c8f..e6940a8 100644
--- a/src/mpi/comm/comm_set_info.c
+++ b/src/mpi/comm/comm_set_info.c
@@ -127,9 +127,8 @@ int MPI_Comm_set_info(MPI_Comm comm, MPI_Info info)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
-            if (mpi_errno)
-                goto fn_fail;
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
     }
diff --git a/src/mpi/comm/comm_set_name.c b/src/mpi/comm/comm_set_name.c
index 75aa49a..3196b4f 100644
--- a/src/mpi/comm/comm_set_name.c
+++ b/src/mpi/comm/comm_set_name.c
@@ -77,7 +77,7 @@ int MPI_Comm_set_name(MPI_Comm comm, const char *comm_name)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ARGNULL( comm_name, "comm_name", mpi_errno );
 	    /* If comm_ptr is not valid, it will be reset to null */
diff --git a/src/mpi/comm/comm_shrink.c b/src/mpi/comm/comm_shrink.c
index 7eaba35..24560e7 100644
--- a/src/mpi/comm/comm_shrink.c
+++ b/src/mpi/comm/comm_shrink.c
@@ -138,9 +138,8 @@ int MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
-            if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(mpi_errno) && mpi_errno)
-                goto fn_fail;
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
     }
diff --git a/src/mpi/comm/comm_size.c b/src/mpi/comm/comm_size.c
index 7fc6276..412c5ff 100644
--- a/src/mpi/comm/comm_size.c
+++ b/src/mpi/comm/comm_size.c
@@ -84,7 +84,7 @@ int MPI_Comm_size( MPI_Comm comm, int *size )
         {
 	    MPIR_ERRTEST_ARGNULL(size,"size",mpi_errno);
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/comm/comm_split.c b/src/mpi/comm/comm_split.c
index 02a8276..6a60d28 100644
--- a/src/mpi/comm/comm_split.c
+++ b/src/mpi/comm/comm_split.c
@@ -465,7 +465,7 @@ int MPI_Comm_split(MPI_Comm comm, int color, int key, MPI_Comm *newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/comm/comm_split_type.c b/src/mpi/comm/comm_split_type.c
index 5e5d685..829e352 100644
--- a/src/mpi/comm/comm_split_type.c
+++ b/src/mpi/comm/comm_split_type.c
@@ -127,7 +127,7 @@ int MPI_Comm_split_type(MPI_Comm comm, int split_type, int key, MPI_Info info,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno)
                 goto fn_fail;
diff --git a/src/mpi/comm/comm_test_inter.c b/src/mpi/comm/comm_test_inter.c
index 4297c2c..46878af 100644
--- a/src/mpi/comm/comm_test_inter.c
+++ b/src/mpi/comm/comm_test_inter.c
@@ -81,7 +81,7 @@ int MPI_Comm_test_inter(MPI_Comm comm, int *flag)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(flag,"flag",mpi_errno);
diff --git a/src/mpi/comm/intercomm_create.c b/src/mpi/comm/intercomm_create.c
index d38d61a..b0a9c6e 100644
--- a/src/mpi/comm/intercomm_create.c
+++ b/src/mpi/comm/intercomm_create.c
@@ -516,7 +516,7 @@ int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate local_comm_ptr */
-            MPID_Comm_valid_ptr( local_comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( local_comm_ptr, mpi_errno, FALSE );
 	    if (local_comm_ptr) {
 		/*  Only check if local_comm_ptr valid */
 		MPIR_ERRTEST_COMM_INTRA(local_comm_ptr, mpi_errno );
@@ -544,7 +544,7 @@ int MPI_Intercomm_create(MPI_Comm local_comm, int local_leader,
 	{
 	    MPID_BEGIN_ERROR_CHECKS;
 	    {
-		MPID_Comm_valid_ptr( peer_comm_ptr, mpi_errno );
+		MPID_Comm_valid_ptr( peer_comm_ptr, mpi_errno, FALSE );
 		/* Note: In MPI 1.0, peer_comm was restricted to 
 		   intracommunicators.  In 1.1, it may be any communicator */
 
diff --git a/src/mpi/comm/intercomm_merge.c b/src/mpi/comm/intercomm_merge.c
index 10b662a..90b7c15 100644
--- a/src/mpi/comm/intercomm_merge.c
+++ b/src/mpi/comm/intercomm_merge.c
@@ -258,7 +258,7 @@ int MPI_Intercomm_merge(MPI_Comm intercomm, int high, MPI_Comm *newintracomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr && comm_ptr->comm_kind != MPID_INTERCOMM) {
 		mpi_errno = MPIR_Err_create_code( MPI_SUCCESS, 
diff --git a/src/mpi/datatype/pack.c b/src/mpi/datatype/pack.c
index 21e00f7..01b007d 100644
--- a/src/mpi/datatype/pack.c
+++ b/src/mpi/datatype/pack.c
@@ -191,7 +191,7 @@ int MPI_Pack(const void *inbuf,
 	    MPIR_ERRTEST_ARGNULL(position, "position", mpi_errno);
             /* Validate comm_ptr */
 	    /* If comm_ptr is not valid, it will be reset to null */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    MPIR_ERRTEST_DATATYPE(datatype, "datatype", mpi_errno);
diff --git a/src/mpi/datatype/pack_size.c b/src/mpi/datatype/pack_size.c
index aa48a5b..49b1cdf 100644
--- a/src/mpi/datatype/pack_size.c
+++ b/src/mpi/datatype/pack_size.c
@@ -110,7 +110,7 @@ int MPI_Pack_size(int incount,
 	    MPIR_ERRTEST_COUNT(incount, mpi_errno);
 	    MPIR_ERRTEST_ARGNULL(size, "size", mpi_errno);
 	    
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_DATATYPE(datatype, "datatype", mpi_errno);
diff --git a/src/mpi/datatype/unpack.c b/src/mpi/datatype/unpack.c
index 997c811..f5594ea 100644
--- a/src/mpi/datatype/unpack.c
+++ b/src/mpi/datatype/unpack.c
@@ -175,7 +175,7 @@ int MPI_Unpack(const void *inbuf, int insize, int *position,
 	    MPIR_ERRTEST_COUNT(outcount, mpi_errno);
 
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 
diff --git a/src/mpi/errhan/comm_call_errhandler.c b/src/mpi/errhan/comm_call_errhandler.c
index 36144f6..2f780eb 100644
--- a/src/mpi/errhan/comm_call_errhandler.c
+++ b/src/mpi/errhan/comm_call_errhandler.c
@@ -87,7 +87,7 @@ int MPI_Comm_call_errhandler(MPI_Comm comm, int errorcode)
         {
             /* Validate comm_ptr; if comm_ptr is not value, it will be reset
 	       to null */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
 	    if (comm_ptr->errhandler) {
diff --git a/src/mpi/errhan/comm_get_errhandler.c b/src/mpi/errhan/comm_get_errhandler.c
index c57fdf0..e705cf2 100644
--- a/src/mpi/errhan/comm_get_errhandler.c
+++ b/src/mpi/errhan/comm_get_errhandler.c
@@ -97,7 +97,7 @@ int MPI_Comm_get_errhandler(MPI_Comm comm, MPI_Errhandler *errhandler)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr; if comm_ptr is not valid, it will be reset to null  */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ARGNULL(errhandler,"errhandler",mpi_errno);
         }
diff --git a/src/mpi/errhan/comm_set_errhandler.c b/src/mpi/errhan/comm_set_errhandler.c
index 65812e4..872de46 100644
--- a/src/mpi/errhan/comm_set_errhandler.c
+++ b/src/mpi/errhan/comm_set_errhandler.c
@@ -110,7 +110,7 @@ int MPI_Comm_set_errhandler(MPI_Comm comm, MPI_Errhandler errhandler)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr; if comm_ptr is not valid, it will be reset to null */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 
 	    if (HANDLE_GET_KIND(errhandler) != HANDLE_KIND_BUILTIN) {
 		MPID_Errhandler_valid_ptr( errhan_ptr, mpi_errno );
diff --git a/src/mpi/errhan/errhandler_get.c b/src/mpi/errhan/errhandler_get.c
index bbbe345..9d58d6b 100644
--- a/src/mpi/errhan/errhandler_get.c
+++ b/src/mpi/errhan/errhandler_get.c
@@ -93,7 +93,7 @@ int MPI_Errhandler_get(MPI_Comm comm, MPI_Errhandler *errhandler)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr; if comm_ptr is not value, it will be reset to null */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ARGNULL(errhandler, "errhandler", mpi_errno);
         }
diff --git a/src/mpi/errhan/errhandler_set.c b/src/mpi/errhan/errhandler_set.c
index 63c85c8..1b95e2c 100644
--- a/src/mpi/errhan/errhandler_set.c
+++ b/src/mpi/errhan/errhandler_set.c
@@ -85,7 +85,7 @@ int MPI_Errhandler_set(MPI_Comm comm, MPI_Errhandler errhandler)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr; if comm_ptr is not value, it will be reset to null */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ERRHANDLER(errhandler, mpi_errno);
 
diff --git a/src/mpi/init/abort.c b/src/mpi/init/abort.c
index fabc013..be2ce16 100644
--- a/src/mpi/init/abort.c
+++ b/src/mpi/init/abort.c
@@ -109,7 +109,7 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/pt2pt/bsend.c b/src/mpi/pt2pt/bsend.c
index 7049ced..d4d3088 100644
--- a/src/mpi/pt2pt/bsend.c
+++ b/src/mpi/pt2pt/bsend.c
@@ -121,7 +121,7 @@ int MPI_Bsend(const void *buf, int count, MPI_Datatype datatype, int dest, int t
         {
 	    MPIR_ERRTEST_COUNT(count,mpi_errno);
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr) {
diff --git a/src/mpi/pt2pt/bsend_init.c b/src/mpi/pt2pt/bsend_init.c
index b6e6e63..b495e5f 100644
--- a/src/mpi/pt2pt/bsend_init.c
+++ b/src/mpi/pt2pt/bsend_init.c
@@ -92,7 +92,7 @@ int MPI_Bsend_init(const void *buf, int count, MPI_Datatype datatype,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/ibsend.c b/src/mpi/pt2pt/ibsend.c
index 9fc7560..af2b5cc 100644
--- a/src/mpi/pt2pt/ibsend.c
+++ b/src/mpi/pt2pt/ibsend.c
@@ -198,7 +198,7 @@ int MPI_Ibsend(const void *buf, int count, MPI_Datatype datatype, int dest, int
         {
 	    MPIR_ERRTEST_COUNT(count,mpi_errno);
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr) {
diff --git a/src/mpi/pt2pt/improbe.c b/src/mpi/pt2pt/improbe.c
index 80cd187..400f2df 100644
--- a/src/mpi/pt2pt/improbe.c
+++ b/src/mpi/pt2pt/improbe.c
@@ -83,7 +83,7 @@ int MPI_Improbe(int source, int tag, MPI_Comm comm, int *flag, MPI_Message *mess
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_ARGNULL(flag, "flag", mpi_errno);
diff --git a/src/mpi/pt2pt/iprobe.c b/src/mpi/pt2pt/iprobe.c
index 7bfc17c..96f74d3 100644
--- a/src/mpi/pt2pt/iprobe.c
+++ b/src/mpi/pt2pt/iprobe.c
@@ -87,7 +87,7 @@ int MPI_Iprobe(int source, int tag, MPI_Comm comm, int *flag,
         MPID_BEGIN_ERROR_CHECKS;
         {
 	    /* Validate communicator */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_ARGNULL( flag, "flag", mpi_errno );
diff --git a/src/mpi/pt2pt/irecv.c b/src/mpi/pt2pt/irecv.c
index c9b832b..ee9aa85 100644
--- a/src/mpi/pt2pt/irecv.c
+++ b/src/mpi/pt2pt/irecv.c
@@ -91,7 +91,7 @@ int MPI_Irecv(void *buf, int count, MPI_Datatype datatype, int source,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/irsend.c b/src/mpi/pt2pt/irsend.c
index 3035910..e2c6f5e 100644
--- a/src/mpi/pt2pt/irsend.c
+++ b/src/mpi/pt2pt/irsend.c
@@ -92,7 +92,7 @@ int MPI_Irsend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/isend.c b/src/mpi/pt2pt/isend.c
index dc37463..cbb6324 100644
--- a/src/mpi/pt2pt/isend.c
+++ b/src/mpi/pt2pt/isend.c
@@ -90,7 +90,7 @@ int MPI_Isend(const void *buf, int count, MPI_Datatype datatype, int dest, int t
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/issend.c b/src/mpi/pt2pt/issend.c
index 4924583..642082c 100644
--- a/src/mpi/pt2pt/issend.c
+++ b/src/mpi/pt2pt/issend.c
@@ -91,7 +91,7 @@ int MPI_Issend(const void *buf, int count, MPI_Datatype datatype, int dest, int
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/mprobe.c b/src/mpi/pt2pt/mprobe.c
index ec88688..e791ef5 100644
--- a/src/mpi/pt2pt/mprobe.c
+++ b/src/mpi/pt2pt/mprobe.c
@@ -81,7 +81,7 @@ int MPI_Mprobe(int source, int tag, MPI_Comm comm, MPI_Message *message, MPI_Sta
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/pt2pt/probe.c b/src/mpi/pt2pt/probe.c
index 840258d..00eaa20 100644
--- a/src/mpi/pt2pt/probe.c
+++ b/src/mpi/pt2pt/probe.c
@@ -83,7 +83,7 @@ int MPI_Probe(int source, int tag, MPI_Comm comm, MPI_Status *status)
         MPID_BEGIN_ERROR_CHECKS;
         {
 	    /* Validate communicator */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 
 	    MPIR_ERRTEST_RECV_TAG(tag,mpi_errno);
diff --git a/src/mpi/pt2pt/recv.c b/src/mpi/pt2pt/recv.c
index bede1e0..41eeff8 100644
--- a/src/mpi/pt2pt/recv.c
+++ b/src/mpi/pt2pt/recv.c
@@ -100,7 +100,7 @@ int MPI_Recv(void *buf, int count, MPI_Datatype datatype, int source, int tag,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/recv_init.c b/src/mpi/pt2pt/recv_init.c
index 23556ce..3f16c60 100644
--- a/src/mpi/pt2pt/recv_init.c
+++ b/src/mpi/pt2pt/recv_init.c
@@ -93,7 +93,7 @@ int MPI_Recv_init(void *buf, int count, MPI_Datatype datatype, int source,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/rsend.c b/src/mpi/pt2pt/rsend.c
index 82394f1..eaf6f09 100644
--- a/src/mpi/pt2pt/rsend.c
+++ b/src/mpi/pt2pt/rsend.c
@@ -88,7 +88,7 @@ int MPI_Rsend(const void *buf, int count, MPI_Datatype datatype, int dest, int t
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/rsend_init.c b/src/mpi/pt2pt/rsend_init.c
index 3061168..f39284d 100644
--- a/src/mpi/pt2pt/rsend_init.c
+++ b/src/mpi/pt2pt/rsend_init.c
@@ -93,7 +93,7 @@ int MPI_Rsend_init(const void *buf, int count, MPI_Datatype datatype, int dest,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/send.c b/src/mpi/pt2pt/send.c
index 4671235..77dc0fe 100644
--- a/src/mpi/pt2pt/send.c
+++ b/src/mpi/pt2pt/send.c
@@ -93,7 +93,7 @@ int MPI_Send(const void *buf, int count, MPI_Datatype datatype, int dest, int ta
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/send_init.c b/src/mpi/pt2pt/send_init.c
index 3002fa9..c20b136 100644
--- a/src/mpi/pt2pt/send_init.c
+++ b/src/mpi/pt2pt/send_init.c
@@ -93,7 +93,7 @@ int MPI_Send_init(const void *buf, int count, MPI_Datatype datatype, int dest,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/sendrecv.c b/src/mpi/pt2pt/sendrecv.c
index e8a1b9f..bc9c374 100644
--- a/src/mpi/pt2pt/sendrecv.c
+++ b/src/mpi/pt2pt/sendrecv.c
@@ -104,7 +104,7 @@ int MPI_Sendrecv(const void *sendbuf, int sendcount, MPI_Datatype sendtype,
         MPID_BEGIN_ERROR_CHECKS;
         {
 	    /* Validate communicator */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    /* Validate count */
diff --git a/src/mpi/pt2pt/sendrecv_rep.c b/src/mpi/pt2pt/sendrecv_rep.c
index c928bec..3c48c9d 100644
--- a/src/mpi/pt2pt/sendrecv_rep.c
+++ b/src/mpi/pt2pt/sendrecv_rep.c
@@ -92,7 +92,7 @@ int MPI_Sendrecv_replace(void *buf, int count, MPI_Datatype datatype,
         MPID_BEGIN_ERROR_CHECKS;
         {
 	    /* Validate communicator */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    /* Validate count */
diff --git a/src/mpi/pt2pt/ssend.c b/src/mpi/pt2pt/ssend.c
index 197a4fd..60dd92e 100644
--- a/src/mpi/pt2pt/ssend.c
+++ b/src/mpi/pt2pt/ssend.c
@@ -87,7 +87,7 @@ int MPI_Ssend(const void *buf, int count, MPI_Datatype datatype, int dest, int t
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/pt2pt/ssend_init.c b/src/mpi/pt2pt/ssend_init.c
index 92470a6..2b30a47 100644
--- a/src/mpi/pt2pt/ssend_init.c
+++ b/src/mpi/pt2pt/ssend_init.c
@@ -90,7 +90,7 @@ int MPI_Ssend_init(const void *buf, int count, MPI_Datatype datatype, int dest,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    
 	    MPIR_ERRTEST_COUNT(count, mpi_errno);
diff --git a/src/mpi/rma/win_allocate.c b/src/mpi/rma/win_allocate.c
index 47c797a..82348f5 100644
--- a/src/mpi/rma/win_allocate.c
+++ b/src/mpi/rma/win_allocate.c
@@ -106,7 +106,7 @@ int MPI_Win_allocate(MPI_Aint size, int disp_unit, MPI_Info info,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             if (size < 0)
                 mpi_errno = MPIR_Err_create_code( MPI_SUCCESS, 
diff --git a/src/mpi/rma/win_allocate_shared.c b/src/mpi/rma/win_allocate_shared.c
index fac911f..8da1b5d 100644
--- a/src/mpi/rma/win_allocate_shared.c
+++ b/src/mpi/rma/win_allocate_shared.c
@@ -116,7 +116,7 @@ int MPI_Win_allocate_shared(MPI_Aint size, int disp_unit, MPI_Info info, MPI_Com
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIU_ERR_CHKANDJUMP1(disp_unit <= 0, mpi_errno, MPI_ERR_ARG,
diff --git a/src/mpi/rma/win_create.c b/src/mpi/rma/win_create.c
index 5cb8d73..96e84b9 100644
--- a/src/mpi/rma/win_create.c
+++ b/src/mpi/rma/win_create.c
@@ -134,7 +134,7 @@ int MPI_Win_create(void *base, MPI_Aint size, int disp_unit, MPI_Info info,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+	    MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             if (size < 0)
                 mpi_errno = MPIR_Err_create_code( MPI_SUCCESS, 
diff --git a/src/mpi/rma/win_create_dynamic.c b/src/mpi/rma/win_create_dynamic.c
index a0d9ba3..2acf081 100644
--- a/src/mpi/rma/win_create_dynamic.c
+++ b/src/mpi/rma/win_create_dynamic.c
@@ -122,7 +122,7 @@ int MPI_Win_create_dynamic(MPI_Info info, MPI_Comm comm, MPI_Win *win)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate pointers */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
diff --git a/src/mpi/spawn/comm_accept.c b/src/mpi/spawn/comm_accept.c
index dccf247..d1ab223 100644
--- a/src/mpi/spawn/comm_accept.c
+++ b/src/mpi/spawn/comm_accept.c
@@ -100,7 +100,7 @@ int MPI_Comm_accept(const char *port_name, MPI_Info info, int root, MPI_Comm com
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
         }
         MPID_END_ERROR_CHECKS;
diff --git a/src/mpi/spawn/comm_connect.c b/src/mpi/spawn/comm_connect.c
index 141f3b4..6d16e46 100644
--- a/src/mpi/spawn/comm_connect.c
+++ b/src/mpi/spawn/comm_connect.c
@@ -99,7 +99,7 @@ int MPI_Comm_connect(const char *port_name, MPI_Info info, int root, MPI_Comm co
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/spawn/comm_disconnect.c b/src/mpi/spawn/comm_disconnect.c
index 8a50774..5812a25 100644
--- a/src/mpi/spawn/comm_disconnect.c
+++ b/src/mpi/spawn/comm_disconnect.c
@@ -82,7 +82,7 @@ int MPI_Comm_disconnect(MPI_Comm * comm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno)
 	    {
diff --git a/src/mpi/spawn/comm_spawn.c b/src/mpi/spawn/comm_spawn.c
index 49c307d..7749ab2 100644
--- a/src/mpi/spawn/comm_spawn.c
+++ b/src/mpi/spawn/comm_spawn.c
@@ -95,7 +95,7 @@ int MPI_Comm_spawn(const char *command, char *argv[], int maxprocs, MPI_Info inf
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
 
diff --git a/src/mpi/spawn/comm_spawn_multiple.c b/src/mpi/spawn/comm_spawn_multiple.c
index 4d22ee4..845ef62 100644
--- a/src/mpi/spawn/comm_spawn_multiple.c
+++ b/src/mpi/spawn/comm_spawn_multiple.c
@@ -101,7 +101,7 @@ int MPI_Comm_spawn_multiple(int count, char *array_of_commands[],
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
 
diff --git a/src/mpi/topo/cart_coords.c b/src/mpi/topo/cart_coords.c
index 6b3a3c5..67f3681 100644
--- a/src/mpi/topo/cart_coords.c
+++ b/src/mpi/topo/cart_coords.c
@@ -88,7 +88,7 @@ int MPI_Cart_coords(MPI_Comm comm, int rank, int maxdims, int coords[])
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
diff --git a/src/mpi/topo/cart_create.c b/src/mpi/topo/cart_create.c
index 494557a..ff26d5f 100644
--- a/src/mpi/topo/cart_create.c
+++ b/src/mpi/topo/cart_create.c
@@ -268,7 +268,7 @@ int MPI_Cart_create(MPI_Comm comm_old, int ndims, const int dims[],
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr) {
diff --git a/src/mpi/topo/cart_get.c b/src/mpi/topo/cart_get.c
index 8b0c95b..2789010 100644
--- a/src/mpi/topo/cart_get.c
+++ b/src/mpi/topo/cart_get.c
@@ -92,7 +92,7 @@ int MPI_Cart_get(MPI_Comm comm, int maxdims, int dims[], int periods[],
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/topo/cart_map.c b/src/mpi/topo/cart_map.c
index 000df13..2bd4c81 100644
--- a/src/mpi/topo/cart_map.c
+++ b/src/mpi/topo/cart_map.c
@@ -158,7 +158,7 @@ int MPI_Cart_map(MPI_Comm comm, int ndims, const int dims[], const int periods[]
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(newrank,"newrank",mpi_errno);
diff --git a/src/mpi/topo/cart_rank.c b/src/mpi/topo/cart_rank.c
index c730bb2..9b1963e 100644
--- a/src/mpi/topo/cart_rank.c
+++ b/src/mpi/topo/cart_rank.c
@@ -117,7 +117,7 @@ int MPI_Cart_rank(MPI_Comm comm, const int coords[], int *rank)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(rank,"rank",mpi_errno);
diff --git a/src/mpi/topo/cart_shift.c b/src/mpi/topo/cart_shift.c
index 66d4007..626527c 100644
--- a/src/mpi/topo/cart_shift.c
+++ b/src/mpi/topo/cart_shift.c
@@ -150,7 +150,7 @@ int MPI_Cart_shift(MPI_Comm comm, int direction, int disp, int *rank_source,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 
diff --git a/src/mpi/topo/cart_sub.c b/src/mpi/topo/cart_sub.c
index 551ed89..7a7527c 100644
--- a/src/mpi/topo/cart_sub.c
+++ b/src/mpi/topo/cart_sub.c
@@ -91,7 +91,7 @@ int MPI_Cart_sub(MPI_Comm comm, const int remain_dims[], MPI_Comm *newcomm)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/topo/cartdim_get.c b/src/mpi/topo/cartdim_get.c
index 7f28efb..0fffc29 100644
--- a/src/mpi/topo/cartdim_get.c
+++ b/src/mpi/topo/cartdim_get.c
@@ -84,7 +84,7 @@ int MPI_Cartdim_get(MPI_Comm comm, int *ndims)
         {
 	    MPIR_ERRTEST_ARGNULL(ndims,"ndims",mpi_errno);
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
 	    /* If comm_ptr is not valid, it will be reset to null */
             if (mpi_errno) goto fn_fail;
         }
diff --git a/src/mpi/topo/dist_gr_create.c b/src/mpi/topo/dist_gr_create.c
index 1252087..e8ec351 100644
--- a/src/mpi/topo/dist_gr_create.c
+++ b/src/mpi/topo/dist_gr_create.c
@@ -124,7 +124,7 @@ int MPI_Dist_graph_create(MPI_Comm comm_old, int n, const int sources[],
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* If comm_ptr is not valid, it will be reset to null */
             if (comm_ptr) {
                 MPIR_ERRTEST_COMM_INTRA(comm_ptr, mpi_errno);
diff --git a/src/mpi/topo/dist_gr_create_adj.c b/src/mpi/topo/dist_gr_create_adj.c
index 215c18a..ddeed10 100644
--- a/src/mpi/topo/dist_gr_create_adj.c
+++ b/src/mpi/topo/dist_gr_create_adj.c
@@ -108,7 +108,7 @@ int MPI_Dist_graph_create_adjacent(MPI_Comm comm_old,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             /* If comm_ptr is not valid, it will be reset to null */
             if (comm_ptr) {
diff --git a/src/mpi/topo/dist_gr_neighb_count.c b/src/mpi/topo/dist_gr_neighb_count.c
index b0da3af..39598d4 100644
--- a/src/mpi/topo/dist_gr_neighb_count.c
+++ b/src/mpi/topo/dist_gr_neighb_count.c
@@ -104,7 +104,7 @@ int MPI_Dist_graph_neighbors_count(MPI_Comm comm, int *indegree, int *outdegree,
     {
         MPID_BEGIN_ERROR_CHECKS;
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_ARGNULL(indegree, "indegree", mpi_errno);
diff --git a/src/mpi/topo/graph_get.c b/src/mpi/topo/graph_get.c
index 7b8d97a..74d8590 100644
--- a/src/mpi/topo/graph_get.c
+++ b/src/mpi/topo/graph_get.c
@@ -89,7 +89,7 @@ int MPI_Graph_get(MPI_Comm comm, int maxindex, int maxedges,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    
diff --git a/src/mpi/topo/graph_map.c b/src/mpi/topo/graph_map.c
index a518e17..475676b 100644
--- a/src/mpi/topo/graph_map.c
+++ b/src/mpi/topo/graph_map.c
@@ -132,7 +132,7 @@ int MPI_Graph_map(MPI_Comm comm, int nnodes, const int indx[], const int edges[]
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(newrank,"newrank",mpi_errno);
diff --git a/src/mpi/topo/graph_nbr.c b/src/mpi/topo/graph_nbr.c
index 2b0aaa1..51c1e0d 100644
--- a/src/mpi/topo/graph_nbr.c
+++ b/src/mpi/topo/graph_nbr.c
@@ -120,7 +120,7 @@ int MPI_Graph_neighbors(MPI_Comm comm, int rank, int maxneighbors,
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(neighbors,"neighbors",mpi_errno);
diff --git a/src/mpi/topo/graphcreate.c b/src/mpi/topo/graphcreate.c
index 4cb4ff5..e13b073 100644
--- a/src/mpi/topo/graphcreate.c
+++ b/src/mpi/topo/graphcreate.c
@@ -200,7 +200,7 @@ int MPI_Graph_create(MPI_Comm comm_old, int nnodes, const int indx[],
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    if (comm_ptr) {
diff --git a/src/mpi/topo/graphdimsget.c b/src/mpi/topo/graphdimsget.c
index cdf7e00..ff13528 100644
--- a/src/mpi/topo/graphdimsget.c
+++ b/src/mpi/topo/graphdimsget.c
@@ -85,7 +85,7 @@ int MPI_Graphdims_get(MPI_Comm comm, int *nnodes, int *nedges)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(nnodes, "nnodes", mpi_errno );
diff --git a/src/mpi/topo/graphnbrcnt.c b/src/mpi/topo/graphnbrcnt.c
index c601520..cf0faab 100644
--- a/src/mpi/topo/graphnbrcnt.c
+++ b/src/mpi/topo/graphnbrcnt.c
@@ -112,7 +112,7 @@ int MPI_Graph_neighbors_count(MPI_Comm comm, int rank, int *nneighbors)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    MPIR_ERRTEST_ARGNULL(nneighbors, "nneighbors", mpi_errno);
 	    /* If comm_ptr is not value, it will be reset to null */
diff --git a/src/mpi/topo/inhb_allgather.c b/src/mpi/topo/inhb_allgather.c
index 8729486..b310890 100644
--- a/src/mpi/topo/inhb_allgather.c
+++ b/src/mpi/topo/inhb_allgather.c
@@ -190,7 +190,7 @@ int MPI_Ineighbor_allgather(const void *sendbuf, int sendcount, MPI_Datatype sen
                 if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
diff --git a/src/mpi/topo/inhb_allgatherv.c b/src/mpi/topo/inhb_allgatherv.c
index 5ea2111..7513582 100644
--- a/src/mpi/topo/inhb_allgatherv.c
+++ b/src/mpi/topo/inhb_allgatherv.c
@@ -195,7 +195,7 @@ int MPI_Ineighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype se
                 if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
 
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
diff --git a/src/mpi/topo/inhb_alltoall.c b/src/mpi/topo/inhb_alltoall.c
index 9d94772..5539a32 100644
--- a/src/mpi/topo/inhb_alltoall.c
+++ b/src/mpi/topo/inhb_alltoall.c
@@ -196,7 +196,7 @@ int MPI_Ineighbor_alltoall(const void *sendbuf, int sendcount, MPI_Datatype send
                 if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
diff --git a/src/mpi/topo/inhb_alltoallv.c b/src/mpi/topo/inhb_alltoallv.c
index 330fdf4..2874522 100644
--- a/src/mpi/topo/inhb_alltoallv.c
+++ b/src/mpi/topo/inhb_alltoallv.c
@@ -201,7 +201,7 @@ int MPI_Ineighbor_alltoallv(const void *sendbuf, const int sendcounts[], const i
                 if (mpi_errno != MPI_SUCCESS) goto fn_fail;
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
         }
diff --git a/src/mpi/topo/inhb_alltoallw.c b/src/mpi/topo/inhb_alltoallw.c
index 1d0dc2f..46b9db2 100644
--- a/src/mpi/topo/inhb_alltoallw.c
+++ b/src/mpi/topo/inhb_alltoallw.c
@@ -173,7 +173,7 @@ int MPI_Ineighbor_alltoallw(const void *sendbuf, const int sendcounts[], const M
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             MPIR_ERRTEST_ARGNULL(request, "request", mpi_errno);
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
diff --git a/src/mpi/topo/nhb_allgather.c b/src/mpi/topo/nhb_allgather.c
index 2d0c91d..f8823cc 100644
--- a/src/mpi/topo/nhb_allgather.c
+++ b/src/mpi/topo/nhb_allgather.c
@@ -146,7 +146,7 @@ int MPI_Neighbor_allgather(const void *sendbuf, int sendcount, MPI_Datatype send
                 MPID_Datatype_committed_ptr(recvtype_ptr, mpi_errno);
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/topo/nhb_allgatherv.c b/src/mpi/topo/nhb_allgatherv.c
index 2b91abb..22d9171 100644
--- a/src/mpi/topo/nhb_allgatherv.c
+++ b/src/mpi/topo/nhb_allgatherv.c
@@ -147,7 +147,7 @@ int MPI_Neighbor_allgatherv(const void *sendbuf, int sendcount, MPI_Datatype sen
                 MPID_Datatype_committed_ptr(recvtype_ptr, mpi_errno);
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/topo/nhb_alltoall.c b/src/mpi/topo/nhb_alltoall.c
index b5c9520..2f4d4fc 100644
--- a/src/mpi/topo/nhb_alltoall.c
+++ b/src/mpi/topo/nhb_alltoall.c
@@ -150,7 +150,7 @@ int MPI_Neighbor_alltoall(const void *sendbuf, int sendcount, MPI_Datatype sendt
                 MPID_Datatype_committed_ptr(recvtype_ptr, mpi_errno);
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/topo/nhb_alltoallv.c b/src/mpi/topo/nhb_alltoallv.c
index 9451ba1..1a67fe7 100644
--- a/src/mpi/topo/nhb_alltoallv.c
+++ b/src/mpi/topo/nhb_alltoallv.c
@@ -145,7 +145,7 @@ int MPI_Neighbor_alltoallv(const void *sendbuf, const int sendcounts[], const in
                 MPID_Datatype_committed_ptr(recvtype_ptr, mpi_errno);
             }
 
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/topo/nhb_alltoallw.c b/src/mpi/topo/nhb_alltoallw.c
index 9ebe2ce..8be4077 100644
--- a/src/mpi/topo/nhb_alltoallw.c
+++ b/src/mpi/topo/nhb_alltoallw.c
@@ -130,7 +130,7 @@ int MPI_Neighbor_alltoallw(const void *sendbuf, const int sendcounts[], const MP
     {
         MPID_BEGIN_ERROR_CHECKS
         {
-            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, FALSE );
             /* TODO more checks may be appropriate (counts, in_place, buffer aliasing, etc) */
             if (mpi_errno != MPI_SUCCESS) goto fn_fail;
         }
diff --git a/src/mpi/topo/topo_test.c b/src/mpi/topo/topo_test.c
index e9f1a6c..62dbb5d 100644
--- a/src/mpi/topo/topo_test.c
+++ b/src/mpi/topo/topo_test.c
@@ -88,7 +88,7 @@ int MPI_Topo_test(MPI_Comm comm, int *status)
         MPID_BEGIN_ERROR_CHECKS;
         {
             /* Validate comm_ptr */
-            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
             if (mpi_errno) goto fn_fail;
 	    /* If comm_ptr is not valid, it will be reset to null */
 	    MPIR_ERRTEST_ARGNULL(status, "status", mpi_errno);
diff --git a/src/util/procmap/local_proc.c b/src/util/procmap/local_proc.c
index 471b56b..2e9f016 100644
--- a/src/util/procmap/local_proc.c
+++ b/src/util/procmap/local_proc.c
@@ -232,7 +232,7 @@ fn_fail:
 int MPIU_Get_internode_rank(MPID_Comm *comm_ptr, int r)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+    MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
     MPIU_Assert(mpi_errno == MPI_SUCCESS);
     MPIU_Assert(r < comm_ptr->remote_size);
     MPIU_Assert(comm_ptr->comm_kind == MPID_INTRACOMM);
@@ -253,7 +253,7 @@ int MPIU_Get_internode_rank(MPID_Comm *comm_ptr, int r)
 int MPIU_Get_intranode_rank(MPID_Comm *comm_ptr, int r)
 {
     int mpi_errno = MPI_SUCCESS;
-    MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+    MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
     MPIU_Assert(mpi_errno == MPI_SUCCESS);
     MPIU_Assert(r < comm_ptr->remote_size);
     MPIU_Assert(comm_ptr->comm_kind == MPID_INTRACOMM);

http://git.mpich.org/mpich.git/commitdiff/1f0ee13674dff8ad44c647c4748e677c7fcfb756

commit 1f0ee13674dff8ad44c647c4748e677c7fcfb756
Author: Wesley Bland <wbland at anl.gov>
Date:   Tue Apr 29 21:40:19 2014 -0500

    Add MPIX_Comm_agree
    
    Adds function implementing an agreement algorithm for the user. This function
    lets the user manually perform an agreement as well as detect unacknowledged
    failures.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index 25fa149..33e1181 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -1539,6 +1539,7 @@ int MPIX_Comm_failure_ack(MPI_Comm comm);
 int MPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 int MPIX_Comm_revoke(MPI_Comm comm);
 int MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm);
+int MPIX_Comm_agree(MPI_Comm comm, int *flag);
 
 
 /* End Prototypes */
@@ -2178,6 +2179,7 @@ int PMPIX_Comm_failure_ack(MPI_Comm comm);
 int PMPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 int PMPIX_Comm_revoke(MPI_Comm comm);
 int PMPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm);
+int PMPIX_Comm_agree(MPI_Comm comm, int *flag);
 
 
 #endif  /* MPI_BUILD_PROFILING */
diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 8496922..79a0bc8 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -2827,6 +2827,23 @@ int MPID_Comm_get_all_failed_procs(MPID_Comm *comm_ptr, MPID_Group **failed_grou
 int MPID_Comm_revoke(MPID_Comm *comm, int is_remote);
 
 /*@
+  MPID_Comm_agree - MPID implementation of the last phase of the agreement
+
+  Input Parameters:
+. comm - communicator
+. bitarray - Bit array of all of the failures that have been discovered in comm
+. flag - flag input for agree from MPIX_Comm_agree
+. new_fail - If there is a new failure that we need to propagate, this should be true
+
+  Output Parameters:
+. flag - Bitwise AND of all of the flag input values
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_agree(MPID_Comm *comm, uint32_t *bitarray, int *flag, int new_fail);
+
+/*@
   MPID_Send - MPID entry point for MPI_Send
 
   Notes:
@@ -3812,7 +3829,8 @@ int MPID_VCR_Get_lpid(MPID_VCR vcr, int * lpid_ptr);
 #define MPIR_TOPO_B_TAG               27
 #define MPIR_REDUCE_SCATTER_BLOCK_TAG 28
 #define MPIR_SHRINK_TAG               29
-#define MPIR_FIRST_NBC_TAG            30
+#define MPIR_AGREE_TAG                30
+#define MPIR_FIRST_NBC_TAG            31
 
 /* These macros must be used carefully. These macros will not work with
  * negative tags. By definition, users are not to use negative tags and the
@@ -4116,6 +4134,7 @@ void MPIR_Free_err_dyncodes( void );
 int MPIR_Comm_idup_impl(MPID_Comm *comm_ptr, MPID_Comm **newcomm, MPID_Request **reqp);
 
 int MPIR_Comm_shrink(MPID_Comm *comm_ptr, MPID_Comm **newcomm_ptr);
+int MPIR_Comm_agree(MPID_Comm *comm_ptr, int *flag);
 
 int MPIR_Allreduce_group(void *sendbuf, void *recvbuf, int count,
                          MPI_Datatype datatype, MPI_Op op, MPID_Comm *comm_ptr,
diff --git a/src/mpi/comm/Makefile.mk b/src/mpi/comm/Makefile.mk
index c133981..cb32747 100644
--- a/src/mpi/comm/Makefile.mk
+++ b/src/mpi/comm/Makefile.mk
@@ -30,7 +30,8 @@ mpi_sources +=                       \
     src/mpi/comm/comm_failure_ack.c            \
     src/mpi/comm/comm_failure_get_acked.c      \
     src/mpi/comm/comm_revoke.c                 \
-    src/mpi/comm/comm_shrink.c
+    src/mpi/comm/comm_shrink.c                 \
+    src/mpi/comm/comm_agree.c
 
 mpi_core_sources += \
     src/mpi/comm/commutil.c
diff --git a/src/mpi/comm/comm_agree.c b/src/mpi/comm/comm_agree.c
new file mode 100644
index 0000000..3ee995e
--- /dev/null
+++ b/src/mpi/comm/comm_agree.c
@@ -0,0 +1,189 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpiimpl.h"
+#include "mpicomm.h"
+#include <stdint.h>
+
+/* -- Begin Profiling Symbol Block for routine MPIX_Comm_agree */
+#if defined(HAVE_PRAGMA_WEAK)
+#pragma weak MPIX_Comm_agree = PMPIX_Comm_agree
+#elif defined(HAVE_PRAGMA_HP_SEC_DEF)
+#pragma _HP_SECONDARY_DEF PMPIX_Comm_agree  MPIX_Comm_agree
+#elif defined(HAVE_PRAGMA_CRI_DUP)
+#pragma _CRI duplicate MPIX_Comm_agree as PMPIX_Comm_agree
+#endif
+/* -- End Profiling Symbol Block */
+
+/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPIX_Comm_agree
+#define MPIX_Comm_agree PMPIX_Comm_agree
+#endif
+
+#undef FUNCNAME
+#define FUNCNAME MPIR_Comm_agree
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPIR_Comm_agree(MPID_Comm *comm_ptr, int *flag)
+{
+    int mpi_errno = MPI_SUCCESS, mpi_errno_tmp = MPI_SUCCESS;
+    MPID_Group *comm_grp, *failed_grp, *new_group_ptr, *global_failed;
+    int result, success = 1;
+    int errflag = 0;
+    int values[2];
+
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIR_COMM_AGREE);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIR_COMM_AGREE);
+
+    MPIR_Comm_group_impl(comm_ptr, &comm_grp);
+
+    /* Get the locally known (not acknowledged) group of failed procs */
+    mpi_errno = MPID_Comm_failure_get_acked(comm_ptr, &failed_grp);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* First decide on the group of failed procs. */
+    mpi_errno = MPID_Comm_get_all_failed_procs(comm_ptr, &global_failed, MPIR_AGREE_TAG);
+    if (mpi_errno) errflag = 1;
+
+    mpi_errno = MPIR_Group_compare_impl(failed_grp, global_failed, &result);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* Create a subgroup without the failed procs */
+    mpi_errno = MPIR_Group_difference_impl(comm_grp, global_failed, &new_group_ptr);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* If that group isn't the same as what we think is failed locally, then
+     * mark it as such. */
+    if (result == MPI_UNEQUAL || errflag)
+        success = 0;
+
+    /* Do an allreduce to decide whether or not anyone thinks the group
+     * has changed */
+    mpi_errno_tmp = MPIR_Allreduce_group(MPI_IN_PLACE, &success, 1, MPI_INT, MPI_MIN, comm_ptr,
+                                         new_group_ptr, MPIR_AGREE_TAG, &errflag);
+    if (!success || errflag || mpi_errno_tmp)
+        success = 0;
+
+    values[0] = success;
+    values[1] = *flag;
+
+    /* Determine both the result of this function (mpi_errno) and the result
+     * of flag that will be returned to the user. */
+    MPIR_Allreduce_group(MPI_IN_PLACE, values, 2, MPI_INT, MPI_BAND, comm_ptr,
+                         new_group_ptr, MPIR_AGREE_TAG, &errflag);
+    /* Ignore the result of the operation this time. Everyone will either
+     * return a failure because of !success earlier or they will return
+     * something useful for flag because of this operation. If there was a new
+     * failure in between the first allreduce and the second one, it's ignored
+     * here. */
+
+    if (failed_grp != MPID_Group_empty)
+        MPIR_Group_release(failed_grp);
+    MPIR_Group_release(new_group_ptr);
+    MPIR_Group_release(comm_grp);
+    if (global_failed != MPID_Group_empty)
+        MPIR_Group_release(global_failed);
+
+    success = values[0];
+    *flag = values[1];
+
+    if (!success) {
+        MPIU_ERR_SET(mpi_errno_tmp, MPIX_ERR_PROC_FAILED, "**mpix_comm_agree");
+        MPIU_ERR_ADD(mpi_errno, mpi_errno_tmp);
+    }
+
+  fn_exit:
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPIX_Comm_agree
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/*@
+MPIX_Comm_agree - Performs agreement operation on comm
+
+Input Parameters:
++ comm - communicator (handle)
+
+Output Parameters:
+. newcomm - new communicator (handle)
+
+.N Threadsafe
+
+.N Fortran
+
+.N Errors
+.N MPI_SUCCESS
+.N MPI_ERR_COMM
+
+@*/
+int MPIX_Comm_agree(MPI_Comm comm, int *flag)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL;
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIX_COMM_AGREE);
+
+    MPIR_ERRTEST_INITIALIZED_ORDIE();
+
+    MPIU_THREAD_CS_ENTER(ALLFUNC,);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIX_COMM_AGREE);
+
+    /* Validate parameters, and convert MPI object handles to object pointers */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            MPIR_ERRTEST_COMM(comm, mpi_errno);
+        }
+        MPID_END_ERROR_CHECKS;
+
+        MPID_Comm_get_ptr( comm, comm_ptr );
+
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            /* Validate comm_ptr */
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(mpi_errno) && mpi_errno)
+                goto fn_fail;
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#else
+    {
+        MPID_Comm_get_ptr( comm, comm_ptr );
+    }
+#endif
+
+    /* ... body of routine ... */
+    mpi_errno = MPIR_Comm_agree(comm_ptr, flag);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* ... end of body of routine ... */
+
+  fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIX_COMM_AGREE);
+    MPIU_THREAD_CS_EXIT(ALLFUNC,);
+    return mpi_errno;
+
+  fn_fail:
+    /* --BEGIN ERROR HANDLING-- */
+#ifdef HAVE_ERROR_CHECKING
+    {
+        mpi_errno =
+            MPIR_Err_create_code(mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__,
+                                 MPI_ERR_OTHER, "**mpix_comm_agree",
+                                 "**mpix_comm_agree %C", comm);
+    }
+#endif
+    mpi_errno = MPIR_Err_return_comm(comm_ptr, FCNAME, mpi_errno);
+    goto fn_exit;
+    /* --END ERROR HANDLING-- */
+}
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 35551d0..060b3b4 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -1105,6 +1105,8 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **mpix_comm_revoke %C:MPIX_Comm_revoke(%C) failed
 **mpix_comm_shrink:MPIX_Comm_shrink failed
 **mpix_comm_shrink %C %p:MPIX_Comm_shrink(%C, new_comm=%p) failed
+**mpix_comm_agree:MPIX_Comm_agree failed
+**mpix_comm_agree %C:MPIX_Comm_agree(%C) failed
 **mpi_intercomm_create:MPI_Intercomm_create failed
 **mpi_intercomm_create %C %d %C %d %d %p:MPI_Intercomm_create(%C, local_leader=%d, %C, remote_leader=%d, tag=%d, newintercomm=%p) failed
 **mpi_intercomm_merge:MPI_Intercomm_merge failed
diff --git a/src/mpid/ch3/src/Makefile.mk b/src/mpid/ch3/src/Makefile.mk
index 30c64bd..4272c26 100644
--- a/src/mpid/ch3/src/Makefile.mk
+++ b/src/mpid/ch3/src/Makefile.mk
@@ -33,6 +33,7 @@ mpi_core_sources +=                          \
     src/mpid/ch3/src/mpid_comm_failure_ack.c               \
     src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c      \
     src/mpid/ch3/src/mpid_comm_revoke.c                    \
+    src/mpid/ch3/src/mpid_comm_agree.c                     \
     src/mpid/ch3/src/mpid_finalize.c                       \
     src/mpid/ch3/src/mpid_get_universe_size.c              \
     src/mpid/ch3/src/mpid_getpname.c                       \
diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index c8dbde6..ea8bf34 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -941,7 +941,7 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            if (rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+            if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
                 MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                             "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
                             rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
@@ -973,7 +973,7 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            if (rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+            if (rreq->dev.match.parts.tag != MPIR_AGREE_TAG && rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
                 MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                             "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
                             rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
diff --git a/src/mpid/ch3/src/mpid_comm_agree.c b/src/mpid/ch3/src/mpid_comm_agree.c
new file mode 100644
index 0000000..6377397
--- /dev/null
+++ b/src/mpid/ch3/src/mpid_comm_agree.c
@@ -0,0 +1,119 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2011 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpidimpl.h"
+
+static int get_parent(int rank, uint32_t *bitarray)
+{
+    int parent = -1;
+    int failed = 1;
+    uint32_t mask;
+
+    while (failed) {
+        mask = 0x80000000;
+        parent = rank == 0 ? -1: (rank - 1) / 2;
+
+        /* Check whether the process is in the failed group */
+        if (parent != -1) {
+            mask >>= parent % (sizeof(uint32_t) * 8);
+            failed = bitarray[parent / (sizeof(uint32_t) * 8)] & mask;
+            if (failed) {
+                rank = parent;
+            }
+        } else failed = 0;
+    }
+
+    return parent;
+}
+
+static void get_children(int rank, int size, uint32_t *bitarray, int *children, int *nchildren)
+{
+    int i;
+    int child;
+
+    for (i = 1; i <= 2; i++) {
+        /* Calculate the child */
+        child = 2 * rank + i;
+        if (child >= size) child = -1;
+
+        /* Check if the child is alive. If not, call get_children on the child
+         * to inherit its children */
+        if (child != -1) {
+            if (bitarray[child / (sizeof(uint32_t) * 8)] & (0x80000000 >> (child % (sizeof(uint32_t) * 8)))) {
+                get_children(child, size, bitarray, children, nchildren);
+            } else {
+                children[*nchildren] = child;
+                (*nchildren)++;
+            }
+        }
+    }
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_agree
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_Comm_agree(MPID_Comm *comm_ptr, uint32_t *bitarray, int *flag, int new_fail)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int *children, nchildren = 0, parent;
+    int i;
+    int errflag = new_fail;
+    int tmp_flag;
+
+    MPID_MPI_STATE_DECL(MPID_STATE_MPID_COMM_AGREE);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPID_COMM_AGREE);
+
+    children = (int *) MPIU_Malloc(sizeof(int) * ((comm_ptr->local_size) / 2));
+
+    /* Calculate my parent and children */
+    parent = get_parent(comm_ptr->rank, bitarray);
+    get_children(comm_ptr->rank, comm_ptr->local_size, bitarray, children, &nchildren);
+
+    /* Get a flag value from each of my children */
+    for (i = 0; i < nchildren; i++) {
+        if (children[i] == -1) continue;
+        mpi_errno = MPIC_Recv(&tmp_flag, 1, MPI_INT, children[i], MPIR_AGREE_TAG,
+                comm_ptr->handle, MPI_STATUS_IGNORE, &errflag);
+        if (mpi_errno) return mpi_errno;
+        if (errflag) new_fail = 1;
+
+        *flag &= tmp_flag;
+    }
+
+    /* If I'm not the root */
+    if (-1 != parent) {
+        /* Send my message to my parent */
+        mpi_errno = MPIC_Send(flag, 1, MPI_INT, parent, MPIR_AGREE_TAG,
+                comm_ptr->handle, &errflag);
+        if (mpi_errno) return mpi_errno;
+
+        /* Receive the result from my parent */
+        mpi_errno = MPIC_Recv(flag, 1, MPI_INT, parent, MPIR_AGREE_TAG,
+                comm_ptr->handle, MPI_STATUS_IGNORE, &errflag);
+        if (mpi_errno) return mpi_errno;
+        if (errflag) new_fail = 1;
+    }
+
+    /* Send my flag value to my children */
+    for (i = 0; i < nchildren; i++) {
+        if (children[i] == -1) continue;
+        mpi_errno = MPIC_Send(flag, 1, MPI_INT, children[i], MPIR_AGREE_TAG,
+                comm_ptr->handle, &errflag);
+        if (mpi_errno) return mpi_errno;
+    }
+
+    MPIU_DBG_MSG_D(CH3_OTHER, VERBOSE, "New failure: %d", new_fail);
+
+    MPIU_ERR_CHKANDJUMP1(new_fail, mpi_errno, MPIX_ERR_PROC_FAILED, "**mpix_comm_agree", "**mpix_comm_agree %C", comm_ptr);
+
+    MPIU_Free(children);
+
+  fn_exit:
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/src/mpid_iprobe.c b/src/mpid/ch3/src/mpid_iprobe.c
index ab8dea6..fa32028 100644
--- a/src/mpid/ch3/src/mpid_iprobe.c
+++ b/src/mpid/ch3/src/mpid_iprobe.c
@@ -34,6 +34,7 @@ int MPID_Iprobe(int source, int tag, MPID_Comm *comm, int context_offset,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_irecv.c b/src/mpid/ch3/src/mpid_irecv.c
index a6732c5..1c75d20 100644
--- a/src/mpid/ch3/src/mpid_irecv.c
+++ b/src/mpid/ch3/src/mpid_irecv.c
@@ -33,6 +33,7 @@ int MPID_Irecv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Comm has been revoked. Returning from MPID_IRECV.");
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
diff --git a/src/mpid/ch3/src/mpid_irsend.c b/src/mpid/ch3/src/mpid_irsend.c
index 7d6dd52..d8a0c0a 100644
--- a/src/mpid/ch3/src/mpid_irsend.c
+++ b/src/mpid/ch3/src/mpid_irsend.c
@@ -40,6 +40,7 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_isend.c b/src/mpid/ch3/src/mpid_isend.c
index 94152dc..168608d 100644
--- a/src/mpid/ch3/src/mpid_isend.c
+++ b/src/mpid/ch3/src/mpid_isend.c
@@ -52,6 +52,7 @@ int MPID_Isend(const void * buf, int count, MPI_Datatype datatype, int rank,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Communicator revoked. MPID_ISEND returning");
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
diff --git a/src/mpid/ch3/src/mpid_issend.c b/src/mpid/ch3/src/mpid_issend.c
index 2467c55..91e484a 100644
--- a/src/mpid/ch3/src/mpid_issend.c
+++ b/src/mpid/ch3/src/mpid_issend.c
@@ -39,6 +39,7 @@ int MPID_Issend(const void * buf, int count, MPI_Datatype datatype, int rank, in
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_probe.c b/src/mpid/ch3/src/mpid_probe.c
index 52fe729..9a238e7 100644
--- a/src/mpid/ch3/src/mpid_probe.c
+++ b/src/mpid/ch3/src/mpid_probe.c
@@ -28,6 +28,7 @@ int MPID_Probe(int source, int tag, MPID_Comm * comm, int context_offset,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_recv.c b/src/mpid/ch3/src/mpid_recv.c
index e3af111..a042864 100644
--- a/src/mpid/ch3/src/mpid_recv.c
+++ b/src/mpid/ch3/src/mpid_recv.c
@@ -41,6 +41,7 @@ int MPID_Recv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_rsend.c b/src/mpid/ch3/src/mpid_rsend.c
index 20b8bcb..92990e6 100644
--- a/src/mpid/ch3/src/mpid_rsend.c
+++ b/src/mpid/ch3/src/mpid_rsend.c
@@ -42,6 +42,7 @@ int MPID_Rsend(const void * buf, int count, MPI_Datatype datatype, int rank, int
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_send.c b/src/mpid/ch3/src/mpid_send.c
index b4ca110..b008b68 100644
--- a/src/mpid/ch3/src/mpid_send.c
+++ b/src/mpid/ch3/src/mpid_send.c
@@ -40,6 +40,7 @@ int MPID_Send(const void * buf, int count, MPI_Datatype datatype, int rank,
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_ssend.c b/src/mpid/ch3/src/mpid_ssend.c
index f17b4e7..07c2087 100644
--- a/src/mpid/ch3/src/mpid_ssend.c
+++ b/src/mpid/ch3/src/mpid_ssend.c
@@ -39,6 +39,7 @@ int MPID_Ssend(const void * buf, int count, MPI_Datatype datatype, int rank, int
 
     /* Check to make sure the communicator hasn't already been revoked */
     if (comm->revoked &&
+            MPIR_AGREE_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask) &&
             MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/pamid/src/misc/mpid_unimpl.c b/src/mpid/pamid/src/misc/mpid_unimpl.c
index 8b92417..7f00d95 100644
--- a/src/mpid/pamid/src/misc/mpid_unimpl.c
+++ b/src/mpid/pamid/src/misc/mpid_unimpl.c
@@ -83,3 +83,9 @@ int MPID_Comm_failure_get_acked(MPID_Comm *comm_ptr, MPID_Group **failed_group_p
   MPID_abort();
   return 0;
 }
+
+int MPID_Comm_agree(MPID_Comm *comm_ptr, uint32_t *bitarray, int *flag, int new_fail)
+{
+  MPID_abort();
+  return 0;
+}
diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index efcc162..8981bae 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -10,4 +10,4 @@ include $(top_srcdir)/Makefile.mtest
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
-noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource revoke_nofail shrink
+noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource revoke_nofail shrink agree
diff --git a/test/mpi/ft/agree.c b/test/mpi/ft/agree.c
new file mode 100644
index 0000000..8ef003f
--- /dev/null
+++ b/test/mpi/ft/agree.c
@@ -0,0 +1,68 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include "mpi.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include "mpitest.h"
+
+int main(int argc, char **argv)
+{
+    int rank, size, rc, errclass, errs = 0;
+    int flag = 1;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    if (size < 4) {
+        fprintf(stderr, "Must run with at least 4 processes\n");
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    if (2 == rank) exit(EXIT_FAILURE);
+
+    if (0 == rank) flag = 0;
+
+    rc = MPIX_Comm_agree(MPI_COMM_WORLD, &flag);
+    MPI_Error_class(rc, &errclass);
+    if (errclass != MPIX_ERR_PROC_FAILED) {
+        fprintf(stderr, "[%d] Expected MPIX_ERR_PROC_FAILED after agree. Received: %d\n", rank, errclass);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+        errs++;
+    } else if (0 != flag) {
+        fprintf(stderr, "[%d] Expected flag to be 0. Received: %d\n", rank, flag);
+        errs++;
+    }
+
+    MPIX_Comm_failure_ack(MPI_COMM_WORLD);
+
+    if (0 == rank) flag = 0;
+    else flag = 1;
+    rc = MPIX_Comm_agree(MPI_COMM_WORLD, &flag);
+    MPI_Error_class(rc, &errclass);
+    if (MPI_SUCCESS != rc) {
+        fprintf(stderr, "[%d] Expected MPI_SUCCESS after agree. Received: %d\n", rank, errclass);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+        errs++;
+    } else if (0 != flag) {
+        fprintf(stderr, "[%d] Expected flag to be 0. Received: %d\n", rank, flag);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+        errs++;
+    }
+
+    MPI_Finalize();
+
+    if (0 == rank) {
+        if (errs == 0)
+            fprintf(stdout, " No Errors\n");
+        else
+            fprintf(stdout, " Found %d errors\n", errs);
+    }
+
+    return errs;
+}
diff --git a/test/mpi/ft/testlist b/test/mpi/ft/testlist
index 4598cde..5e8b54b 100644
--- a/test/mpi/ft/testlist
+++ b/test/mpi/ft/testlist
@@ -14,3 +14,4 @@ scatter 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=
 anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
 revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultsTest=TestStatusNoErrors strict=false timelimit=10
 shrink 8 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
+agree 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10

http://git.mpich.org/mpich.git/commitdiff/5be10ce97cdf586ccaa5ab86f29d3827bb215056

commit 5be10ce97cdf586ccaa5ab86f29d3827bb215056
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Apr 24 16:38:13 2014 -0500

    Add MPIX_Comm_shrink functionality
    
    This adds a new function MPIX_COMM_SHRINK. This is a communicator creation
    function that creates a new communicator based on a previous communicator, but
    excluding any failed processes.
    
    As part of the operation, the shrink call needs to perform an agreement to
    determine the group of failed processes. This is done using the algorithm
    published by Hursey et al. in his EuroMPI '12 paper.
    
    The list of failed processes is collected using a bit array. This happens via
    a few new functions in the CH3 layer to create and send a bitarry to the
    master process and receive an updated bitarray. Obviously, this is not a very
    scalable implementation yet, but something better can easily be plugged in
    here to replace the naïve implementation. This is also a use case for an
    MPI_Recv_reduce for future reference.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index 799598e..25fa149 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -1538,6 +1538,7 @@ int MPI_T_category_changed(int *stamp);
 int MPIX_Comm_failure_ack(MPI_Comm comm);
 int MPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 int MPIX_Comm_revoke(MPI_Comm comm);
+int MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm);
 
 
 /* End Prototypes */
@@ -2176,6 +2177,7 @@ int PMPI_T_category_changed(int *stamp);
 int PMPIX_Comm_failure_ack(MPI_Comm comm);
 int PMPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 int PMPIX_Comm_revoke(MPI_Comm comm);
+int PMPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm);
 
 
 #endif  /* MPI_BUILD_PROFILING */
diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 3c7aa29..8496922 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -2785,6 +2785,36 @@ int MPID_Comm_failure_ack(MPID_Comm *comm);
 int MPID_Comm_failure_get_acked(MPID_Comm *comm, MPID_Group **failed_group_ptr);
 
 /*@
+  MPID_Comm_failed_bitarray - MPID function to get the bitarray including all of the failed processes
+
+  Input Parameters:
+. comm - communicator
+. acked - true if bitarray should contain only acked procs
+
+  Output Parameter:
+. bitarray - Bit array containing all of the failed processes in comm
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_failed_bitarray(MPID_Comm *comm, uint32_t **bitarray, int acked);
+
+/*@
+  MPID_Comm_get_all_failed_procs - Constructs a group of failed processes that it uniform over a communicator
+
+  Input Parameters:
+. comm - communicator
+. tag - Tag used to do communciation
+
+  Output Parameters:
+. failed_grp - group of all failed processes
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_get_all_failed_procs(MPID_Comm *comm_ptr, MPID_Group **failed_group, int tag);
+
+/*@
   MPID_Comm_revoke - MPID entry point for MPI_Comm_revoke
 
   Input Parameters:
@@ -3781,7 +3811,8 @@ int MPID_VCR_Get_lpid(MPID_VCR vcr, int * lpid_ptr);
 #define MPIR_TOPO_A_TAG               26
 #define MPIR_TOPO_B_TAG               27
 #define MPIR_REDUCE_SCATTER_BLOCK_TAG 28
-#define MPIR_FIRST_NBC_TAG            29
+#define MPIR_SHRINK_TAG               29
+#define MPIR_FIRST_NBC_TAG            30
 
 /* These macros must be used carefully. These macros will not work with
  * negative tags. By definition, users are not to use negative tags and the
@@ -4084,6 +4115,8 @@ void MPIR_Free_err_dyncodes( void );
 
 int MPIR_Comm_idup_impl(MPID_Comm *comm_ptr, MPID_Comm **newcomm, MPID_Request **reqp);
 
+int MPIR_Comm_shrink(MPID_Comm *comm_ptr, MPID_Comm **newcomm_ptr);
+
 int MPIR_Allreduce_group(void *sendbuf, void *recvbuf, int count,
                          MPI_Datatype datatype, MPI_Op op, MPID_Comm *comm_ptr,
                          MPID_Group *group_ptr, int tag, int *errflag);
diff --git a/src/mpi/comm/Makefile.mk b/src/mpi/comm/Makefile.mk
index 2197991..c133981 100644
--- a/src/mpi/comm/Makefile.mk
+++ b/src/mpi/comm/Makefile.mk
@@ -29,7 +29,8 @@ mpi_sources +=                       \
     src/mpi/comm/comm_split_type.c   \
     src/mpi/comm/comm_failure_ack.c            \
     src/mpi/comm/comm_failure_get_acked.c      \
-    src/mpi/comm/comm_revoke.c
+    src/mpi/comm/comm_revoke.c                 \
+    src/mpi/comm/comm_shrink.c
 
 mpi_core_sources += \
     src/mpi/comm/commutil.c
diff --git a/src/mpi/comm/comm_shrink.c b/src/mpi/comm/comm_shrink.c
new file mode 100644
index 0000000..7eaba35
--- /dev/null
+++ b/src/mpi/comm/comm_shrink.c
@@ -0,0 +1,181 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpiimpl.h"
+#include "mpicomm.h"
+#include <stdint.h>
+
+/* This function has multiple phases.
+ *
+ * In the first phase, all alive processes must collectively decide which
+ * processes are dead. This happens via a fault-tolerant all-reduce style
+ * algorithm. This is implemented via the recursive-doubling algorithm as a
+ * first pass for simplicity.
+ *
+ * In the second phase, the remaining processes must create a new communicator
+ * based on the group determined in the first phase. This phase simply uses
+ * the existing implementation of MPI_Comm_create_group. If the call to
+ * MPI_Comm_create_group fails, then the algorithm is restarted in phase one
+ * and a new group is determined.
+ */
+
+/* -- Begin Profiling Symbol Block for routine MPIX_Comm_shrink */
+#if defined(HAVE_PRAGMA_WEAK)
+#pragma weak MPIX_Comm_shrink = PMPIX_Comm_shrink
+#elif defined(HAVE_PRAGMA_HP_SEC_DEF)
+#pragma _HP_SECONDARY_DEF PMPIX_Comm_shrink  MPIX_Comm_shrink
+#elif defined(HAVE_PRAGMA_CRI_DUP)
+#pragma _CRI duplicate MPIX_Comm_shrink as PMPIX_Comm_shrink
+#endif
+/* -- End Profiling Symbol Block */
+
+/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPIX_Comm_shrink
+#define MPIX_Comm_shrink PMPIX_Comm_shrink
+#endif
+
+#undef FUNCNAME
+#define FUNCNAME MPIR_Comm_shrink
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/* comm shrink impl; assumes that standard error checking has already taken
+ * place in the calling function */
+int MPIR_Comm_shrink(MPID_Comm *comm_ptr, MPID_Comm **newcomm_ptr)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Group *global_failed, *comm_grp, *new_group_ptr;
+    int attempts = 0;
+    int errflag = 0, tmp_errflag = 0;
+
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIR_COMM_SHRINK);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIR_COMM_SHRINK);
+
+    /* TODO - Implement this function for intercommunicators */
+    MPIR_Comm_group_impl(comm_ptr, &comm_grp);
+
+    do {
+        mpi_errno = MPID_Comm_get_all_failed_procs(comm_ptr, &global_failed, MPIR_SHRINK_TAG);
+        /* Ignore the mpi_errno value here as it will definitely communicate
+         * with failed procs */
+
+        mpi_errno = MPIR_Group_difference_impl(comm_grp, global_failed, &new_group_ptr);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+        if (MPID_Group_empty != global_failed) MPIR_Group_release(global_failed);
+
+        mpi_errno = MPIR_Comm_create_group(comm_ptr, new_group_ptr, MPIR_SHRINK_TAG, newcomm_ptr);
+        errflag = mpi_errno || *newcomm_ptr == NULL;
+
+        mpi_errno = MPIR_Allreduce_group(MPI_IN_PLACE, &errflag, 1, MPI_INT, MPI_MAX, comm_ptr,
+            new_group_ptr, MPIR_SHRINK_TAG, &tmp_errflag);
+        MPIR_Group_release(new_group_ptr);
+
+        if (errflag) MPIU_Object_set_ref(new_group_ptr, 0);
+    } while (errflag && ++attempts < 5);
+
+    if (errflag && attempts >= 5) goto fn_fail;
+    else mpi_errno = MPI_SUCCESS;
+
+  fn_exit:
+    MPIR_Group_release(comm_grp);
+    return mpi_errno;
+  fn_fail:
+    if (*newcomm_ptr) MPIU_Object_set_ref(*newcomm_ptr, 0);
+    MPIU_Object_set_ref(global_failed, 0);
+    MPIU_Object_set_ref(new_group_ptr, 0);
+    goto fn_exit;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPIX_Comm_shrink
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/*@
+MPIX_Comm_shrink - Creates a new communitor from an existing communicator while
+                  excluding failed processes
+
+Input Parameters:
++ comm - communicator (handle)
+
+Output Parameters:
+. newcomm - new communicator (handle)
+
+.N Threadsafe
+
+.N Fortran
+
+.N Errors
+.N MPI_SUCCESS
+.N MPI_ERR_COMM
+
+@*/
+int MPIX_Comm_shrink(MPI_Comm comm, MPI_Comm *newcomm)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL, *newcomm_ptr;
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIX_COMM_SHRINK);
+
+    MPIR_ERRTEST_INITIALIZED_ORDIE();
+
+    MPIU_THREAD_CS_ENTER(ALLFUNC,);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIX_COMM_SHRINK);
+
+    /* Validate parameters, and convert MPI object handles to object pointers */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            MPIR_ERRTEST_COMM(comm, mpi_errno);
+        }
+        MPID_END_ERROR_CHECKS;
+
+        MPID_Comm_get_ptr( comm, comm_ptr );
+
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            /* Validate comm_ptr */
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno );
+            if (MPIX_ERR_REVOKED != MPIR_ERR_GET_CLASS(mpi_errno) && mpi_errno)
+                goto fn_fail;
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#else
+    {
+        MPID_Comm_get_ptr( comm, comm_ptr );
+    }
+#endif
+
+    /* ... body of routine ... */
+    mpi_errno = MPIR_Comm_shrink(comm_ptr, &newcomm_ptr);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    if (newcomm_ptr)
+        MPIU_OBJ_PUBLISH_HANDLE(*newcomm, newcomm_ptr->handle);
+    else
+        *newcomm = MPI_COMM_NULL;
+    /* ... end of body of routine ... */
+
+  fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIX_COMM_SHRINK);
+    MPIU_THREAD_CS_EXIT(ALLFUNC,);
+    return mpi_errno;
+
+  fn_fail:
+    /* --BEGIN ERROR HANDLING-- */
+#ifdef HAVE_ERROR_CHECKING
+    {
+        mpi_errno =
+            MPIR_Err_create_code(mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__,
+                                 MPI_ERR_OTHER, "**mpix_comm_shrink",
+                                 "**mpix_comm_shrink %C %p", comm, newcomm);
+    }
+#endif
+    mpi_errno = MPIR_Err_return_comm(comm_ptr, FCNAME, mpi_errno);
+    goto fn_exit;
+    /* --END ERROR HANDLING-- */
+}
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 8bd6810..35551d0 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -1103,6 +1103,8 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **mpix_comm_failure_get_acked %C %p:MPIX_Comm_failure_get_acked(%C, group=%p) failed
 **mpix_comm_revoke:MPIX_Comm_revoke failed
 **mpix_comm_revoke %C:MPIX_Comm_revoke(%C) failed
+**mpix_comm_shrink:MPIX_Comm_shrink failed
+**mpix_comm_shrink %C %p:MPIX_Comm_shrink(%C, new_comm=%p) failed
 **mpi_intercomm_create:MPI_Intercomm_create failed
 **mpi_intercomm_create %C %d %C %d %d %p:MPI_Intercomm_create(%C, local_leader=%d, %C, remote_leader=%d, tag=%d, newintercomm=%p) failed
 **mpi_intercomm_merge:MPI_Intercomm_merge failed
diff --git a/src/mpid/ch3/src/Makefile.mk b/src/mpid/ch3/src/Makefile.mk
index 091613d..30c64bd 100644
--- a/src/mpid/ch3/src/Makefile.mk
+++ b/src/mpid/ch3/src/Makefile.mk
@@ -31,6 +31,7 @@ mpi_core_sources +=                          \
     src/mpid/ch3/src/mpid_comm_disconnect.c                \
     src/mpid/ch3/src/mpid_comm_spawn_multiple.c            \
     src/mpid/ch3/src/mpid_comm_failure_ack.c               \
+    src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c      \
     src/mpid/ch3/src/mpid_comm_revoke.c                    \
     src/mpid/ch3/src/mpid_finalize.c                       \
     src/mpid/ch3/src/mpid_get_universe_size.c              \
diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index d84d54d..c8dbde6 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -941,11 +941,13 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
-                        "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
-                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
-            dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
-            continue;
+            if (rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+                MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                            "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
+                            rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+                continue;
+            }
         }
 
         prev_rreq = rreq;
@@ -971,11 +973,13 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
         match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
 
         if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
-            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
-                        "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
-                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
-            dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
-            continue;
+            if (rreq->dev.match.parts.tag != MPIR_SHRINK_TAG) {
+                MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                            "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
+                            rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+                dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+                continue;
+            }
         }
 
         prev_rreq = rreq;
diff --git a/src/mpid/ch3/src/mpid_comm_failure_ack.c b/src/mpid/ch3/src/mpid_comm_failure_ack.c
index be64340..d01b99b 100644
--- a/src/mpid/ch3/src/mpid_comm_failure_ack.c
+++ b/src/mpid/ch3/src/mpid_comm_failure_ack.c
@@ -69,3 +69,77 @@ fn_exit:
 fn_fail:
     goto fn_exit;
 }
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_failed_bitarray
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_Comm_failed_bitarray(MPID_Comm *comm_ptr, uint32_t **bitarray, int acked)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int size, i;
+    uint32_t bit;
+    int *failed_procs, *group_procs;
+    MPID_Group *failed_group, *comm_group;
+    MPIDI_STATE_DECL(MPID_STATE_COMM_FAILED_BITARRAY);
+    MPIU_CHKLMEM_DECL(2);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_COMM_FAILED_BITARRAY);
+
+    /* TODO - Fix this for intercommunicators */
+    size = comm_ptr->local_size;
+
+    /* We can fit sizeof(uint32_t) * 8 ranks in one uint64_t so divide the
+     * size by that */
+    /* This buffer will be handed back to the calling function so we use a
+     * "real" malloc here and expect the caller to free the buffer later. The
+     * other buffers in this function are temporary and will be automatically
+     * cleaned up at the end of the function. */
+    *bitarray = (uint32_t *) MPIU_Malloc(sizeof(uint32_t) * (size / (sizeof(uint32_t) * 8)+1));
+    if (!(*bitarray)) {
+        fprintf(stderr, "Could not allocate space for bitarray\n");
+        PMPI_Abort(MPI_COMM_WORLD, 1);
+    }
+    for (i = 0; i <= size/(sizeof(uint32_t)*8); i++) *bitarray[i] = 0;
+
+    mpi_errno = MPIDI_CH3U_Check_for_failed_procs();
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    if (acked)
+        MPIDI_CH3U_Get_failed_group(comm_ptr->ch.last_ack_rank, &failed_group);
+    else
+        MPIDI_CH3U_Get_failed_group(-2, &failed_group);
+
+    if (failed_group == MPID_Group_empty) goto fn_exit;
+
+
+    MPIU_CHKLMEM_MALLOC(group_procs, int *, sizeof(int)*failed_group->size, mpi_errno, "group_procs");
+    for (i = 0; i < failed_group->size; i++) group_procs[i] = i;
+    MPIU_CHKLMEM_MALLOC(failed_procs, int *, sizeof(int)*failed_group->size, mpi_errno, "failed_procs");
+
+    MPIR_Comm_group_impl(comm_ptr, &comm_group);
+
+    MPIR_Group_translate_ranks_impl(failed_group, failed_group->size, group_procs, comm_group, failed_procs);
+
+    /* The bits will actually be ordered in decending order rather than
+     * ascending. This is purely for readability since it makes no practical
+     * difference. So if the bits look like this:
+     *
+     * 10001100 01001000 00000000 00000001
+     *
+     * Then processes 1, 5, 6, 9, 12, and 32 have failed. */
+    for (i = 0; i < failed_group->size; i++) {
+        bit = 0x80000000;
+        bit >>= failed_procs[i] % (sizeof(uint32_t) * 8);
+
+        *bitarray[failed_procs[i] / (sizeof(uint32_t) * 8)] |= bit;
+    }
+
+    MPIR_Group_free_impl(comm_group);
+
+  fn_exit:
+    MPIU_CHKLMEM_FREEALL();
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c b/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
new file mode 100644
index 0000000..882f30a
--- /dev/null
+++ b/src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
@@ -0,0 +1,156 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2011 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpidimpl.h"
+#ifdef USE_PMI2_API
+#include "pmi2.h"
+#else
+#include "pmi.h"
+#endif
+
+/* Generates a bitarray based on orig_comm where all procs in group are marked with 1 */
+static int *group_to_bitarray(MPID_Group *group, MPID_Comm *orig_comm) {
+    uint32_t *bitarray, mask;
+    int bitarray_size = orig_comm->local_size / 8 + orig_comm->local_size % 8 ? 1 : 0;
+    int *group_ranks, *comm_ranks, i, index;
+
+    bitarray = (int *) MPIU_Malloc(sizeof(int) * bitarray_size);
+
+    if (group == MPID_Group_empty) {
+        for (i = 0; i < bitarray_size; i++) bitarray[i] = 0;
+        return bitarray;
+    }
+
+    group_ranks = (int *) MPIU_Malloc(sizeof(int) * group->size);
+    comm_ranks = (int *) MPIU_Malloc(sizeof(int) * group->size);
+
+    for (i = 0; i < group->size; i++) group_ranks[i] = i;
+    for (i = 0; i < bitarray_size; i++) bitarray[i] = 0;
+
+    MPIR_Group_translate_ranks_impl(group, group->size, group_ranks,
+                                    orig_comm->local_group, comm_ranks);
+
+    for (i = 0; i < group->size && comm_ranks[i] != MPI_UNDEFINED; i++) {
+        index = comm_ranks[i] / 32;
+        mask = 0x80000000 >> comm_ranks[i] % 32;
+        bitarray[index] |= mask;
+    }
+
+    MPIU_Free(group_ranks);
+    MPIU_Free(comm_ranks);
+
+    return bitarray;
+}
+
+/* Generates an MPID_Group from a bitarray */
+static MPID_Group *bitarray_to_group(MPID_Comm *comm_ptr, int *bitarray)
+{
+    MPID_Group *ret_group;
+    MPID_Group *comm_group;
+    UT_array *ranks_array;
+    int i, found = 0;
+
+    utarray_new(ranks_array, &ut_int_icd);
+
+    MPIR_Comm_group_impl(comm_ptr, &comm_group);
+
+    /* Converts the bitarray into a utarray */
+    for (i = 0; i < comm_ptr->local_size; i++) {
+        if (bitarray[i/32] & (0x80000000 >> i % 32)) {
+            utarray_push_back(ranks_array, &i);
+            found++;
+        }
+    }
+
+    if (found)
+        /* Converts the utarray into a group */
+        MPIR_Group_incl_impl(comm_group, found, ut_int_array(ranks_array), &ret_group);
+    else
+        ret_group = MPID_Group_empty;
+
+    utarray_free(ranks_array);
+    MPIR_Group_release(comm_group);
+
+    return ret_group;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_get_all_failed_procs
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_Comm_get_all_failed_procs(MPID_Comm *comm_ptr, MPID_Group **failed_group, int tag)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int errflag = 0;
+    int i, j;
+    int *bitarray, *remote_bitarray, bitarray_size;
+    MPID_Group *local_fail;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
+
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
+
+    /* Kick the progress engine in case it's been a while so we get all the
+     * latest updates about failures */
+    MPIDI_CH3I_Progress(NULL, false);
+
+    /* Generate the list of failed processes */
+    MPIDI_CH3U_Check_for_failed_procs();
+
+    mpi_errno = MPIDI_CH3U_Get_failed_group(-2, &local_fail);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* Generate a bitarray based on the list of failed procs */
+    bitarray = group_to_bitarray(local_fail, comm_ptr);
+    bitarray_size = comm_ptr->local_size / 8 + comm_ptr->local_size % 8 ? 1 : 0;
+    remote_bitarray = MPIU_Malloc(sizeof(int) * bitarray_size);
+
+    /* For now, this will be implemented as a star with rank 0 serving as
+     * the source */
+    if (comm_ptr->rank == 0) {
+        for (i = 1; i < comm_ptr->local_size; i++) {
+            /* Get everyone's list of failed processes to aggregate */
+            mpi_errno = MPIC_Recv(remote_bitarray, bitarray_size, MPI_INT,
+                i, tag, comm_ptr->handle, MPI_STATUS_IGNORE, &errflag);
+            if (mpi_errno) continue;
+
+            /* Combine the received bitarray with my own */
+            for (j = 0; j < bitarray_size; j++)
+                bitarray[j] |= remote_bitarray[j];
+        }
+
+        for (i = 1; i < comm_ptr->local_size; i++) {
+            /* Send the list to each rank to be processed locally */
+            mpi_errno = MPIC_Ssend(bitarray, bitarray_size, MPI_INT, i,
+                tag, comm_ptr->handle, &errflag);
+            if (mpi_errno) errflag = 1;
+        }
+
+        /* Convert the bitarray into a group */
+        *failed_group = bitarray_to_group(comm_ptr, bitarray);
+    } else {
+        /* Send my bitarray to rank 0 */
+        mpi_errno = MPIC_Ssend(bitarray, bitarray_size, MPI_INT, 0,
+            tag, comm_ptr->handle, &errflag);
+        if (mpi_errno) errflag = 1;
+
+        /* Get the resulting bitarray back from rank 0 */
+        mpi_errno = MPIC_Recv(remote_bitarray, bitarray_size, MPI_INT, 0,
+            tag, comm_ptr->handle, MPI_STATUS_IGNORE, &errflag);
+        if (mpi_errno) errflag = 1;
+
+        /* Convert the bitarray into a group */
+        *failed_group = bitarray_to_group(comm_ptr, remote_bitarray);
+    }
+
+    MPIU_Free(bitarray);
+    MPIU_Free(remote_bitarray);
+
+  fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPID_COMM_GET_ALL_FAILED_PROCS);
+    return mpi_errno;
+  fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/src/mpid_iprobe.c b/src/mpid/ch3/src/mpid_iprobe.c
index 07e1205..ab8dea6 100644
--- a/src/mpid/ch3/src/mpid_iprobe.c
+++ b/src/mpid/ch3/src/mpid_iprobe.c
@@ -33,7 +33,8 @@ int MPID_Iprobe(int source, int tag, MPID_Comm *comm, int context_offset,
     }
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
 
diff --git a/src/mpid/ch3/src/mpid_irecv.c b/src/mpid/ch3/src/mpid_irecv.c
index 28e7b66..a6732c5 100644
--- a/src/mpid/ch3/src/mpid_irecv.c
+++ b/src/mpid/ch3/src/mpid_irecv.c
@@ -32,7 +32,8 @@ int MPID_Irecv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
     }
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Comm has been revoked. Returning from MPID_IRECV.");
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_irsend.c b/src/mpid/ch3/src/mpid_irsend.c
index d1ce07f..7d6dd52 100644
--- a/src/mpid/ch3/src/mpid_irsend.c
+++ b/src/mpid/ch3/src/mpid_irsend.c
@@ -39,7 +39,8 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
                 rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
     
diff --git a/src/mpid/ch3/src/mpid_isend.c b/src/mpid/ch3/src/mpid_isend.c
index 086dc6f..94152dc 100644
--- a/src/mpid/ch3/src/mpid_isend.c
+++ b/src/mpid/ch3/src/mpid_isend.c
@@ -51,7 +51,8 @@ int MPID_Isend(const void * buf, int count, MPI_Datatype datatype, int rank,
                   rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Communicator revoked. MPID_ISEND returning");
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
diff --git a/src/mpid/ch3/src/mpid_issend.c b/src/mpid/ch3/src/mpid_issend.c
index ce672fb..2467c55 100644
--- a/src/mpid/ch3/src/mpid_issend.c
+++ b/src/mpid/ch3/src/mpid_issend.c
@@ -38,7 +38,8 @@ int MPID_Issend(const void * buf, int count, MPI_Datatype datatype, int rank, in
                  rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
     
diff --git a/src/mpid/ch3/src/mpid_probe.c b/src/mpid/ch3/src/mpid_probe.c
index 3b91941..52fe729 100644
--- a/src/mpid/ch3/src/mpid_probe.c
+++ b/src/mpid/ch3/src/mpid_probe.c
@@ -27,7 +27,8 @@ int MPID_Probe(int source, int tag, MPID_Comm * comm, int context_offset,
     }
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
 
diff --git a/src/mpid/ch3/src/mpid_recv.c b/src/mpid/ch3/src/mpid_recv.c
index 45f1580..e3af111 100644
--- a/src/mpid/ch3/src/mpid_recv.c
+++ b/src/mpid/ch3/src/mpid_recv.c
@@ -40,7 +40,8 @@ int MPID_Recv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
     }
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
 
diff --git a/src/mpid/ch3/src/mpid_rsend.c b/src/mpid/ch3/src/mpid_rsend.c
index 4561804..20b8bcb 100644
--- a/src/mpid/ch3/src/mpid_rsend.c
+++ b/src/mpid/ch3/src/mpid_rsend.c
@@ -41,7 +41,8 @@ int MPID_Rsend(const void * buf, int count, MPI_Datatype datatype, int rank, int
                               rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
     
diff --git a/src/mpid/ch3/src/mpid_send.c b/src/mpid/ch3/src/mpid_send.c
index d96cf14..b4ca110 100644
--- a/src/mpid/ch3/src/mpid_send.c
+++ b/src/mpid/ch3/src/mpid_send.c
@@ -39,7 +39,8 @@ int MPID_Send(const void * buf, int count, MPI_Datatype datatype, int rank,
 		rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
 
diff --git a/src/mpid/ch3/src/mpid_ssend.c b/src/mpid/ch3/src/mpid_ssend.c
index e4ed3dc..f17b4e7 100644
--- a/src/mpid/ch3/src/mpid_ssend.c
+++ b/src/mpid/ch3/src/mpid_ssend.c
@@ -38,7 +38,8 @@ int MPID_Ssend(const void * buf, int count, MPI_Datatype datatype, int rank, int
               rank, tag, comm->context_id + context_offset));
 
     /* Check to make sure the communicator hasn't already been revoked */
-    if (comm->revoked) {
+    if (comm->revoked &&
+            MPIR_SHRINK_TAG != MPIR_TAG_MASK_ERROR_BIT(tag & ~MPIR_Process.tagged_coll_mask)) {
         MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
     }
 
diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index 90fa619..efcc162 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -10,4 +10,4 @@ include $(top_srcdir)/Makefile.mtest
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
-noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource revoke_nofail
+noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource revoke_nofail shrink
diff --git a/test/mpi/ft/shrink.c b/test/mpi/ft/shrink.c
new file mode 100644
index 0000000..6c5742c
--- /dev/null
+++ b/test/mpi/ft/shrink.c
@@ -0,0 +1,62 @@
+
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2003 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include "mpi.h"
+#include <stdio.h>
+#include <stdlib.h>
+#include "mpitest.h"
+
+/*
+ * This test ensures that shrink works correctly
+ */
+int main(int argc, char **argv)
+{
+    int rank, size, newsize, rc, errclass, errs = 0;
+    MPI_Comm newcomm;
+    int sendbuf = 0;
+    int recvbuf = 0;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    if (size < 4) {
+        fprintf(stderr, "Must run with at least 4 processes\n");
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    if (2 == rank) exit(EXIT_FAILURE);
+
+    rc = MPIX_Comm_shrink(MPI_COMM_WORLD, &newcomm);
+    if (rc) {
+        MPI_Error_class(rc, &errclass);
+        fprintf(stderr, "Expected MPI_SUCCESS from MPIX_Comm_shrink. Received: %d\n", errclass);
+        errs++;
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_size(newcomm, &newsize);
+    if (newsize != size-1)
+        errs++;
+
+    rc = MPI_Barrier(newcomm);
+    if (rc) {
+        MPI_Error_class(rc, &errclass);
+        fprintf(stderr, "Expected MPI_SUCCESS from MPI_BARRIER. Received: %d\n", errclass);
+        errs++;
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_free(&newcomm);
+
+    if (0 == rank) fprintf(stdout, " No Errors\n");
+
+    MPI_Finalize();
+
+    return 0;
+}
diff --git a/test/mpi/ft/testlist b/test/mpi/ft/testlist
index a7317eb..4598cde 100644
--- a/test/mpi/ft/testlist
+++ b/test/mpi/ft/testlist
@@ -13,3 +13,4 @@ bcast 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=fa
 scatter 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
 revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultsTest=TestStatusNoErrors strict=false timelimit=10
+shrink 8 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
diff --git a/test/mpid/ch3/failed_bitmask.c b/test/mpid/ch3/failed_bitmask.c
new file mode 100644
index 0000000..51518f8
--- /dev/null
+++ b/test/mpid/ch3/failed_bitmask.c
@@ -0,0 +1,59 @@
+
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include <stdio.h>
+
+#include "mpidimpl.h"
+
+int main(int argc, char **argv)
+{
+    int rc, size, rank, ec;
+    MPID_Comm *comm_ptr;
+    uint32_t *mask;
+
+    MPI_Init(&argc, &argv);
+
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    if (size != 16) {
+        fprintf(stderr, "Requires 16 ranks\n");
+        exit(1);
+    }
+
+    if (rank == 1  ||
+        rank == 5  ||
+        rank == 6  ||
+        rank == 9  ||
+        rank == 12)
+        exit(1);
+
+    rc = MPI_Barrier(MPI_COMM_WORLD);
+    ec = MPI_Error_class(rc, &ec);
+    if (MPI_SUCCESS == rc) {
+        fprintf(stderr, "[%d] ERROR CLASS: %d\n", rank, rc);
+    }
+
+    MPID_Comm_get_ptr(MPI_COMM_WORLD, comm_ptr);
+
+    MPID_Comm_failed_bitarray(comm_ptr, &mask, 0);
+
+    if (mask[0] != (uint32_t) 0x46480000) {
+        fprintf(stderr, "[%d] Unexpected failure bitmask: 0x%x\n", rank, mask[0]);
+        exit(1);
+    } else {
+        fprintf(stdout, " No errors\n");
+    }
+
+    MPIU_Free(mask);
+
+    MPI_Finalize();
+
+    return 0;
+}

http://git.mpich.org/mpich.git/commitdiff/ee5173e396f43adb3dd0660e59e7b2e19cb856c5

commit ee5173e396f43adb3dd0660e59e7b2e19cb856c5
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Apr 11 11:25:16 2014 -0500

    Add check for revoked communicator
    
    Piggybacking on the MPID_Comm_valid_ptr check in the HAVE_ERROR_CHECKING
    block, this checks to see if the communicator has been revoked and
    returns MPIX_ERR_REVOKED if so.
    
    This probably should move out of the HAVE_ERROR_CHECKING section since it
    requires the user to have this turned on. If the user leaves it off, they'll
    never be notified. However, if this moves out of the HAVE_ERROR_CHECKING
    section, it will probably have performance implications.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 8d3f2e8..3c7aa29 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -499,6 +499,9 @@ int MPIU_Handle_free( void *((*)[]), int );
      if ((ptr) && MPIU_Object_get_ref(ptr) <= 0) {    \
          MPIU_ERR_SET(err,MPI_ERR_COMM,"**comm");     \
          ptr = 0;                                     \
+     } else if (ptr->revoked) {                       \
+         MPIU_ERR_SET(err,MPIX_ERR_REVOKED,"**comm"); \
+         ptr = 0;                                     \
      }                                                \
 }
 #define MPID_Group_valid_ptr(ptr,err) MPID_Valid_ptr_class(Group,ptr,MPI_ERR_GROUP,err)
diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index 0ad6d64..d84d54d 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -872,16 +872,16 @@ static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev
     }
     
     /* remove from queue */
-    if (*head == *req)
+    if (*head == *req) {
+        if (*head == recvq_posted_head) MPIR_T_PVAR_LEVEL_DEC(RECVQ, posted_recvq_length, 1);
+
         *head = (*req)->dev.next;
-    else
+    } else
         prev_req->dev.next = (*req)->dev.next;
+
     if (*tail == *req)
         *tail = prev_req;
 
-    if (*head == recvq_posted_head)
-        MPIR_T_PVAR_LEVEL_DEC(RECVQ, posted_recvq_length, 1);
-
     /* set error and complete */
     (*req)->status.MPI_ERROR = *error;
     MPIDI_CH3U_Request_complete(*req);
@@ -978,6 +978,7 @@ int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
             continue;
         }
 
+        prev_rreq = rreq;
         rreq = rreq->dev.next;
     }
 

http://git.mpich.org/mpich.git/commitdiff/57f6ee88801fd9d2959cc133fe4bb10b25848f4f

commit 57f6ee88801fd9d2959cc133fe4bb10b25848f4f
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Jul 30 10:15:47 2014 -0500

    Add MPI_Comm_revoke
    
    MPI_Comm_revoke is a special function because it does not have a matching call
    on the "receiving side". This is because it has to act as an out-of-band,
    resilient broadcast algorithm. Because of this, in this commit, in addition to
    the usual functions to implement MPI communication calls (MPI/MPID/CH3/etc.),
    we add a new CH3 packet type that will handle revoking a communicator without
    involving a matching call from the MPI layer (similar to how RMA is currently
    implemented).
    
    The thing that must be handled most carefully when revoking a communicator is
    to ensure that a previously used context ID will eventually be returned to the
    pool of available context IDs and that after this occurs, no old messages will
    match the new usage of the context ID (for instance, if some messages are very
    slow and show up late). To accomplish this, revoke is implemented as an
    all-to-all algorithm. When one process calls revoke, it will send a message to
    all other processes in the communicator, which will trigger that process to
    send a message to all other processes, and so on. Once a process has already
    revoked its communicator locally, it won't send out another wave of messages.
    As each process receives the revoke messages from the other processes, it will
    track how many messages have been received. Once it has either received a
    revoke message or a message about a process failure for each other process, it
    will release its refcount on the communicator object. After the application
    has freed all of its references to the communicator (and all requests, files,
    etc. associated with it), the context ID will be returned to the available
    pool.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index c2afed9..799598e 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -881,8 +881,9 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
 #define MPIX_ERR_PROC_FAILED          MPICH_ERR_FIRST_MPIX+1 /* Process failure */
 #define MPIX_ERR_PROC_FAILED_PENDING  MPICH_ERR_FIRST_MPIX+2 /* A failure has caused this request
                                                               * to be pending */
+#define MPIX_ERR_REVOKED              MPICH_ERR_FIRST_MPIX+3 /* The communciation object has been revoked */
 
-#define MPICH_ERR_LAST_MPIX           MPICH_ERR_FIRST_MPIX+2
+#define MPICH_ERR_LAST_MPIX           MPICH_ERR_FIRST_MPIX+3
 
 
 /* End of MPI's error classes */
@@ -891,7 +892,7 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
 typedef int (MPI_Datarep_conversion_function)(void *, MPI_Datatype, int, 
              void *, MPI_Offset, void *);
 typedef int (MPI_Datarep_extent_function)(MPI_Datatype datatype, MPI_Aint *,
-					  void *);
+                      void *);
 #define MPI_CONVERSION_FN_NULL ((MPI_Datarep_conversion_function *)0)
 
 /* 
@@ -1536,6 +1537,7 @@ int MPI_T_category_changed(int *stamp);
 /* Fault Tolerance Extensions */
 int MPIX_Comm_failure_ack(MPI_Comm comm);
 int MPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
+int MPIX_Comm_revoke(MPI_Comm comm);
 
 
 /* End Prototypes */
@@ -2173,6 +2175,7 @@ int PMPI_T_category_changed(int *stamp);
 /* Fault Tolerance Extensions */
 int PMPIX_Comm_failure_ack(MPI_Comm comm);
 int PMPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
+int PMPIX_Comm_revoke(MPI_Comm comm);
 
 
 #endif  /* MPI_BUILD_PROFILING */
diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index cc711df..8d3f2e8 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -1240,6 +1240,9 @@ typedef struct MPID_Comm {
 				       implementting the topology routines */
     int next_sched_tag;             /* used by the NBC schedule code to allocate tags */
 
+    int revoked;                    /* Flag to track whether the communicator
+                                     * has been revoked */
+
     MPID_Info *info;                /* Hints to the communicator */
 
 #ifdef MPID_HAS_HETERO
@@ -2779,6 +2782,18 @@ int MPID_Comm_failure_ack(MPID_Comm *comm);
 int MPID_Comm_failure_get_acked(MPID_Comm *comm, MPID_Group **failed_group_ptr);
 
 /*@
+  MPID_Comm_revoke - MPID entry point for MPI_Comm_revoke
+
+  Input Parameters:
+  comm - communicator
+  remote - True if we received the revoke message from a remote process
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_revoke(MPID_Comm *comm, int is_remote);
+
+/*@
   MPID_Send - MPID entry point for MPI_Send
 
   Notes:
diff --git a/src/mpi/comm/Makefile.mk b/src/mpi/comm/Makefile.mk
index 5dd2743..2197991 100644
--- a/src/mpi/comm/Makefile.mk
+++ b/src/mpi/comm/Makefile.mk
@@ -28,7 +28,8 @@ mpi_sources +=                       \
     src/mpi/comm/intercomm_merge.c   \
     src/mpi/comm/comm_split_type.c   \
     src/mpi/comm/comm_failure_ack.c            \
-    src/mpi/comm/comm_failure_get_acked.c
+    src/mpi/comm/comm_failure_get_acked.c      \
+    src/mpi/comm/comm_revoke.c
 
 mpi_core_sources += \
     src/mpi/comm/commutil.c
diff --git a/src/mpi/comm/comm_revoke.c b/src/mpi/comm/comm_revoke.c
new file mode 100644
index 0000000..7a5154f
--- /dev/null
+++ b/src/mpi/comm/comm_revoke.c
@@ -0,0 +1,112 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpiimpl.h"
+#ifdef HAVE_STDLIB_H
+#include <stdlib.h>
+#endif
+
+/* -- Begin Profiling Symbol Block for routine MPIX_Comm_revoke */
+#if defined(HAVE_PRAGMA_WEAK)
+#pragma weak MPIX_Comm_revoke = PMPIX_Comm_revoke
+#elif defined(HAVE_PRAGMA_HP_SEC_DEF)
+#pragma _HP_SECONDARY_DEF PMPIX_Comm_revoke  MPIX_Comm_revoke
+#elif defined(HAVE_PRAGMA_CRI_DUP)
+#pragma _CRI duplicate MPIX_Comm_revoke as PMPIX_Comm_revoke
+#endif
+/* -- End Profiling Symbol Block */
+
+/* Define MPICH_MPIX_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPIX_Comm_revoke
+#define MPIX_Comm_revoke PMPIX_Comm_revoke
+
+#endif
+
+#undef FUNCNAME
+#define FUNCNAME MPIX_Comm_revoke
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/*@
+    MPIX_Comm_revoke - Prevent a communicator from being used in the future
+
+Input Parameters:
++ comm - communicator to revoke
+
+Notes:
+Asynchronously notifies all MPI processes associated with the communicator 'comm'.
+This will be manifest by returning the MPIX_ERR_REVOKED during a subsequent MPI
+call.
+
+.N Fortran
+
+.N Errors
+.N MPIX_SUCCESS
+@*/
+int MPIX_Comm_revoke(MPI_Comm comm)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL;
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIX_COMM_REVOKE);
+
+    MPIR_ERRTEST_INITIALIZED_ORDIE();
+
+    MPIU_THREAD_CS_ENTER(ALLFUNC,);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIX_COMM_REVOKE);
+
+    /* Validate parameters, especially handles needing to be converted */
+#    ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            MPIR_ERRTEST_COMM(comm, mpi_errno);
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif
+
+    /* Convert MPI object handles to object pointers */
+    MPID_Comm_get_ptr( comm, comm_ptr );
+
+    /* Validate parameters and objects (post conversion) */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            /* Validate comm_ptr */
+            MPID_Comm_valid_ptr( comm_ptr, mpi_errno, TRUE );
+            if (mpi_errno) goto fn_fail;
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif
+
+    /* ... body of routine ... */
+
+    mpi_errno = MPID_Comm_revoke(comm_ptr, 0);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    /* ... end of body of routine ... */
+
+  fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIX_COMM_REVOKE);
+    MPIU_THREAD_CS_EXIT(ALLFUNC,);
+    return mpi_errno;
+  fn_fail:
+    /* --BEGIN ERROR HANDLING-- */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        mpi_errno = MPIR_Err_create_code(
+            mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_OTHER, "**mpix_comm_revoke",
+            "**mpix_comm_revoke %C", comm);
+    }
+#   endif
+    mpi_errno = MPIR_Err_return_comm( comm_ptr, FCNAME, mpi_errno );
+    goto fn_exit;
+    /* --END ERROR HANDLING-- */
+}
diff --git a/src/mpi/comm/commutil.c b/src/mpi/comm/commutil.c
index dbe81fd..cae702f 100644
--- a/src/mpi/comm/commutil.c
+++ b/src/mpi/comm/commutil.c
@@ -135,6 +135,9 @@ int MPIR_Comm_init(MPID_Comm *comm_p)
     /* abstractions bleed a bit here... :( */
     comm_p->next_sched_tag = MPIR_FIRST_NBC_TAG;
 
+    /* Initialize the revoked flag as false */
+    comm_p->revoked = 0;
+
     /* Fields not set include context_id, remote and local size, and
        kind, since different communicator construction routines need
        different values */
diff --git a/src/mpi/errhan/baseerrnames.txt b/src/mpi/errhan/baseerrnames.txt
index 8f07168..cc71016 100644
--- a/src/mpi/errhan/baseerrnames.txt
+++ b/src/mpi/errhan/baseerrnames.txt
@@ -79,3 +79,4 @@ MPI_ERR_RMA_RANGE   55      **rmarange
 MPI_ERR_RMA_ATTACH  56      **rmaattach
 MPI_ERR_RMA_SHARED  57      **rmashared
 MPI_ERR_RMA_FLAVOR  58      **rmaflavor
+MPIX_ERR_REVOKED    59      **revoked
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 36eaa2c..8bd6810 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -423,6 +423,7 @@ unexpected messages queued.
 **node_root_rank:Unable to get the node root rank
 **proc_failed:Process failed
 **failure_pending:Request pending due to failure
+**revoked:Communication object revoked
 # Duplicates?
 #**argnull:Invalid null parameter
 #**argnull %s:Invalid null parameter %s
@@ -1100,6 +1101,8 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **mpix_comm_failure_ack %C:MPIX_Comm_failure_ack(%C) failed
 **mpix_comm_failure_get_acked:MPIX_Comm_failure_get_acked failed
 **mpix_comm_failure_get_acked %C %p:MPIX_Comm_failure_get_acked(%C, group=%p) failed
+**mpix_comm_revoke:MPIX_Comm_revoke failed
+**mpix_comm_revoke %C:MPIX_Comm_revoke(%C) failed
 **mpi_intercomm_create:MPI_Intercomm_create failed
 **mpi_intercomm_create %C %d %C %d %d %p:MPI_Intercomm_create(%C, local_leader=%d, %C, remote_leader=%d, tag=%d, newintercomm=%p) failed
 **mpi_intercomm_merge:MPI_Intercomm_merge failed
diff --git a/src/mpid/ch3/include/mpidimpl.h b/src/mpid/ch3/include/mpidimpl.h
index fe8e7ed..5e3f82f 100644
--- a/src/mpid/ch3/include/mpidimpl.h
+++ b/src/mpid/ch3/include/mpidimpl.h
@@ -1504,6 +1504,7 @@ MPID_Request * MPIDI_CH3U_Recvq_FDP_or_AEU(MPIDI_Message_match * match,
 int MPIDI_CH3U_Recvq_count_unexp(void);
 int MPIDI_CH3U_Complete_posted_with_error(MPIDI_VC_t *vc);
 int MPIDI_CH3U_Complete_disabled_anysources(void);
+int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr);
 
 
 int MPIDI_CH3U_Request_load_send_iov(MPID_Request * const sreq, 
@@ -1526,6 +1527,7 @@ int MPIDI_CH3U_Receive_data_unexpected(MPID_Request * rreq, char *buf, MPIDI_msg
 int MPIDI_CH3I_Comm_init(void);
 
 int MPIDI_CH3I_Comm_handle_failed_procs(MPID_Group *new_failed_procs);
+void MPIDI_CH3I_Comm_find(MPIR_Context_id_t context_id, MPID_Comm **comm);
 
 /* The functions below allow channels to register functions to be
    called immediately after a communicator has been created, and
@@ -1820,7 +1822,8 @@ int MPIDI_CH3_PktHandler_Close( MPIDI_VC_t *, MPIDI_CH3_Pkt_t *,
 				MPIDI_msg_sz_t *, MPID_Request ** );
 int MPIDI_CH3_PktHandler_EndCH3( MPIDI_VC_t *, MPIDI_CH3_Pkt_t *,
 				 MPIDI_msg_sz_t *, MPID_Request ** );
-
+int MPIDI_CH3_PktHandler_Revoke(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                MPIDI_msg_sz_t *buflen, MPID_Request **rreqp);
 int MPIDI_CH3_PktHandler_Init( MPIDI_CH3_PktHandler_Fcn *[], int );
 
 #ifdef MPICH_DBG_OUTPUT
diff --git a/src/mpid/ch3/include/mpidpkt.h b/src/mpid/ch3/include/mpidpkt.h
index 98c5fce..570cd56 100644
--- a/src/mpid/ch3/include/mpidpkt.h
+++ b/src/mpid/ch3/include/mpidpkt.h
@@ -100,6 +100,7 @@ typedef enum
     MPIDI_CH3_PKT_GET_ACCUM_RESP,
     MPIDI_CH3_PKT_FLOW_CNTL_UPDATE,  /* FIXME: Unused */
     MPIDI_CH3_PKT_CLOSE,
+    MPIDI_CH3_PKT_REVOKE,
     MPIDI_CH3_PKT_END_CH3,
     /* The channel can define additional types by defining the value
        MPIDI_CH3_PKT_ENUM */
@@ -411,6 +412,13 @@ typedef struct MPIDI_CH3_Pkt_close
 }
 MPIDI_CH3_Pkt_close_t;
 
+typedef struct MPIDI_CH3_Pkt_revoke
+{
+    MPIDI_CH3_Pkt_type_t type;
+    MPIR_Context_id_t revoked_comm;
+}
+MPIDI_CH3_Pkt_revoke_t;
+
 typedef union MPIDI_CH3_Pkt
 {
     MPIDI_CH3_Pkt_type_t type;
@@ -445,6 +453,7 @@ typedef union MPIDI_CH3_Pkt
     MPIDI_CH3_Pkt_fop_t fop;
     MPIDI_CH3_Pkt_fop_resp_t fop_resp;
     MPIDI_CH3_Pkt_get_accum_resp_t get_accum_resp;
+    MPIDI_CH3_Pkt_revoke_t revoke;
 # if defined(MPIDI_CH3_PKT_DECL)
     MPIDI_CH3_PKT_DECL
 # endif
diff --git a/src/mpid/ch3/include/mpidpre.h b/src/mpid/ch3/include/mpidpre.h
index 75e0214..13d9544 100644
--- a/src/mpid/ch3/include/mpidpre.h
+++ b/src/mpid/ch3/include/mpidpre.h
@@ -173,6 +173,9 @@ typedef struct MPIDI_CH3I_comm
     int eager_max_msg_sz;   /* comm-wide eager/rendezvous message threshold */
     int anysource_enabled;  /* TRUE iff this anysource recvs can be posted on this communicator */
     int last_ack_rank;      /* The rank of the last acknowledged failure */
+    int waiting_for_revoke; /* The number of other processes from which we are
+                             * waiting for a revoke message before we can release
+                             * the context id */
     struct MPID_nem_barrier_vars *barrier_vars; /* shared memory variables used in barrier */
     struct MPID_Comm *next; /* next pointer for list of communicators */
     struct MPID_Comm *prev; /* prev pointer for list of communicators */
diff --git a/src/mpid/ch3/src/Makefile.mk b/src/mpid/ch3/src/Makefile.mk
index 5160b5b..091613d 100644
--- a/src/mpid/ch3/src/Makefile.mk
+++ b/src/mpid/ch3/src/Makefile.mk
@@ -12,6 +12,7 @@ mpi_core_sources +=                          \
     src/mpid/ch3/src/ch3u_handle_connection.c              \
     src/mpid/ch3/src/ch3u_handle_recv_pkt.c                \
     src/mpid/ch3/src/ch3u_handle_recv_req.c                \
+    src/mpid/ch3/src/ch3u_handle_revoke_pkt.c              \
     src/mpid/ch3/src/ch3u_handle_send_req.c                \
     src/mpid/ch3/src/ch3u_port.c                           \
     src/mpid/ch3/src/ch3u_recvq.c                          \
@@ -30,6 +31,7 @@ mpi_core_sources +=                          \
     src/mpid/ch3/src/mpid_comm_disconnect.c                \
     src/mpid/ch3/src/mpid_comm_spawn_multiple.c            \
     src/mpid/ch3/src/mpid_comm_failure_ack.c               \
+    src/mpid/ch3/src/mpid_comm_revoke.c                    \
     src/mpid/ch3/src/mpid_finalize.c                       \
     src/mpid/ch3/src/mpid_get_universe_size.c              \
     src/mpid/ch3/src/mpid_getpname.c                       \
diff --git a/src/mpid/ch3/src/ch3u_comm.c b/src/mpid/ch3/src/ch3u_comm.c
index 89662b5..e39cd89 100644
--- a/src/mpid/ch3/src/ch3u_comm.c
+++ b/src/mpid/ch3/src/ch3u_comm.c
@@ -333,3 +333,18 @@ int MPIDI_CH3I_Comm_handle_failed_procs(MPID_Group *new_failed_procs)
  fn_fail:
     goto fn_exit;
 }
+
+void MPIDI_CH3I_Comm_find(MPIR_Context_id_t context_id, MPID_Comm **comm)
+{
+    MPIDI_STATE_DECL(MPIDI_STATE_MPIDI_CH3I_COMM_FIND);
+    MPIDI_FUNC_ENTER(MPIDI_STATE_MPIDI_CH3I_COMM_FIND);
+
+    COMM_FOREACH((*comm)) {
+        if ((*comm)->context_id == context_id) {
+            MPIU_DBG_MSG_D(CH3_OTHER,VERBOSE,"Found matching context id: %d", context_id);
+            break;
+        }
+    }
+
+    MPIDI_FUNC_EXIT(MPIDI_STATE_MPIDI_CH3I_COMM_FIND);
+}
diff --git a/src/mpid/ch3/src/ch3u_handle_recv_pkt.c b/src/mpid/ch3/src/ch3u_handle_recv_pkt.c
index 92d3e6b..f60ba73 100644
--- a/src/mpid/ch3/src/ch3u_handle_recv_pkt.c
+++ b/src/mpid/ch3/src/ch3u_handle_recv_pkt.c
@@ -610,6 +610,10 @@ int MPIDI_CH3_PktHandler_Init( MPIDI_CH3_PktHandler_Fcn *pktArray[],
         MPIDI_CH3_PktHandler_Get_AccumResp;
     /* End of default RMA operations */
 
+    /* Fault tolerance */
+    pktArray[MPIDI_CH3_PKT_REVOKE] =
+        MPIDI_CH3_PktHandler_Revoke;
+
  fn_fail:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_PKTHANDLER_INIT);
     return mpi_errno;
diff --git a/src/mpid/ch3/src/ch3u_handle_revoke_pkt.c b/src/mpid/ch3/src/ch3u_handle_revoke_pkt.c
new file mode 100644
index 0000000..29b19e1
--- /dev/null
+++ b/src/mpid/ch3/src/ch3u_handle_revoke_pkt.c
@@ -0,0 +1,38 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2001 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpidimpl.h"
+
+#undef FUNCNAME
+#define FUNCNAME MPIDI_CH3_PktHandler_Revoke
+#undef FCNAME
+#define FCNAME MPIDI_QUOTE(FUNCNAME)
+int MPIDI_CH3_PktHandler_Revoke(MPIDI_VC_t *vc, MPIDI_CH3_Pkt_t *pkt,
+                                MPIDI_msg_sz_t *buflen, MPID_Request **rreqp)
+{
+    MPIDI_CH3_Pkt_revoke_t *revoke_pkt = &pkt->revoke;
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL;
+
+    MPIU_DBG_MSG_D(CH3_OTHER, VERBOSE, "Received revoke pkt from %d", vc->pg_rank);
+
+    /* Search through all of the communicators to find the right context_id */
+    MPIDI_CH3I_Comm_find(revoke_pkt->revoked_comm, &comm_ptr);
+    if (comm_ptr == NULL)
+        MPIU_ERR_SETANDJUMP1(mpi_errno, MPI_ERR_OTHER, "**ch3|postrecv",
+                "**ch3|postrecv %s", "MPIDI_CH3_PKT_REVOKE");
+
+    mpi_errno = MPID_Comm_revoke(comm_ptr, 1);
+    if (mpi_errno != MPI_SUCCESS)
+        MPIU_ERR_SETANDJUMP1(mpi_errno, MPI_ERR_OTHER, "**ch3|postrecv",
+                "**ch3|postrecv %s", "MPIDI_CH3_PKT_REVOKE");
+
+    /* There is no request associated with a revoke packet */
+    *rreqp = NULL;
+
+fn_fail:
+    return mpi_errno;
+}
diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index 67faec6..0ad6d64 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -860,7 +860,7 @@ static inline int req_uses_vc(const MPID_Request* req, const MPIDI_VC_t *vc)
 #define FCNAME MPIU_QUOTE(FUNCNAME)
 /* This dequeues req from the posted recv queue, set req's error code to comm_fail, and updates the req pointer.
    Note that this creates a new error code if one hasn't already been created (i.e., if *error is MPI_SUCCESS). */
-static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev_req, int *error, int rank)
+static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev_req, MPID_Request **head, MPID_Request **tail, int *error, int rank)
 {
     MPID_Request *next = (*req)->dev.next;
 
@@ -872,14 +872,15 @@ static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev
     }
     
     /* remove from queue */
-    if (recvq_posted_head == *req)
-        recvq_posted_head = (*req)->dev.next;
+    if (*head == *req)
+        *head = (*req)->dev.next;
     else
         prev_req->dev.next = (*req)->dev.next;
-    if (recvq_posted_tail == *req)
-        recvq_posted_tail = prev_req;
+    if (*tail == *req)
+        *tail = prev_req;
 
-    MPIR_T_PVAR_LEVEL_DEC(RECVQ, posted_recvq_length, 1);
+    if (*head == recvq_posted_head)
+        MPIR_T_PVAR_LEVEL_DEC(RECVQ, posted_recvq_length, 1);
 
     /* set error and complete */
     (*req)->status.MPI_ERROR = *error;
@@ -890,6 +891,101 @@ static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev
     *req = next;
 }
 
+/*
+ * MPIDI_CH3U_Clean_recvq()
+ *
+ * Looks through the entire unexpected recv queue and the posted recv queues.
+ * If a request is found that involved the provided communicator (comm_ptr),
+ * it is dequeed and marked as failed via MPIX_ERR_REVOKED.
+ *
+ * Multithread - This routine must be called from within a MSGQUEUE
+ * critical section.  If a request is allocated, it must not release
+ * the MSGQUEUE until the request is completely valid, as another thread
+ * may then find it and dequeue it.
+ *
+ */
+int MPIDI_CH3U_Clean_recvq(MPID_Comm *comm_ptr)
+{
+    int mpi_errno = MPI_SUCCESS;
+    int error = MPIX_ERR_REVOKED;
+    MPID_Request *rreq, *prev_rreq = NULL;
+    MPIDI_Message_match match;
+    MPIDI_Message_match mask;
+    MPIDI_STATE_DECL(MPIDI_CH3U_CLEAN_RECVQ);
+
+    MPIDI_FUNC_ENTER(MPIDI_CH3U_CLEAN_RECVQ);
+
+    MPIU_THREAD_CS_ASSERT_HELD(MSGQUEUE);
+
+    rreq = recvq_unexpected_head;
+    mask.parts.context_id = ~0;
+    mask.parts.rank = mask.parts.tag = 0;
+
+    /* Clear the error bit in the tag since we don't care about whether or
+     * not we're trying to report an error anymore. */
+    MPIR_TAG_CLEAR_ERROR_BIT(mask.parts.tag);
+
+    while (NULL != rreq) {
+        /* We'll have to do this matching twice. Once for the pt2pt context id
+         * and once for the collective context id */
+        match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_PT2PT;
+
+        if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                        "cleaning up unexpected pt2pt pkt rank=%d tag=%d contextid=%d",
+                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+            dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+            continue;
+        }
+
+        match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
+
+        if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                        "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
+                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+            dequeue_and_set_error(&rreq, prev_rreq, &recvq_unexpected_head, &recvq_unexpected_tail, &error, MPI_PROC_NULL);
+            continue;
+        }
+
+        prev_rreq = rreq;
+        rreq = rreq->dev.next;
+    }
+
+    rreq = recvq_posted_head;
+    prev_rreq = NULL;
+
+    while (NULL != rreq) {
+        /* We'll have to do this matching twice. Once for the pt2pt context id
+         * and once for the collective context id */
+        match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_PT2PT;
+
+        if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                        "cleaning up unexpected pt2pt pkt rank=%d tag=%d contextid=%d",
+                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+            dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+            continue;
+        }
+
+        match.parts.context_id = comm_ptr->recvcontext_id + MPID_CONTEXT_INTRA_COLL;
+
+        if (MATCH_WITH_LEFT_RIGHT_MASK(rreq->dev.match, match, mask)) {
+            MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
+                        "cleaning up unexpected collective pkt rank=%d tag=%d contextid=%d",
+                        rreq->dev.match.parts.rank, rreq->dev.match.parts.tag, rreq->dev.match.parts.context_id));
+            dequeue_and_set_error(&rreq, prev_rreq, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
+            continue;
+        }
+
+        rreq = rreq->dev.next;
+    }
+
+    MPIDI_FUNC_EXIT(MPIDI_CH3U_CLEAN_RECVQ);
+
+    return mpi_errno;
+}
+
 #undef FUNCNAME
 #define FUNCNAME MPIDI_CH3U_Complete_disabled_anysources
 #undef FCNAME
@@ -911,7 +1007,7 @@ int MPIDI_CH3U_Complete_disabled_anysources(void)
     prev_req = NULL;
     while (req) {
         if (req->dev.match.parts.rank == MPI_ANY_SOURCE && !MPIDI_CH3I_Comm_AS_enabled(req->comm)) {
-            dequeue_and_set_error(&req, prev_req, &error, MPI_PROC_NULL); /* we don't know the rank of the failed proc */
+            dequeue_and_set_error(&req, prev_req, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL); /* we don't know the rank of the failed proc */
         } else {
             prev_req = req;
             req = req->dev.next;
@@ -949,7 +1045,7 @@ int MPIDI_CH3U_Complete_posted_with_error(MPIDI_VC_t *vc)
     prev_req = NULL;
     while (req) {
         if (req->dev.match.parts.rank != MPI_ANY_SOURCE && req_uses_vc(req, vc)) {
-            dequeue_and_set_error(&req, prev_req, &error, vc->pg_rank);
+            dequeue_and_set_error(&req, prev_req, &recvq_posted_head, &recvq_posted_tail, &error, MPI_PROC_NULL);
         } else {
             prev_req = req;
             req = req->dev.next;
diff --git a/src/mpid/ch3/src/mpid_comm_disconnect.c b/src/mpid/ch3/src/mpid_comm_disconnect.c
index 67681ac..a115c6e 100644
--- a/src/mpid/ch3/src/mpid_comm_disconnect.c
+++ b/src/mpid/ch3/src/mpid_comm_disconnect.c
@@ -24,11 +24,16 @@
 #define FCNAME MPIDI_QUOTE(FUNCNAME)
 int MPID_Comm_disconnect(MPID_Comm *comm_ptr)
 {
-    int mpi_errno;
+    int mpi_errno = MPI_SUCCESS;
     MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_DISCONNECT);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_DISCONNECT);
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm_ptr->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     /* Before releasing the communicator, we need to ensure that all VCs are
        in a stable state.  In particular, if a VC is still in the process of
        connecting, complete the connection before tearing it down */
diff --git a/src/mpid/ch3/src/mpid_comm_revoke.c b/src/mpid/ch3/src/mpid_comm_revoke.c
new file mode 100644
index 0000000..7a10b7d
--- /dev/null
+++ b/src/mpid/ch3/src/mpid_comm_revoke.c
@@ -0,0 +1,108 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *   (C) 2014 by Argonne National Laboratory.
+ *       See COPYRIGHT in top-level directory.
+ */
+
+#include "mpidimpl.h"
+
+/*
+ * This function does all of the work or either revoking the communciator for
+ * the first time or keeping track of an ongoing revocation.
+ *
+ * comm_ptr  - The communicator being revoked
+ * is_remote - If we received the revocation from a remote process, this should
+ *             be set to true. This way we'll know to decrement the counter twice
+ *             (once for our local revocation and once for the remote).
+ */
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_revoke
+#undef FCNAME
+#define FCNAME MPIDI_QUOTE(FUNCNAME)
+int MPID_Comm_revoke(MPID_Comm *comm_ptr, int is_remote)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPIDI_CH3_Pkt_t upkt;
+    MPIDI_CH3_Pkt_revoke_t *revoke_pkt = &upkt.revoke;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_REVOKE);
+    MPIDI_VC_t *vc;
+    MPID_IOV iov[MPID_IOV_LIMIT];
+    int i, size, my_rank, failed = 0;
+    MPID_Request *request;
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_REVOKE);
+
+    if (0 == comm_ptr->revoked) {
+        /* Mark the communicator as revoked locally */
+        comm_ptr->revoked = 1;
+
+        /* Keep a reference to this comm so it doesn't get destroyed while
+         * it's being revoked */
+        MPIR_Comm_add_ref(comm_ptr);
+
+        /* Send out the revoke message */
+        MPIDI_Pkt_init(revoke_pkt, MPIDI_CH3_PKT_REVOKE);
+        revoke_pkt->revoked_comm = comm_ptr->context_id;
+
+        size = comm_ptr->remote_size;
+        my_rank = comm_ptr->rank;
+        for (i = 0; i < size; i++) {
+            if (i == my_rank) continue;
+            request = NULL;
+
+            MPIDI_Comm_get_vc_set_active(comm_ptr, i, &vc);
+
+            iov[0].MPID_IOV_BUF = (MPID_IOV_BUF_CAST) revoke_pkt;
+            iov[0].MPID_IOV_LEN = sizeof(*revoke_pkt);
+
+            MPIU_THREAD_CS_ENTER(CH3COMM, vc);
+            mpi_errno = MPIDI_CH3_iStartMsgv(vc, iov, 1, &request);
+            MPIU_THREAD_CS_EXIT(CH3COMM, vc);
+            if (mpi_errno) failed++;
+            if (NULL != request)
+                /* We don't need to keep a reference to this request. The
+                 * progress engine will keep a reference until it completes
+                 * later */
+                MPID_Request_release(request);
+        }
+
+        /* Start a counter to track how many revoke messages we've received from
+         * other ranks */
+        comm_ptr->ch.waiting_for_revoke = comm_ptr->local_size - 1 - is_remote - failed; /* Subtract the processes who already know about the revoke */
+        MPIU_DBG_MSG_FMT(CH3_OTHER, VERBOSE, (MPIU_DBG_FDEST, "Comm %08x waiting_for_revoke: %d", comm_ptr->handle, comm_ptr->ch.waiting_for_revoke));
+
+        /* Check to see if we are done revoking */
+        if (comm_ptr->ch.waiting_for_revoke == 0) {
+            MPIR_Comm_release(comm_ptr, 0);
+        }
+
+        /* Go clean up all of the existing operations involving this
+         * communicator. This includes completing existing MPI requests, MPID
+         * requests, and cleaning up the unexpected queue to make sure there
+         * aren't any unexpected messages hanging around. */
+
+        /* Clean up the receive and unexpected queues */
+        MPIU_THREAD_CS_ENTER(MSGQUEUE,);
+        MPIDI_CH3U_Clean_recvq(comm_ptr);
+        MPIU_THREAD_CS_EXIT(MSGQUEUE,);
+    } else if (is_remote)  { /* If this is local, we've already revoked and don't need to do it again. */
+        /* Decrement the revoke counter */
+        comm_ptr->ch.waiting_for_revoke--;
+        MPIU_DBG_MSG_FMT(CH3_OTHER, VERBOSE, (MPIU_DBG_FDEST, "Comm %08x waiting_for_revoke: %d", comm_ptr->handle, comm_ptr->ch.waiting_for_revoke));
+
+        /* Check to see if we are done revoking */
+        if (comm_ptr->ch.waiting_for_revoke == 0) {
+            MPIR_Comm_release(comm_ptr, 0);
+        }
+    }
+
+fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_REVOKE);
+    return MPI_SUCCESS;
+fn_fail:
+    if (request) {
+        MPIU_Object_set_ref(request, 0);
+        MPIDI_CH3_Request_destroy(request);
+    }
+    goto fn_exit;
+}
diff --git a/src/mpid/ch3/src/mpid_comm_spawn_multiple.c b/src/mpid/ch3/src/mpid_comm_spawn_multiple.c
index 975c040..12cd32b 100644
--- a/src/mpid/ch3/src/mpid_comm_spawn_multiple.c
+++ b/src/mpid/ch3/src/mpid_comm_spawn_multiple.c
@@ -43,6 +43,11 @@ int MPID_Comm_spawn_multiple(int count, char *array_of_commands[],
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_SPAWN_MULTIPLE);
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm_ptr->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     /* We allow an empty implementation of this function to 
        simplify building MPICH on systems that have difficulty
        supporing process creation */
@@ -57,6 +62,8 @@ int MPID_Comm_spawn_multiple(int count, char *array_of_commands[],
 		  "**notimpl %s", FCNAME);
 #   endif
     
+fn_fail:
+fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_SPAWN_MULTIPLE);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_improbe.c b/src/mpid/ch3/src/mpid_improbe.c
index 5dd7b5f..4f2dfd7 100644
--- a/src/mpid/ch3/src/mpid_improbe.c
+++ b/src/mpid/ch3/src/mpid_improbe.c
@@ -30,6 +30,11 @@ int MPID_Improbe(int source, int tag, MPID_Comm *comm, int context_offset,
         goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
 #ifdef ENABLE_COMM_OVERRIDES
     if (MPIDI_Anysource_improbe_fn) {
         if (source == MPI_ANY_SOURCE) {
diff --git a/src/mpid/ch3/src/mpid_iprobe.c b/src/mpid/ch3/src/mpid_iprobe.c
index 601f27b..07e1205 100644
--- a/src/mpid/ch3/src/mpid_iprobe.c
+++ b/src/mpid/ch3/src/mpid_iprobe.c
@@ -32,6 +32,11 @@ int MPID_Iprobe(int source, int tag, MPID_Comm *comm, int context_offset,
 	goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
 #ifdef ENABLE_COMM_OVERRIDES
     if (MPIDI_Anysource_iprobe_fn) {
         if (source == MPI_ANY_SOURCE) {
diff --git a/src/mpid/ch3/src/mpid_irecv.c b/src/mpid/ch3/src/mpid_irecv.c
index 103a352..28e7b66 100644
--- a/src/mpid/ch3/src/mpid_irecv.c
+++ b/src/mpid/ch3/src/mpid_irecv.c
@@ -31,6 +31,12 @@ int MPID_Irecv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
         goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Comm has been revoked. Returning from MPID_IRECV.");
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     MPIU_THREAD_CS_ENTER(MSGQUEUE,);
     rreq = MPIDI_CH3U_Recvq_FDU_or_AEP(rank, tag, 
 				       comm->recvcontext_id + context_offset,
@@ -161,6 +167,8 @@ int MPID_Irecv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
 		   rreq->handle);
 
  fn_fail:
+    MPIU_DBG_MSG_D(CH3_OTHER,VERBOSE,"IRECV errno: 0x%08x", mpi_errno);
+    MPIU_DBG_MSG_D(CH3_OTHER,VERBOSE,"(class: %d)", MPIR_ERR_GET_CLASS(mpi_errno));
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_IRECV);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_irsend.c b/src/mpid/ch3/src/mpid_irsend.c
index 5477ea0..d1ce07f 100644
--- a/src/mpid/ch3/src/mpid_irsend.c
+++ b/src/mpid/ch3/src/mpid_irsend.c
@@ -37,6 +37,11 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                 "rank=%d, tag=%d, context=%d", 
                 rank, tag, comm->context_id + context_offset));
+
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
     
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
@@ -145,6 +150,7 @@ int MPID_Irsend(const void * buf, int count, MPI_Datatype datatype, int rank, in
     }
 		  );
     
+  fn_fail:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_IRSEND);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_isend.c b/src/mpid/ch3/src/mpid_isend.c
index f110316..086dc6f 100644
--- a/src/mpid/ch3/src/mpid_isend.c
+++ b/src/mpid/ch3/src/mpid_isend.c
@@ -49,6 +49,12 @@ int MPID_Isend(const void * buf, int count, MPI_Datatype datatype, int rank,
     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                   "rank=%d, tag=%d, context=%d", 
                   rank, tag, comm->context_id + context_offset));
+
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_DBG_MSG(CH3_OTHER,VERBOSE,"Communicator revoked. MPID_ISEND returning");
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
     
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
@@ -179,6 +185,7 @@ int MPID_Isend(const void * buf, int count, MPI_Datatype datatype, int rank,
     }
 		  );
     
+  fn_fail:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_ISEND);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_issend.c b/src/mpid/ch3/src/mpid_issend.c
index 189907b..ce672fb 100644
--- a/src/mpid/ch3/src/mpid_issend.c
+++ b/src/mpid/ch3/src/mpid_issend.c
@@ -36,6 +36,11 @@ int MPID_Issend(const void * buf, int count, MPI_Datatype datatype, int rank, in
     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
                  "rank=%d, tag=%d, context=%d", 
                  rank, tag, comm->context_id + context_offset));
+
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
     
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
@@ -124,6 +129,7 @@ int MPID_Issend(const void * buf, int count, MPI_Datatype datatype, int rank, in
     }
 		  )
     
+  fn_fail:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_ISSEND);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_mprobe.c b/src/mpid/ch3/src/mpid_mprobe.c
index bc4aed4..550a832 100644
--- a/src/mpid/ch3/src/mpid_mprobe.c
+++ b/src/mpid/ch3/src/mpid_mprobe.c
@@ -28,6 +28,11 @@ int MPID_Mprobe(int source, int tag, MPID_Comm *comm, int context_offset,
         goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
 #ifdef ENABLE_COMM_OVERRIDES
     if (MPIDI_Anysource_improbe_fn) {
         if (source == MPI_ANY_SOURCE) {
diff --git a/src/mpid/ch3/src/mpid_probe.c b/src/mpid/ch3/src/mpid_probe.c
index 6d2f41b..3b91941 100644
--- a/src/mpid/ch3/src/mpid_probe.c
+++ b/src/mpid/ch3/src/mpid_probe.c
@@ -26,6 +26,11 @@ int MPID_Probe(int source, int tag, MPID_Comm * comm, int context_offset,
 	goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
 #ifdef ENABLE_COMM_OVERRIDES
     if (MPIDI_Anysource_iprobe_fn) {
         if (source == MPI_ANY_SOURCE) {
diff --git a/src/mpid/ch3/src/mpid_recv.c b/src/mpid/ch3/src/mpid_recv.c
index 76d782e..45f1580 100644
--- a/src/mpid/ch3/src/mpid_recv.c
+++ b/src/mpid/ch3/src/mpid_recv.c
@@ -39,6 +39,11 @@ int MPID_Recv(void * buf, int count, MPI_Datatype datatype, int rank, int tag,
 	goto fn_exit;
     }
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     MPIU_THREAD_CS_ENTER(MSGQUEUE,);
     rreq = MPIDI_CH3U_Recvq_FDU_or_AEP(rank, tag, 
 				       comm->recvcontext_id + context_offset,
diff --git a/src/mpid/ch3/src/mpid_rma.c b/src/mpid/ch3/src/mpid_rma.c
index caaebe6..c112db0 100644
--- a/src/mpid/ch3/src/mpid_rma.c
+++ b/src/mpid/ch3/src/mpid_rma.c
@@ -77,6 +77,11 @@ int MPID_Win_create(void *base, MPI_Aint size, int disp_unit, MPID_Info *info,
     
     MPIDI_RMA_FUNC_ENTER(MPID_STATE_MPID_WIN_CREATE);
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm_ptr->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     mpi_errno = win_init(size, disp_unit, MPI_WIN_FLAVOR_CREATE, MPI_WIN_UNIFIED, comm_ptr, win_ptr);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
diff --git a/src/mpid/ch3/src/mpid_rsend.c b/src/mpid/ch3/src/mpid_rsend.c
index 7cacea0..4561804 100644
--- a/src/mpid/ch3/src/mpid_rsend.c
+++ b/src/mpid/ch3/src/mpid_rsend.c
@@ -39,6 +39,11 @@ int MPID_Rsend(const void * buf, int count, MPI_Datatype datatype, int rank, int
     MPIU_DBG_MSG_FMT(CH3_OTHER,VERBOSE,(MPIU_DBG_FDEST,
 					"rank=%d, tag=%d, context=%d", 
                               rank, tag, comm->context_id + context_offset));
+
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
     
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
@@ -151,6 +156,7 @@ int MPID_Rsend(const void * buf, int count, MPI_Datatype datatype, int rank, int
     }
 		  );
     
+  fn_fail:
     MPIDI_FUNC_EXIT(MPID_STATE_MPID_RSEND);
     return mpi_errno;
 }
diff --git a/src/mpid/ch3/src/mpid_send.c b/src/mpid/ch3/src/mpid_send.c
index 0328b0c..d96cf14 100644
--- a/src/mpid/ch3/src/mpid_send.c
+++ b/src/mpid/ch3/src/mpid_send.c
@@ -38,6 +38,11 @@ int MPID_Send(const void * buf, int count, MPI_Datatype datatype, int rank,
                 "rank=%d, tag=%d, context=%d", 
 		rank, tag, comm->context_id + context_offset));
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
 	mpi_errno = MPIDI_Isend_self(buf, count, datatype, rank, tag, comm, 
diff --git a/src/mpid/ch3/src/mpid_ssend.c b/src/mpid/ch3/src/mpid_ssend.c
index 6127d09..e4ed3dc 100644
--- a/src/mpid/ch3/src/mpid_ssend.c
+++ b/src/mpid/ch3/src/mpid_ssend.c
@@ -37,6 +37,11 @@ int MPID_Ssend(const void * buf, int count, MPI_Datatype datatype, int rank, int
               "rank=%d, tag=%d, context=%d", 
               rank, tag, comm->context_id + context_offset));
 
+    /* Check to make sure the communicator hasn't already been revoked */
+    if (comm->revoked) {
+        MPIU_ERR_SETANDJUMP(mpi_errno,MPIX_ERR_REVOKED,"**revoked");
+    }
+
     if (rank == comm->rank && comm->comm_kind != MPID_INTERCOMM)
     {
 	mpi_errno = MPIDI_Isend_self(buf, count, datatype, rank, tag, comm, 
@@ -109,6 +114,7 @@ int MPID_Ssend(const void * buf, int count, MPI_Datatype datatype, int rank, int
 	   must wait until sreq completes */
     }
 
+  fn_fail:
   fn_exit:
     *request = sreq;
     
diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index 01f5a96..90fa619 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -10,4 +10,4 @@ include $(top_srcdir)/Makefile.mtest
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
-noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource
+noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource revoke_nofail

http://git.mpich.org/mpich.git/commitdiff/628d2daf99811e7a426c08f3726ec8072d927178

commit 628d2daf99811e7a426c08f3726ec8072d927178
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Mar 28 14:03:20 2014 -0500

    Add test for anysource handling
    
    This test ensures that MPI_ANY_SOURCE receives are handles correctly after a
    failure occurs. It tests both that failures are returned when they should be
    (unacknowledged failures) and not returned when they shouldn't (acknowledged
    failures).
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index 98b8123..01f5a96 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -10,4 +10,4 @@ include $(top_srcdir)/Makefile.mtest
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
-noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack
+noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack anysource
diff --git a/test/mpi/ft/anysource.c b/test/mpi/ft/anysource.c
new file mode 100644
index 0000000..31459f6
--- /dev/null
+++ b/test/mpi/ft/anysource.c
@@ -0,0 +1,78 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <stdlib.h>
+#include <unistd.h>
+
+/*
+ * This test makes sure the MPI_ANY_SOURCE receive operations are handled
+ * correctly. */
+int main(int argc, char **argv)
+{
+    int rank, size, err, ec;
+    char buf[10] = " No errors";
+    char error[MPI_MAX_ERROR_STRING];
+    MPI_Request request;
+    MPI_Status status;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size < 3) {
+        fprintf(stderr, "Must run with at least 3 processes\n");
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    if (rank == 1) {
+        exit(EXIT_FAILURE);
+    }
+
+    /* Make sure ANY_SOURCE returns correctly after a failure */
+    if (rank == 0) {
+        err = MPI_Recv(buf, 10, MPI_CHAR, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        if (MPI_SUCCESS == err) {
+            fprintf(stderr, "Expected a failure for receive from ANY_SOURCE\n");
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        /* Make sure that new ANY_SOURCE operations don't work yet */
+        MPI_Irecv(buf, 10, MPI_CHAR, MPI_ANY_SOURCE, 0, MPI_COMM_WORLD, &request);
+        err = MPI_Wait(&request, &status);
+        if (MPI_SUCCESS == err) {
+            fprintf(stderr, "Expected a failure for receive from ANY_SOURCE\n");
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        /* Make sure that ANY_SOURCE works after failures are acknowledged */
+        MPIX_Comm_failure_ack(MPI_COMM_WORLD);
+        err = MPI_Wait(&request, &status);
+        if (MPI_SUCCESS != err) {
+            MPI_Error_class(err, &ec);
+            MPI_Error_string(err, error, &size);
+            fprintf(stderr, "Unexpected failure after acknowledged failure (%d)\n%s", ec, error);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        fprintf(stdout, "%s\n", buf);
+    } else if (rank == 2) {
+        /* Make sure we don't send our first message too early */
+        sleep(2);
+
+        err = MPI_Send(buf, 10, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
+        if (MPI_SUCCESS != err) {
+            MPI_Error_class(err, &ec);
+            MPI_Error_string(err, error, &size);
+            fprintf(stderr, "Unexpected failure from MPI_Send (%d)\n%s", ec, error);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+    }
+
+    MPI_Finalize();
+}
diff --git a/test/mpi/ft/revoke_nofail.c b/test/mpi/ft/revoke_nofail.c
new file mode 100644
index 0000000..9e48712
--- /dev/null
+++ b/test/mpi/ft/revoke_nofail.c
@@ -0,0 +1,66 @@
+
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+/*
+ * This test ensures that MPI_Comm_revoke works when a process failure has not
+ * occurred yet.
+ */
+int main(int argc, char **argv)
+{
+    int rank, size;
+    int rc, ec;
+    char error[MPI_MAX_ERROR_STRING];
+    MPI_Comm world_dup, world_dup2;
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size < 4) {
+        fprintf( stderr, "Must run with at least 4 processes\n" );
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    MPI_Comm_dup(MPI_COMM_WORLD, &world_dup);
+    MPI_Comm_dup(MPI_COMM_WORLD, &world_dup2);
+
+    if (rank == 3)
+        MPIX_Comm_revoke(world_dup);
+
+    rc = MPI_Barrier(world_dup);
+    MPI_Error_class(rc, &ec);
+    if (ec != MPIX_ERR_REVOKED) {
+        MPI_Error_string(ec, error, &size);
+        fprintf(stderr, "[%d] MPI_Barrier should have returned MPIX_ERR_REVOKED (%d), but it actually returned: %d\n%s\n",
+                rank, MPIX_ERR_REVOKED, ec, error);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    rc = MPI_Barrier(world_dup2);
+    MPI_Error_class(rc, &ec);
+    if (ec != MPI_SUCCESS) {
+        MPI_Error_string(ec, error, &size);
+        fprintf(stderr, "[%d] MPI_Barrier should have returned MPI_SUCCESS, but it actually returned: %d\n%s\n",
+                rank, ec, error);
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_free(&world_dup);
+    MPI_Comm_free(&world_dup2);
+
+    if (rank == 0)
+        fprintf(stdout, " No errors\n");
+
+    MPI_Finalize();
+
+    return 0;
+}
diff --git a/test/mpi/ft/testlist b/test/mpi/ft/testlist
index 275b1fc..a7317eb 100644
--- a/test/mpi/ft/testlist
+++ b/test/mpi/ft/testlist
@@ -11,3 +11,5 @@ gather 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=f
 reduce 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 bcast 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
 scatter 4 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10 xfail=ticket1945
+anysource 3 mpiexecarg=-disable-auto-cleanup resultTest=TestStatusNoErrors strict=false timeLimit=10
+revoke_nofail 4 mpiexecarg=-disable-auto-cleanup resultsTest=TestStatusNoErrors strict=false timelimit=10

http://git.mpich.org/mpich.git/commitdiff/5c71c3a8bf633063445cdc29b19f1c1104527bb9

commit 5c71c3a8bf633063445cdc29b19f1c1104527bb9
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Mar 28 13:08:26 2014 -0500

    Remove coll_active field in MPIDI_Comm
    
    The collectively active field wasn't doing anything anymore so it's been
    removed. This was a remnant from a previous FT proposal.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpid/ch3/include/mpidpre.h b/src/mpid/ch3/include/mpidpre.h
index 48769aa..75e0214 100644
--- a/src/mpid/ch3/include/mpidpre.h
+++ b/src/mpid/ch3/include/mpidpre.h
@@ -171,7 +171,6 @@ typedef union {
 typedef struct MPIDI_CH3I_comm
 {
     int eager_max_msg_sz;   /* comm-wide eager/rendezvous message threshold */
-    int coll_active;        /* TRUE iff this communicator is collectively active */
     int anysource_enabled;  /* TRUE iff this anysource recvs can be posted on this communicator */
     int last_ack_rank;      /* The rank of the last acknowledged failure */
     struct MPID_nem_barrier_vars *barrier_vars; /* shared memory variables used in barrier */
diff --git a/src/mpid/ch3/src/ch3u_comm.c b/src/mpid/ch3/src/ch3u_comm.c
index 4be2c02..89662b5 100644
--- a/src/mpid/ch3/src/ch3u_comm.c
+++ b/src/mpid/ch3/src/ch3u_comm.c
@@ -203,7 +203,6 @@ int comm_created(MPID_Comm *comm, void *param)
 
     MPIDI_FUNC_ENTER(MPID_STATE_COMM_CREATED);
 
-    comm->ch.coll_active = TRUE;
     comm->ch.anysource_enabled = TRUE;
 
     /* Use the VC's eager threshold by default. */
@@ -308,7 +307,7 @@ int MPIDI_CH3I_Comm_handle_failed_procs(MPID_Group *new_failed_procs)
     COMM_FOREACH(comm) {
         /* if this comm is already collectively inactive and
            anysources are disabled, there's no need to check */
-        if (!comm->ch.coll_active && !comm->ch.anysource_enabled)
+        if (!comm->ch.anysource_enabled)
             continue;
 
         mpi_errno = nonempty_intersection(comm, new_failed_procs, &flag);
@@ -316,9 +315,8 @@ int MPIDI_CH3I_Comm_handle_failed_procs(MPID_Group *new_failed_procs)
 
         if (flag) {
             MPIU_DBG_MSG_FMT(CH3_OTHER, VERBOSE,
-                             (MPIU_DBG_FDEST, "disabling AS and coll on communicator %p (%#08x)",
+                             (MPIU_DBG_FDEST, "disabling AS on communicator %p (%#08x)",
                               comm, comm->handle));
-            comm->ch.coll_active = FALSE;
             comm->ch.anysource_enabled = FALSE;
         }
     }

http://git.mpich.org/mpich.git/commitdiff/39b958059fbd50f05d92190ce8eb507437a4878e

commit 39b958059fbd50f05d92190ce8eb507437a4878e
Author: Wesley Bland <wbland at anl.gov>
Date:   Fri Mar 28 11:16:49 2014 -0500

    Fix bug where ANY_SOURCE recv could complete when it shouldn't
    
    There was a case where an MPI_ANY_SOURCE recv call could complete successfully
    if there was already a message waiting in the unexpected receive queue when
    the call to a receive function was processed, even if any_source operations
    had already been disabled on the communicator because of an unacknowledged
    failure.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index 8ec9057..67faec6 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -494,7 +494,8 @@ MPID_Request * MPIDI_CH3U_Recvq_FDU_or_AEP(int source, int tag,
                                            int context_id, MPID_Comm *comm, void *user_buf,
                                            int user_count, MPI_Datatype datatype, int * foundp)
 {
-    int found;
+    int mpi_errno = MPI_SUCCESS;
+    int found = FALSE;
     MPID_Request *rreq, *prev_rreq;
     MPIDI_Message_match match;
     MPIDI_Message_match mask;
@@ -556,8 +557,15 @@ MPID_Request * MPIDI_CH3U_Recvq_FDU_or_AEP(int source, int tag,
 	else {
 	    if (tag == MPI_ANY_TAG)
 		match.parts.tag = mask.parts.tag = 0;
-	    if (source == MPI_ANY_SOURCE)
-		match.parts.rank = mask.parts.rank = 0;
+            if (source == MPI_ANY_SOURCE) {
+                if (!MPIDI_CH3I_Comm_AS_enabled(comm)) {
+                    MPIU_ERR_SET(mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail");
+                    rreq->status.MPI_ERROR = mpi_errno;
+                    MPIDI_CH3U_Request_complete(rreq);
+                    goto lock_exit;
+                }
+                match.parts.rank = mask.parts.rank = 0;
+            }
 
 	    do {
             MPIR_T_PVAR_COUNTER_INC(RECVQ, unexpected_recvq_match_attempts, 1);
@@ -594,8 +602,6 @@ MPID_Request * MPIDI_CH3U_Recvq_FDU_or_AEP(int source, int tag,
     /* A matching request was not found in the unexpected queue, so we 
        need to allocate a new request and add it to the posted queue */
     {
-	int mpi_errno = MPI_SUCCESS;
-
         found = FALSE;
 
 	MPIDI_Request_create_rreq( rreq, mpi_errno, goto lock_exit );

http://git.mpich.org/mpich.git/commitdiff/8652e0ade03c6b5a8dcc8205a1d978413471f130

commit 8652e0ade03c6b5a8dcc8205a1d978413471f130
Author: Wesley Bland <wbland at anl.gov>
Date:   Mon Mar 24 16:44:08 2014 -0500

    Add MPIX_Comm_failure_ack/get_acked
    
    This commit adds the new functions MPI(X)_COMM_FAILURE_ACK and
    MPI(X)_COMM_FAILURE_GET_ACKED. These two functions together allow the user to
    get the group of failed processes.
    
    Most of the implementation for this is pushed into the MPID layer since some
    systems won't support this (PAMI). The existing function
    MPIDI_CH3U_Check_for_failed_procs has been modified to give back the group of
    acknowledged failed processes. There is an inefficiency here in that the list
    of failed processes is retrieved from PMI and parsed every time the user calls
    both failure_ack and get_acked, but this means we don't have to try to cache
    the list that comes back from PMI (which could potentially be expensive, but
    would have some cost even in the failure-free case).
    
    This commit adds a failed to the MPID_Comm structure. There is now a field
    called last_ack_rank. This is a single integer that stores the last
    acknowledged failure for this communicator which is used to determine when to
    stop parsing when getting back the list of acknowledged failed processes.
    
    Lastly, this commit includes a test to make sure that all of the above works
    (test/mpi/ft/failure_ack). This tests that a failure is appropriately included
    in the failed group and excluded if the failure was not previously
    acknowledged.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index d1925f2..c2afed9 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -1534,6 +1534,8 @@ int MPI_T_category_changed(int *stamp);
 
 /* Non-standard but public extensions to MPI */
 /* Fault Tolerance Extensions */
+int MPIX_Comm_failure_ack(MPI_Comm comm);
+int MPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 
 
 /* End Prototypes */
@@ -2169,6 +2171,8 @@ int PMPI_T_category_changed(int *stamp);
 
 /* Non-standard but public extensions to MPI */
 /* Fault Tolerance Extensions */
+int PMPIX_Comm_failure_ack(MPI_Comm comm);
+int PMPIX_Comm_failure_get_acked(MPI_Comm comm, MPI_Group *failedgrp);
 
 
 #endif  /* MPI_BUILD_PROFILING */
diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index 105d7b3..cc711df 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -2754,6 +2754,31 @@ int MPID_Comm_spawn_multiple(int, char *[], char **[], const int [], MPID_Info*
                              int, MPID_Comm *, MPID_Comm **, int []);
 
 /*@
+  MPID_Comm_failure_ack - MPID entry point for MPI_Comm_failure_ack
+
+  Input Parameters:
+. comm - communicator
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_failure_ack(MPID_Comm *comm);
+
+/*@
+  MPID_Comm_failure_get_acked - MPID entry point for MPI_Comm_failure_get_acked
+
+  Input Parameters:
+. comm - communicator
+
+  Output Parameters
+. failed_group_ptr - group of failed processes
+
+  Return Value:
+  'MPI_SUCCESS' or a valid MPI error code.
+@*/
+int MPID_Comm_failure_get_acked(MPID_Comm *comm, MPID_Group **failed_group_ptr);
+
+/*@
   MPID_Send - MPID entry point for MPI_Send
 
   Notes:
diff --git a/src/mpi/comm/Makefile.mk b/src/mpi/comm/Makefile.mk
index a14b695..5dd2743 100644
--- a/src/mpi/comm/Makefile.mk
+++ b/src/mpi/comm/Makefile.mk
@@ -26,7 +26,9 @@ mpi_sources +=                       \
     src/mpi/comm/comm_test_inter.c   \
     src/mpi/comm/intercomm_create.c  \
     src/mpi/comm/intercomm_merge.c   \
-    src/mpi/comm/comm_split_type.c
+    src/mpi/comm/comm_split_type.c   \
+    src/mpi/comm/comm_failure_ack.c            \
+    src/mpi/comm/comm_failure_get_acked.c
 
 mpi_core_sources += \
     src/mpi/comm/commutil.c
diff --git a/src/mpi/comm/comm_failure_ack.c b/src/mpi/comm/comm_failure_ack.c
new file mode 100644
index 0000000..7f2dab8
--- /dev/null
+++ b/src/mpi/comm/comm_failure_ack.c
@@ -0,0 +1,114 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpiimpl.h"
+#include "mpicomm.h"
+
+/* -- Begin Profiling Symbol Block for routine MPIX_Comm_failure_ack */
+#if defined(HAVE_PRAGMA_WEAK)
+#pragma weak MPIX_Comm_failure_ack = PMPIX_Comm_failure_ack
+#elif defined(HAVE_PRAGMA_HP_SEC_DEF)
+#pragma _HP_SECONDARY_DEF PMPIX_Comm_failure_ack  MPIX_Comm_failure_ack
+#elif defined(HAVE_PRAGMA_CRI_DUP)
+#pragma _CRI duplicate MPIX_Comm_failure_ack as PMPIX_Comm_failure_ack
+#endif
+/* -- End Profiling Symbol Block */
+
+/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPIX_Comm_failure_ack
+#define MPIX_Comm_failure_ack PMPIX_Comm_failure_ack
+
+#endif
+
+#undef FUNCNAME
+#define FUNCNAME MPIX_Comm_failure_ack
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/*@
+
+MPIX_Comm_failure_ack - Acknowledge the current group of failed processes
+
+Input Parameters:
+. comm - Communicator (handle)
+
+Notes:
+.N COMMNULL
+
+.N ThreadSafe
+
+.N Fortran
+
+.N Errors
+.N MPI_SUCCESS
+.N MPI_ERR_COMM
+@*/
+int MPIX_Comm_failure_ack( MPI_Comm comm )
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL;
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIX_COMM_FAILURE_ACK);
+
+    MPIR_ERRTEST_INITIALIZED_ORDIE();
+
+    MPIU_THREAD_CS_ENTER(ALLFUNC,);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIX_COMM_FAILURE_ACK);
+
+    /* Validate parameters, especially handles needing to be converted */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            MPIR_ERRTEST_COMM(comm, mpi_errno);
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif /* HAVE_ERROR_CHECKING */
+
+    /* Convert MPI object handles to object pointers */
+    MPID_Comm_get_ptr(comm, comm_ptr);
+
+    /* Validate parameters and objects(post conversion */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            /* Validate comm_ptr */
+            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            /* If comm_ptr is not valid, it will be reset to null */
+            if (mpi_errno) goto fn_fail;
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif /* HAVE_ERROR_CHECKING */
+
+    /* ... body of routine ... */
+
+    mpi_errno = MPID_Comm_failure_ack(comm_ptr);
+    if (mpi_errno) goto fn_fail;
+
+    /* ... end of body of routine ... */
+
+fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIX_COMM_FAILURE_ACK);
+    MPIU_THREAD_CS_EXIT(ALLFUNC,);
+    return mpi_errno;
+
+fn_fail:
+    /* --BEGIN ERROR HANDLING-- */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        mpi_errno = MPIR_Err_create_code(
+                mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_OTHER, "**mpix_comm_failure_ack",
+                "**mpix_comm_failure_ack %C", comm);
+    }
+#   endif
+    mpi_errno = MPIR_Err_return_comm( comm_ptr, FCNAME, mpi_errno );
+    goto fn_exit;
+    /* --END ERROR HANDLING-- */
+}
diff --git a/src/mpi/comm/comm_failure_get_acked.c b/src/mpi/comm/comm_failure_get_acked.c
new file mode 100644
index 0000000..83d9eda
--- /dev/null
+++ b/src/mpi/comm/comm_failure_get_acked.c
@@ -0,0 +1,118 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpiimpl.h"
+#include "mpicomm.h"
+
+/* -- Begin Profiling Symbol Block for routine MPIX_Comm_get_acked */
+#if defined(HAVE_PRAGMA_WEAK)
+#pragma weak MPIX_Comm_failure_get_acked = PMPIX_Comm_failure_get_acked
+#elif defined(HAVE_PRAGMA_HP_SEC_DEF)
+#pragma _HP_SECONDARY_DEF PMPIX_Comm_failure_get_acked  MPIX_Comm_failure_get_acked
+#elif defined(HAVE_PRAGMA_CRI_DUP)
+#pragma _CRI duplicate MPIX_Comm_failure_get_acked as PMPIX_Comm_failure_get_acked
+#endif
+/* -- End Profiling Symbol Block */
+
+/* Define MPICH_MPI_FROM_PMPI if weak symbols are not supported to build
+   the MPI routines */
+#ifndef MPICH_MPI_FROM_PMPI
+#undef MPIX_Comm_failure_get_acked
+#define MPIX_Comm_failure_get_acked PMPIX_Comm_failure_get_acked
+
+#endif
+
+#undef FUNCNAME
+#define FUNCNAME MPIX_Comm_failure_get_acked
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+/*@
+
+MPIX_Comm_failure_get_acked - Get the group of acknowledged failures.
+
+Input Parameters:
+. comm - Communicator (handle)
+
+Output Parameters:
+. failed_group - Group (handle)
+
+Notes:
+.N COMMNULL
+
+.N ThreadSafe
+
+.N Fortran
+
+.N Errors
+.N MPI_SUCCESS
+.N MPI_ERR_COMM
+@*/
+int MPIX_Comm_failure_get_acked( MPI_Comm comm, MPI_Group *failedgrp )
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Comm *comm_ptr = NULL;
+    MPID_Group *group_ptr;
+    MPID_MPI_STATE_DECL(MPID_STATE_MPIX_COMM_FAILURE_GET_ACKED);
+
+    MPIR_ERRTEST_INITIALIZED_ORDIE();
+
+    MPIU_THREAD_CS_ENTER(ALLFUNC,);
+    MPID_MPI_FUNC_ENTER(MPID_STATE_MPIX_COMM_FAILURE_GET_ACKED);
+
+    /* Validate parameters, especially handles needing to be converted */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            MPIR_ERRTEST_COMM(comm, mpi_errno);
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif /* HAVE_ERROR_CHECKING */
+
+    /* Convert MPI object handles to object pointers */
+    MPID_Comm_get_ptr(comm, comm_ptr);
+
+    /* Validate parameters and objects(post conversion */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        MPID_BEGIN_ERROR_CHECKS;
+        {
+            /* Validate comm_ptr */
+            MPID_Comm_valid_ptr(comm_ptr, mpi_errno);
+            /* If comm_ptr is not valid, it will be reset to null */
+            if (mpi_errno) goto fn_fail;
+        }
+        MPID_END_ERROR_CHECKS;
+    }
+#   endif /* HAVE_ERROR_CHECKING */
+
+    /* ... body of routine ... */
+
+    mpi_errno = MPID_Comm_failure_get_acked(comm_ptr, &group_ptr);
+    if (mpi_errno) goto fn_fail;
+    *failedgrp = group_ptr->handle;
+    /* ... end of body of routine ... */
+
+fn_exit:
+    MPID_MPI_FUNC_EXIT(MPID_STATE_MPIX_COMM_FAILURE_GET_ACKED);
+    MPIU_THREAD_CS_EXIT(ALLFUNC,);
+    return mpi_errno;
+
+fn_fail:
+    /* --BEGIN ERROR HANDLING-- */
+#   ifdef HAVE_ERROR_CHECKING
+    {
+        mpi_errno = MPIR_Err_create_code(
+                mpi_errno, MPIR_ERR_RECOVERABLE, FCNAME, __LINE__, MPI_ERR_OTHER, "**mpix_comm_failure_get_acked",
+                "**mpix_comm_failure_get_acked %C %p", comm, failedgrp);
+    }
+#   endif
+    mpi_errno = MPIR_Err_return_comm( comm_ptr, FCNAME, mpi_errno );
+    goto fn_exit;
+    /* --END ERROR HANDLING-- */
+}
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 82cf107..36eaa2c 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -1096,6 +1096,10 @@ is too big (> MPIU_SHMW_GHND_SZ)
 **mpi_comm_remote_size %C %p:MPI_Comm_remote_size(%C, size=%p) failed
 **mpi_comm_remote_group:MPI_Comm_remote_group failed
 **mpi_comm_remote_group %C %p:MPI_Comm_remote_group(%C, group=%p) failed
+**mpix_comm_failure_ack:MPIX_Comm_failure_ack failed
+**mpix_comm_failure_ack %C:MPIX_Comm_failure_ack(%C) failed
+**mpix_comm_failure_get_acked:MPIX_Comm_failure_get_acked failed
+**mpix_comm_failure_get_acked %C %p:MPIX_Comm_failure_get_acked(%C, group=%p) failed
 **mpi_intercomm_create:MPI_Intercomm_create failed
 **mpi_intercomm_create %C %d %C %d %d %p:MPI_Intercomm_create(%C, local_leader=%d, %C, remote_leader=%d, tag=%d, newintercomm=%p) failed
 **mpi_intercomm_merge:MPI_Intercomm_merge failed
diff --git a/src/mpid/ch3/include/mpidpre.h b/src/mpid/ch3/include/mpidpre.h
index 8c3a68e..48769aa 100644
--- a/src/mpid/ch3/include/mpidpre.h
+++ b/src/mpid/ch3/include/mpidpre.h
@@ -173,6 +173,7 @@ typedef struct MPIDI_CH3I_comm
     int eager_max_msg_sz;   /* comm-wide eager/rendezvous message threshold */
     int coll_active;        /* TRUE iff this communicator is collectively active */
     int anysource_enabled;  /* TRUE iff this anysource recvs can be posted on this communicator */
+    int last_ack_rank;      /* The rank of the last acknowledged failure */
     struct MPID_nem_barrier_vars *barrier_vars; /* shared memory variables used in barrier */
     struct MPID_Comm *next; /* next pointer for list of communicators */
     struct MPID_Comm *prev; /* prev pointer for list of communicators */
diff --git a/src/mpid/ch3/src/Makefile.mk b/src/mpid/ch3/src/Makefile.mk
index a69106a..5160b5b 100644
--- a/src/mpid/ch3/src/Makefile.mk
+++ b/src/mpid/ch3/src/Makefile.mk
@@ -29,6 +29,7 @@ mpi_core_sources +=                          \
     src/mpid/ch3/src/mpid_cancel_send.c                    \
     src/mpid/ch3/src/mpid_comm_disconnect.c                \
     src/mpid/ch3/src/mpid_comm_spawn_multiple.c            \
+    src/mpid/ch3/src/mpid_comm_failure_ack.c               \
     src/mpid/ch3/src/mpid_finalize.c                       \
     src/mpid/ch3/src/mpid_get_universe_size.c              \
     src/mpid/ch3/src/mpid_getpname.c                       \
diff --git a/src/mpid/ch3/src/ch3u_comm.c b/src/mpid/ch3/src/ch3u_comm.c
index 0fc766b..4be2c02 100644
--- a/src/mpid/ch3/src/ch3u_comm.c
+++ b/src/mpid/ch3/src/ch3u_comm.c
@@ -209,6 +209,9 @@ int comm_created(MPID_Comm *comm, void *param)
     /* Use the VC's eager threshold by default. */
     comm->ch.eager_max_msg_sz = -1;
 
+    /* Initialize the last acked failure to -1 */
+    comm->ch.last_ack_rank = -1;
+
     COMM_ADD(comm);
 
  fn_exit:
diff --git a/src/mpid/ch3/src/mpid_comm_failure_ack.c b/src/mpid/ch3/src/mpid_comm_failure_ack.c
new file mode 100644
index 0000000..be64340
--- /dev/null
+++ b/src/mpid/ch3/src/mpid_comm_failure_ack.c
@@ -0,0 +1,71 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+
+#include "mpidimpl.h"
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_failure_ack
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_Comm_failure_ack(MPID_Comm *comm_ptr)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_FAILURE_ACK);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_FAILURE_ACK);
+
+    /* Update the list of failed processes that we know about locally.
+     * This part could technically be turned off and be a correct
+     * implementation, but it would be slower about propagating failure
+     * information. Also, this is the failure case so speed isn't as
+     * important. */
+    MPIDI_CH3U_Check_for_failed_procs();
+
+    /* Update the marker for the last known failed process in this
+     * communciator. */
+    comm_ptr->ch.last_ack_rank = MPIDI_last_known_failed;
+
+fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_FAILURE_ACK);
+    return mpi_errno;
+fn_fail:
+    goto fn_exit;
+}
+
+#undef FUNCNAME
+#define FUNCNAME MPID_Comm_failure_get_acked
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPID_Comm_failure_get_acked(MPID_Comm *comm_ptr, MPID_Group **group_ptr)
+{
+    int mpi_errno = MPI_SUCCESS;
+    MPID_Group *failed_group, *comm_group;
+    MPIDI_STATE_DECL(MPID_STATE_MPID_COMM_FAILURE_GET_ACKED);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_MPID_COMM_FAILURE_GET_ACKED);
+
+    /* Get the group of all failed processes */
+    MPIDI_CH3U_Check_for_failed_procs();
+    MPIDI_CH3U_Get_failed_group(comm_ptr->ch.last_ack_rank, &failed_group);
+    if (failed_group == MPID_Group_empty) {
+        *group_ptr = MPID_Group_empty;
+        goto fn_exit;
+    }
+
+    MPIR_Comm_group_impl(comm_ptr, &comm_group);
+
+    /* Get the intersection of all falied processes in this communicator */
+    MPIR_Group_intersection_impl(failed_group, comm_group, group_ptr);
+
+    MPIR_Group_release(comm_group);
+    MPIR_Group_release(failed_group);
+
+fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_MPID_COMM_FAILURE_GET_ACKED);
+    return mpi_errno;
+fn_fail:
+    goto fn_exit;
+}
diff --git a/src/mpid/pamid/src/misc/mpid_unimpl.c b/src/mpid/pamid/src/misc/mpid_unimpl.c
index ff52890..8b92417 100644
--- a/src/mpid/pamid/src/misc/mpid_unimpl.c
+++ b/src/mpid/pamid/src/misc/mpid_unimpl.c
@@ -71,3 +71,15 @@ int MPID_Comm_spawn_multiple(int count,
   return 0;
 }
 #endif
+
+int MPID_Comm_failure_ack(MPID_Comm *comm_ptr)
+{
+  MPID_abort();
+  return 0;
+}
+
+int MPID_Comm_failure_get_acked(MPID_Comm *comm_ptr, MPID_Group **failed_group_ptr)
+{
+  MPID_abort();
+  return 0;
+}
diff --git a/test/mpi/ft/Makefile.am b/test/mpi/ft/Makefile.am
index b01f80a..98b8123 100644
--- a/test/mpi/ft/Makefile.am
+++ b/test/mpi/ft/Makefile.am
@@ -10,4 +10,4 @@ include $(top_srcdir)/Makefile.mtest
 ## for all programs that are just built from the single corresponding source
 ## file, we don't need per-target _SOURCES rules, automake will infer them
 ## correctly
-noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter
+noinst_PROGRAMS = die abort sendalive isendalive senddead recvdead isenddead irecvdead barrier gather reduce bcast scatter failure_ack
diff --git a/test/mpi/ft/failure_ack.c b/test/mpi/ft/failure_ack.c
new file mode 100644
index 0000000..e5acc41
--- /dev/null
+++ b/test/mpi/ft/failure_ack.c
@@ -0,0 +1,117 @@
+/* -*- Mode: C; c-basic-offset:4 ; indent-tabs-mode:nil ; -*- */
+/*
+ *
+ *  (C) 2014 by Argonne National Laboratory.
+ *      See COPYRIGHT in top-level directory.
+ */
+#include <mpi.h>
+#include <stdio.h>
+#include <stdlib.h>
+
+/*
+ * This test makes sure that after a failure, the correct group of failed
+ * processes is returned from MPIX_Comm_failure_ack/get_acked.
+ */
+int main(int argc, char **argv)
+{
+    int rank, size, err, result, i;
+    char buf[10] = " No errors";
+    char error[MPI_MAX_ERROR_STRING];
+    MPI_Group failed_grp, one_grp, world_grp;
+    int one[] = {1};
+    int world_ranks[] = {0,1,2};
+    int failed_ranks[3];
+
+    MPI_Init(&argc, &argv);
+    MPI_Comm_rank(MPI_COMM_WORLD, &rank);
+    MPI_Comm_size(MPI_COMM_WORLD, &size);
+    if (size < 3) {
+        fprintf( stderr, "Must run with at least 3 processes\n" );
+        MPI_Abort(MPI_COMM_WORLD, 1);
+    }
+
+    MPI_Comm_set_errhandler(MPI_COMM_WORLD, MPI_ERRORS_RETURN);
+
+    if (rank == 1) {
+        exit(EXIT_FAILURE);
+    }
+
+    if (rank == 0) {
+        err = MPI_Recv(buf, 10, MPI_CHAR, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        if (MPI_SUCCESS == err) {
+            fprintf(stderr, "Expected a failure for receive from rank 1\n");
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        err = MPIX_Comm_failure_ack(MPI_COMM_WORLD);
+        if (MPI_SUCCESS != err) {
+            int ec;
+            MPI_Error_class(err, &ec);
+            MPI_Error_string(err, error, &size);
+            fprintf(stderr, "MPIX_Comm_failure_ack returned an error: %d\n%s", ec, error);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+        err = MPIX_Comm_failure_get_acked(MPI_COMM_WORLD, &failed_grp);
+        if (MPI_SUCCESS != err) {
+            int ec;
+            MPI_Error_class(err, &ec);
+            MPI_Error_string(err, error, &size);
+            fprintf(stderr, "MPIX_Comm_failure_get_acked returned an error: %d\n%s", ec, error);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        MPI_Comm_group(MPI_COMM_WORLD, &world_grp);
+        MPI_Group_incl(world_grp, 1, one, &one_grp);
+        MPI_Group_compare(one_grp, failed_grp, &result);
+        if (MPI_IDENT != result) {
+            fprintf(stderr, "First failed group contains incorrect processes\n");
+            MPI_Group_size(failed_grp, &size);
+            MPI_Group_translate_ranks(failed_grp, size, world_ranks, world_grp, failed_ranks);
+            for (i = 0; i < size; i++)
+                fprintf(stderr, "DEAD: %d\n", failed_ranks[i]);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+        MPI_Group_free(&failed_grp);
+
+        err = MPI_Recv(buf, 10, MPI_CHAR, 2, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        if (MPI_SUCCESS != err) {
+            fprintf(stderr, "First receive failed\n");
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+        err = MPI_Recv(buf, 10, MPI_CHAR, 2, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
+        if (MPI_SUCCESS == err) {
+            fprintf(stderr, "Expected a failure for receive from rank 2\n");
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        err = MPIX_Comm_failure_get_acked(MPI_COMM_WORLD, &failed_grp);
+        if (MPI_SUCCESS != err) {
+            int ec;
+            MPI_Error_class(err, &ec);
+            MPI_Error_string(err, error, &size);
+            fprintf(stderr, "MPIX_Comm_failure_get_acked returned an error: %d\n%s", ec, error);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        MPI_Group_compare(one_grp, failed_grp, &result);
+        if (MPI_IDENT != result) {
+            fprintf(stderr, "Second failed group contains incorrect processes\n");
+            MPI_Group_size(failed_grp, &size);
+            MPI_Group_translate_ranks(failed_grp, size, world_ranks, world_grp, failed_ranks);
+            for (i = 0; i < size; i++)
+                fprintf(stderr, "DEAD: %d\n", failed_ranks[i]);
+            MPI_Abort(MPI_COMM_WORLD, 1);
+        }
+
+        fprintf(stdout, " No errors\n");
+    } else if (rank == 2) {
+        MPI_Ssend(buf, 10, MPI_CHAR, 0, 0, MPI_COMM_WORLD);
+
+        exit(EXIT_FAILURE);
+    }
+
+    MPI_Group_free(&failed_grp);
+    MPI_Group_free(&one_grp);
+    MPI_Group_free(&world_grp);
+    MPI_Finalize();
+}

http://git.mpich.org/mpich.git/commitdiff/665ced285ab9e2f655852c901b9a819f6390474e

commit 665ced285ab9e2f655852c901b9a819f6390474e
Author: Wesley Bland <wbland at anl.gov>
Date:   Mon Mar 24 10:54:37 2014 -0500

    Add MPIDI_CH3U_Get_failed_group
    
    This function will take a last_failed value and generate an MPID_Group. If the
    value is MPI_PROC_NULL, then it will parse the entire list. This function is
    exposed by MPID so this can be used by any functions that need the list of
    failed processes.
    
    This change necessitated changing the way the list of failed processes is
    retreived from PMI. Rather than allocating a char array on demand every time
    we get the list from PMI, this string is allocated at init time and freed at
    finalize time now. This means that we can cache the value to be used later for
    things like just querying the list of processes that we already know have
    failed, rather than also getting the new list (which is important for the
    failure_ack/get_acked semantics).
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpid/ch3/include/mpidimpl.h b/src/mpid/ch3/include/mpidimpl.h
index 099e76d..fe8e7ed 100644
--- a/src/mpid/ch3/include/mpidimpl.h
+++ b/src/mpid/ch3/include/mpidimpl.h
@@ -48,6 +48,8 @@ int gethostname(char *name, size_t len);
 /* group of processes detected to have failed.  This is a subset of
    comm_world group. */
 extern MPID_Group *MPIDI_Failed_procs_group;
+extern int MPIDI_last_known_failed;
+extern char *MPIDI_failed_procs_string;
 
 extern int MPIDI_Use_pmi2_api;
 
@@ -1657,6 +1659,10 @@ int MPIDI_CH3_Channel_close( void );
 #else
 #define MPIDI_CH3_Channel_close( )   MPI_SUCCESS
 #endif
+
+/* MPIDI_CH3U_Get_failed_group() generates a group of failed processes based
+ * on the last list generated during MPIDI_CH3U_Check_for_failed_procs */
+int MPIDI_CH3U_Get_failed_group(int last_rank, MPID_Group **failed_group);
 /* MPIDI_CH3U_Check_for_failed_procs() reads PMI_dead_processes key
    and marks VCs to those processes as failed */
 int MPIDI_CH3U_Check_for_failed_procs(void);
diff --git a/src/mpid/ch3/src/ch3u_handle_connection.c b/src/mpid/ch3/src/ch3u_handle_connection.c
index 00bdb8e..352fc07 100644
--- a/src/mpid/ch3/src/ch3u_handle_connection.c
+++ b/src/mpid/ch3/src/ch3u_handle_connection.c
@@ -19,6 +19,8 @@ static volatile int MPIDI_Outstanding_close_ops = 0;
 int MPIDI_Failed_vc_count = 0;
 
 MPID_Group *MPIDI_Failed_procs_group = NULL;
+int MPIDI_last_known_failed = MPI_PROC_NULL;
+char *MPIDI_failed_procs_string = NULL;
 
 #undef FUNCNAME
 #define FUNCNAME MPIDI_CH3U_Handle_connection
@@ -430,6 +432,81 @@ static int terminate_failed_VCs(MPID_Group *new_failed_group)
             ++c;                                                                                \
     } while (0)
 
+/* There are three possible input values for `last_rank:
+ *
+ * < -1 = All failures regardless of acknowledgement
+ * -1 (MPI_PROC_NULL) = No failures have been acknowledged yet (return MPID_Group_empty)
+ * >= 0 = The last failure acknowledged. All failures returned will have
+ *        been acknowledged previously.
+ */
+#undef FUNCNAME
+#define FUNCNAME MPIDI_CH3U_Get_failed_group
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPIDI_CH3U_Get_failed_group(int last_rank, MPID_Group **failed_group)
+{
+    char *c;
+    int i, mpi_errno = MPI_SUCCESS, rank;
+    UT_array *failed_procs = NULL;
+    MPID_Group *world_group;
+    MPIDI_STATE_DECL(MPID_STATE_GET_FAILED_GROUP);
+
+    MPIDI_FUNC_ENTER(MPID_STATE_GET_FAILED_GROUP);
+
+    MPIU_DBG_MSG_D(CH3_OTHER, VERBOSE, "Getting failed group with %d as last acknowledged\n", last_rank);
+
+    if (-1 == last_rank) {
+        MPIU_DBG_MSG(CH3_OTHER, VERBOSE, "No failure acknowledged");
+        *failed_group = MPID_Group_empty;
+        goto fn_exit;
+    }
+
+    if (*MPIDI_failed_procs_string == '\0') {
+        MPIU_DBG_MSG(CH3_OTHER, VERBOSE, "Found no failed ranks");
+        *failed_group = MPID_Group_empty;
+        goto fn_exit;
+    }
+
+    utarray_new(failed_procs, &ut_int_icd);
+
+    /* parse list of failed processes.  This is a comma separated list
+       of ranks or ranges of ranks (e.g., "1, 3-5, 11") */
+    i = 0;
+    c = MPIDI_failed_procs_string;
+    while(1) {
+        parse_rank(&rank);
+        ++i;
+        MPIU_DBG_MSG_D(CH3_OTHER, VERBOSE, "Found failed rank: %d", rank);
+        utarray_push_back(failed_procs, &rank);
+        MPIDI_last_known_failed = rank;
+        MPIU_ERR_CHKINTERNAL(*c != ',' && *c != '\0', mpi_errno, "error parsing failed process list");
+        if (*c == '\0' || last_rank == rank)
+            break;
+        ++c; /* skip ',' */
+    }
+
+    /* Create group of failed processes for comm_world.  Failed groups for other
+       communicators can be created from this one using group_intersection. */
+    mpi_errno = MPIR_Comm_group_impl(MPIR_Process.comm_world, &world_group);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    mpi_errno = MPIR_Group_incl_impl(world_group, i, ut_int_array(failed_procs), failed_group);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+    mpi_errno = MPIR_Group_release(world_group);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+fn_exit:
+    MPIDI_FUNC_EXIT(MPID_STATE_GET_FAILED_GROUP);
+    if (failed_procs)
+        utarray_free(failed_procs);
+    return mpi_errno;
+fn_oom:
+    MPIU_ERR_SET1(mpi_errno, MPI_ERR_OTHER, "**nomem", "**nomem %s", "utarray");
+fn_fail:
+    goto fn_exit;
+}
+
 #undef FUNCNAME
 #define FUNCNAME MPIDI_CH3U_Check_for_failed_procs
 #undef FCNAME
@@ -438,15 +515,9 @@ int MPIDI_CH3U_Check_for_failed_procs(void)
 {
     int mpi_errno = MPI_SUCCESS;
     int pmi_errno;
-    char *val;
-    char *c;
     int len;
     char *kvsname;
-    int rank, rank_hi;
-    int i;
-    UT_array *failed_procs = NULL;
-    MPID_Group *world_group, *prev_failed_group, *new_failed_group;
-    MPIU_CHKLMEM_DECL(1);
+    MPID_Group *prev_failed_group, *new_failed_group;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3U_CHECK_FOR_FAILED_PROCS);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3U_CHECK_FOR_FAILED_PROCS);
@@ -460,87 +531,52 @@ int MPIDI_CH3U_Check_for_failed_procs(void)
 #ifdef USE_PMI2_API
     {
         int vallen = 0;
-        MPIU_CHKLMEM_MALLOC(val, char *, PMI2_MAX_VALLEN, mpi_errno, "val");
-        pmi_errno = PMI2_KVS_Get(kvsname, PMI2_ID_NULL, "PMI_dead_processes", val, PMI2_MAX_VALLEN, &vallen);
+        pmi_errno = PMI2_KVS_Get(kvsname, PMI2_ID_NULL, "PMI_dead_processes", MPIDI_failed_procs_string, PMI2_MAX_VALLEN, &vallen);
         MPIU_ERR_CHKANDJUMP(pmi_errno, mpi_errno, MPI_ERR_OTHER, "**pmi_kvs_get");
     }
 #else
     pmi_errno = PMI_KVS_Get_value_length_max(&len);
     MPIU_ERR_CHKANDJUMP(pmi_errno, mpi_errno, MPI_ERR_OTHER, "**pmi_kvs_get_value_length_max");
-    MPIU_CHKLMEM_MALLOC(val, char *, len, mpi_errno, "val");
-    pmi_errno = PMI_KVS_Get(kvsname, "PMI_dead_processes", val, len);
+    pmi_errno = PMI_KVS_Get(kvsname, "PMI_dead_processes", MPIDI_failed_procs_string, len);
     MPIU_ERR_CHKANDJUMP(pmi_errno, mpi_errno, MPI_ERR_OTHER, "**pmi_kvs_get");
 #endif
-    
-    MPIU_DBG_MSG_S(CH3_DISCONNECT, TYPICAL, "Received proc fail notification: %s", val);
-    
-    if (*val == '\0') {
+
+    if (*MPIDI_failed_procs_string == '\0') {
         /* there are no failed processes */
         MPIDI_Failed_procs_group = MPID_Group_empty;
         goto fn_exit;
     }
 
-    utarray_new(failed_procs, &ut_int_icd);
-    
-    /* parse list of failed processes.  This is a comma separated list
-       of ranks or ranges of ranks (e.g., "1, 3-5, 11") */
-    i = 0;
-    c = val;
-    while(1) {
-        parse_rank(&rank);
-        if (*c == '-') {
-            ++c; /* skip '-' */
-            parse_rank(&rank_hi);
-        } else
-            rank_hi = rank;
-        while (rank <= rank_hi) {
-            utarray_push_back(failed_procs, &rank);
-            ++i;
-            ++rank;
-        }
-        MPIU_ERR_CHKINTERNAL(*c != ',' && *c != '\0', mpi_errno, "error parsing failed process list");
-        if (*c == '\0')
-            break;
-        ++c; /* skip ',' */
-    }
+    MPIU_DBG_MSG_S(CH3_OTHER, TYPICAL, "Received proc fail notification: %s", MPIDI_failed_procs_string);
 
     /* save reference to previous group so we can identify new failures */
     prev_failed_group = MPIDI_Failed_procs_group;
 
-    /* Create group of failed processes for comm_world.  Failed groups for other
-       communicators can be created from this one using group_intersection. */
-    mpi_errno = MPIR_Comm_group_impl(MPIR_Process.comm_world, &world_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
-    mpi_errno = MPIR_Group_incl_impl(world_group, i, ut_int_array(failed_procs), &MPIDI_Failed_procs_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
-    mpi_errno = MPIR_Group_free_impl(world_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    /* Parse the list of failed processes */
+    MPIDI_CH3U_Get_failed_group(-2, &MPIDI_Failed_procs_group);
 
     /* get group of newly failed processes */
     mpi_errno = MPIR_Group_difference_impl(MPIDI_Failed_procs_group, prev_failed_group, &new_failed_group);
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
-    mpi_errno = MPIDI_CH3I_Comm_handle_failed_procs(new_failed_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    if (new_failed_group != MPID_Group_empty) {
+        mpi_errno = MPIDI_CH3I_Comm_handle_failed_procs(new_failed_group);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
 
-    mpi_errno = terminate_failed_VCs(new_failed_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-    
-    mpi_errno = MPIR_Group_free_impl(new_failed_group);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+        mpi_errno = terminate_failed_VCs(new_failed_group);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
+        mpi_errno = MPIR_Group_release(new_failed_group);
+        if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+    }
 
     /* free prev group */
     if (prev_failed_group != MPID_Group_empty) {
-        mpi_errno = MPIR_Group_free_impl(prev_failed_group);
+        mpi_errno = MPIR_Group_release(prev_failed_group);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     }
 
  fn_exit:
-    MPIU_CHKLMEM_FREEALL();
-    if (failed_procs)
-        utarray_free(failed_procs);
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3U_CHECK_FOR_FAILED_PROCS);
     return mpi_errno;
 
diff --git a/src/mpid/ch3/src/mpid_finalize.c b/src/mpid/ch3/src/mpid_finalize.c
index 4ff6a17..5048ee7 100644
--- a/src/mpid/ch3/src/mpid_finalize.c
+++ b/src/mpid/ch3/src/mpid_finalize.c
@@ -145,6 +145,8 @@ int MPID_Finalize(void)
 	}
     }
     
+    MPIU_Free(MPIDI_failed_procs_string);
+
     MPIDU_Ftb_finalize();
 
  fn_exit:
diff --git a/src/mpid/ch3/src/mpid_init.c b/src/mpid/ch3/src/mpid_init.c
index 43206bb..16c425e 100644
--- a/src/mpid/ch3/src/mpid_init.c
+++ b/src/mpid/ch3/src/mpid_init.c
@@ -99,6 +99,7 @@ int MPID_Init(int *argc, char ***argv, int requested, int *provided,
     MPID_Comm * comm;
     int p;
     MPIDI_STATE_DECL(MPID_STATE_MPID_INIT);
+    int val;
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPID_INIT);
 
@@ -117,13 +118,22 @@ int MPID_Init(int *argc, char ***argv, int requested, int *provided,
     MPIDI_Use_pmi2_api = TRUE;
 #else
     {
-        int ret, val;
+        int ret;
         ret = MPL_env2bool("MPICH_USE_PMI2_API", &val);
         if (ret == 1 && val)
             MPIDI_Use_pmi2_api = TRUE;
     }
 #endif
-    
+
+    /* Create the string that will cache the last group of failed processes
+     * we received from PMI */
+#ifdef USE_PMI2_API
+    MPIDI_failed_procs_string = MPIU_Malloc(sizeof(char) * PMI2_MAX_VALLEN);
+#else
+    PMI_KVS_Get_value_length_max(&val);
+    MPIDI_failed_procs_string = MPIU_Malloc(sizeof(char) * (val+1));
+#endif
+
     /*
      * Set global process attributes.  These can be overridden by the channel 
      * if necessary.

http://git.mpich.org/mpich.git/commitdiff/782d036c4f898f786bb3a4f90b02f5d99971d9c6

commit 782d036c4f898f786bb3a4f90b02f5d99971d9c6
Author: Wesley Bland <wbland at anl.gov>
Date:   Thu Mar 20 12:51:44 2014 -0500

    Don't compress and order list of failed procs in PMI
    
    Previously, PMI provided a list of failed processes as a sorted list via a
    string. This meant the list could look something like this:
    
    1,3-5,7,10
    
    However, in the new fault tolerance specification, the function
    MPI_COMM_FAILURE_ACK needs to be able to determine the local order of the
    failures to more efficiently acknowledge them without creating a list per
    communciator. This requires that PMI not sort or compress the failure
    notification. So now, the previous string could look like this:
    
    3,1,4,5,10,7
    
    Obviously, this is less efficient if there are lots of failures. Hopefully,
    this is something that can be fixed in future versions of PMI.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/pm/hydra/pm/pmiserv/pmiserv_cb.c b/src/pm/hydra/pm/pmiserv/pmiserv_cb.c
index 364294a..28007fa 100644
--- a/src/pm/hydra/pm/pmiserv/pmiserv_cb.c
+++ b/src/pm/hydra/pm/pmiserv/pmiserv_cb.c
@@ -339,100 +339,38 @@ static HYD_status control_cb(int fd, HYD_event_t events, void *userp)
                 /* FIXME: If the list of dead processes does not fit
                  * inside a single value length, set it as a
                  * multi-line value. */
-                struct proc_list {
-                    char *segment;
-                    int start;
-                    int end;
-                    struct proc_list *next;
-                } *list = NULL, *e, *tmp;
-                char *segment, *start, *end, *current_list, *str, *run;
-                int included = 0;
+                /* FIXME: This was changed from a sorted list where sequential
+                 * numbers could be compacted to an expanded list where they
+                 * couldn't. Obviously, this isn't sustainable on the PMI
+                 * side, but on the MPI side, it's necessary (see the
+                 * definition of MPI_COMM_FAILURE_ACK). In a future version of
+                 * PMI where we can pass around things other than strings,
+                 * this should improve. */
+                char *segment, *current_list, *str;
+                int included = 0, value;
 
                 /* Create a sorted list of processes */
                 current_list = HYDU_strdup(pg_scratch->dead_processes);
 
+                /* Search to see if this process is already in the list */
+                included = 0;
                 segment = strtok(current_list, ",");
                 do {
-                    HYDU_MALLOC(e, struct proc_list *, sizeof(struct proc_list), status);
-                    e->segment = HYDU_strdup(segment);
-                    e->next = NULL;
-
-                    if (list == NULL)
-                        list = e;
-                    else {
-                        for (tmp = list; tmp->next; tmp = tmp->next);
-                        tmp->next = e;
-                    }
-
-                    segment = strtok(NULL, ",");
-                } while (segment);
-
-                for (e = list; e; e = e->next) {
-                    start = strtok(e->segment, "-");
-                    end = strtok(NULL, "-");
-
-                    e->start = atoi(start);
-                    if (end)
-                        e->end = atoi(end);
-                    else
-                        e->end = atoi(start);
-
-                    if (hdr.pid == e->start - 1) {
-                        e->start = hdr.pid;
-                        included = 1;
-                    }
-                    else if (hdr.pid == e->end + 1) {
-                        e->end = hdr.pid;
+                    value = strtol(segment, NULL, 10);
+                    if (value == hdr.pid) {
                         included = 1;
+                        break;
                     }
-                }
+                    segment = strtok(NULL, ",");
+                } while (segment != NULL);
 
+                /* Add this process to the end of the list */
                 if (!included) {
-                    HYDU_MALLOC(e, struct proc_list *, sizeof(struct proc_list), status);
-                    e->start = hdr.pid;
-                    e->end = hdr.pid;
-                    e->next = NULL;
-
-                    if (hdr.pid < list->start) {
-                        e->next = list;
-                        list = e;
-                    }
-                    else {
-                        for (tmp = list; tmp->next && tmp->next->start < hdr.pid; tmp = tmp->next);
-                        e->next = tmp->next;
-                        tmp->next = e;
-                    }
-                }
+                    HYDU_MALLOC(str, char *, PMI_MAXVALLEN, status);
 
-                for (e = list; e->next;) {
-                    if (e->end == e->next->start) {
-                        e->end = e->next->end;
-                        tmp = e->next;
-                        e->next = e->next->next;
-                        HYDU_FREE(tmp);
-                    }
-                    else
-                        e = e->next;
-                }
-
-                str = NULL;
-                for (e = list; e; e = e->next) {
-                    HYDU_MALLOC(run, char *, PMI_MAXVALLEN, status);
-                    if (str) {
-                        if (e->start == e->end)
-                            HYDU_snprintf(run, PMI_MAXVALLEN, "%s,%d", str, e->start);
-                        else
-                            HYDU_snprintf(run, PMI_MAXVALLEN, "%s,%d-%d", str, e->start, e->end);
-                    }
-                    else {
-                        if (e->start == e->end)
-                            HYDU_snprintf(run, PMI_MAXVALLEN, "%d", e->start);
-                        else
-                            HYDU_snprintf(run, PMI_MAXVALLEN, "%d-%d", e->start, e->end);
-                    }
-                    if (str)
-                        HYDU_FREE(str);
-                    str = run;
+                    HYDU_snprintf(str, PMI_MAXVALLEN, "%s,%d", pg_scratch->dead_processes, hdr.pid);
+                } else {
+                    str = current_list;
                 }
                 pg_scratch->dead_processes = str;
             }

http://git.mpich.org/mpich.git/commitdiff/3325b6f7b416647a7c66878a71cac19708096c8a

commit 3325b6f7b416647a7c66878a71cac19708096c8a
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Mar 19 15:52:55 2014 -0500

    Return MPIX_ERR_PROC_FAILED_PENDING when appropriate
    
    The MPI_Waitall and MPI_Testall functions should return
    MPIX_ERR_PROC_FAILED_PENDING when a process failure prevents the operations
    from completing.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpi/pt2pt/testall.c b/src/mpi/pt2pt/testall.c
index 8bd65fe..fd6a41e 100644
--- a/src/mpi/pt2pt/testall.c
+++ b/src/mpi/pt2pt/testall.c
@@ -87,6 +87,7 @@ int MPI_Testall(int count, MPI_Request array_of_requests[], int *flag,
     int n_completed;
     int active_flag;
     int rc;
+    int proc_failure = 0;
     int mpi_errno = MPI_SUCCESS;
     MPIU_CHKLMEM_DECL(1);
     MPID_MPI_STATE_DECL(MPID_STATE_MPI_TESTALL);
@@ -168,10 +169,13 @@ int MPI_Testall(int count, MPI_Request array_of_requests[], int *flag,
 	if (request_ptrs[i] != NULL && MPID_Request_is_complete(request_ptrs[i]))
 	{
 	    n_completed++;
-	    if (MPIR_Request_get_error(request_ptrs[i]) != MPI_SUCCESS)
-	    {
-		mpi_errno = MPI_ERR_IN_STATUS;
-	    }
+            rc = MPIR_Request_get_error(request_ptrs[i]);
+            if (rc != MPI_SUCCESS)
+            {
+                if (MPIX_ERR_PROC_FAILED == MPIR_ERR_GET_CLASS(rc))
+                    proc_failure = 1;
+                mpi_errno = MPI_ERR_IN_STATUS;
+            }
 	}
     }
     
@@ -203,7 +207,10 @@ int MPI_Testall(int count, MPI_Request array_of_requests[], int *flag,
 		{
 		    if (mpi_errno == MPI_ERR_IN_STATUS && array_of_statuses != MPI_STATUSES_IGNORE)
 		    { 
-			array_of_statuses[i].MPI_ERROR = MPI_ERR_PENDING;
+                        if (!proc_failure)
+                            array_of_statuses[i].MPI_ERROR = MPI_ERR_PENDING;
+                        else
+                            array_of_statuses[i].MPI_ERROR = MPIX_ERR_PROC_FAILED_PENDING;
 		    }
 		}
 	    }
diff --git a/src/mpi/pt2pt/waitall.c b/src/mpi/pt2pt/waitall.c
index ccdd329..b3a87fc 100644
--- a/src/mpi/pt2pt/waitall.c
+++ b/src/mpi/pt2pt/waitall.c
@@ -47,6 +47,7 @@ int MPIR_Waitall_impl(int count, MPI_Request array_of_requests[],
     int active_flag;
     int rc;
     int n_greqs;
+    int proc_failure = 0;
     const int ignoring_statuses = (array_of_statuses == MPI_STATUSES_IGNORE);
     int optimize = ignoring_statuses; /* see NOTE-O1 */
     MPIU_CHKLMEM_DECL(1);
@@ -180,6 +181,12 @@ int MPIR_Waitall_impl(int count, MPI_Request array_of_requests[],
         {
             /* req completed with an error */
             mpi_errno = MPI_ERR_IN_STATUS;
+
+            if (!proc_failure) {
+                if (MPIX_ERR_PROC_FAILED == MPIR_ERR_GET_CLASS(rc))
+                    proc_failure = 1;
+            }
+
             if (!ignoring_statuses)
             {
                 /* set the error code for this request */
@@ -197,7 +204,10 @@ int MPIR_Waitall_impl(int count, MPI_Request array_of_requests[],
                         }
                         else
                         {
-                            array_of_statuses[j].MPI_ERROR = MPI_ERR_PENDING;
+                            if (!proc_failure)
+                                array_of_statuses[j].MPI_ERROR = MPI_ERR_PENDING;
+                            else
+                                array_of_statuses[j].MPI_ERROR = MPIX_ERR_PROC_FAILED_PENDING;
                         }
                     }
                 }

http://git.mpich.org/mpich.git/commitdiff/ed98c9834b6b827eaea970616590e8095d0ef418

commit ed98c9834b6b827eaea970616590e8095d0ef418
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Mar 19 15:39:06 2014 -0500

    Add MPIX_ERR_PROC_FAILED_PENDING
    
    This is a new error code required by the ULFM proposal. This code replaces
    MPI_ERR_PENDING in cases where the failure that would have otherwise cause
    MPI_ERR_PENDING is related to process failure (MPIX_ERR_PROC_FAILED). In that
    case, we return MPIX_ERR_PROC_FAILED_PENDING instead.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index c018671..d1925f2 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -797,9 +797,9 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
 #define MPI_ERR_UNKNOWN     13      /* Unknown error */
 #define MPI_ERR_INTERN      16      /* Internal error code    */
 
-/* Multiple completion has two special error classes */
-#define MPI_ERR_IN_STATUS   17      /* Look in status for error value */
-#define MPI_ERR_PENDING     18      /* Pending request */
+/* Multiple completion has three special error classes */
+#define MPI_ERR_IN_STATUS           17      /* Look in status for error value */
+#define MPI_ERR_PENDING             18      /* Pending request */
 
 /* New MPI-2 Error classes */
 #define MPI_ERR_ACCESS      20      /* */
@@ -866,6 +866,7 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
 #define MPI_T_ERR_PVAR_NO_WRITE     71  /* Pvar can't be written or reset */
 #define MPI_T_ERR_PVAR_NO_ATOMIC    72  /* Pvar can't be R/W atomically */
 
+
 #define MPI_ERR_LASTCODE    0x3fffffff  /* Last valid error code for a 
 					   predefined error class */
 /* WARNING: this is also defined in mpishared.h.  Update both locations */
@@ -877,9 +878,11 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
                                   * range. All MPIX error codes will be
                                   * above this value to be ABI complaint. */
 
-#define MPIX_ERR_PROC_FAILED      MPICH_ERR_FIRST_MPIX+1   /* Process failure */
+#define MPIX_ERR_PROC_FAILED          MPICH_ERR_FIRST_MPIX+1 /* Process failure */
+#define MPIX_ERR_PROC_FAILED_PENDING  MPICH_ERR_FIRST_MPIX+2 /* A failure has caused this request
+                                                              * to be pending */
 
-#define MPICH_ERR_LAST_MPIX       MPICH_ERR_FIRST_MPIX+1
+#define MPICH_ERR_LAST_MPIX           MPICH_ERR_FIRST_MPIX+2
 
 
 /* End of MPI's error classes */
diff --git a/src/mpi/errhan/baseerrnames.txt b/src/mpi/errhan/baseerrnames.txt
index 5c7dc93..8f07168 100644
--- a/src/mpi/errhan/baseerrnames.txt
+++ b/src/mpi/errhan/baseerrnames.txt
@@ -36,6 +36,7 @@ MPI_ERR_INTERN      16      **intern
 # Multiple completion has two special error classes 
 MPI_ERR_IN_STATUS   17      **instatus
 MPI_ERR_PENDING     18      **inpending
+MPIX_ERR_PROC_FAILED_PENDING 19 **failure_pending
 # New MPI-2 Error classes 
 MPI_ERR_FILE        27      **file
 MPI_ERR_ACCESS      20      **fileaccess
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 2800002..82cf107 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -422,6 +422,7 @@ unexpected messages queued.
 **badcase %d:INTERNAL ERROR: unexpected value in case statement (value=%d)
 **node_root_rank:Unable to get the node root rank
 **proc_failed:Process failed
+**failure_pending:Request pending due to failure
 # Duplicates?
 #**argnull:Invalid null parameter
 #**argnull %s:Invalid null parameter %s

http://git.mpich.org/mpich.git/commitdiff/6ce715477e725d550af675fbd10cc3b2ff0c615c

commit 6ce715477e725d550af675fbd10cc3b2ff0c615c
Author: Wesley Bland <wbland at anl.gov>
Date:   Mon Jul 28 17:04:38 2014 -0500

    Rename error code to MPIX_ERR_PROC_FAILED
    
    Previously, MPICH was using MPIX_ERR_FAIL_STOP as the generic error code for
    process failures. The ULFM document specifies the error code to be
    MPIX_ERR_PROC_FAILED.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index 393c7d8..c018671 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -877,7 +877,7 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
                                   * range. All MPIX error codes will be
                                   * above this value to be ABI complaint. */
 
-#define MPIX_ERR_PROC_FAIL_STOP   MPICH_ERR_FIRST_MPIX+1   /* Process failure */
+#define MPIX_ERR_PROC_FAILED      MPICH_ERR_FIRST_MPIX+1   /* Process failure */
 
 #define MPICH_ERR_LAST_MPIX       MPICH_ERR_FIRST_MPIX+1
 
diff --git a/src/mpi/errhan/baseerrnames.txt b/src/mpi/errhan/baseerrnames.txt
index c66c7fe..5c7dc93 100644
--- a/src/mpi/errhan/baseerrnames.txt
+++ b/src/mpi/errhan/baseerrnames.txt
@@ -73,7 +73,7 @@ MPI_ERR_RMA_SYNC    50      **rmasync
 MPI_ERR_SIZE        51      **rmasize
 MPI_ERR_DISP        52      **rmadisp
 MPI_ERR_ASSERT      53      **assert
-MPIX_ERR_PROC_FAIL_STOP 54  **proc_fail_stop
+MPIX_ERR_PROC_FAILED 54     **proc_failed
 MPI_ERR_RMA_RANGE   55      **rmarange
 MPI_ERR_RMA_ATTACH  56      **rmaattach
 MPI_ERR_RMA_SHARED  57      **rmashared
diff --git a/src/mpi/errhan/errnames.txt b/src/mpi/errhan/errnames.txt
index 0cbe441..2800002 100644
--- a/src/mpi/errhan/errnames.txt
+++ b/src/mpi/errhan/errnames.txt
@@ -421,7 +421,7 @@ unexpected messages queued.
 **badcase:INTERNAL ERROR: unexpected value in case statement
 **badcase %d:INTERNAL ERROR: unexpected value in case statement (value=%d)
 **node_root_rank:Unable to get the node root rank
-**proc_fail_stop:A process has failed
+**proc_failed:Process failed
 # Duplicates?
 #**argnull:Invalid null parameter
 #**argnull %s:Invalid null parameter %s
diff --git a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
index a77e9e3..9496048 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/portals4/ptl_init.c
@@ -359,7 +359,7 @@ int vc_terminate(MPIDI_VC_t *vc)
         /* VC is terminated as a result of a fault.  Complete
            outstanding sends with an error and terminate
            connection immediately. */
-        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
         mpi_errno = MPID_nem_ptl_sendq_complete_with_error(vc, req_errno);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         mpi_errno = MPID_nem_ptl_vc_terminated(vc);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c b/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c
index 7e8ce8f..3aad34a 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c
@@ -1648,7 +1648,7 @@ fn_exit:
 fn_fail: /* comm related failures jump here */
     {
 
-        MPIU_ERR_SET1(mpi_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", sc_vc->pg_rank);
+        MPIU_ERR_SET1(mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", sc_vc->pg_rank);
         mpi_errno = MPID_nem_tcp_cleanup_on_error(sc_vc, mpi_errno);
         if (mpi_errno) {
             MPIU_ERR_SET(mpi_errno, MPI_ERR_OTHER, "**tcp_cleanup_fail");
@@ -1829,11 +1829,11 @@ int MPID_nem_tcp_connpoll(int in_blocking_poll)
                 
                 MPIU_DBG_MSG(NEM_SOCK_DET, VERBOSE, "error polling fd, closing sc");
                 if (it_sc->vc) {
-                    MPIU_ERR_SET2(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d %s", it_sc->vc->pg_rank, err_str);
+                    MPIU_ERR_SET2(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d %s", it_sc->vc->pg_rank, err_str);
                     mpi_errno = MPID_nem_tcp_cleanup_on_error(it_sc->vc, req_errno);
                     MPIU_ERR_CHKANDJUMP(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**tcp_cleanup_fail");
                 } else {
-                    MPIU_ERR_SET2(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail_conn", "**comm_fail_conn %s %s", CONN_STATE_STR[it_sc->state.cstate], err_str);
+                    MPIU_ERR_SET2(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail_conn", "**comm_fail_conn %s %s", CONN_STATE_STR[it_sc->state.cstate], err_str);
                     mpi_errno = close_cleanup_and_free_sc_plfd(it_sc);
                     MPIU_ERR_CHKANDJUMP(mpi_errno, mpi_errno, MPI_ERR_OTHER, "**tcp_cleanup_fail");
                 }
diff --git a/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_init.c b/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_init.c
index 26ca89c..14ba19a 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_init.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_init.c
@@ -671,7 +671,7 @@ int MPID_nem_tcp_vc_terminate(MPIDI_VC_t *vc)
         /* VC is terminated as a result of a fault.  Complete
            outstanding sends with an error and terminate
            connection immediately. */
-        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
         mpi_errno = MPID_nem_tcp_error_out_send_queue(vc, req_errno);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         mpi_errno = MPID_nem_tcp_vc_terminated(vc);
diff --git a/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_send.c b/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_send.c
index d250ef6..80c503d 100644
--- a/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_send.c
+++ b/src/mpid/ch3/channels/nemesis/netmod/tcp/tcp_send.c
@@ -97,7 +97,7 @@ int MPID_nem_tcp_send_queued(MPIDI_VC_t *vc, MPIDI_nem_tcp_request_queue_t *send
             int req_errno = MPI_SUCCESS;
 
             MPIU_ERR_SET(req_errno, MPI_ERR_OTHER, "**sock_closed");
-            MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+            MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
             mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
             if (mpi_errno) MPIU_ERR_POP(mpi_errno);
             goto fn_exit; /* this vc is closed now, just bail out */
@@ -112,7 +112,7 @@ int MPID_nem_tcp_send_queued(MPIDI_VC_t *vc, MPIDI_nem_tcp_request_queue_t *send
             } else {
                 int req_errno = MPI_SUCCESS;
                 MPIU_ERR_SET1(req_errno, MPI_ERR_OTHER, "**writev", "**writev %s", MPIU_Strerror(errno));
-                MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                 mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                 if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                 goto fn_exit; /* this vc is closed now, just bail out */
@@ -262,7 +262,7 @@ int MPID_nem_tcp_iStartContigMsg(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_s
                     int req_errno = MPI_SUCCESS;
 
                     MPIU_ERR_SET(req_errno, MPI_ERR_OTHER, "**sock_closed");
-                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                     mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                     goto fn_fail;
@@ -274,7 +274,7 @@ int MPID_nem_tcp_iStartContigMsg(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_s
                     else {
                         int req_errno = MPI_SUCCESS;
                         MPIU_ERR_SET1(req_errno, MPI_ERR_OTHER, "**writev", "**writev %s", MPIU_Strerror(errno));
-                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                         mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                         goto fn_fail;
@@ -401,7 +401,7 @@ int MPID_nem_tcp_iStartContigMsg_paused(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_
                 int req_errno = MPI_SUCCESS;
 
                 MPIU_ERR_SET(req_errno, MPI_ERR_OTHER, "**sock_closed");
-                MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                 mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                 if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                 goto fn_fail;
@@ -413,7 +413,7 @@ int MPID_nem_tcp_iStartContigMsg_paused(MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_
                 else {
                     int req_errno = MPI_SUCCESS;
                     MPIU_ERR_SET1(req_errno, MPI_ERR_OTHER, "**writev", "**writev %s", MPIU_Strerror(errno));
-                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
 
                     mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
@@ -536,7 +536,7 @@ int MPID_nem_tcp_iSendContig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPID
                     int req_errno = MPI_SUCCESS;
 
                     MPIU_ERR_SET(req_errno, MPI_ERR_OTHER, "**sock_closed");
-                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                     mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                     goto fn_fail;
@@ -548,7 +548,7 @@ int MPID_nem_tcp_iSendContig(MPIDI_VC_t *vc, MPID_Request *sreq, void *hdr, MPID
                     else {
                         int req_errno = MPI_SUCCESS;
                         MPIU_ERR_SET1(req_errno, MPI_ERR_OTHER, "**writev", "**writev %s", MPIU_Strerror(errno));
-                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                         mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                         goto fn_fail;
@@ -695,7 +695,7 @@ int MPID_nem_tcp_SendNoncontig(MPIDI_VC_t *vc, MPID_Request *sreq, void *header,
                     int req_errno = MPI_SUCCESS;
 
                     MPIU_ERR_SET(req_errno, MPI_ERR_OTHER, "**sock_closed");
-                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                    MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                     mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                     goto fn_fail;
@@ -707,7 +707,7 @@ int MPID_nem_tcp_SendNoncontig(MPIDI_VC_t *vc, MPID_Request *sreq, void *header,
                     else {
                         int req_errno = MPI_SUCCESS;
                         MPIU_ERR_SET1(req_errno, MPI_ERR_OTHER, "**writev", "**writev %s", MPIU_Strerror(errno));
-                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                        MPIU_ERR_SET1(req_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                         mpi_errno = MPID_nem_tcp_cleanup_on_error(vc, req_errno);
                         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
                         goto fn_fail;
diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_isend.c b/src/mpid/ch3/channels/nemesis/src/ch3_isend.c
index da5141a..234fc4d 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_isend.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_isend.c
@@ -26,7 +26,7 @@ int MPIDI_CH3_iSend (MPIDI_VC_t *vc, MPID_Request *sreq, void * hdr, MPIDI_msg_s
 
     if (vc->state == MPIDI_VC_STATE_MORIBUND) {
         sreq->status.MPI_ERROR = MPI_SUCCESS;
-        MPIU_ERR_SET1(sreq->status.MPI_ERROR, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+        MPIU_ERR_SET1(sreq->status.MPI_ERROR, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
         MPIDI_CH3U_Request_complete(sreq);
         goto fn_fail;
     }
diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c b/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
index bcceb2d..d5fe393 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_isendv.c
@@ -30,7 +30,7 @@ int MPIDI_CH3_iSendv (MPIDI_VC_t *vc, MPID_Request *sreq, MPID_IOV *iov, int n_i
 
     if (vc->state == MPIDI_VC_STATE_MORIBUND) {
         sreq->status.MPI_ERROR = MPI_SUCCESS;
-        MPIU_ERR_SET1(sreq->status.MPI_ERROR, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+        MPIU_ERR_SET1(sreq->status.MPI_ERROR, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
         MPIDI_CH3U_Request_complete(sreq);
         goto fn_fail;
     }
diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c b/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c
index 4e97a0a..db1df84 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c
@@ -33,7 +33,7 @@ int MPIDI_CH3_iStartMsg (MPIDI_VC_t *vc, void *hdr, MPIDI_msg_sz_t hdr_sz, MPID_
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_ISTARTMSG);
 
-    MPIU_ERR_CHKANDJUMP1(vc->state == MPIDI_VC_STATE_MORIBUND, mpi_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+    MPIU_ERR_CHKANDJUMP1(vc->state == MPIDI_VC_STATE_MORIBUND, mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
 
     if (vc->ch.iStartContigMsg)
     {
diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_istartmsgv.c b/src/mpid/ch3/channels/nemesis/src/ch3_istartmsgv.c
index 2d26c18..02d7cb3 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_istartmsgv.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_istartmsgv.c
@@ -41,7 +41,7 @@ int MPIDI_CH3_iStartMsgv (MPIDI_VC_t *vc, MPID_IOV *iov, int n_iov, MPID_Request
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_ISTARTMSGV);
 
-    MPIU_ERR_CHKANDJUMP1(vc->state == MPIDI_VC_STATE_MORIBUND, mpi_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+    MPIU_ERR_CHKANDJUMP1(vc->state == MPIDI_VC_STATE_MORIBUND, mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
 
     if (vc->ch.iStartContigMsg)
     {
diff --git a/src/mpid/ch3/channels/nemesis/src/ch3_progress.c b/src/mpid/ch3/channels/nemesis/src/ch3_progress.c
index 2b58bfe..7232361 100644
--- a/src/mpid/ch3/channels/nemesis/src/ch3_progress.c
+++ b/src/mpid/ch3/channels/nemesis/src/ch3_progress.c
@@ -1023,7 +1023,7 @@ int MPIDI_CH3I_Complete_sendq_with_error(MPIDI_VC_t * vc)
                 MPIDI_CH3I_shm_sendq.tail = prev;
 
             req->status.MPI_ERROR = MPI_SUCCESS;
-            MPIU_ERR_SET1(req->status.MPI_ERROR, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+            MPIU_ERR_SET1(req->status.MPI_ERROR, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
             
             MPID_Request_release(req); /* ref count was incremented when added to queue */
             MPIDI_CH3U_Request_complete(req);
diff --git a/src/mpid/ch3/src/ch3u_recvq.c b/src/mpid/ch3/src/ch3u_recvq.c
index f4f9000..8ec9057 100644
--- a/src/mpid/ch3/src/ch3u_recvq.c
+++ b/src/mpid/ch3/src/ch3u_recvq.c
@@ -627,13 +627,13 @@ MPID_Request * MPIDI_CH3U_Recvq_FDU_or_AEP(int source, int tag,
             MPIDI_VC_t *vc;
             MPIDI_Comm_get_vc(comm, source, &vc);
             if (vc->state == MPIDI_VC_STATE_MORIBUND) {
-                MPIU_ERR_SET1(mpi_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", vc->pg_rank);
+                MPIU_ERR_SET1(mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", vc->pg_rank);
                 rreq->status.MPI_ERROR = mpi_errno;
                 MPIDI_CH3U_Request_complete(rreq);
                 goto lock_exit;
             }
         } else if (!MPIDI_CH3I_Comm_AS_enabled(comm)) {
-            MPIU_ERR_SET(mpi_errno, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail");
+            MPIU_ERR_SET(mpi_errno, MPIX_ERR_PROC_FAILED, "**comm_fail");
             rreq->status.MPI_ERROR = mpi_errno;
             MPIDI_CH3U_Request_complete(rreq);
             goto lock_exit;
@@ -860,9 +860,9 @@ static inline void dequeue_and_set_error(MPID_Request **req,  MPID_Request *prev
 
     if (*error == MPI_SUCCESS) {
         if (rank == MPI_PROC_NULL)
-            MPIU_ERR_SET(*error, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail");
+            MPIU_ERR_SET(*error, MPIX_ERR_PROC_FAILED, "**comm_fail");
         else
-            MPIU_ERR_SET1(*error, MPIX_ERR_PROC_FAIL_STOP, "**comm_fail", "**comm_fail %d", rank);
+            MPIU_ERR_SET1(*error, MPIX_ERR_PROC_FAILED, "**comm_fail", "**comm_fail %d", rank);
     }
     
     /* remove from queue */
diff --git a/test/mpi/ft/barrier.c b/test/mpi/ft/barrier.c
index 2072698..6e33f11 100644
--- a/test/mpi/ft/barrier.c
+++ b/test/mpi/ft/barrier.c
@@ -37,11 +37,11 @@ int main(int argc, char **argv)
     if (rank == 0) {
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if (errclass == MPIX_ERR_PROC_FAIL_STOP) {
+        if (errclass == MPIX_ERR_PROC_FAILED) {
             printf(" No Errors\n");
             fflush(stdout);
         } else {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         }
 #else
         if (err) {
diff --git a/test/mpi/ft/bcast.c b/test/mpi/ft/bcast.c
index 0bdae30..06cfeff 100644
--- a/test/mpi/ft/bcast.c
+++ b/test/mpi/ft/bcast.c
@@ -45,8 +45,8 @@ int main(int argc, char **argv)
 
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
     MPI_Error_class(rc, &errclass);
-    if ((rc) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+    if ((rc) && (errclass != MPIX_ERR_PROC_FAILED)) {
+        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         errs++;
     }
 #endif
@@ -60,8 +60,8 @@ int main(int argc, char **argv)
 
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
     MPI_Error_class(rc, &errclass);
-    if ((rc) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+    if ((rc) && (errclass != MPIX_ERR_PROC_FAILED)) {
+        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         errs++;
     }
 #endif
diff --git a/test/mpi/ft/gather.c b/test/mpi/ft/gather.c
index 6ac2a6e..997d58b 100644
--- a/test/mpi/ft/gather.c
+++ b/test/mpi/ft/gather.c
@@ -40,11 +40,11 @@ int main(int argc, char **argv)
     if (rank == 0) {
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if (errclass == MPIX_ERR_PROC_FAIL_STOP) {
+        if (errclass == MPIX_ERR_PROC_FAILED) {
             printf(" No Errors\n");
             fflush(stdout);
         } else {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         }
 #else
         if (err) {
diff --git a/test/mpi/ft/irecvdead.c b/test/mpi/ft/irecvdead.c
index bf20dc1..f97c6f5 100644
--- a/test/mpi/ft/irecvdead.c
+++ b/test/mpi/ft/irecvdead.c
@@ -11,7 +11,7 @@
 /*
  * This test attempts MPI_Irecv with the source being a dead process. It should fail
  * and return an error at completion. If we are testing sufficiently new MPICH, we
- * look for the MPIX_ERR_PROC_FAIL_STOP error code. These should be converted to look
+ * look for the MPIX_ERR_PROC_FAILED error code. These should be converted to look
  * for the standarized error code once it is finalized.
  */
 int main(int argc, char **argv)
@@ -41,11 +41,11 @@ int main(int argc, char **argv)
         err = MPI_Wait(&request, MPI_STATUS_IGNORE);
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if (errclass == MPIX_ERR_PROC_FAIL_STOP) {
+        if (errclass == MPIX_ERR_PROC_FAILED) {
             printf(" No Errors\n");
             fflush(stdout);
         } else {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         }
 #else
         if (err) {
diff --git a/test/mpi/ft/isenddead.c b/test/mpi/ft/isenddead.c
index 3440b57..a915c3e 100644
--- a/test/mpi/ft/isenddead.c
+++ b/test/mpi/ft/isenddead.c
@@ -40,8 +40,8 @@ int main(int argc, char **argv)
         err = MPI_Wait(&request, MPI_STATUS_IGNORE);
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if ((err) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+        if ((err) && (errclass != MPIX_ERR_PROC_FAILED)) {
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         } else {
             printf(" No Errors\n");
             fflush(stdout);
diff --git a/test/mpi/ft/recvdead.c b/test/mpi/ft/recvdead.c
index 5b194ca..931afdf 100644
--- a/test/mpi/ft/recvdead.c
+++ b/test/mpi/ft/recvdead.c
@@ -11,7 +11,7 @@
 /*
  * This test attempts MPI_Recv with the source being a dead process. It should fail
  * and return an error. If we are testing sufficiently new MPICH, we look for the
- * MPIX_ERR_PROC_FAIL_STOP error code. These should be converted to look for the
+ * MPIX_ERR_PROC_FAILED error code. These should be converted to look for the
  * standarized error code once it is finalized.
  */
 int main(int argc, char **argv)
@@ -36,11 +36,11 @@ int main(int argc, char **argv)
         err = MPI_Recv(buf, 1, MPI_CHAR, 1, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if (errclass == MPIX_ERR_PROC_FAIL_STOP) {
+        if (errclass == MPIX_ERR_PROC_FAILED) {
             printf(" No Errors\n");
             fflush(stdout);
         } else {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         }
 #else
         if (err) {
diff --git a/test/mpi/ft/reduce.c b/test/mpi/ft/reduce.c
index 30664b5..993fe07 100644
--- a/test/mpi/ft/reduce.c
+++ b/test/mpi/ft/reduce.c
@@ -38,11 +38,11 @@ int main(int argc, char **argv)
     if (rank == 0) {
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if (errclass == MPIX_ERR_PROC_FAIL_STOP) {
+        if (errclass == MPIX_ERR_PROC_FAILED) {
             printf(" No Errors\n");
             fflush(stdout);
         } else {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         }
 #else
         if (err) {
diff --git a/test/mpi/ft/scatter.c b/test/mpi/ft/scatter.c
index 93f7b86..d47dfc7 100644
--- a/test/mpi/ft/scatter.c
+++ b/test/mpi/ft/scatter.c
@@ -49,8 +49,8 @@ int main(int argc, char **argv)
 
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
     MPI_Error_class(rc, &errclass);
-    if ((rc) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+    if ((rc) && (errclass != MPIX_ERR_PROC_FAILED)) {
+        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         errs++;
     }
 #endif
@@ -70,8 +70,8 @@ int main(int argc, char **argv)
 
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
     MPI_Error_class(rc, &errclass);
-    if ((rc) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+    if ((rc) && (errclass != MPIX_ERR_PROC_FAILED)) {
+        fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         errs++;
     }
 #endif
diff --git a/test/mpi/ft/senddead.c b/test/mpi/ft/senddead.c
index 6df36f1..9524789 100644
--- a/test/mpi/ft/senddead.c
+++ b/test/mpi/ft/senddead.c
@@ -35,8 +35,8 @@ int main(int argc, char **argv)
         err = MPI_Send(buf, 100000, MPI_CHAR, 1, 0, MPI_COMM_WORLD);
 #if defined (MPICH) && (MPICH_NUMVERSION >= 30100102)
         MPI_Error_class(err, &errclass);
-        if ((err) && (errclass != MPIX_ERR_PROC_FAIL_STOP)) {
-            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAIL_STOP\n", errclass);
+        if ((err) && (errclass != MPIX_ERR_PROC_FAILED)) {
+            fprintf(stderr, "Wrong error code (%d) returned. Expected MPIX_ERR_PROC_FAILED\n", errclass);
         } else {
             printf(" No Errors\n");
             fflush(stdout);

http://git.mpich.org/mpich.git/commitdiff/c83eddd9138b21558a618457910583cf7c1ba321

commit c83eddd9138b21558a618457910583cf7c1ba321
Author: Wesley Bland <wbland at anl.gov>
Date:   Mon Jul 28 11:27:19 2014 -0500

    Destroy request object before setting it to NULL.
    
    When an isendv fails in MPIDI_CH3_EagerSyncNoncontigSend, the request is set
    to NULL before returning it back to the caller. Unfortunately, the request is
    not allocated inside this function so if we pass back NULL, we lose the handle
    on the request. If we're going to return NULL, we need to make sure the
    request is destroyed before giving it back.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/mpid/ch3/src/ch3u_eagersync.c b/src/mpid/ch3/src/ch3u_eagersync.c
index a85dc22..f88501c 100644
--- a/src/mpid/ch3/src/ch3u_eagersync.c
+++ b/src/mpid/ch3/src/ch3u_eagersync.c
@@ -79,7 +79,10 @@ int MPIDI_CH3_EagerSyncNoncontigSend( MPID_Request **sreq_p,
 	/* --BEGIN ERROR HANDLING-- */
 	if (mpi_errno != MPI_SUCCESS)
 	{
-	    MPID_Request_release(sreq);
+        /* Make sure to destroy the request before setting the pointer to
+         * NULL, otherwise we lose the handle on the request */
+        MPIU_Object_set_ref(sreq, 0);
+        MPIDI_CH3_Request_destroy(sreq);
 	    *sreq_p = NULL;
             MPIU_ERR_SETANDJUMP(mpi_errno, MPI_ERR_OTHER, "**ch3|eagermsg");
 	}

http://git.mpich.org/mpich.git/commitdiff/b68657dcaff19cd0a164f75f31eace6ef64d324b

commit b68657dcaff19cd0a164f75f31eace6ef64d324b
Author: Wesley Bland <wbland at anl.gov>
Date:   Wed Jul 30 09:54:11 2014 -0500

    Introduce MPICH_ERR_LAST_MPIX
    
    There were a few places in MPICH where the error class was being checked
    against MPICH_ERR_LAST_CLASS and being flagged as invalid if it was too large.
    This is incorrect now that we have a new space for MPIX error codes. Add
    MPICH_ERR_LAST_MPIX as a way of keeping track of what the actual last valid
    error class is.
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpi.h.in b/src/include/mpi.h.in
index c12e38c..393c7d8 100644
--- a/src/include/mpi.h.in
+++ b/src/include/mpi.h.in
@@ -877,7 +877,10 @@ typedef int (MPIX_Grequest_wait_function)(int, void **, double, MPI_Status *);
                                   * range. All MPIX error codes will be
                                   * above this value to be ABI complaint. */
 
-#define MPIX_ERR_PROC_FAIL_STOP MPICH_ERR_FIRST_MPIX+1   /* Process failure */
+#define MPIX_ERR_PROC_FAIL_STOP   MPICH_ERR_FIRST_MPIX+1   /* Process failure */
+
+#define MPICH_ERR_LAST_MPIX       MPICH_ERR_FIRST_MPIX+1
+
 
 /* End of MPI's error classes */
 
diff --git a/src/mpi/errhan/errutil.c b/src/mpi/errhan/errutil.c
index 84555e3..633697d 100644
--- a/src/mpi/errhan/errutil.c
+++ b/src/mpi/errhan/errutil.c
@@ -473,7 +473,7 @@ static int checkValidErrcode( int error_class, const char fcname[],
     int errcode = *errcode_p;
     int rc = 0;
 
-    if (error_class > MPICH_ERR_LAST_CLASS)
+    if (error_class > MPICH_ERR_LAST_MPIX)
     {
 	/* --BEGIN ERROR HANDLING-- */
 	if (errcode & ~ERROR_CLASS_MASK)
@@ -525,7 +525,7 @@ int MPIR_Err_combine_codes(int error1, int error2)
 
     error2_class = MPIR_ERR_GET_CLASS(error2_code);
     if (MPIR_ERR_GET_CLASS(error2_class) < MPI_SUCCESS ||
-	MPIR_ERR_GET_CLASS(error2_class) > MPICH_ERR_LAST_CLASS)
+	MPIR_ERR_GET_CLASS(error2_class) > MPICH_ERR_LAST_MPIX)
     {
 	error2_class = MPI_ERR_OTHER;
     }
@@ -908,7 +908,7 @@ int MPIR_Err_create_code_valist( int lastcode, int fatal, const char fcname[],
     if (error_class == MPI_ERR_OTHER)
     {
         if (MPIR_ERR_GET_CLASS(lastcode) > MPI_SUCCESS && 
-	    MPIR_ERR_GET_CLASS(lastcode) <= MPICH_ERR_LAST_CLASS)
+	    MPIR_ERR_GET_CLASS(lastcode) <= MPICH_ERR_LAST_MPIX)
 	{
 	    /* If the last class is more specific (and is valid), then pass it 
 	       through */
@@ -1274,7 +1274,7 @@ static void MPIR_Err_print_stack_string(int errcode, char *str, int maxlen )
 
 	error_class = ERROR_GET_CLASS(errcode);
 	
-	if (error_class <= MPICH_ERR_LAST_CLASS)
+	if (error_class <= MPICH_ERR_LAST_MPIX)
 	{
 	    MPIU_Snprintf(str, maxlen, "(unknown)(): %s\n", 
 			  get_class_msg(ERROR_GET_CLASS(errcode)));

-----------------------------------------------------------------------

Summary of changes:
 src/include/mpi.h.in                               |   27 +++-
 src/include/mpiimpl.h                              |   98 ++++++++++-
 src/mpi/attr/attr_delete.c                         |    2 +-
 src/mpi/attr/attr_get.c                            |    2 +-
 src/mpi/attr/attr_put.c                            |    2 +-
 src/mpi/attr/comm_delete_attr.c                    |    2 +-
 src/mpi/attr/comm_get_attr.c                       |    2 +-
 src/mpi/attr/comm_set_attr.c                       |    4 +-
 src/mpi/coll/allgather.c                           |    2 +-
 src/mpi/coll/allgatherv.c                          |    2 +-
 src/mpi/coll/allred_group.c                        |   39 ++--
 src/mpi/coll/allreduce.c                           |    2 +-
 src/mpi/coll/alltoall.c                            |    2 +-
 src/mpi/coll/alltoallv.c                           |    2 +-
 src/mpi/coll/alltoallw.c                           |    2 +-
 src/mpi/coll/barrier.c                             |    2 +-
 src/mpi/coll/bcast.c                               |    2 +-
 src/mpi/coll/exscan.c                              |    2 +-
 src/mpi/coll/gather.c                              |    2 +-
 src/mpi/coll/gatherv.c                             |    2 +-
 src/mpi/coll/helper_fns.c                          |    8 +-
 src/mpi/coll/iallgather.c                          |    2 +-
 src/mpi/coll/iallgatherv.c                         |    2 +-
 src/mpi/coll/iallreduce.c                          |    2 +-
 src/mpi/coll/ialltoall.c                           |    2 +-
 src/mpi/coll/ialltoallv.c                          |    2 +-
 src/mpi/coll/ialltoallw.c                          |    2 +-
 src/mpi/coll/ibarrier.c                            |    2 +-
 src/mpi/coll/ibcast.c                              |    2 +-
 src/mpi/coll/iexscan.c                             |    2 +-
 src/mpi/coll/igather.c                             |    2 +-
 src/mpi/coll/igatherv.c                            |    2 +-
 src/mpi/coll/ired_scat.c                           |    2 +-
 src/mpi/coll/ired_scat_block.c                     |    2 +-
 src/mpi/coll/ireduce.c                             |    2 +-
 src/mpi/coll/iscan.c                               |    2 +-
 src/mpi/coll/iscatter.c                            |    2 +-
 src/mpi/coll/iscatterv.c                           |    2 +-
 src/mpi/coll/red_scat.c                            |    2 +-
 src/mpi/coll/red_scat_block.c                      |    2 +-
 src/mpi/coll/reduce.c                              |    2 +-
 src/mpi/coll/scan.c                                |    2 +-
 src/mpi/coll/scatter.c                             |    2 +-
 src/mpi/coll/scatterv.c                            |    2 +-
 src/mpi/comm/Makefile.mk                           |    7 +-
 src/mpi/comm/comm_agree.c                          |  188 ++++++++++++++++++++
 src/mpi/comm/comm_compare.c                        |    4 +-
 src/mpi/comm/comm_create.c                         |    2 +-
 src/mpi/comm/comm_create_group.c                   |    2 +-
 src/mpi/comm/comm_dup.c                            |    2 +-
 src/mpi/comm/comm_dup_with_info.c                  |    5 +-
 src/mpi/comm/comm_failure_ack.c                    |  114 ++++++++++++
 src/mpi/comm/comm_failure_get_acked.c              |  118 ++++++++++++
 src/mpi/comm/comm_free.c                           |    2 +-
 src/mpi/comm/comm_get_info.c                       |    5 +-
 src/mpi/comm/comm_get_name.c                       |    2 +-
 src/mpi/comm/comm_group.c                          |    4 +-
 src/mpi/comm/comm_idup.c                           |    2 +-
 src/mpi/comm/comm_rank.c                           |    4 +-
 src/mpi/comm/comm_remote_group.c                   |    2 +-
 src/mpi/comm/comm_remote_size.c                    |    2 +-
 src/mpi/comm/comm_revoke.c                         |  112 ++++++++++++
 src/mpi/comm/comm_set_info.c                       |    5 +-
 src/mpi/comm/comm_set_name.c                       |    2 +-
 src/mpi/comm/comm_shrink.c                         |  180 +++++++++++++++++++
 src/mpi/comm/comm_size.c                           |    2 +-
 src/mpi/comm/comm_split.c                          |    2 +-
 src/mpi/comm/comm_split_type.c                     |    2 +-
 src/mpi/comm/comm_test_inter.c                     |    2 +-
 src/mpi/comm/commutil.c                            |    3 +
 src/mpi/comm/intercomm_create.c                    |    4 +-
 src/mpi/comm/intercomm_merge.c                     |    2 +-
 src/mpi/datatype/pack.c                            |    2 +-
 src/mpi/datatype/pack_size.c                       |    2 +-
 src/mpi/datatype/unpack.c                          |    2 +-
 src/mpi/errhan/baseerrnames.txt                    |    4 +-
 src/mpi/errhan/comm_call_errhandler.c              |    2 +-
 src/mpi/errhan/comm_get_errhandler.c               |    2 +-
 src/mpi/errhan/comm_set_errhandler.c               |    2 +-
 src/mpi/errhan/errhandler_get.c                    |    2 +-
 src/mpi/errhan/errhandler_set.c                    |    2 +-
 src/mpi/errhan/errnames.txt                        |   14 ++-
 src/mpi/errhan/errutil.c                           |    8 +-
 src/mpi/init/abort.c                               |    2 +-
 src/mpi/pt2pt/bsend.c                              |    2 +-
 src/mpi/pt2pt/bsend_init.c                         |    2 +-
 src/mpi/pt2pt/ibsend.c                             |    2 +-
 src/mpi/pt2pt/improbe.c                            |    2 +-
 src/mpi/pt2pt/iprobe.c                             |    2 +-
 src/mpi/pt2pt/irecv.c                              |    2 +-
 src/mpi/pt2pt/irsend.c                             |    2 +-
 src/mpi/pt2pt/isend.c                              |    2 +-
 src/mpi/pt2pt/issend.c                             |    2 +-
 src/mpi/pt2pt/mprobe.c                             |    2 +-
 src/mpi/pt2pt/probe.c                              |    2 +-
 src/mpi/pt2pt/recv.c                               |    2 +-
 src/mpi/pt2pt/recv_init.c                          |    2 +-
 src/mpi/pt2pt/rsend.c                              |    2 +-
 src/mpi/pt2pt/rsend_init.c                         |    2 +-
 src/mpi/pt2pt/send.c                               |    2 +-
 src/mpi/pt2pt/send_init.c                          |    2 +-
 src/mpi/pt2pt/sendrecv.c                           |    2 +-
 src/mpi/pt2pt/sendrecv_rep.c                       |    2 +-
 src/mpi/pt2pt/ssend.c                              |    2 +-
 src/mpi/pt2pt/ssend_init.c                         |    2 +-
 src/mpi/pt2pt/testall.c                            |   17 ++-
 src/mpi/pt2pt/waitall.c                            |   12 ++-
 src/mpi/rma/win_allocate.c                         |    2 +-
 src/mpi/rma/win_allocate_shared.c                  |    2 +-
 src/mpi/rma/win_create.c                           |    2 +-
 src/mpi/rma/win_create_dynamic.c                   |    2 +-
 src/mpi/spawn/comm_accept.c                        |    2 +-
 src/mpi/spawn/comm_connect.c                       |    2 +-
 src/mpi/spawn/comm_disconnect.c                    |    2 +-
 src/mpi/spawn/comm_spawn.c                         |    2 +-
 src/mpi/spawn/comm_spawn_multiple.c                |    2 +-
 src/mpi/topo/cart_coords.c                         |    2 +-
 src/mpi/topo/cart_create.c                         |    2 +-
 src/mpi/topo/cart_get.c                            |    2 +-
 src/mpi/topo/cart_map.c                            |    2 +-
 src/mpi/topo/cart_rank.c                           |    2 +-
 src/mpi/topo/cart_shift.c                          |    2 +-
 src/mpi/topo/cart_sub.c                            |    2 +-
 src/mpi/topo/cartdim_get.c                         |    2 +-
 src/mpi/topo/dist_gr_create.c                      |    2 +-
 src/mpi/topo/dist_gr_create_adj.c                  |    2 +-
 src/mpi/topo/dist_gr_neighb_count.c                |    2 +-
 src/mpi/topo/graph_get.c                           |    2 +-
 src/mpi/topo/graph_map.c                           |    2 +-
 src/mpi/topo/graph_nbr.c                           |    2 +-
 src/mpi/topo/graphcreate.c                         |    2 +-
 src/mpi/topo/graphdimsget.c                        |    2 +-
 src/mpi/topo/graphnbrcnt.c                         |    2 +-
 src/mpi/topo/inhb_allgather.c                      |    2 +-
 src/mpi/topo/inhb_allgatherv.c                     |    2 +-
 src/mpi/topo/inhb_alltoall.c                       |    2 +-
 src/mpi/topo/inhb_alltoallv.c                      |    2 +-
 src/mpi/topo/inhb_alltoallw.c                      |    2 +-
 src/mpi/topo/nhb_allgather.c                       |    2 +-
 src/mpi/topo/nhb_allgatherv.c                      |    2 +-
 src/mpi/topo/nhb_alltoall.c                        |    2 +-
 src/mpi/topo/nhb_alltoallv.c                       |    2 +-
 src/mpi/topo/nhb_alltoallw.c                       |    2 +-
 src/mpi/topo/topo_test.c                           |    2 +-
 .../channels/nemesis/netmod/portals4/ptl_init.c    |    2 +-
 src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c  |    6 +-
 .../ch3/channels/nemesis/netmod/tcp/tcp_init.c     |    2 +-
 .../ch3/channels/nemesis/netmod/tcp/tcp_send.c     |   20 +-
 src/mpid/ch3/channels/nemesis/src/ch3_isend.c      |    2 +-
 src/mpid/ch3/channels/nemesis/src/ch3_isendv.c     |    3 +-
 src/mpid/ch3/channels/nemesis/src/ch3_istartmsg.c  |    2 +-
 src/mpid/ch3/channels/nemesis/src/ch3_istartmsgv.c |    2 +-
 src/mpid/ch3/channels/nemesis/src/ch3_progress.c   |    2 +-
 src/mpid/ch3/include/mpidimpl.h                    |   11 +-
 src/mpid/ch3/include/mpidpkt.h                     |    9 +
 src/mpid/ch3/include/mpidpre.h                     |    5 +-
 src/mpid/ch3/src/Makefile.mk                       |    5 +
 src/mpid/ch3/src/ch3u_comm.c                       |   24 ++-
 src/mpid/ch3/src/ch3u_eagersync.c                  |    5 +-
 src/mpid/ch3/src/ch3u_handle_connection.c          |  154 ++++++++++------
 src/mpid/ch3/src/ch3u_handle_recv_pkt.c            |    4 +
 src/mpid/ch3/src/ch3u_handle_revoke_pkt.c          |   38 ++++
 src/mpid/ch3/src/ch3u_recvq.c                      |  143 +++++++++++++--
 src/mpid/ch3/src/mpid_comm_agree.c                 |  119 ++++++++++++
 src/mpid/ch3/src/mpid_comm_disconnect.c            |    7 +-
 src/mpid/ch3/src/mpid_comm_failure_ack.c           |  145 +++++++++++++++
 src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c  |  156 ++++++++++++++++
 src/mpid/ch3/src/mpid_comm_revoke.c                |  108 +++++++++++
 src/mpid/ch3/src/mpid_comm_spawn_multiple.c        |    7 +
 src/mpid/ch3/src/mpid_finalize.c                   |    2 +
 src/mpid/ch3/src/mpid_improbe.c                    |    5 +
 src/mpid/ch3/src/mpid_init.c                       |   14 ++-
 src/mpid/ch3/src/mpid_iprobe.c                     |    7 +
 src/mpid/ch3/src/mpid_irecv.c                      |   10 +
 src/mpid/ch3/src/mpid_irsend.c                     |    8 +
 src/mpid/ch3/src/mpid_isend.c                      |    9 +
 src/mpid/ch3/src/mpid_issend.c                     |    8 +
 src/mpid/ch3/src/mpid_mprobe.c                     |    5 +
 src/mpid/ch3/src/mpid_probe.c                      |    7 +
 src/mpid/ch3/src/mpid_recv.c                       |    7 +
 src/mpid/ch3/src/mpid_rma.c                        |    5 +
 src/mpid/ch3/src/mpid_rsend.c                      |    8 +
 src/mpid/ch3/src/mpid_send.c                       |    7 +
 src/mpid/ch3/src/mpid_ssend.c                      |    8 +
 src/mpid/pamid/src/misc/mpid_unimpl.c              |   18 ++
 src/pm/hydra/pm/pmiserv/pmiserv_cb.c               |  104 +++---------
 src/util/procmap/local_proc.c                      |    4 +-
 test/mpi/ft/Makefile.am                            |    2 +-
 test/mpi/ft/agree.c                                |   68 +++++++
 test/mpi/ft/anysource.c                            |   78 ++++++++
 test/mpi/ft/barrier.c                              |    4 +-
 test/mpi/ft/bcast.c                                |    8 +-
 test/mpi/ft/failure_ack.c                          |  117 ++++++++++++
 test/mpi/ft/gather.c                               |    4 +-
 test/mpi/ft/irecvdead.c                            |    6 +-
 test/mpi/ft/isenddead.c                            |    4 +-
 test/mpi/ft/recvdead.c                             |    6 +-
 test/mpi/ft/reduce.c                               |    4 +-
 test/mpi/ft/revoke_nofail.c                        |   66 +++++++
 test/mpi/ft/scatter.c                              |    8 +-
 test/mpi/ft/senddead.c                             |    4 +-
 test/mpi/ft/shrink.c                               |   62 +++++++
 test/mpi/ft/testlist                               |    4 +
 test/mpid/ch3/failed_bitmask.c                     |   59 ++++++
 204 files changed, 2546 insertions(+), 399 deletions(-)
 create mode 100644 src/mpi/comm/comm_agree.c
 create mode 100644 src/mpi/comm/comm_failure_ack.c
 create mode 100644 src/mpi/comm/comm_failure_get_acked.c
 create mode 100644 src/mpi/comm/comm_revoke.c
 create mode 100644 src/mpi/comm/comm_shrink.c
 create mode 100644 src/mpid/ch3/src/ch3u_handle_revoke_pkt.c
 create mode 100644 src/mpid/ch3/src/mpid_comm_agree.c
 create mode 100644 src/mpid/ch3/src/mpid_comm_failure_ack.c
 create mode 100644 src/mpid/ch3/src/mpid_comm_get_all_failed_procs.c
 create mode 100644 src/mpid/ch3/src/mpid_comm_revoke.c
 create mode 100644 test/mpi/ft/agree.c
 create mode 100644 test/mpi/ft/anysource.c
 create mode 100644 test/mpi/ft/failure_ack.c
 create mode 100644 test/mpi/ft/revoke_nofail.c
 create mode 100644 test/mpi/ft/shrink.c
 create mode 100644 test/mpid/ch3/failed_bitmask.c


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list