[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.1rc2-13-ga3e8305

mysql vizuser noreply at mpich.org
Wed Nov 27 14:26:59 CST 2013


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  a3e830570a6e41dc9d49e2139fa33fef604afc97 (commit)
       via  86adc1b1f98641139c7e0e5b2387d27db0cd88a9 (commit)
      from  0dcb61440ebcec65bc460c22c986b8d26e52a7ce (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/a3e830570a6e41dc9d49e2139fa33fef604afc97

commit a3e830570a6e41dc9d49e2139fa33fef604afc97
Author: James Dinan <james.dinan at intel.com>
Date:   Wed Nov 20 12:53:22 2013 -0700

    Update comm_create to use sparse ctx id allocation
    
    Update MPI_Comm_create to use sparse, rather than dense, context_id
    allocation. This fixes incorrect context ID exhaustion errors caused by
    including processes that are not in the group of the new communicator in
    the allocation operation. This bug is exercised by
    test/mpi/cerrors/comm/too_many_comms3.c.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/comm/comm_create.c b/src/mpi/comm/comm_create.c
index abf01ea..375f5a3 100644
--- a/src/mpi/comm/comm_create.c
+++ b/src/mpi/comm/comm_create.c
@@ -233,8 +233,8 @@ int MPIR_Comm_create_intra(MPID_Comm *comm_ptr, MPID_Group *group_ptr,
        member of the group */
     /* In the multi-threaded case, MPIR_Get_contextid assumes that the
        calling routine already holds the single criticial section */
-    /* TODO should be converted to use MPIR_Get_contextid_sparse instead */
-    mpi_errno = MPIR_Get_contextid( comm_ptr, &new_context_id );
+    mpi_errno = MPIR_Get_contextid_sparse( comm_ptr, &new_context_id,
+                                           group_ptr->rank == MPI_UNDEFINED );
     if (mpi_errno) MPIU_ERR_POP(mpi_errno);
     MPIU_Assert(new_context_id != 0);
 
@@ -278,7 +278,6 @@ int MPIR_Comm_create_intra(MPID_Comm *comm_ptr, MPID_Group *group_ptr,
     }
     else {
         /* This process is not in the group */
-        MPIR_Free_contextid( new_context_id );
         new_context_id = 0;
     }
 
@@ -294,8 +293,9 @@ fn_fail:
         MPIR_Comm_release(*newcomm_ptr, 0/*isDisconnect*/);
         new_context_id = 0; /* MPIR_Comm_release frees the new ctx id */
     }
-    if (new_context_id != 0)
+    if (new_context_id != 0 && group_ptr->rank != MPI_UNDEFINED) {
         MPIR_Free_contextid(new_context_id);
+    }
     /* --END ERROR HANDLING-- */
     goto fn_exit;
 }

http://git.mpich.org/mpich.git/commitdiff/86adc1b1f98641139c7e0e5b2387d27db0cd88a9

commit 86adc1b1f98641139c7e0e5b2387d27db0cd88a9
Author: James Dinan <james.dinan at intel.com>
Date:   Wed Nov 20 12:42:50 2013 -0700

    Improve context ID exhaustion error reporting
    
    Adds a check to determine if context ID allocaiton failed because of
    exhaustion or fragmentation and improves error reporting.
    
    Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>

diff --git a/src/mpi/comm/commutil.c b/src/mpi/comm/commutil.c
index 26facb7..8ca467e 100644
--- a/src/mpi/comm/commutil.c
+++ b/src/mpi/comm/commutil.c
@@ -1130,6 +1130,8 @@ int MPIR_Get_contextid_sparse_group(MPID_Comm *comm_ptr, MPID_Group *group_ptr,
             /* --BEGIN ERROR HANDLING-- */
             int nfree = 0;
             int ntotal = 0;
+            int minfree;
+
             if (own_mask) {
                 MPIU_THREAD_CS_ENTER(CONTEXTID,);
                 mask_in_use = 0;
@@ -1141,9 +1143,29 @@ int MPIR_Get_contextid_sparse_group(MPID_Comm *comm_ptr, MPID_Group *group_ptr,
             }
 
             MPIR_ContextMaskStats(&nfree, &ntotal);
-            MPIU_ERR_SETANDJUMP3(mpi_errno, MPI_ERR_OTHER,
-                                 "**toomanycommfrag", "**toomanycommfrag %d %d %d",
-                                 nfree, ntotal, ignore_id);
+            if (ignore_id)
+                minfree = INT_MAX;
+            else
+                minfree = nfree;
+
+            if (group_ptr != NULL) {
+                int coll_tag = tag | MPIR_Process.tagged_coll_mask; /* Shift tag into the tagged coll space */
+                mpi_errno = MPIR_Allreduce_group(MPI_IN_PLACE, &minfree, 1, MPI_INT, MPI_MIN,
+                                                 comm_ptr, group_ptr, coll_tag, &errflag);
+            } else {
+                mpi_errno = MPIR_Allreduce_impl(MPI_IN_PLACE, &minfree, 1, MPI_INT,
+                                                 MPI_MIN, comm_ptr, &errflag);
+            }
+
+            if (minfree > 0) {
+                MPIU_ERR_SETANDJUMP3(mpi_errno, MPI_ERR_OTHER,
+                                     "**toomanycommfrag", "**toomanycommfrag %d %d %d",
+                                     nfree, ntotal, ignore_id);
+            } else {
+                MPIU_ERR_SETANDJUMP3(mpi_errno, MPI_ERR_OTHER,
+                                     "**toomanycomm", "**toomanycomm %d %d %d",
+                                     nfree, ntotal, ignore_id);
+            }
             /* --END ERROR HANDLING-- */
         }
 

-----------------------------------------------------------------------

Summary of changes:
 src/mpi/comm/comm_create.c |    8 ++++----
 src/mpi/comm/commutil.c    |   28 +++++++++++++++++++++++++---
 2 files changed, 29 insertions(+), 7 deletions(-)


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list