[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.2b3-209-g88ba1a7

Service Account noreply at mpich.org
Sun Jul 12 10:16:43 CDT 2015


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  88ba1a79d96fcf2b6e0cf40254969ba2878cac56 (commit)
       via  2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013 (commit)
       via  b2a9a499143796a8eafa7d6585979ba661233a32 (commit)
      from  f2426e5ae73382a2b63d3ddcd7daa7a4fa56bcf5 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/88ba1a79d96fcf2b6e0cf40254969ba2878cac56

commit 88ba1a79d96fcf2b6e0cf40254969ba2878cac56
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Fri Jul 10 13:46:53 2015 -0500

    Remove xfail for ticket 2269
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/comm/testlist b/test/mpi/comm/testlist
index ccd5615..13780cd 100644
--- a/test/mpi/comm/testlist
+++ b/test/mpi/comm/testlist
@@ -29,7 +29,7 @@ comm_idup 4 mpiversion=3.0
 comm_idup 9 mpiversion=3.0
 comm_idup_mul 2 mpiversion=3.0
 comm_idup_overlap 2 mpiversion=3.0
-comm_idup_iallreduce 6 mpiversion=3.0 xfail=ticket2269
+comm_idup_iallreduce 6 mpiversion=3.0
 comm_idup_nb 6 mpiversion=3.0 xfail=ticket2283
 comm_idup_isend 6 mpiversion=3.0 xfail=ticket2269
 comm_idup_comm 6 mpiversion=3.0 xfail=ticket2269
diff --git a/test/mpi/threads/comm/testlist b/test/mpi/threads/comm/testlist
index de5da72..5c1126c 100644
--- a/test/mpi/threads/comm/testlist
+++ b/test/mpi/threads/comm/testlist
@@ -3,7 +3,7 @@ dup_leak_test 2
 comm_dup_deadlock 4
 comm_create_threads 4
 comm_create_group_threads 4
-comm_idup 4 mpiversion=3.0 xfail=ticket2269
+comm_idup 4 mpiversion=3.0
 ctxidup 4 mpiversion=3.0
 idup_nb 4 mpiversion=3.0 xfail=ticket2283
 idup_comm_gen 4 mpiversion=3.0 xfail=ticket2269

http://git.mpich.org/mpich.git/commitdiff/2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013

commit 2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Fri Jul 10 07:01:34 2015 -0500

    Fixing multiple MPI_COMM_IDUPs in the same communicator
    
    The original code does not deal with the following case correctly:
    calling multiple MPI_COMM_IDUPs from the same parent communicator and do
    a MPI_Waitall on all the nonblocking requests. The test comm_idup_mul.c
    will run into livelock because of an ordering problem. More details of
    the cause of the bug is discussed in ticket #2269.
    
    What's needed is to distinguish different MPI_COMM_IDUPs in the same
    communicator and give the earliest MPI_COMM_IDUP the highest priority to
    avoid the ordering problem.
    
    In order to do that, three counters are added to the communicator data
    structure: idup_count, idup_curr_seqnum and idup_next_seqnum.
    idup_count is used to record how many MPI_COMM_IDUPs are duplicating
    from the current communicator at the same time. idup_curr_seqnum and
    idup_next_seqnum is used to give the earliest MPI_COMM_IDUP (among all
    MPI_COMM_IDUPs in the same parent communicator) the highest priority.
    
    Fix #2269
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index d3b491d..417026d 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -1250,6 +1250,11 @@ typedef struct MPID_Comm {
     int revoked;                    /* Flag to track whether the communicator
                                      * has been revoked */
 
+    int idup_count;              /* how many MPI_COMM_IDUPs duplicating from
+                                    the current communicator at the same time */
+    int idup_curr_seqnum;        /* give each child communicator a sequence number */
+    int idup_next_seqnum;        /* the smallest sequence number wins  */
+
     MPID_Info *info;                /* Hints to the communicator */
 
 #ifdef MPID_HAS_HETERO
diff --git a/src/mpi/comm/commutil.c b/src/mpi/comm/commutil.c
index 4628a41..f4b2930 100644
--- a/src/mpi/comm/commutil.c
+++ b/src/mpi/comm/commutil.c
@@ -138,6 +138,10 @@ int MPIR_Comm_init(MPID_Comm *comm_p)
     /* Initialize the revoked flag as false */
     comm_p->revoked = 0;
 
+    comm_p->idup_count = 0;
+    comm_p->idup_curr_seqnum = 0;
+    comm_p->idup_next_seqnum = 0;
+
     /* Fields not set include context_id, remote and local size, and
        kind, since different communicator construction routines need
        different values */
@@ -1235,6 +1239,7 @@ struct gcn_state {
     int own_mask;
     int own_eager_mask;
     int first_iter;
+    int seqnum;
     MPID_Comm *comm_ptr;
     MPID_Comm *comm_ptr_inter;
     MPID_Sched_t s;
@@ -1333,6 +1338,10 @@ static int sched_cb_gcn_allocate_cid(MPID_Comm *comm, int tag, void *state)
         MPID_SCHED_BARRIER(st->s);
     } else {
         /* Successfully allocated a context id */
+
+        st->comm_ptr->idup_next_seqnum++;
+        st->comm_ptr->idup_count--;
+
         mpi_errno = MPID_Sched_cb(&sched_cb_gcn_bcast, st, st->s);
         if (mpi_errno) MPIU_ERR_POP(mpi_errno);
         MPID_SCHED_BARRIER(st->s);
@@ -1367,11 +1376,27 @@ static int sched_cb_gcn_copy_mask(MPID_Comm *comm, int tag, void *state)
             st->own_eager_mask = 1;
         }
         st->first_iter = 0;
+
+        /* idup_count > 1 means there are multiple communicators duplicating
+         * from the current communicator at the same time. And
+         * idup_curr_seqnum gives each duplication operation a priority */
+        st->comm_ptr->idup_count++;
+        st->seqnum = st->comm_ptr->idup_curr_seqnum++;
     } else {
         if (st->comm_ptr->context_id < lowestContextId) {
             lowestContextId = st->comm_ptr->context_id;
         }
-        if (mask_in_use || (st->comm_ptr->context_id != lowestContextId)) {
+
+        /* If one of the following conditions happens, set local_mask to zero
+         * so sched_cb_gcn_allocate_cid can not find a valid id and will retry:
+         * 1. mask is used by other threads;
+         * 2. the current MPI_COMM_IDUP operation does not has the lowestContextId;
+         * 3. for the case that multiple communicators duplicating from the
+         *    same communicator at the same time, the sequence number of the
+         *    current MPI_COMM_IDUP operation is not the smallest. */
+        if (mask_in_use || (st->comm_ptr->context_id != lowestContextId)
+                || (st->comm_ptr->idup_count > 1
+                    && st->seqnum != st->comm_ptr->idup_next_seqnum)) {
             memset(st->local_mask, 0, MPIR_MAX_CONTEXT_MASK * sizeof(int));
             st->own_mask = 0;
         } else {
@@ -1385,6 +1410,7 @@ static int sched_cb_gcn_copy_mask(MPID_Comm *comm, int tag, void *state)
             mask_in_use = 1;
             st->own_mask = 1;
         }
+
     }
 
     mpi_errno = st->comm_ptr->coll_fns->Iallreduce_sched(MPI_IN_PLACE, st->local_mask, MPIR_MAX_CONTEXT_MASK,

http://git.mpich.org/mpich.git/commitdiff/b2a9a499143796a8eafa7d6585979ba661233a32

commit b2a9a499143796a8eafa7d6585979ba661233a32
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date:   Tue Jun 9 15:45:39 2015 -0500

    Change MPI_Comm_idup test to be more strict
    
    Change test/mpi/comm/comm_idup_mul.c to be more strict to reveal the
    problem of multiple MPI_Comm_idup calls in the same communicator
    followed by a MPI_Waitall.
    
    See #2269
    
    Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>

diff --git a/test/mpi/comm/comm_idup_mul.c b/test/mpi/comm/comm_idup_mul.c
index 2830f02..a5f8242 100644
--- a/test/mpi/comm/comm_idup_mul.c
+++ b/test/mpi/comm/comm_idup_mul.c
@@ -11,7 +11,7 @@
 #include <stdio.h>
 #include <mpi.h>
 
-#define NUM_ITER    2
+#define NUM_ITER    10
 
 int main(int argc, char **argv)
 {

-----------------------------------------------------------------------

Summary of changes:
 src/include/mpiimpl.h          |    5 +++++
 src/mpi/comm/commutil.c        |   28 +++++++++++++++++++++++++++-
 test/mpi/comm/comm_idup_mul.c  |    2 +-
 test/mpi/comm/testlist         |    2 +-
 test/mpi/threads/comm/testlist |    2 +-
 5 files changed, 35 insertions(+), 4 deletions(-)


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list