[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.2b3-209-g88ba1a7
Service Account
noreply at mpich.org
Sun Jul 12 10:16:43 CDT 2015
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".
The branch, master has been updated
via 88ba1a79d96fcf2b6e0cf40254969ba2878cac56 (commit)
via 2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013 (commit)
via b2a9a499143796a8eafa7d6585979ba661233a32 (commit)
from f2426e5ae73382a2b63d3ddcd7daa7a4fa56bcf5 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/88ba1a79d96fcf2b6e0cf40254969ba2878cac56
commit 88ba1a79d96fcf2b6e0cf40254969ba2878cac56
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date: Fri Jul 10 13:46:53 2015 -0500
Remove xfail for ticket 2269
Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>
diff --git a/test/mpi/comm/testlist b/test/mpi/comm/testlist
index ccd5615..13780cd 100644
--- a/test/mpi/comm/testlist
+++ b/test/mpi/comm/testlist
@@ -29,7 +29,7 @@ comm_idup 4 mpiversion=3.0
comm_idup 9 mpiversion=3.0
comm_idup_mul 2 mpiversion=3.0
comm_idup_overlap 2 mpiversion=3.0
-comm_idup_iallreduce 6 mpiversion=3.0 xfail=ticket2269
+comm_idup_iallreduce 6 mpiversion=3.0
comm_idup_nb 6 mpiversion=3.0 xfail=ticket2283
comm_idup_isend 6 mpiversion=3.0 xfail=ticket2269
comm_idup_comm 6 mpiversion=3.0 xfail=ticket2269
diff --git a/test/mpi/threads/comm/testlist b/test/mpi/threads/comm/testlist
index de5da72..5c1126c 100644
--- a/test/mpi/threads/comm/testlist
+++ b/test/mpi/threads/comm/testlist
@@ -3,7 +3,7 @@ dup_leak_test 2
comm_dup_deadlock 4
comm_create_threads 4
comm_create_group_threads 4
-comm_idup 4 mpiversion=3.0 xfail=ticket2269
+comm_idup 4 mpiversion=3.0
ctxidup 4 mpiversion=3.0
idup_nb 4 mpiversion=3.0 xfail=ticket2283
idup_comm_gen 4 mpiversion=3.0 xfail=ticket2269
http://git.mpich.org/mpich.git/commitdiff/2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013
commit 2b219dfe4ca8f8e04f6aaedc25a2a2cf98dc7013
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date: Fri Jul 10 07:01:34 2015 -0500
Fixing multiple MPI_COMM_IDUPs in the same communicator
The original code does not deal with the following case correctly:
calling multiple MPI_COMM_IDUPs from the same parent communicator and do
a MPI_Waitall on all the nonblocking requests. The test comm_idup_mul.c
will run into livelock because of an ordering problem. More details of
the cause of the bug is discussed in ticket #2269.
What's needed is to distinguish different MPI_COMM_IDUPs in the same
communicator and give the earliest MPI_COMM_IDUP the highest priority to
avoid the ordering problem.
In order to do that, three counters are added to the communicator data
structure: idup_count, idup_curr_seqnum and idup_next_seqnum.
idup_count is used to record how many MPI_COMM_IDUPs are duplicating
from the current communicator at the same time. idup_curr_seqnum and
idup_next_seqnum is used to give the earliest MPI_COMM_IDUP (among all
MPI_COMM_IDUPs in the same parent communicator) the highest priority.
Fix #2269
Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>
diff --git a/src/include/mpiimpl.h b/src/include/mpiimpl.h
index d3b491d..417026d 100644
--- a/src/include/mpiimpl.h
+++ b/src/include/mpiimpl.h
@@ -1250,6 +1250,11 @@ typedef struct MPID_Comm {
int revoked; /* Flag to track whether the communicator
* has been revoked */
+ int idup_count; /* how many MPI_COMM_IDUPs duplicating from
+ the current communicator at the same time */
+ int idup_curr_seqnum; /* give each child communicator a sequence number */
+ int idup_next_seqnum; /* the smallest sequence number wins */
+
MPID_Info *info; /* Hints to the communicator */
#ifdef MPID_HAS_HETERO
diff --git a/src/mpi/comm/commutil.c b/src/mpi/comm/commutil.c
index 4628a41..f4b2930 100644
--- a/src/mpi/comm/commutil.c
+++ b/src/mpi/comm/commutil.c
@@ -138,6 +138,10 @@ int MPIR_Comm_init(MPID_Comm *comm_p)
/* Initialize the revoked flag as false */
comm_p->revoked = 0;
+ comm_p->idup_count = 0;
+ comm_p->idup_curr_seqnum = 0;
+ comm_p->idup_next_seqnum = 0;
+
/* Fields not set include context_id, remote and local size, and
kind, since different communicator construction routines need
different values */
@@ -1235,6 +1239,7 @@ struct gcn_state {
int own_mask;
int own_eager_mask;
int first_iter;
+ int seqnum;
MPID_Comm *comm_ptr;
MPID_Comm *comm_ptr_inter;
MPID_Sched_t s;
@@ -1333,6 +1338,10 @@ static int sched_cb_gcn_allocate_cid(MPID_Comm *comm, int tag, void *state)
MPID_SCHED_BARRIER(st->s);
} else {
/* Successfully allocated a context id */
+
+ st->comm_ptr->idup_next_seqnum++;
+ st->comm_ptr->idup_count--;
+
mpi_errno = MPID_Sched_cb(&sched_cb_gcn_bcast, st, st->s);
if (mpi_errno) MPIU_ERR_POP(mpi_errno);
MPID_SCHED_BARRIER(st->s);
@@ -1367,11 +1376,27 @@ static int sched_cb_gcn_copy_mask(MPID_Comm *comm, int tag, void *state)
st->own_eager_mask = 1;
}
st->first_iter = 0;
+
+ /* idup_count > 1 means there are multiple communicators duplicating
+ * from the current communicator at the same time. And
+ * idup_curr_seqnum gives each duplication operation a priority */
+ st->comm_ptr->idup_count++;
+ st->seqnum = st->comm_ptr->idup_curr_seqnum++;
} else {
if (st->comm_ptr->context_id < lowestContextId) {
lowestContextId = st->comm_ptr->context_id;
}
- if (mask_in_use || (st->comm_ptr->context_id != lowestContextId)) {
+
+ /* If one of the following conditions happens, set local_mask to zero
+ * so sched_cb_gcn_allocate_cid can not find a valid id and will retry:
+ * 1. mask is used by other threads;
+ * 2. the current MPI_COMM_IDUP operation does not has the lowestContextId;
+ * 3. for the case that multiple communicators duplicating from the
+ * same communicator at the same time, the sequence number of the
+ * current MPI_COMM_IDUP operation is not the smallest. */
+ if (mask_in_use || (st->comm_ptr->context_id != lowestContextId)
+ || (st->comm_ptr->idup_count > 1
+ && st->seqnum != st->comm_ptr->idup_next_seqnum)) {
memset(st->local_mask, 0, MPIR_MAX_CONTEXT_MASK * sizeof(int));
st->own_mask = 0;
} else {
@@ -1385,6 +1410,7 @@ static int sched_cb_gcn_copy_mask(MPID_Comm *comm, int tag, void *state)
mask_in_use = 1;
st->own_mask = 1;
}
+
}
mpi_errno = st->comm_ptr->coll_fns->Iallreduce_sched(MPI_IN_PLACE, st->local_mask, MPIR_MAX_CONTEXT_MASK,
http://git.mpich.org/mpich.git/commitdiff/b2a9a499143796a8eafa7d6585979ba661233a32
commit b2a9a499143796a8eafa7d6585979ba661233a32
Author: Huiwei Lu <huiweilu at mcs.anl.gov>
Date: Tue Jun 9 15:45:39 2015 -0500
Change MPI_Comm_idup test to be more strict
Change test/mpi/comm/comm_idup_mul.c to be more strict to reveal the
problem of multiple MPI_Comm_idup calls in the same communicator
followed by a MPI_Waitall.
See #2269
Signed-off-by: Junchao Zhang <jczhang at mcs.anl.gov>
diff --git a/test/mpi/comm/comm_idup_mul.c b/test/mpi/comm/comm_idup_mul.c
index 2830f02..a5f8242 100644
--- a/test/mpi/comm/comm_idup_mul.c
+++ b/test/mpi/comm/comm_idup_mul.c
@@ -11,7 +11,7 @@
#include <stdio.h>
#include <mpi.h>
-#define NUM_ITER 2
+#define NUM_ITER 10
int main(int argc, char **argv)
{
-----------------------------------------------------------------------
Summary of changes:
src/include/mpiimpl.h | 5 +++++
src/mpi/comm/commutil.c | 28 +++++++++++++++++++++++++++-
test/mpi/comm/comm_idup_mul.c | 2 +-
test/mpi/comm/testlist | 2 +-
test/mpi/threads/comm/testlist | 2 +-
5 files changed, 35 insertions(+), 4 deletions(-)
hooks/post-receive
--
MPICH primary repository
More information about the commits
mailing list