[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.1-56-g4b35902
Service Account
noreply at mpich.org
Wed Mar 19 09:01:00 CDT 2014
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".
The branch, master has been updated
via 4b35902a9704bb6ebdc1767d73d36443ca37fe73 (commit)
from 1f532907778162971e6bf51b754b68c95861cd66 (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/4b35902a9704bb6ebdc1767d73d36443ca37fe73
commit 4b35902a9704bb6ebdc1767d73d36443ca37fe73
Author: Su Huang <suhuang at us.ibm.com>
Date: Wed Mar 19 04:27:47 2014 -0400
MPICH test case linked_list_lockall hang in MPI_Win_flush
The scenario of the hang is described as follows:
Assuming the job runs with 4 tasks, task 0 is in a loop of processing the
following RMA operations to fetch the displacement, the loop ends if the
displacement is being updated.
MPI_Win_get_accumulate( target rank is task 0)
MPI_Win_flush(task 0)
task 1 and 3 hang in MPI_Win_flush() waiting for a call to
MPI_Win_compare_and_swap() to complete. The target rank for this operation is
task 0.
task 2 hangs in MPI_Win_flush() waiting for a call to MPI_Accumulate() to
complete. The target rank for this operation is task 0 as well.
Task 0 is busy making MPI_Win_get_accumulate() and MPI_Win_flush() calls to
see if the displacement is being updated, the target rank of the operation is
task 0 itself which means the operation is local and can be completed without
a need of making a PAMI dispatcher call. Meanwhile, the other three tasks
issue RMA operations to the target task 0 and wait for the completion of the
operations. Because task 0 is in a loop of making local operations, no PAMI
dispatcher is called, no progress made for any remote operations which is the
root cause of the hang.
The fix for the problem is to add a call to PAMI dispatcher in MPI_Win_flush(),
the call is made prior to the check of the condition. Current code checks the
condition first, if the condition is satisfied, then no PAMI dispatcher is called.
The following statement in MPI_Win_flush()
MPID_PROGRESS_WAIT_WHILE(sync->total != sync->complete)
will be replaced by
MPID_PROGRESS_WAIT_DO_WHILE(sync->total != sync->complete)
(ibm) D196445
Signed-off-by: Michael Blocksome <blocksom at us.ibm.com>
diff --git a/src/mpid/pamid/src/onesided/mpid_win_flush.c b/src/mpid/pamid/src/onesided/mpid_win_flush.c
index c2e7e29..310d059 100644
--- a/src/mpid/pamid/src/onesided/mpid_win_flush.c
+++ b/src/mpid/pamid/src/onesided/mpid_win_flush.c
@@ -53,7 +53,7 @@ MPID_Win_flush(int rank,
return mpi_errno, "**rmasync");
}
sync = &win->mpid.sync;
- MPID_PROGRESS_WAIT_WHILE(sync->total != sync->complete);
+ MPID_PROGRESS_WAIT_DO_WHILE(sync->total != sync->complete);
sync->total = 0;
sync->started = 0;
sync->complete = 0;
-----------------------------------------------------------------------
Summary of changes:
src/mpid/pamid/src/onesided/mpid_win_flush.c | 2 +-
1 files changed, 1 insertions(+), 1 deletions(-)
hooks/post-receive
--
MPICH primary repository
More information about the commits
mailing list