[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.1-56-g4b35902

Service Account noreply at mpich.org
Wed Mar 19 09:01:00 CDT 2014


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  4b35902a9704bb6ebdc1767d73d36443ca37fe73 (commit)
      from  1f532907778162971e6bf51b754b68c95861cd66 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/4b35902a9704bb6ebdc1767d73d36443ca37fe73

commit 4b35902a9704bb6ebdc1767d73d36443ca37fe73
Author: Su Huang <suhuang at us.ibm.com>
Date:   Wed Mar 19 04:27:47 2014 -0400

    MPICH test case linked_list_lockall hang in MPI_Win_flush
    
    The scenario of the hang is described as follows:
    
      Assuming the job runs with 4 tasks, task 0 is in a loop  of processing the
      following RMA operations to fetch the displacement, the loop ends if the
      displacement is being updated.
    
        MPI_Win_get_accumulate( target rank is task 0)
        MPI_Win_flush(task 0)
    
      task 1 and 3 hang in MPI_Win_flush() waiting for a call to
      MPI_Win_compare_and_swap() to complete. The target rank for this operation is
      task 0.
    
      task 2 hangs in MPI_Win_flush() waiting for a call to MPI_Accumulate() to
      complete. The target rank for this operation is task 0 as well.
    
      Task 0 is busy making MPI_Win_get_accumulate() and MPI_Win_flush() calls to
      see if the displacement is being updated, the target rank of the operation is
      task 0 itself which means the operation is local and can be completed without
      a need of making a PAMI dispatcher call.  Meanwhile, the other three tasks
      issue RMA operations to the target task 0 and wait for the completion of the
      operations. Because task 0 is in a loop of making local operations, no PAMI
      dispatcher is called, no progress made for any remote operations which is the
      root cause of the hang.
    
    The fix for the problem is to add a call to PAMI dispatcher in MPI_Win_flush(),
    the call is made prior to the check of the condition. Current code checks the
    condition first, if the condition is satisfied, then no PAMI dispatcher is called.
    
    The following statement in MPI_Win_flush()
    
      MPID_PROGRESS_WAIT_WHILE(sync->total != sync->complete)
    
    will be replaced by
    
      MPID_PROGRESS_WAIT_DO_WHILE(sync->total != sync->complete)
    
    (ibm) D196445
    
    Signed-off-by: Michael Blocksome <blocksom at us.ibm.com>

diff --git a/src/mpid/pamid/src/onesided/mpid_win_flush.c b/src/mpid/pamid/src/onesided/mpid_win_flush.c
index c2e7e29..310d059 100644
--- a/src/mpid/pamid/src/onesided/mpid_win_flush.c
+++ b/src/mpid/pamid/src/onesided/mpid_win_flush.c
@@ -53,7 +53,7 @@ MPID_Win_flush(int       rank,
                         return mpi_errno, "**rmasync");
      }
   sync = &win->mpid.sync;
-  MPID_PROGRESS_WAIT_WHILE(sync->total != sync->complete);
+  MPID_PROGRESS_WAIT_DO_WHILE(sync->total != sync->complete);
   sync->total    = 0;
   sync->started  = 0;
   sync->complete = 0;

-----------------------------------------------------------------------

Summary of changes:
 src/mpid/pamid/src/onesided/mpid_win_flush.c |    2 +-
 1 files changed, 1 insertions(+), 1 deletions(-)


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list