[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.2a2-16-g8a0887b

Service Account noreply at mpich.org
Mon Nov 24 14:08:29 CST 2014


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  8a0887b9726c8c466b3fed6cc0c6ff42468a2aee (commit)
      from  e645371f794f4325db21e93f5e5517e02f4146c1 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/8a0887b9726c8c466b3fed6cc0c6ff42468a2aee

commit 8a0887b9726c8c466b3fed6cc0c6ff42468a2aee
Author: Xin Zhao <xinzhao3 at illinois.edu>
Date:   Tue Nov 18 21:43:21 2014 -0600

    Bug-fix: preventing completing the same RMA request twice.
    
    It is possible that a request handler of RMA request is
    called for the second time inside the first called request
    handler on the same request.
    
    Consider the following case: a req is queued up in Nemesis
    SHM queue with ref count of 2: one is for request completion
    and another is for dequeueing from SHM queue. The first
    called req handler completed this request and decrement ref
    count to 1. This request is still in the queue. However,
    within this handler, we trigger the same req handler on the
    same request again (for example making progress on SHM queue),
    and the second called handler also tries to complete this
    request, which leads to the wrong execution.
    
    In this patch we check if request has already been completed
    when entering the req handler, to prevent processing the same
    request twice. We also move the function finish_op_on_target()
    (where the same req handler can be triggered again)
    after request completion routine, so that we can mark the
    current request as completed before enter the same req handler
    for the second time.
    
    Fix #2204
    
    Signed-off-by: Pavan Balaji <balaji at anl.gov>

diff --git a/src/mpid/ch3/src/ch3u_handle_recv_req.c b/src/mpid/ch3/src/ch3u_handle_recv_req.c
index fffce27..dbf2b36 100644
--- a/src/mpid/ch3/src/ch3u_handle_recv_req.c
+++ b/src/mpid/ch3/src/ch3u_handle_recv_req.c
@@ -80,19 +80,45 @@ int MPIDI_CH3_ReqHandler_PutRecvComplete( MPIDI_VC_t *vc,
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = rreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = rreq->dev.flags;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_PUTRECVCOMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_PUTRECVCOMPLETE);
 
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(rreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
+
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
+    /* mark data transfer as complete and decrement CC */
+    MPIDI_CH3U_Request_complete(rreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
     mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_PUT,
-                                    rreq->dev.flags, rreq->dev.source_win_handle);
+                                    flags, source_win_handle);
     if (mpi_errno != MPI_SUCCESS) MPIU_ERR_POP(mpi_errno);
 
-    /* mark data transfer as complete and decrement CC */
-    MPIDI_CH3U_Request_complete(rreq);
     *complete = TRUE;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_REQHANDLER_PUTRECVCOMPLETE);
     return MPI_SUCCESS;
@@ -115,10 +141,30 @@ int MPIDI_CH3_ReqHandler_AccumRecvComplete( MPIDI_VC_t *vc,
     int mpi_errno = MPI_SUCCESS;
     MPI_Aint true_lb, true_extent;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = rreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = rreq->dev.flags;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_ACCUMRECVCOMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_ACCUMRECVCOMPLETE);
 
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(rreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
+
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
     MPIU_Assert(MPIDI_Request_get_type(rreq) == MPIDI_REQUEST_TYPE_ACCUM_RESP);
@@ -138,13 +184,19 @@ int MPIDI_CH3_ReqHandler_AccumRecvComplete( MPIDI_VC_t *vc,
     MPIR_Type_get_true_extent_impl(rreq->dev.datatype, &true_lb, &true_extent);
     MPIU_Free((char *) rreq->dev.final_user_buf + true_lb);
 
+    /* mark data transfer as complete and decrement CC */
+    MPIDI_CH3U_Request_complete(rreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
     mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_ACCUMULATE,
-                                    rreq->dev.flags, rreq->dev.source_win_handle);
+                                    flags, source_win_handle);
     if (mpi_errno != MPI_SUCCESS) MPIU_ERR_POP(mpi_errno);
 
-    /* mark data transfer as complete and decrement CC */
-    MPIDI_CH3U_Request_complete(rreq);
     *complete = TRUE;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_REQHANDLER_ACCUMRECVCOMPLETE);
     return MPI_SUCCESS;
diff --git a/src/mpid/ch3/src/ch3u_handle_send_req.c b/src/mpid/ch3/src/ch3u_handle_send_req.c
index ab7d728..a1e586b 100644
--- a/src/mpid/ch3/src/ch3u_handle_send_req.c
+++ b/src/mpid/ch3/src/ch3u_handle_send_req.c
@@ -49,13 +49,29 @@ int MPIDI_CH3_ReqHandler_GetSendComplete( MPIDI_VC_t *vc ATTRIBUTE((unused)),
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = sreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = sreq->dev.flags;
+
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(sreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
 
     MPID_Win_get_ptr(sreq->dev.target_win_handle, win_ptr);
 
-    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_GET,
-                                    sreq->dev.flags, sreq->dev.source_win_handle);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
     /* here we decrement the Active Target counter to guarantee the GET-like
        operation are completed when counter reaches zero. */
     win_ptr->at_completion_counter--;
@@ -63,6 +79,15 @@ int MPIDI_CH3_ReqHandler_GetSendComplete( MPIDI_VC_t *vc ATTRIBUTE((unused)),
 
     /* mark data transfer as complete and decrement CC */
     MPIDI_CH3U_Request_complete(sreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
+    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_GET,
+                                    flags, source_win_handle);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
     *complete = TRUE;
 
  fn_exit:
@@ -81,9 +106,30 @@ int MPIDI_CH3_ReqHandler_GaccumSendComplete( MPIDI_VC_t *vc,
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = rreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = rreq->dev.flags;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_GACCUMSENDCOMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_GACCUMSENDCOMPLETE);
+
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(rreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
+
     /* This function is triggered when sending back process of GACC/FOP/CAS
        is finished. Only GACC used user_buf. FOP and CAS can fit all data
        in response packet. */
@@ -92,17 +138,23 @@ int MPIDI_CH3_ReqHandler_GaccumSendComplete( MPIDI_VC_t *vc,
 
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
-    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_GET_ACCUM,
-                                    rreq->dev.flags, rreq->dev.source_win_handle);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
     /* here we decrement the Active Target counter to guarantee the GET-like
        operation are completed when counter reaches zero. */
     win_ptr->at_completion_counter--;
     MPIU_Assert(win_ptr->at_completion_counter >= 0);
 
     MPIDI_CH3U_Request_complete(rreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
+    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_GET_ACCUM,
+                                    flags, source_win_handle);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
     *complete = TRUE;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_REQHANDLER_GACCUMSENDCOMPLETE);
     return mpi_errno;
@@ -122,9 +174,30 @@ int MPIDI_CH3_ReqHandler_CASSendComplete( MPIDI_VC_t *vc,
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = rreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = rreq->dev.flags;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_CASSENDCOMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_CASSENDCOMPLETE);
+
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(rreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
+
     /* This function is triggered when sending back process of GACC/FOP/CAS
        is finished. Only GACC used user_buf. FOP and CAS can fit all data
        in response packet. */
@@ -133,17 +206,23 @@ int MPIDI_CH3_ReqHandler_CASSendComplete( MPIDI_VC_t *vc,
 
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
-    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_CAS,
-                                    rreq->dev.flags, rreq->dev.source_win_handle);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
     /* here we decrement the Active Target counter to guarantee the GET-like
        operation are completed when counter reaches zero. */
     win_ptr->at_completion_counter--;
     MPIU_Assert(win_ptr->at_completion_counter >= 0);
 
     MPIDI_CH3U_Request_complete(rreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
+    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_CAS,
+                                    flags, source_win_handle);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
     *complete = TRUE;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_REQHANDLER_CASSENDCOMPLETE);
     return mpi_errno;
@@ -161,9 +240,30 @@ int MPIDI_CH3_ReqHandler_FOPSendComplete( MPIDI_VC_t *vc,
 {
     int mpi_errno = MPI_SUCCESS;
     MPID_Win *win_ptr;
+    MPI_Win source_win_handle = rreq->dev.source_win_handle;
+    MPIDI_CH3_Pkt_flags_t flags = rreq->dev.flags;
     MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3_REQHANDLER_FOPSENDCOMPLETE);
 
     MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_REQHANDLER_FOPSENDCOMPLETE);
+
+    /* NOTE: It is possible that this request is already completed before
+       entering this handler. This happens when this req handler is called
+       within the same req handler on the same request.
+       Consider this case: req is queued up in SHM queue with ref count of 2:
+       one is for completing the request and another is for dequeueing from
+       the queue. The first called req handler on this request completed
+       this request and decrement ref counter to 1. Request is still in the
+       queue. Within this handler, we call the req handler on the same request
+       for the second time (for example when making progress on SHM queue),
+       and the second called handler also tries to complete this request,
+       which leads to wrong execution.
+       Here we check if req is already completed to prevent processing the
+       same request twice. */
+    if (MPID_Request_is_complete(rreq)) {
+        *complete = FALSE;
+        goto fn_exit;
+    }
+
     /* This function is triggered when sending back process of GACC/FOP/CAS
        is finished. Only GACC used user_buf. FOP and CAS can fit all data
        in response packet. */
@@ -172,17 +272,23 @@ int MPIDI_CH3_ReqHandler_FOPSendComplete( MPIDI_VC_t *vc,
 
     MPID_Win_get_ptr(rreq->dev.target_win_handle, win_ptr);
 
-    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_FOP,
-                                    rreq->dev.flags, rreq->dev.source_win_handle);
-    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
-
     /* here we decrement the Active Target counter to guarantee the GET-like
        operation are completed when counter reaches zero. */
     win_ptr->at_completion_counter--;
     MPIU_Assert(win_ptr->at_completion_counter >= 0);
 
     MPIDI_CH3U_Request_complete(rreq);
+
+    /* NOTE: finish_op_on_target() must be called after we complete this request,
+       because inside finish_op_on_target() we may call this request handler
+       on the same request again (in release_lock()). Marking this request as
+       completed will prevent us from processing the same request twice. */
+    mpi_errno = finish_op_on_target(win_ptr, vc, MPIDI_CH3_PKT_FOP,
+                                    flags, source_win_handle);
+    if (mpi_errno) MPIU_ERR_POP(mpi_errno);
+
     *complete = TRUE;
+
  fn_exit:
     MPIDI_FUNC_EXIT(MPID_STATE_MPIDI_CH3_REQHANDLER_FOPSENDCOMPLETE);
     return mpi_errno;

-----------------------------------------------------------------------

Summary of changes:
 src/mpid/ch3/src/ch3u_handle_recv_req.c |   64 +++++++++++++--
 src/mpid/ch3/src/ch3u_handle_send_req.c |  138 +++++++++++++++++++++++++++----
 2 files changed, 180 insertions(+), 22 deletions(-)


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list