[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.2b3-102-g9c4b9b1
Service Account
noreply at mpich.org
Tue Jun 16 14:40:22 CDT 2015
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".
The branch, master has been updated
via 9c4b9b172428fa991ff6e7904d96b57e28fb24c7 (commit)
via ac07f982719436df23b25484c239ad2cc23b2a9e (commit)
from 394d46b764838dc9193efc012ea297cab3a33aac (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/9c4b9b172428fa991ff6e7904d96b57e28fb24c7
commit 9c4b9b172428fa991ff6e7904d96b57e28fb24c7
Author: Lena Oden <loden at anl.gov>
Date: Fri Jun 12 16:02:50 2015 -0500
Close remainig conns before sockset is destroyed
Loser of Head-to-Head connections are not necessarily
closed, if the sock set is destroyed. This patch
looks for all open connections, close the socket
and free the memory recourses. Fixes #2180
Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>
diff --git a/src/mpid/ch3/channels/sock/src/ch3_progress.c b/src/mpid/ch3/channels/sock/src/ch3_progress.c
index f09551d..0fcc20d 100644
--- a/src/mpid/ch3/channels/sock/src/ch3_progress.c
+++ b/src/mpid/ch3/channels/sock/src/ch3_progress.c
@@ -375,6 +375,8 @@ int MPIDI_CH3I_Progress_init(void)
int MPIDI_CH3I_Progress_finalize(void)
{
int mpi_errno;
+ MPIDI_CH3I_Connection_t *conn = NULL;
+
MPIDI_STATE_DECL(MPID_STATE_MPIDI_CH3I_PROGRESS_FINALIZE);
MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3I_PROGRESS_FINALIZE);
@@ -383,8 +385,16 @@ int MPIDI_CH3I_Progress_finalize(void)
mpi_errno = MPIDU_CH3I_ShutdownListener();
if (mpi_errno != MPI_SUCCESS) { MPIU_ERR_POP(mpi_errno); }
- /* FIXME: Cleanly shutdown other socks and free connection structures.
- (close protocol?) */
+
+ /* Close open connections */
+ MPIDU_Sock_close_open_sockets(MPIDI_CH3I_sock_set,(void**) &conn);
+ while (conn != NULL) {
+ conn->state = CONN_STATE_CLOSING;
+ mpi_errno = MPIDI_CH3_Sockconn_handle_close_event(conn);
+ if (mpi_errno) { MPIU_ERR_POP(mpi_errno); }
+ MPIDU_Sock_close_open_sockets(MPIDI_CH3I_sock_set,(void**) &conn);
+ }
+
/*
diff --git a/src/mpid/common/sock/mpidu_sock.h b/src/mpid/common/sock/mpidu_sock.h
index a9a9e95..9f752a9 100644
--- a/src/mpid/common/sock/mpidu_sock.h
+++ b/src/mpid/common/sock/mpidu_sock.h
@@ -229,6 +229,33 @@ int MPIDU_Sock_create_set(MPIDU_Sock_set_t * set);
/*@
+MPIDU_Sock_close_open_sockets - close the first open sockets of a sock_element
+
+Input Parameter:
+. set - set to be considered
+
+Output Parameter:
+. user_ptr - pointer to the user pointer pointer of a socket.
+
+Return value: a MPI error code with a Sock extended error class
++ MPI_SUCCESS - sock set successfully destroyed
+. MPIDU_SOCK_ERR_INIT - Sock module not initialized
+. MPIDU_SOCK_ERR_BAD_SET - invalid sock set
+. MPIDU_SOCK_ERR_NOMEM - unable to allocate required memory
+- MPIDU_SOCK_ERR_FAIL - unable to destroy the sock set (<BRT> because it still contained active sock objects?)
+
+
+Notes:
+This function only closes the first open socket of a sock_set and returns the
+user pointer of the sock-info structure. To close all sockets, the function must
+be called repeatedly, untiluser_ptr == NULL. The reason for this is
+that the overlying protocoll may need the user_ptr for further cleanup.
+
+@*/
+int MPIDU_Sock_close_open_sockets(struct MPIDU_Sock_set * sock_set, void** user_ptr );
+
+
+/*@
MPIDU_Sock_destroy_set - destroy an existing sock set, releasing an internal resource associated with that set
Input Parameter:
diff --git a/src/mpid/common/sock/poll/sock_set.i b/src/mpid/common/sock/poll/sock_set.i
index 6600aea..8948e1d 100644
--- a/src/mpid/common/sock/poll/sock_set.i
+++ b/src/mpid/common/sock/poll/sock_set.i
@@ -170,6 +170,38 @@ int MPIDU_Sock_create_set(struct MPIDU_Sock_set ** sock_setp)
/* --END ERROR HANDLING-- */
}
+#undef FUNCNAME
+#define FUNCNAME MPIDU_Sock_close_open_sockets
+#undef FCNAME
+#define FCNAME MPIU_QUOTE(FUNCNAME)
+int MPIDU_Sock_close_open_sockets(struct MPIDU_Sock_set * sock_set, void** user_ptr ){
+
+ int i;
+ int mpi_errno = MPI_SUCCESS;
+ struct pollinfo * pollinfos = NULL;
+ pollinfos = sock_set->pollinfos;
+ MPIDI_STATE_DECL(MPID_STATE_MPIDU_SOCK_CLOSE_OPEN_SOCKETS);
+
+ MPIDI_FUNC_ENTER(MPID_STATE_MPIDU_SOCK_CLOSE_OPEN_SOCKETS);
+
+ MPIDU_SOCKI_VERIFY_INIT(mpi_errno, fn_exit);
+ /* wakeup waiting socket if mullti-threades */
+ *user_ptr = NULL;
+ for (i = 0; i < sock_set->poll_array_elems; i++) {
+ if(pollinfos[i].sock != NULL && pollinfos[i].type != MPIDU_SOCKI_TYPE_INTERRUPTER){
+ close(pollinfos[i].fd);
+ MPIDU_Socki_sock_free(pollinfos[i].sock);
+ *user_ptr = pollinfos[i].user_ptr;
+ break;
+ }
+ }
+#ifdef USE_SOCK_VERIFY
+ fn_exit:
+#endif
+ MPIDI_FUNC_EXIT(MPID_STATE_MPIDU_SOCK_CLOSE_OPEN_SOCKETS);
+ return mpi_errno;
+}
+
#undef FUNCNAME
#define FUNCNAME MPIDU_Sock_destroy_set
http://git.mpich.org/mpich.git/commitdiff/ac07f982719436df23b25484c239ad2cc23b2a9e
commit ac07f982719436df23b25484c239ad2cc23b2a9e
Author: Lena Oden <loden at anl.gov>
Date: Thu Jun 4 17:55:39 2015 -0500
Handling of discard connection to avoid reconnect
The loser of a head-to-head connection sometimes tries
to reconnect later, afer MPI_Finalize was called This
can lead to several errors in the socket layer, depending
on the state of the disarded connection and the appereance
of the connection events. Refs #2180
This Patch has two ways to handle this:
1.)
Discarded connections are marked with CONN_STATE_DISCARD,
so they are hold from connection. Furthermore, an error on
any discarded connection (because the remote side closed in
MPI_Finalize) is ignored and the connection is closed.
2.)
Add a finalize flag for process groups. If a process group is
closing and tries to close all VCs, a flag is set to mark this.
If the flag is set, a reconnection (in the socket state) is
refused and the connection is closed on both sided.
Both steps are necessary to catch all reconnection tries after
MPI_Finalize was called.
Signed-off-by: Ken Raffenetti <raffenet at mcs.anl.gov>
diff --git a/src/mpid/ch3/channels/sock/src/ch3_progress.c b/src/mpid/ch3/channels/sock/src/ch3_progress.c
index 8715193..f09551d 100644
--- a/src/mpid/ch3/channels/sock/src/ch3_progress.c
+++ b/src/mpid/ch3/channels/sock/src/ch3_progress.c
@@ -458,6 +458,14 @@ static int MPIDI_CH3I_Progress_handle_sock_event(MPIDU_Sock_event_t * event)
{
MPIDI_CH3I_Connection_t * conn =
(MPIDI_CH3I_Connection_t *) event->user_ptr;
+ /* If we have a READ event on a discarded connection, we probably have
+ an error on this connection, if the remote side is closed due to
+ MPI_Finalize. Since the connection is discareded (and therefore not needed)
+ it can be closed and the error can be ignored */
+ if(conn->state == CONN_STATE_DISCARD){
+ MPIDI_CH3_Sockconn_handle_close_event(conn);
+ break;
+ }
MPID_Request * rreq = conn->recv_active;
diff --git a/src/mpid/ch3/include/mpidimpl.h b/src/mpid/ch3/include/mpidimpl.h
index 7692704..5b0c536 100644
--- a/src/mpid/ch3/include/mpidimpl.h
+++ b/src/mpid/ch3/include/mpidimpl.h
@@ -92,6 +92,13 @@ typedef struct MPIDI_PG
find a particular process group. */
void * id;
+ /* Flag to mark a procress group which is finalizing. This means thay
+ the VCs for this process group are closing, (normally becuase
+ MPI_Finalize was called). This is required to avoid a reconnection
+ of the VCs when the PG is closed due to unused elements in the event
+ queue */
+ int finalize;
+
/* Replacement abstraction for connection information */
/* Connection information needed to access processes in this process
group and to share the data with other processes. The items are
diff --git a/src/mpid/ch3/src/mpidi_pg.c b/src/mpid/ch3/src/mpidi_pg.c
index 6f35c59..2367259 100644
--- a/src/mpid/ch3/src/mpidi_pg.c
+++ b/src/mpid/ch3/src/mpidi_pg.c
@@ -200,6 +200,7 @@ int MPIDI_PG_Create(int vct_sz, void * pg_id, MPIDI_PG_t ** pg_ptr)
MPIU_Object_set_ref(pg, 0);
pg->size = vct_sz;
pg->id = pg_id;
+ pg->finalize = 0;
/* Initialize the connection information to null. Use
the appropriate MPIDI_PG_InitConnXXX routine to set up these
fields */
@@ -1216,6 +1217,7 @@ int MPIDI_PG_Close_VCs( void )
MPIDI_VC_GetStateString(vc->state)));
}
}
+ pg->finalize = 1;
pg = pg->next;
}
/* Note that we do not free the process groups within this routine, even
diff --git a/src/mpid/ch3/util/sock/ch3u_connect_sock.c b/src/mpid/ch3/util/sock/ch3u_connect_sock.c
index efe0bc6..9c4f1e0 100644
--- a/src/mpid/ch3/util/sock/ch3u_connect_sock.c
+++ b/src/mpid/ch3/util/sock/ch3u_connect_sock.c
@@ -598,11 +598,13 @@ int MPIDI_CH3_Sockconn_handle_connect_event( MPIDI_CH3I_Connection_t *conn,
}
/* --END ERROR HANDLING-- */
- if (conn->state == CONN_STATE_CONNECTING) {
+ if (conn->state == CONN_STATE_CONNECTING || conn->state == CONN_STATE_DISCARD) {
MPIDI_CH3I_Pkt_sc_open_req_t *openpkt =
(MPIDI_CH3I_Pkt_sc_open_req_t *)&conn->pkt.type;
- MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_OPEN_CSEND);
- conn->state = CONN_STATE_OPEN_CSEND;
+ if(conn->state == CONN_STATE_CONNECTING){
+ MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_OPEN_CSEND);
+ conn->state = CONN_STATE_OPEN_CSEND;
+ }
MPIDI_Pkt_init(openpkt, MPIDI_CH3I_PKT_SC_OPEN_REQ);
openpkt->pg_id_len = (int) strlen(MPIDI_Process.my_pg->id) + 1;
openpkt->pg_rank = MPIR_Process.comm_world->rank;
@@ -688,6 +690,16 @@ int MPIDI_CH3_Sockconn_handle_close_event( MPIDI_CH3I_Connection_t * conn )
not be referenced anymore in any case. */
conn->vc = NULL;
}
+ else if(conn->state == CONN_STATE_DISCARD) {
+ /* post close, so the socket is closed and memmory leaks are avoided */
+ MPIU_DBG_MSG(CH3_DISCONNECT,TYPICAL,"CLosing sock (Post_close)");
+ conn->state = CONN_STATE_CLOSING;
+ mpi_errno = MPIDU_Sock_post_close(conn->sock);
+ if (mpi_errno != MPI_SUCCESS) {
+ MPIU_ERR_POP(mpi_errno);
+ }
+ goto fn_exit;
+ }
else {
MPIU_Assert(conn->state == CONN_STATE_LISTENING);
MPIDI_CH3I_listener_conn = NULL;
@@ -785,8 +797,9 @@ int MPIDI_CH3_Sockconn_handle_conn_event( MPIDI_CH3I_Connection_t * conn )
MPIDI_CH3I_Pkt_sc_open_resp_t *openpkt =
(MPIDI_CH3I_Pkt_sc_open_resp_t *)&conn->pkt.type;
/* FIXME: is this the correct assert? */
- MPIU_Assert( conn->state == CONN_STATE_OPEN_CRECV );
- if (openpkt->ack) {
+
+ if (openpkt->ack && conn->state != CONN_STATE_DISCARD) {
+ MPIU_Assert( conn->state == CONN_STATE_OPEN_CRECV );
MPIDI_CH3I_VC *vcch = &conn->vc->ch;
MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_CONNECTED);
conn->state = CONN_STATE_CONNECTED;
@@ -812,6 +825,11 @@ int MPIDI_CH3_Sockconn_handle_conn_event( MPIDI_CH3I_Connection_t * conn )
Why isn't it changed? Is there an assert here,
such as conn->vc->conn != conn (there is another connection
chosen for the vc)? */
+ /*Answer to FIXME */
+ /* Neither freed nor updated. This connection is the looser of
+ a head-to-head connection. The VC is still in use, but by
+ another sochekt connection. The refcount is not incremented
+ By chaning the assosiated connection. */
/* MPIU_Assert( conn->vc->ch.conn != conn ); */
/* Set the candidate vc for this connection to NULL (we
are discarding this connection because (I think) we
@@ -824,6 +842,11 @@ int MPIDI_CH3_Sockconn_handle_conn_event( MPIDI_CH3I_Connection_t * conn )
conn->vc = NULL;
conn->state = CONN_STATE_CLOSING;
/* FIXME: What does post close do here? */
+ /* Answer to FIXME: */
+ /* Since the connection is discarded, the socket is
+ no longer needed and should be closed. This is initiated with the post
+ close command. This also caused that the socket is removed from the
+ socket set, so no more polling on this socket*/
MPIU_DBG_MSG(CH3_DISCONNECT,TYPICAL,"CLosing sock (Post_close)");
mpi_errno = MPIDU_Sock_post_close(conn->sock);
if (mpi_errno != MPI_SUCCESS) {
@@ -881,6 +904,18 @@ int MPIDI_CH3_Sockconn_handle_connopen_event( MPIDI_CH3I_Connection_t * conn )
MPIDI_PG_Get_vc_set_active(pg, pg_rank, &vc);
MPIU_Assert(vc->pg_rank == pg_rank);
+ if(pg->finalize == 1) {
+ MPIDI_Pkt_init(openresp, MPIDI_CH3I_PKT_SC_OPEN_RESP);
+ openresp->ack = FALSE;
+ MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_OPEN_LSEND);
+ conn->state = CONN_STATE_OPEN_LSEND;
+ mpi_errno = connection_post_send_pkt(conn);
+ if (mpi_errno != MPI_SUCCESS) {
+ MPIU_ERR_SETANDJUMP(mpi_errno,MPI_ERR_INTERN,
+ "**ch3|sock|open_lrecv_data");
+ }
+ goto fn_exit;
+ }
vcch = &vc->ch;
if (vcch->conn == NULL) {
/* no head-to-head connects, accept the connection */
@@ -902,6 +937,11 @@ int MPIDI_CH3_Sockconn_handle_connopen_event( MPIDI_CH3I_Connection_t * conn )
MPIU_DBG_MSG_FMT(CH3_CONNECT,TYPICAL,(MPIU_DBG_FDEST,
"vc=%p,conn=%p:Accept head-to-head connection (my process group), discarding vcch->conn=%p",vc,conn, vcch->conn));
+ /* mark old connection */
+ MPIDI_CH3I_Connection_t *old_conn = vcch->conn;
+ MPIU_DBG_CONNSTATECHANGE(old_conn,old_conn,CONN_STATE_DISCARD);
+ old_conn->state = CONN_STATE_DISCARD;
+
/* accept connection */
MPIU_DBG_VCCHSTATECHANGE(vc,VC_STATE_CONNECTING);
vcch->state = MPIDI_CH3I_VC_STATE_CONNECTING;
@@ -926,6 +966,10 @@ int MPIDI_CH3_Sockconn_handle_connopen_event( MPIDI_CH3I_Connection_t * conn )
if (strcmp(MPIDI_Process.my_pg->id, pg->id) < 0) {
MPIU_DBG_MSG_FMT(CH3_CONNECT,TYPICAL,(MPIU_DBG_FDEST,
"vc=%p,conn=%p:Accept head-to-head connection (two process groups), discarding vcch->conn=%p",vc,conn, vcch->conn));
+ /* mark old connection */
+ MPIDI_CH3I_Connection_t *old_conn = vcch->conn;
+ MPIU_DBG_CONNSTATECHANGE(old_conn,old_conn,CONN_STATE_DISCARD);
+ old_conn->state = CONN_STATE_DISCARD;
/* accept connection */
MPIU_DBG_VCCHSTATECHANGE(vc,VC_STATE_CONNECTING);
vcch->state = MPIDI_CH3I_VC_STATE_CONNECTING;
@@ -973,11 +1017,13 @@ int MPIDI_CH3_Sockconn_handle_connwrite( MPIDI_CH3I_Connection_t * conn )
MPIDI_FUNC_ENTER(MPID_STATE_MPIDI_CH3_SOCKCONN_HANDLE_CONNWRITE);
- if (conn->state == CONN_STATE_OPEN_CSEND) {
+ if (conn->state == CONN_STATE_OPEN_CSEND || conn->state == CONN_STATE_DISCARD) {
/* finished sending open request packet */
/* post receive for open response packet */
- MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_OPEN_CRECV);
- conn->state = CONN_STATE_OPEN_CRECV;
+ if(conn->state == CONN_STATE_OPEN_CSEND){
+ MPIU_DBG_CONNSTATECHANGE(conn->vc,conn,CONN_STATE_OPEN_CRECV);
+ conn->state = CONN_STATE_OPEN_CRECV;
+ }
mpi_errno = connection_post_recv_pkt(conn);
if (mpi_errno != MPI_SUCCESS) {
MPIU_ERR_POP(mpi_errno);
diff --git a/src/mpid/ch3/util/sock/ch3u_init_sock.c b/src/mpid/ch3/util/sock/ch3u_init_sock.c
index 68de1e2..a2a89fe 100644
--- a/src/mpid/ch3/util/sock/ch3u_init_sock.c
+++ b/src/mpid/ch3/util/sock/ch3u_init_sock.c
@@ -119,6 +119,7 @@ const char * MPIDI_Conn_GetStateString(int state)
case CONN_STATE_CONNECTED: name = "CONN_STATE_CONNECTED"; break;
case CONN_STATE_CLOSING: name = "CONN_STATE_CLOSING"; break;
case CONN_STATE_CLOSED: name = "CONN_STATE_CLOSED"; break;
+ case CONN_STATE_DISCARD: name = "CONN_STATE_DISCARD"; break;
case CONN_STATE_FAILED: name = "CONN_STATE_FAILE"; break;
}
diff --git a/src/mpid/ch3/util/sock/ch3usock.h b/src/mpid/ch3/util/sock/ch3usock.h
index 0231488..8de0f22 100644
--- a/src/mpid/ch3/util/sock/ch3usock.h
+++ b/src/mpid/ch3/util/sock/ch3usock.h
@@ -25,6 +25,7 @@ typedef enum MPIDI_CH3I_Conn_state
CONN_STATE_CONNECTED,
CONN_STATE_CLOSING,
CONN_STATE_CLOSED,
+ CONN_STATE_DISCARD,
CONN_STATE_FAILED
} MPIDI_CH3I_Conn_state;
-----------------------------------------------------------------------
Summary of changes:
src/mpid/ch3/channels/sock/src/ch3_progress.c | 22 ++++++++-
src/mpid/ch3/include/mpidimpl.h | 7 +++
src/mpid/ch3/src/mpidi_pg.c | 2 +
src/mpid/ch3/util/sock/ch3u_connect_sock.c | 62 +++++++++++++++++++++---
src/mpid/ch3/util/sock/ch3u_init_sock.c | 1 +
src/mpid/ch3/util/sock/ch3usock.h | 1 +
src/mpid/common/sock/mpidu_sock.h | 27 +++++++++++
src/mpid/common/sock/poll/sock_set.i | 32 +++++++++++++
8 files changed, 144 insertions(+), 10 deletions(-)
hooks/post-receive
--
MPICH primary repository
More information about the commits
mailing list