[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.1-260-g6b5993a
Service Account
noreply at mpich.org
Thu May 22 09:44:36 CDT 2014
This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".
The branch, master has been updated
via 6b5993af5cd4aadd6648c024a6b815749c35f8a6 (commit)
from 98b5e585a61a8eccbd0224b64c66c505fa5ddf0e (commit)
Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.
- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/6b5993af5cd4aadd6648c024a6b815749c35f8a6
commit 6b5993af5cd4aadd6648c024a6b815749c35f8a6
Author: Su Huang <suhuang at us.ibm.com>
Date: Thu May 22 09:29:57 2014 -0400
pamid: task 0 hang in MPI_Init() if MP_PRINTENV=yes
In MPIDI_Print_mpenv(), when calling MPIR_Gather_impl to gather all MP environment variables
from all tasks in a job, the errflag parameter was not initialized to 0 before it was
passed to the routine:
mpi_errno = MPIR_Gather_impl(&sender, sizeof(MPIDI_printenv_t), MPI_BYTE, gatherer,
sizeof(MPIDI_printenv_t),MPI_BYTE, 0,comm_ptr,
(int *) &errflag);
To process the Gather collective call, each task issued MPIC_Recv, MPIC_Send and MPIC_Wait.
MPIC_Send() sends a message with MPIR_GATHER_TAG (defined as 0x3). Since the routine had a
non-zero errflag passed in,
if (*errflag && MPIR_CVAR_ENABLE_COLL_FT_RET)
MPIR_TAG_SET_ERROR_BIT(tag);
the 30th bit of the tag was set to 1 :(1 << 30) (MPIR_TAG_ERROR_BIT). Therefore, the tag was
changed from 0x3 to 0x40000003.
On task 1, a message with this modified tag was sent to task 0. When the message arrived at
task 0, the receive for the message with the original tag of 0x3 had been posted.
However, the tag in the arrived message differed from the tag from the posted receive.
So no match was found for the arrived message which was the root cause of the hang.
MPIR_TAG_SET_ERROR_BIT was added for MPI 3.0 (pe rbrew and beyond) which explains why
the job does not fail with prior releases.
(ibm) D197745
Signed-off-by: Michael Blocksome <blocksom at us.ibm.com>
diff --git a/src/mpid/pamid/src/mpidi_util.c b/src/mpid/pamid/src/mpidi_util.c
index aba5bb6..45a36d0 100644
--- a/src/mpid/pamid/src/mpidi_util.c
+++ b/src/mpid/pamid/src/mpidi_util.c
@@ -35,7 +35,7 @@
#include "mpidi_util.h"
#define PAMI_TUNE_MAX_ITER 2000
-
+#define _DEBUG 1
/* Short hand for sizes */
#define ONE (1)
#define ONEK (1<<10)
@@ -461,7 +461,7 @@ int MPIDI_Print_mpenv(int rank,int size)
char *popenptr;
char tempstr[128];
int mpi_errno;
- int errflag;
+ int errflag=0;
MPIDI_Set_mpich_env(rank,size);
memset(&sender,0,sizeof(MPIDI_printenv_t));
-----------------------------------------------------------------------
Summary of changes:
src/mpid/pamid/src/mpidi_util.c | 4 ++--
1 files changed, 2 insertions(+), 2 deletions(-)
hooks/post-receive
--
MPICH primary repository
More information about the commits
mailing list