[mpich-commits] [mpich] MPICH primary repository branch, master, updated. v3.0.4-440-g254aa2c

mysql vizuser noreply at mpich.org
Tue Aug 6 17:45:44 CDT 2013


This is an automated email from the git hooks/post-receive script. It was
generated because a ref change was pushed to the repository containing
the project "MPICH primary repository".

The branch, master has been updated
       via  254aa2cdaba145bf9c7f6a42665cae32d1b31685 (commit)
      from  4824c7620ed2da59c1bbb9411b414725462972d0 (commit)

Those revisions listed above that are new to this repository have
not appeared on any other notification email; so we list those
revisions in full, below.

- Log -----------------------------------------------------------------
http://git.mpich.org/mpich.git/commitdiff/254aa2cdaba145bf9c7f6a42665cae32d1b31685

commit 254aa2cdaba145bf9c7f6a42665cae32d1b31685
Author: Michael Blocksome <blocksom at us.ibm.com>
Date:   Tue Aug 6 11:32:15 2013 -0500

    Clean up pamid MPID_Abort() logic
    
    Removed the processing of the (undocumented) environment variable
    'PAMID_CORE_ON_ABORT' which was being checked to determine if the user
    does *not* want the process to core dump. On Blue Gene/Q the core dump
    was accomplished by calling 'abort()' which sends SIGSBRT to all
    processes and all processes would then write a core file. This is not
    scalable.
    
    Instead, MPID_Abort() will invoke 'exit(1)' which will terminate all
    processes in the job. This behavior is identical for both the POE and
    the Blue Gene/Q control systems.
    
    On Blue Gene/Q the user may replicate the previous core dump behavior by
    using the environment variables 'BG_COREDUMPONERROR=1' or
    'BG_COREDUMPONEXIT=1'.
    
    Finally, the 'DYNAMIC_TASKING' #ifdef is moved up so it is checked first.
    'MPIDI_NO_ASSERT' and 'DYNAMIC_TASKING' are typically defined for PE. It
    appears that the dynamic tasking code was never being invoked.
    
    (ibm) CPS 99YURA
    
    Signed-off-by: Bob Cernohous <bobc at us.ibm.com>

diff --git a/src/mpid/pamid/src/misc/mpid_abort.c b/src/mpid/pamid/src/misc/mpid_abort.c
index ef8af06..dc618af 100644
--- a/src/mpid/pamid/src/misc/mpid_abort.c
+++ b/src/mpid/pamid/src/misc/mpid_abort.c
@@ -27,8 +27,8 @@
  *
  * \param[in] comm      The communicator associated with the failure (can be null).
  * \param[in] mpi_errno The MPI error associated with the failure (can be zero).
- * \param[in] exit_code The requested exit code, however BG features imply that exit(1) will always be used.
- * \param[in] error_msg The message to display (may be NULL_
+ * \param[in] exit_code The requested exit code.
+ * \param[in] error_msg The message to display (may be NULL)
  *
  * This is the majority of the call to MPID_Abort().  The only
  * difference is that it does not call exit.  That allows it to be
@@ -74,26 +74,28 @@ void MPIDI_Abort_core(MPID_Comm * comm, int mpi_errno, int exit_code, const char
  * \brief The central parts of the MPID_Abort call
  * \param[in] comm      The communicator associated with the failure (can be null).
  * \param[in] mpi_errno The MPI error associated with the failure (can be zero).
- * \param[in] exit_code The requested exit code, however BG features imply that exit(1) will always be used.
- * \param[in] error_msg The message to display (may be NULL_
- * \returns MPI_ERR_INTERN
+ * \param[in] exit_code The requested exit code.
+ * \param[in] error_msg The message to display (may be NULL)
+ * \return MPI_ERR_INTERN
  *
  * This function MUST NEVER return.
  */
 int MPID_Abort(MPID_Comm * comm, int mpi_errno, int exit_code, const char *error_msg)
 {
-  char* env     = getenv("PAMID_CORE_ON_ABORT");
   MPIDI_Abort_core(comm, mpi_errno, exit_code, error_msg);
 
-#ifdef MPIDI_NO_ASSERT
-  exit(1);
-#endif
-  if (env != NULL)
-    if ( (strncasecmp("no", env, 2)==0) || (strncasecmp("exit", env, 4)==0) || (strncmp("0", env, 1)==0) )
-      exit(1);
-
 #ifdef DYNAMIC_TASKING
   return PMI2_Abort(1,error_msg);
+#else
+  /* The POE and BGQ control systems both catch the exit value for additional
+   * processing. If a process exits with '1' then all processes in the job
+   * are terminated. The requested error code is lost in this process however
+   * this is acceptable, but not desirable, behavior according to the MPI
+   * standard.
+   *
+   * On BGQ, the user may force the process (rank) that exited with '1' to core
+   * dump by setting the environment variable 'BG_COREDUMPONERROR=1'.
+   */
+  exit(1);
 #endif
-  abort();
 }

-----------------------------------------------------------------------

Summary of changes:
 src/mpid/pamid/src/misc/mpid_abort.c |   30 ++++++++++++++++--------------
 1 files changed, 16 insertions(+), 14 deletions(-)


hooks/post-receive
-- 
MPICH primary repository


More information about the commits mailing list