[mpich-discuss] turning off MPI abort messages

Jed Brown jed at jedbrown.org
Sat Feb 22 14:07:59 CST 2014


Jeff Hammond <jeff.science at gmail.com> writes:

> https://trac.mpich.org/projects/mpich/ticket/2038 has the patches.

Although I thought I once had an account on Trac, it doesn't seem to
know about me any more.  Anyway, this patch passes an undefined
abort_str on to MPID_Abort.

    char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
    ...
    if (!MPIR_CVAR_SUPPRESS_ABORT_MESSAGE)
        /* FIXME: This is not internationalized */
        MPIU_Snprintf(abort_str, 100, "application called MPI_Abort(%s, %d) - process %d", comm_name, errorcode, comm_ptr->rank);
    mpi_errno = MPID_Abort( comm_ptr, mpi_errno, errorcode, abort_str );


==27285== Conditional jump or move depends on uninitialised value(s)
==27285==    at 0x56F2AE8: vfprintf (in /usr/lib/libc-2.19.so)
==27285==    by 0x56F5630: buffered_vfprintf (in /usr/lib/libc-2.19.so)
==27285==    by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
==27285==    by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285==    by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285==    by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
==27285==    by 0x400808: main (in /home/jed/lang/mpi/a.out)
==27285== 
==27285== Syscall param write(buf) points to uninitialised byte(s)
==27285==    at 0x5783470: __write_nocancel (in /usr/lib/libc-2.19.so)
==27285==    by 0x571E472: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
==27285==    by 0x571DB32: new_do_write (in /usr/lib/libc-2.19.so)
==27285==    by 0x571EA85: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
==27285==    by 0x56F56C5: buffered_vfprintf (in /usr/lib/libc-2.19.so)
==27285==    by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
==27285==    by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285==    by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
==27285==    by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
==27285==    by 0x400808: main (in /home/jed/lang/mpi/a.out)
==27285==  Address 0xffeffd130 is on thread 1's stack

So I fix this:

diff --git i/src/mpi/init/abort.c w/src/mpi/init/abort.c
index f0b4cdc..bb1a63b 100644
--- i/src/mpi/init/abort.c
+++ w/src/mpi/init/abort.c
@@ -74,7 +74,7 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
     int mpi_errno = MPI_SUCCESS;
     MPID_Comm *comm_ptr = NULL;
     /* FIXME: 100 is arbitrary and may not be long enough */
-    char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
+    char abort_str[100] = "", comm_name[MPI_MAX_OBJECT_NAME];
     int len = MPI_MAX_OBJECT_NAME;
     MPID_MPI_STATE_DECL(MPID_STATE_MPI_ABORT);
 

and now I can sort of suppress the output:

$ MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 ./a.out                                                                                                                                                     

$

so it prints a blank line which may not be acceptable if it is producing
a stream, but is otherwise fine.  Passing abort_str=NULL is already used
for something else ("internal ABORT"), but the following cleans up the
output.

diff --git i/src/mpid/ch3/src/mpid_abort.c w/src/mpid/ch3/src/mpid_abort.c
index f0877ca..74b8a56 100644
--- i/src/mpid/ch3/src/mpid_abort.c
+++ w/src/mpid/ch3/src/mpid_abort.c
@@ -94,7 +94,7 @@ int MPID_Abort(MPID_Comm * comm, int mpi_errno, int exit_code,
 #elif defined(MPIDI_DEV_IMPLEMENTS_ABORT)
     MPIDI_CH3I_PMI_Abort(exit_code, error_msg);
 #else
-    MPIU_Error_printf("%s\n", error_msg);
+    if (error_msg[0]) MPIU_Error_printf("%s\n", error_msg);
     fflush(stderr);
 #endif

If this is acceptable, a similar change should be applied to the other
devices.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: not available
Type: application/pgp-signature
Size: 835 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140222/9a78b67a/attachment.sig>


More information about the discuss mailing list