[mpich-discuss] turning off MPI abort messages

Jeff Hammond jeff.science at gmail.com
Sat Feb 22 14:27:38 CST 2014


Thanks for catching this.  I tested my patch but not through valgrind.

Thanks also for figuring out the line break issue.  I figured that was
coming from the device but didn't track it down.

Jeff

On Sat, Feb 22, 2014 at 2:07 PM, Jed Brown <jed at jedbrown.org> wrote:
> Jeff Hammond <jeff.science at gmail.com> writes:
>
>> https://trac.mpich.org/projects/mpich/ticket/2038 has the patches.
>
> Although I thought I once had an account on Trac, it doesn't seem to
> know about me any more.  Anyway, this patch passes an undefined
> abort_str on to MPID_Abort.
>
>     char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
>     ...
>     if (!MPIR_CVAR_SUPPRESS_ABORT_MESSAGE)
>         /* FIXME: This is not internationalized */
>         MPIU_Snprintf(abort_str, 100, "application called MPI_Abort(%s, %d) - process %d", comm_name, errorcode, comm_ptr->rank);
>     mpi_errno = MPID_Abort( comm_ptr, mpi_errno, errorcode, abort_str );
>
>
> ==27285== Conditional jump or move depends on uninitialised value(s)
> ==27285==    at 0x56F2AE8: vfprintf (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x56F5630: buffered_vfprintf (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
> ==27285==    by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
> ==27285==    by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
> ==27285==    by 0x400808: main (in /home/jed/lang/mpi/a.out)
> ==27285==
> ==27285== Syscall param write(buf) points to uninitialised byte(s)
> ==27285==    at 0x5783470: __write_nocancel (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x571E472: _IO_file_write@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x571DB32: new_do_write (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x571EA85: _IO_file_xsputn@@GLIBC_2.2.5 (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x56F56C5: buffered_vfprintf (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x56F06BD: vfprintf (in /usr/lib/libc-2.19.so)
> ==27285==    by 0x4E96336: MPIU_Error_printf (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
> ==27285==    by 0x4EC0D93: MPID_Abort (in /home/jed/usr/mpich-clang/lib/libmpich.so.12.0.0)
> ==27285==    by 0x40795B6: MPI_Abort (in /home/jed/usr/mpich-clang/lib/libpmpich.so.12.0.0)
> ==27285==    by 0x400808: main (in /home/jed/lang/mpi/a.out)
> ==27285==  Address 0xffeffd130 is on thread 1's stack
>
> So I fix this:
>
> diff --git i/src/mpi/init/abort.c w/src/mpi/init/abort.c
> index f0b4cdc..bb1a63b 100644
> --- i/src/mpi/init/abort.c
> +++ w/src/mpi/init/abort.c
> @@ -74,7 +74,7 @@ int MPI_Abort(MPI_Comm comm, int errorcode)
>      int mpi_errno = MPI_SUCCESS;
>      MPID_Comm *comm_ptr = NULL;
>      /* FIXME: 100 is arbitrary and may not be long enough */
> -    char abort_str[100], comm_name[MPI_MAX_OBJECT_NAME];
> +    char abort_str[100] = "", comm_name[MPI_MAX_OBJECT_NAME];
>      int len = MPI_MAX_OBJECT_NAME;
>      MPID_MPI_STATE_DECL(MPID_STATE_MPI_ABORT);
>
>
> and now I can sort of suppress the output:
>
> $ MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1 ./a.out
>
> $
>
> so it prints a blank line which may not be acceptable if it is producing
> a stream, but is otherwise fine.  Passing abort_str=NULL is already used
> for something else ("internal ABORT"), but the following cleans up the
> output.
>
> diff --git i/src/mpid/ch3/src/mpid_abort.c w/src/mpid/ch3/src/mpid_abort.c
> index f0877ca..74b8a56 100644
> --- i/src/mpid/ch3/src/mpid_abort.c
> +++ w/src/mpid/ch3/src/mpid_abort.c
> @@ -94,7 +94,7 @@ int MPID_Abort(MPID_Comm * comm, int mpi_errno, int exit_code,
>  #elif defined(MPIDI_DEV_IMPLEMENTS_ABORT)
>      MPIDI_CH3I_PMI_Abort(exit_code, error_msg);
>  #else
> -    MPIU_Error_printf("%s\n", error_msg);
> +    if (error_msg[0]) MPIU_Error_printf("%s\n", error_msg);
>      fflush(stderr);
>  #endif
>
> If this is acceptable, a similar change should be applied to the other
> devices.



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list