[mpich-discuss] turning off MPI abort messages

Jeff Hammond jeff.science at gmail.com
Fri Feb 21 16:05:59 CST 2014


https://trac.mpich.org/projects/mpich/ticket/2038 has the patches.

Jeff

On Fri, Feb 21, 2014 at 3:47 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> Barry:
>
> Can you tolerate the following workaround for Hydra's error cleanup or
> do you need it to be internal?  I presume you know enough bash to
> generalize a.sh appropriately.
>
> alcfwl181:build jhammond$ cat a.sh
> #!/bin/sh
> $1
> true
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.sh ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.sh ./a.out
>
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 61123 RUNNING AT alcfwl181.alcf.anl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> alcfwl181:build jhammond$ mpiexec -n 1 -env
> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 61126 RUNNING AT alcfwl181.alcf.anl.gov
> =   EXIT CODE: 1
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
>
> On Fri, Feb 21, 2014 at 3:10 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> Barry:
>>
>> Would the following behavior be acceptable to you?  I have only made
>> the changes in MPI but am looking at the process manager now.
>>
>> Jeff
>>
>>
>> # Without the process manager
>>
>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0
>> alcfwl181:build jhammond$ ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1
>> alcfwl181:build jhammond$ ./a.out
>>
>> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE
>> alcfwl181:build jhammond$ ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> # With the process manager
>>
>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61023 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>>
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61026 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61032 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>>
>>
>>
>> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>    Is there any way to turn off MPICH (and others) printing messages about MPI_Abort?  We have already prepared and presented useful error messages to the user about the situation and would like to avoid having these additional messages printed (that often make the situation look worse than it is)
>>>
>>>     Thanks
>>>
>>>    Barry
>>>
>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>> [cli_0]: aborting job:
>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>
>>> ==================================================================mailto:discuss at mpich.org=================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   EXIT CODE: 56
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com



-- 
Jeff Hammond
jeff.science at gmail.com


More information about the discuss mailing list