[mpich-discuss] turning off MPI abort messages

Jeff Hammond jeff.science at gmail.com
Fri Feb 21 17:01:50 CST 2014


On Fri, Feb 21, 2014 at 4:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>    Jeff,
>
>      Thanks. This is certainly a useful thing.

It's only half a solution right now.  Hacking Hydra is a bit more
difficult for me.  Not sure how long before I can solve that in a
manner that the MPICH folks find acceptable.

>       I never meant to kick a hornet’s nest with my initial email. I was taught by my postdoctoral advisor that any library or package that had stdout or stderr output hardwired that could not be turned off without losing functionality was rude and poorly thought out but then that guy probably never amounted to anything I guess so I should just ignore him since he doesn’t represent main stream thought.

A colleague of mine suggested that libraries shouldn't be calling
MPI_Abort but rather return an error code to the application and let
them decide how to handle it, but he learned MPI from Bill Gropp, so
he might not know anything ;-)

I apologize for being unpleasant earlier.

Best,

Jeff


>
>    Barry
>
> On Feb 21, 2014, at 3:10 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>
>> Barry:
>>
>> Would the following behavior be acceptable to you?  I have only made
>> the changes in MPI but am looking at the process manager now.
>>
>> Jeff
>>
>>
>> # Without the process manager
>>
>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0
>> alcfwl181:build jhammond$ ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1
>> alcfwl181:build jhammond$ ./a.out
>>
>> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE
>> alcfwl181:build jhammond$ ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> # With the process manager
>>
>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61023 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>>
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61026 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out
>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>
>> ===================================================================================
>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> =   PID 61032 RUNNING AT alcfwl181.alcf.anl.gov
>> =   EXIT CODE: 1
>> =   CLEANING UP REMAINING PROCESSES
>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> ===================================================================================
>>
>>
>>
>> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>
>>>   Is there any way to turn off MPICH (and others) printing messages about MPI_Abort?  We have already prepared and presented useful error messages to the user about the situation and would like to avoid having these additional messages printed (that often make the situation look worse than it is)
>>>
>>>    Thanks
>>>
>>>   Barry
>>>
>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>> [cli_0]: aborting job:
>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>
>>> ==================================================================mailto:discuss at mpich.org=================
>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> =   EXIT CODE: 56
>>> =   CLEANING UP REMAINING PROCESSES
>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>>
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list