[mpich-discuss] turning off MPI abort messages
Barry Smith
bsmith at mcs.anl.gov
Fri Feb 21 18:06:54 CST 2014
On Feb 21, 2014, at 5:01 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> On Fri, Feb 21, 2014 at 4:32 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>> Jeff,
>>
>> Thanks. This is certainly a useful thing.
>
> It's only half a solution right now. Hacking Hydra is a bit more
> difficult for me. Not sure how long before I can solve that in a
> manner that the MPICH folks find acceptable.
>
>> I never meant to kick a hornet’s nest with my initial email. I was taught by my postdoctoral advisor that any library or package that had stdout or stderr output hardwired that could not be turned off without losing functionality was rude and poorly thought out but then that guy probably never amounted to anything I guess so I should just ignore him since he doesn’t represent main stream thought.
>
> A colleague of mine suggested that libraries shouldn't be calling
> MPI_Abort but rather return an error code to the application and let
> them decide how to handle it, but he learned MPI from Bill Gropp, so
> he might not know anything ;-)
Actually "the library" isn’t “calling” MPI_Abort, the library’s default error handler is eventually calling MPI_Abort(). The library returns error codes to the application code and the application code is free to handle them anyway it likes as well as set its own error handlers.
Barry
>
> I apologize for being unpleasant earlier.
>
> Best,
>
> Jeff
>
>
>>
>> Barry
>>
>> On Feb 21, 2014, at 3:10 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>
>>> Barry:
>>>
>>> Would the following behavior be acceptable to you? I have only made
>>> the changes in MPI but am looking at the process manager now.
>>>
>>> Jeff
>>>
>>>
>>> # Without the process manager
>>>
>>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=0
>>> alcfwl181:build jhammond$ ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>> alcfwl181:build jhammond$ export MPIR_CVAR_SUPPRESS_ABORT_MESSAGE=1
>>> alcfwl181:build jhammond$ ./a.out
>>>
>>> alcfwl181:build jhammond$ unset MPIR_CVAR_SUPPRESS_ABORT_MESSAGE
>>> alcfwl181:build jhammond$ ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>
>>> # With the process manager
>>>
>>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 0 ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>
>>> ===================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = PID 61023 RUNNING AT alcfwl181.alcf.anl.gov
>>> = EXIT CODE: 1
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> alcfwl181:build jhammond$ mpiexec -n 1 -env
>>> MPIR_CVAR_SUPPRESS_ABORT_MESSAGE 1 ./a.out
>>>
>>>
>>> ===================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = PID 61026 RUNNING AT alcfwl181.alcf.anl.gov
>>> = EXIT CODE: 1
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>> alcfwl181:build jhammond$ mpiexec -n 1 ./a.out
>>> application called MPI_Abort(MPI_COMM_WORLD, 1) - process 0
>>>
>>> ===================================================================================
>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>> = PID 61032 RUNNING AT alcfwl181.alcf.anl.gov
>>> = EXIT CODE: 1
>>> = CLEANING UP REMAINING PROCESSES
>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>> ===================================================================================
>>>
>>>
>>>
>>> On Thu, Feb 20, 2014 at 11:33 AM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>>>
>>>> Is there any way to turn off MPICH (and others) printing messages about MPI_Abort? We have already prepared and presented useful error messages to the user about the situation and would like to avoid having these additional messages printed (that often make the situation look worse than it is)
>>>>
>>>> Thanks
>>>>
>>>> Barry
>>>>
>>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>> [cli_0]: aborting job:
>>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>>>>
>>>> ==================================================================mailto:discuss at mpich.org=================
>>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>>>> = EXIT CODE: 56
>>>> = CLEANING UP REMAINING PROCESSES
>>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>>>> ===================================================================================
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list