[mpich-discuss] turning off MPI abort messages

Jeff Hammond jeff.science at gmail.com
Fri Feb 21 13:20:47 CST 2014


Aron told me that I should have said "we look forward to your patch"
earlier instead of causing you guys to go apoplectic over my different
definition of the word crash.

Jeff

On Fri, Feb 21, 2014 at 1:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>
>   Understood. But I would like to eliminate both sets of error messages and still provide a useful “return code”. Perhaps compile time options to the library?
>
>    Barry
>
> On Feb 21, 2014, at 12:40 PM, Jim Dinan <james.dinan at gmail.com> wrote:
>
>> A little more detail -- you're actually getting messages from two sources: (1) the MPICH library ("application called MPI_Abort...") and (2) the job launcher ("BAD TERMINATION...").  You can eliminate the messages from the job launcher by providing an error code of 0 in MPI_Abort.
>>
>>  ~Jim.
>>
>>
>>
>>
>> On Fri, Feb 21, 2014 at 1:19 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> >> Just configure MPICH such that snprintf isn't discovered by configure
>> >> and you won't see these messages.
>> >>
>> >> The other solution is to fix PETSc so that people can't crash it so easily ;-)
>> >
>> >    Here we go again. It is not CRASHING; it has detected an error conditioning and trying to appropriately and cleanly terminate. The reason it needs to use MPI_Abort() is that often detecting error conditions is not a uniformly collective thing.
>> >
>> >     Printing a suitable error message and ending is not crashing. But with all the badly formatted “error messages” printed by MPICH I can not control at the end it looks like it is crashing.
>>
>> You're returning a non-zero exit code, which I consider crashing.  I
>> apologize if this definition disagrees with yours.  If this is just
>> gentle cleanup, why not exit with code=0 as Jim suggested already?
>>
>> Jeff
>>
>> >> On Thu, Feb 20, 2014 at 3:19 PM, Jim Dinan <james.dinan at gmail.com> wrote:
>> >>> If you can find a way to call MPI_Finalize instead, you will portably
>> >>> eliminate these messages.
>> >>>
>> >>> A lesser solution would be to provide an error code of 0 (or MPI_SUCCESS) to
>> >>> MPI_Abort, e.g. MPI_Comm_abort(MPI_COMM_WORLD, MPI_SUCCESS).  This would
>> >>> eliminate the error message that you are getting from the job launcher.
>> >>> MPICH could be modified to be quiet about the abort when the application
>> >>> aborts with an error code of MPI_SUCCESS.
>> >>>
>> >>> ~Jim.
>> >>>
>> >>>
>> >>> On Thu, Feb 20, 2014 at 12:33 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>> >>>>
>> >>>>
>> >>>>   Is there any way to turn off MPICH (and others) printing messages about
>> >>>> MPI_Abort?  We have already prepared and presented useful error messages to
>> >>>> the user about the situation and would like to avoid having these additional
>> >>>> messages printed (that often make the situation look worse than it is)
>> >>>>
>> >>>>    Thanks
>> >>>>
>> >>>>   Barry
>> >>>>
>> >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>> >>>> [cli_0]: aborting job:
>> >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>> >>>>
>> >>>>
>> >>>> ==================================================================mailto:discuss at mpich.org=================
>> >>>> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> >>>> =   EXIT CODE: 56
>> >>>> =   CLEANING UP REMAINING PROCESSES
>> >>>> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> >>>>
>> >>>> ===================================================================================
>> >>>>
>> >>>>
>> >>>>
>> >>>>
>> >>>> _______________________________________________
>> >>>> discuss mailing list     discuss at mpich.org
>> >>>> To manage subscription options or unsubscribe:
>> >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> >>>
>> >>>
>> >>>
>> >>> _______________________________________________
>> >>> discuss mailing list     discuss at mpich.org
>> >>> To manage subscription options or unsubscribe:
>> >>> https://lists.mpich.org/mailman/listinfo/discuss
>> >>
>> >>
>> >>
>> >> --
>> >> Jeff Hammond
>> >> jeff.science at gmail.com
>> >> _______________________________________________
>> >> discuss mailing list     discuss at mpich.org
>> >> To manage subscription options or unsubscribe:
>> >> https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > discuss mailing list     discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list