[mpich-discuss] turning off MPI abort messages
Jeff Hammond
jeff.science at gmail.com
Fri Feb 21 13:35:39 CST 2014
I have already implemented the CVAR logic to suppress the output in
MPI_Abort so the necessary convincing is only as to whether or not
MPICH folks accepts this patch.
I'm not sure whether I will bother to do anything to Hydra. Seems
like one could wrap the binary in a "nofail" script (we use these on
BG a lot because of how the control system responds to error codes) to
keep Hydra from complaining. However, this is shell magic and clearly
I don't know anything about shell because I think nonzero error codes
in MPI programs imply that ls crashes when looking for missing files.
Jeff
On Fri, Feb 21, 2014 at 1:28 PM, Jim Dinan <james.dinan at gmail.com> wrote:
> Barry,
>
> Do you need a portable solution that works across different MPI
> implementations, or does a solution for making just MPICH silent address
> your need? If the latter, you could probably convince someone on the MPICH
> team to add an environment variable for MPICH and a command-line "quiet"
> flag for hydra/mpiexec.
>
> ~Jim.
>
>
> On Fri, Feb 21, 2014 at 2:13 PM, Barry Smith <bsmith at mcs.anl.gov> wrote:
>>
>>
>> Understood. But I would like to eliminate both sets of error messages
>> and still provide a useful “return code”. Perhaps compile time options to
>> the library?
>>
>> Barry
>>
>> On Feb 21, 2014, at 12:40 PM, Jim Dinan <james.dinan at gmail.com> wrote:
>>
>> > A little more detail -- you're actually getting messages from two
>> > sources: (1) the MPICH library ("application called MPI_Abort...") and (2)
>> > the job launcher ("BAD TERMINATION..."). You can eliminate the messages
>> > from the job launcher by providing an error code of 0 in MPI_Abort.
>> >
>> > ~Jim.
>> >
>> >
>> >
>> >
>> > On Fri, Feb 21, 2014 at 1:19 PM, Jeff Hammond <jeff.science at gmail.com>
>> > wrote:
>> > >> Just configure MPICH such that snprintf isn't discovered by configure
>> > >> and you won't see these messages.
>> > >>
>> > >> The other solution is to fix PETSc so that people can't crash it so
>> > >> easily ;-)
>> > >
>> > > Here we go again. It is not CRASHING; it has detected an error
>> > > conditioning and trying to appropriately and cleanly terminate. The reason
>> > > it needs to use MPI_Abort() is that often detecting error conditions is not
>> > > a uniformly collective thing.
>> > >
>> > > Printing a suitable error message and ending is not crashing. But
>> > > with all the badly formatted “error messages” printed by MPICH I can not
>> > > control at the end it looks like it is crashing.
>> >
>> > You're returning a non-zero exit code, which I consider crashing. I
>> > apologize if this definition disagrees with yours. If this is just
>> > gentle cleanup, why not exit with code=0 as Jim suggested already?
>> >
>> > Jeff
>> >
>> > >> On Thu, Feb 20, 2014 at 3:19 PM, Jim Dinan <james.dinan at gmail.com>
>> > >> wrote:
>> > >>> If you can find a way to call MPI_Finalize instead, you will
>> > >>> portably
>> > >>> eliminate these messages.
>> > >>>
>> > >>> A lesser solution would be to provide an error code of 0 (or
>> > >>> MPI_SUCCESS) to
>> > >>> MPI_Abort, e.g. MPI_Comm_abort(MPI_COMM_WORLD, MPI_SUCCESS). This
>> > >>> would
>> > >>> eliminate the error message that you are getting from the job
>> > >>> launcher.
>> > >>> MPICH could be modified to be quiet about the abort when the
>> > >>> application
>> > >>> aborts with an error code of MPI_SUCCESS.
>> > >>>
>> > >>> ~Jim.
>> > >>>
>> > >>>
>> > >>> On Thu, Feb 20, 2014 at 12:33 PM, Barry Smith <bsmith at mcs.anl.gov>
>> > >>> wrote:
>> > >>>>
>> > >>>>
>> > >>>> Is there any way to turn off MPICH (and others) printing messages
>> > >>>> about
>> > >>>> MPI_Abort? We have already prepared and presented useful error
>> > >>>> messages to
>> > >>>> the user about the situation and would like to avoid having these
>> > >>>> additional
>> > >>>> messages printed (that often make the situation look worse than it
>> > >>>> is)
>> > >>>>
>> > >>>> Thanks
>> > >>>>
>> > >>>> Barry
>> > >>>>
>> > >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>> > >>>> [cli_0]: aborting job:
>> > >>>> application called MPI_Abort(MPI_COMM_WORLD, 56) - process 0
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> ==================================================================mailto:discuss at mpich.org=================
>> > >>>> = BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>> > >>>> = EXIT CODE: 56
>> > >>>> = CLEANING UP REMAINING PROCESSES
>> > >>>> = YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>> > >>>>
>> > >>>>
>> > >>>> ===================================================================================
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>>
>> > >>>> _______________________________________________
>> > >>>> discuss mailing list discuss at mpich.org
>> > >>>> To manage subscription options or unsubscribe:
>> > >>>> https://lists.mpich.org/mailman/listinfo/discuss
>> > >>>
>> > >>>
>> > >>>
>> > >>> _______________________________________________
>> > >>> discuss mailing list discuss at mpich.org
>> > >>> To manage subscription options or unsubscribe:
>> > >>> https://lists.mpich.org/mailman/listinfo/discuss
>> > >>
>> > >>
>> > >>
>> > >> --
>> > >> Jeff Hammond
>> > >> jeff.science at gmail.com
>> > >> _______________________________________________
>> > >> discuss mailing list discuss at mpich.org
>> > >> To manage subscription options or unsubscribe:
>> > >> https://lists.mpich.org/mailman/listinfo/discuss
>> > >
>> > > _______________________________________________
>> > > discuss mailing list discuss at mpich.org
>> > > To manage subscription options or unsubscribe:
>> > > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> >
>> >
>> > --
>> > Jeff Hammond
>> > jeff.science at gmail.com
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>> >
>> > _______________________________________________
>> > discuss mailing list discuss at mpich.org
>> > To manage subscription options or unsubscribe:
>> > https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Jeff Hammond
jeff.science at gmail.com
More information about the discuss
mailing list