[mpich-discuss] turning off MPI abort messages

Jeff Hammond jeff.science at gmail.com
Sat Feb 22 07:41:32 CST 2014


I'll look into a general setenv-like mechanism for CVARs rather than an abort-specific one-off. Might be there already through MPI_T interface (which is standard, unlike MPIX).

Jeff

Sent from my iPhone

> On Feb 21, 2014, at 11:35 PM, Jed Brown <jed at jedbrown.org> wrote:
> 
> Jeff Hammond <jeff.science at gmail.com> writes:
> 
>> Did you look at my patch or the demonstration yet? I posted all the details this afternoon.
> 
> Yeah, I wrote the message on the plane before I could read it.
> 
>> I tried very hard to support verbosity suppression in a reasonable way at runtime. 
>> 
>> Do you really want an MPIX call that is equivalent to setenv("<see
>> patch>")? Is the extra code worth it? (These are serious questions.)
> 
> My understanding of these variables is that they are processed in
> MPI_Init.  PETSc may not have access to MPI_Init, so we're too late to
> influence the environment variable.  But if the user gets a return code
> and calls our error-handler (as we encourage them to do if they have
> nothing better to do with it), we'd like to be able to exit cleanly.  A
> global setting isn't as good because if the user encounters an error
> condition and "crashes", it probably makes sense for MPICH to print the
> information.
> 
> 
> The reason this issue came up is that we have a sizable fraction of
> support email in which the exact question is answered in the error
> message we print when they call our error handler.  But a lot of users
> don't read the error message and worse, they don't copy the whole thing
> into the email.  This happens almost every day and requires an extra
> round-trip on the list the resolve.  Our thinking was that if we can
> make the error message visually cleaner, they may be more likely to read
> it and answer their own question or copy the message into an email in
> which case we can ask them to read the relevant lines of the error
> message.  (A large fraction of these issues are because the user has
> configured something incompatible, usually via run-time options.  It is
> very analogous to asking for a file that doesn't exist.  A
> straightforward change of run-time options will get them running
> correctly.)
> 
> The MPICH Abort messages add a lot of visual clutter that we hypothesize
> makes people less likely to read our messages (telling them how to fix
> the problem), or to believe the output does not have useful information.
> 
> I think something along the lines of MPIX_Abort_quietly(comm,err) would
> be the right granularity since it allows our error handler to suppress
> the clutter without influencing the behavior observed in other libraries
> or the application.



More information about the discuss mailing list