[mpich-discuss] how to make mpich crash hard so I can gdb bt it?

David Goodell (dgoodell) dgoodell at cisco.com
Fri Oct 11 17:22:50 CDT 2013


On Oct 11, 2013, at 4:44 PM, Jeff Hammond <jeff.science at gmail.com> wrote:

> I apologize if this is a stupid question, but I would like MPICH to
> crash and burn rather than terminate gently when an error occurs so
> that I can gdb bt it to know where the error is in my program, since I
> am absolutely certain that this error is in the application and not
> MPICH.
> 
> Ideally, I would be able to select the failure mode at runtime, e.g.
> MPICH_FAILURE_MODE={hard,soft}, so that I can get either the nice
> MPICH trace when I think MPICH is the problem and a brutal
> light-the-machine-room-on-fire-and-abort-the-universe error when I
> think it's my fault.

For assertion failures, I think I used to change MPIR_Assert_fail/MPIR_Assert_fail_fmt to just call "abort()" instead of MPID_Abort.  As long as my ulimit settings were correct then I'd get a core file that I could load in the debugger.  The process manager will take care of cleaning up the other processes anyway.

IIRC, you can perform a similar trick with MPIR_Err_create_code to catch error codes at the time they are created, though that may not be exactly the same spot that you want to examine.

-Dave




More information about the discuss mailing list