[mpich-discuss] how to make mpich crash hard so I can gdb bt it?

Jeff Hammond jeff.science at gmail.com
Fri Oct 11 17:55:48 CDT 2013


Thanks for all the pointers.  Jed's response indicates that I need to
learn how to use gdb better, but I'll cache all of these suggestions
since I imagine they are all useful in some situations.

Best,

Jeff

On Fri, Oct 11, 2013 at 5:22 PM, David Goodell (dgoodell)
<dgoodell at cisco.com> wrote:
> On Oct 11, 2013, at 4:44 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>
>> I apologize if this is a stupid question, but I would like MPICH to
>> crash and burn rather than terminate gently when an error occurs so
>> that I can gdb bt it to know where the error is in my program, since I
>> am absolutely certain that this error is in the application and not
>> MPICH.
>>
>> Ideally, I would be able to select the failure mode at runtime, e.g.
>> MPICH_FAILURE_MODE={hard,soft}, so that I can get either the nice
>> MPICH trace when I think MPICH is the problem and a brutal
>> light-the-machine-room-on-fire-and-abort-the-universe error when I
>> think it's my fault.
>
> For assertion failures, I think I used to change MPIR_Assert_fail/MPIR_Assert_fail_fmt to just call "abort()" instead of MPID_Abort.  As long as my ulimit settings were correct then I'd get a core file that I could load in the debugger.  The process manager will take care of cleaning up the other processes anyway.
>
> IIRC, you can perform a similar trick with MPIR_Err_create_code to catch error codes at the time they are created, though that may not be exactly the same spot that you want to examine.
>
> -Dave
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list