<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
span.EmailStyle20
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Thanks, Kurt. I think the reason your signal trap didn’t work is because `mpiexec` is trapping it first. Segfault is code error. Why would you want to trap it?<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">"Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall@nasa.gov><br>
<b>Date: </b>Monday, April 6, 2020 at 1:52 PM<br>
<b>To: </b>"Zhou, Hui" <zhouh@anl.gov>, "discuss@mpich.org" <discuss@mpich.org><br>
<b>Subject: </b>RE: [mpich-discuss] Abnormal termination on Linux<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
<p class="MsoNormal"><span style="color:#1F497D">Hui,</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Sorry for not mentioning that. MPICH 3.3.2 compiled with pgc++ 19.5.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Kurt </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Zhou, Hui <zhouh@anl.gov> <br>
<b>Sent:</b> Monday, April 6, 2020 12:58 PM<br>
<b>To:</b> discuss@mpich.org<br>
<b>Cc:</b> Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall@nasa.gov><br>
<b>Subject:</b> [EXTERNAL] Re: [mpich-discuss] Abnormal termination on Linux<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Which version of MPICH were you running?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">"Mccall, Kurt E. (MSFC-EV41) via discuss" <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Reply-To: </b>"<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>" <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Date: </b>Monday, April 6, 2020 at 12:45 PM<br>
<b>To: </b>"<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>" <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Cc: </b>"Mccall, Kurt E. (MSFC-EV41)" <<a href="mailto:kurt.e.mccall@nasa.gov">kurt.e.mccall@nasa.gov</a>><br>
<b>Subject: </b>Re: [mpich-discuss] Abnormal termination on Linux</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
<p class="MsoNormal"><span style="color:#1F497D">I should mention that I am unable to predict in which node or process the abnormal termination occurs, so I can’t practically attach a debugger and try to intercept the error.</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"> </span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D">Kurt</span><o:p></o:p></p>
<p class="MsoNormal"><span style="color:#1F497D"> </span><o:p></o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Mccall, Kurt E. (MSFC-EV41) <<a href="mailto:kurt.e.mccall@nasa.gov">kurt.e.mccall@nasa.gov</a>>
<br>
<b>Sent:</b> Monday, April 6, 2020 11:50 AM<br>
<b>To:</b> <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
<b>Cc:</b> Mccall, Kurt E. (MSFC-EV41) <<a href="mailto:kurt.e.mccall@nasa.gov">kurt.e.mccall@nasa.gov</a>><br>
<b>Subject:</b> Abnormal termination on Linux<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">I have a couple of questions about abnormal termination. The EXIT CODE below is 11, which could be signal SIGSEGV, or is it something defined by MPICH? If it is SIGSEGV, it is strange because my signal handler isn’t catching it and cleaning
up properly (the signal handler calls MPI_Finalize()). Is there any way to get more information about the location of the error?<o:p></o:p></p>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">= BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES<o:p></o:p></p>
<p class="MsoNormal">= PID 14385 RUNNING AT n020.cluster.com<o:p></o:p></p>
<p class="MsoNormal">= <b><span style="color:red">EXIT CODE: 11</span></b><o:p></o:p></p>
<p class="MsoNormal">= CLEANING UP REMAINING PROCESSES<o:p></o:p></p>
<div style="border:none;border-bottom:double windowtext 2.25pt;padding:0in 0in 22.0pt 0in">
<p class="MsoNormal">= YOU CAN IGNORE THE BELOW CLEANUP MESSAGES<o:p></o:p></p>
</div>
<p class="MsoNormal"> <o:p></o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Kurt<o:p></o:p></p>
</div>
</body>
</html>