<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<style type="text/css" style="display:none;"> P {margin-top:0;margin-bottom:0;} </style>
</head>
<body dir="ltr">
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hi Thomas,</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
The assertion error means packet corruption. It's not clear what could be the cause unless you can provide a reproducer. Anyway, mpich-3.2.1 is quite old. My first suggestion would be try a newer mpich see if the error persists.</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
<br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
-- <br>
</div>
<div style="font-family: Calibri, Arial, Helvetica, sans-serif; font-size: 12pt; color: rgb(0, 0, 0);" class="elementToProof">
Hui Zhou<br>
</div>
<div id="appendonsend"></div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Thomas Jayaseelan-External via discuss <discuss@mpich.org><br>
<b>Sent:</b> Thursday, June 15, 2023 9:58 AM<br>
<b>To:</b> discuss@mpich.org <discuss@mpich.org><br>
<b>Cc:</b> Thomas Jayaseelan-External <thomas.jayaseelan@regeneron.com>; Sundaresh Krishnasamy-External <sundaresh.krishnasam@regeneron.com>; Hariram Jayaram-External <hariram.jayaram@regeneron.com><br>
<b>Subject:</b> Re: [mpich-discuss] Issue in MPICH while submitting jobs through slurm in NONMEM application</font>
<div> </div>
</div>
<style>
<!--
@font-face
{font-family:"Cambria Math"}
@font-face
{font-family:Calibri}
p.x_MsoNormal, li.x_MsoNormal, div.x_MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif}
p.x_MsoListParagraph, li.x_MsoListParagraph, div.x_MsoListParagraph
{margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:10.0pt;
font-family:"Calibri",sans-serif}
span.x_EmailStyle21
{font-family:"Calibri",sans-serif;
color:windowtext}
.x_MsoChpDefault
{font-size:10.0pt}
@page WordSection1
{margin:1.0in 1.0in 1.0in 1.0in}
div.x_WordSection1
{}
ol
{margin-bottom:0in}
ul
{margin-bottom:0in}
-->
</style>
<div lang="EN-US" link="#0563C1" vlink="#954F72" style="word-wrap:break-word">
<div class="x_WordSection1">
<p class="x_MsoNormal">Hi All,</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">It would helpful if you could help me on the below issue that we face in our application using MPI.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Best Regards,</p>
<p class="x_MsoNormal">Thomas</p>
<p class="x_MsoNormal"> </p>
<div>
<div style="border:none; border-top:solid #E1E1E1 1.0pt; padding:3.0pt 0in 0in 0in">
<p class="x_MsoNormal"><b>From:</b> Thomas Jayaseelan-External <br>
<b>Sent:</b> Thursday, June 15, 2023 10:58 AM<br>
<b>To:</b> discuss@mpich.org<br>
<b>Cc:</b> Sundaresh Krishnasamy-External <sundaresh.krishnasam@regeneron.com>; Hariram Jayaram-External <hariram.jayaram@regeneron.com><br>
<b>Subject:</b> Issue in MPICH while submitting jobs through slurm in NONMEM application</p>
</div>
</div>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Hi Team,</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">This is Thomas, I am part of HPCOPs team in Regeneron Pharmaceuticals company. We build and support the HPC cluster infrastructure for the business as per their requirements.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">I have reached out to you to get help on an issue that we are currently facing with MPI. It would be great if you could help us in getting a solution to it.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Nonmem is the application in which users submit the jobs through CLI, it is a CLI based application. When user tries to run job with more no. of cores the job runs for 10 to 15 hours and then stops intermittently. Please find the below
error message that we get in our output file.</p>
<p class="x_MsoNormal"> </p>
<ol start="1" type="1" style="margin-top:0in">
<li class="x_MsoListParagraph" style="margin-left:0in"><span style="font-size:11.0pt">Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 600: hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_ID_INFO || hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_TMPVC_INFO</span></li></ol>
<p class="x_MsoListParagraph"><span style="font-size:11.0pt">internal ABORT - process 1231</span></p>
<p class="x_MsoListParagraph"><span style="font-size:11.0pt">Done with nonmem execution</span></p>
<p class="x_MsoNormal"><span style=""> </span></p>
<ol start="2" type="1" style="margin-top:0in">
<li class="x_MsoListParagraph" style="margin-left:0in"><span style="font-size:11.0pt">Assertion failed in file src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c at line 600: hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_ID_INFO || hdr.pkt_type == MPIDI_NEM_TCP_SOCKSM_PKT_TMPVC_INFO</span></li></ol>
<p class="x_MsoListParagraph"><span style="font-size:11.0pt">internal ABORT - process 163</span></p>
<p class="x_MsoListParagraph"><span style="font-size:11.0pt">Done with nonmem execution</span></p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal"><b>Details:</b></p>
<p class="x_MsoNormal">NONMEM application version – NM750</p>
<p class="x_MsoNormal">Slurm version - 21.08.6</p>
<p class="x_MsoNormal">MPICH version - 3.2.1</p>
<p class="x_MsoNormal">OS – Amazon Linux 2</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Please let me know if you need anything from my end.</p>
<p class="x_MsoNormal"> </p>
<p class="x_MsoNormal">Best Regards,</p>
<p class="x_MsoNormal">Thomas</p>
<p class="x_MsoNormal"> </p>
</div>
<div>******************************************************************** <br>
This e-mail and any attachment hereto, is intended only for use by the addressee(s) named above and may contain legally privileged and/or confidential information. If you are not the intended recipient of this e-mail, any dissemination, distribution or copying
of this email, or any attachment hereto, is strictly prohibited. If you receive this email in error please immediately notify me by return electronic mail and permanently delete this email and any attachment hereto, any copy of this e-mail and of any such
attachment, and any printout thereof. Finally, please note that only authorized representatives of Regeneron Pharmaceuticals, Inc. have the power and authority to enter into business dealings with any third party.
<br>
********************************************************************<br>
</div>
</div>
</body>
</html>