<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:DengXian;
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:"\@DengXian";
panose-1:2 1 6 0 3 1 1 1 1 1;}
@font-face
{font-family:Menlo;
panose-1:2 11 6 9 3 8 4 2 2 4;}
@font-face
{font-family:"Lucida Grande";
panose-1:2 11 6 0 4 5 2 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0in;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
p.xmsolistparagraph, li.xmsolistparagraph, div.xmsolistparagraph
{mso-style-name:x_msolistparagraph;
margin-top:0in;
margin-right:0in;
margin-bottom:0in;
margin-left:.5in;
font-size:10.0pt;
font-family:"Calibri",sans-serif;}
p.xp1, li.xp1, div.xp1
{mso-style-name:x_p1;
margin:0in;
font-size:8.5pt;
font-family:Menlo;
color:black;}
p.p1, li.p1, div.p1
{mso-style-name:p1;
margin:0in;
font-size:8.5pt;
font-family:Menlo;
color:black;}
span.xs1
{mso-style-name:x_s1;}
span.s1
{mso-style-name:s1;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
/* List Definitions */
@list l0
{mso-list-id:17850237;
mso-list-template-ids:265439344;}
@list l0:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level2
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level3
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level5
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level6
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level8
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l0:level9
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1
{mso-list-id:1825926197;
mso-list-template-ids:405978184;}
@list l1:level1
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level2
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:1.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level3
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:1.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level4
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:2.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level5
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:2.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level6
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:3.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level7
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:3.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level8
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:4.0in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
@list l1:level9
{mso-level-number-format:bullet;
mso-level-text:\F0B7 ;
mso-level-tab-stop:4.5in;
mso-level-number-position:left;
text-indent:-.25in;
mso-ansi-font-size:10.0pt;
font-family:Symbol;}
ol
{margin-bottom:0in;}
ul
{margin-bottom:0in;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Oops. I forgot to hit send on this after I wrote it. Sorry about the delay.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">You can download the latest ga release at <a href="https://github.com/GlobalArrays/ga/releases/tag/v5.8.2">
https://github.com/GlobalArrays/ga/releases/tag/v5.8.2</a> (this is what I’ve been testing with lately). I’ve been configuring with<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="p1"><span class="s1">../configure --enable-i4 --enable-cxx --with-mpi3\</span></p>
<p class="p1"><span class="apple-converted-space"> </span><span class="s1">--prefix=/people/d3g293/tmp/ga-hotfiix/build_rma_314 CC=mpicc CXX=mpicxx FC=mpif90 \</span></p>
<p class="p1"><span class="apple-converted-space"> </span><span class="s1">CFLAGS="-g" CXXFLAG="-g" FFLAGS="-g"</span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">After configuring and building, you can run the test suite using<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="p1"><span class="s1">make check-ga MPIEXEC="mpirun -n 4 "</span></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Our standard configuration for running the test suite is 4 processors on 2 nodes.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Zhou, Hui <zhouh@anl.gov><br>
<b>Date: </b>Friday, November 4, 2022 at 2:09 PM<br>
<b>To: </b>discuss@mpich.org <discuss@mpich.org><br>
<b>Cc: </b>Palmer, Bruce J <Bruce.Palmer@pnnl.gov><br>
<b>Subject: </b>Re: [mpich-discuss] Crash on MPI_Rput<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black">Hi Bruce,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black">Is the test suite available for us to checkout and test?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black">--
<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="background:white"><span style="font-size:12.0pt;color:black">Hui<o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="1" width="100%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> Palmer, Bruce J via discuss <discuss@mpich.org><br>
<b>Sent:</b> Friday, November 4, 2022 4:03 PM<br>
<b>To:</b> discuss@mpich.org <discuss@mpich.org><br>
<b>Cc:</b> Palmer, Bruce J <Bruce.Palmer@pnnl.gov><br>
<b>Subject:</b> Re: [mpich-discuss] Crash on MPI_Rput</span> </p>
<div>
<p class="MsoNormal"> </p>
</div>
</div>
<div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt">I kind of dropped this for a while but I’d like to pick it back up. I did some more testing using different versions of mpich and got the following results for the RMA runtime</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">MPICH-3.1.4 configured with</span></p>
<p class="xp1"><span class="xs1">./configure --prefix=/people/d3g293/mpich/mpich-3.1.4/install --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc CXX=g++</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">2/80 tests fail in GA test suite</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">MPICH-4.0.2 configured with</span></p>
<p class="xp1"><span class="xs1">unset F90</span></p>
<p class="xp1"><span class="xs1">./configure --prefix=/people/d3g293/mpich/mpich-4.0.2/install --with-device=ch4:ofi:sockets --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc CXX=g++</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">25/80 tests fail in GA test suite</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Running with MPICH-3.3.2 seems to lead to around 8 failures, but my notes on this aren’t that good.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">If I run with OpenMPI 4.1.4, everything passes. Any reason for why I’m seeing this? I haven’t really done much to this runtime in the last few years.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Bruce</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Palmer, Bruce J via discuss <discuss@mpich.org><br>
<b>Date: </b>Wednesday, September 28, 2022 at 12:30 PM<br>
<b>To: </b>'Thakur, Rajeev' <thakur@anl.gov>, discuss@mpich.org <discuss@mpich.org>, Zhou, Hui <zhouh@anl.gov><br>
<b>Cc: </b>Palmer, Bruce J <Bruce.Palmer@pnnl.gov><br>
<b>Subject: </b>Re: [mpich-discuss] Crash on MPI_Rput</span></p>
</div>
<div style="border:none;border-left:solid #D77600 6.0pt;padding:0in 0in 0in 0in;font-size:1.15rem">
<p class="xmsonormal" align="center" style="text-align:center;background:#F7E3CC">
<span style="font-size:11.0pt;font-family:"Arial",sans-serif;color:black">Check twice before you click! This email originated from outside PNNL.</span></p>
</div>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt;color:#1F497D">I think the MPI-RMA runtime was mostly (maybe completely) working with 3.2-3.4. It may have even been working earlier with 4.0. I think there is a pretty good chance that the problem is a system
configuration problem at our end and I was hoping that you might have some insight into what it might be based on the errors I’m seeing. I can try running with a few earlier versions of mpich and see if any of them work better.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:#1F497D"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:#1F497D">Bruce</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;color:#1F497D"> </span></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal"><b><span style="font-size:11.0pt">From:</span></b><span style="font-size:11.0pt"> Thakur, Rajeev <thakur@anl.gov>
<br>
<b>Sent:</b> Wednesday, September 28, 2022 12:24 PM<br>
<b>To:</b> discuss@mpich.org; Zhou, Hui <zhouh@anl.gov><br>
<b>Cc:</b> Palmer, Bruce J <Bruce.Palmer@pnnl.gov><br>
<b>Subject:</b> Re: [mpich-discuss] Crash on MPI_Rput</span></p>
</div>
</div>
<p class="xmsonormal"> </p>
<p class="xmsonormal"><span style="font-size:11.0pt;font-family:"Lucida Grande",sans-serif">Was it working with an earlier version of MPICH? If so, which one?</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;font-family:"Lucida Grande",sans-serif"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;font-family:"Lucida Grande",sans-serif">Rajeev</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt;font-family:"Lucida Grande",sans-serif"> </span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal"><b><span style="font-size:12.0pt;color:black">From: </span>
</b><span style="font-size:12.0pt;color:black">"Palmer, Bruce J via discuss" <</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Reply-To: </b>"</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">" <</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Date: </b>Wednesday, September 28, 2022 at 2:20 PM<br>
<b>To: </b>"Zhou, Hui" <</span><a href="mailto:zhouh@anl.gov"><span style="font-size:12.0pt">zhouh@anl.gov</span></a><span style="font-size:12.0pt;color:black">>, "</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">"
<</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Cc: </b>"Palmer, Bruce J" <</span><a href="mailto:Bruce.Palmer@pnnl.gov"><span style="font-size:12.0pt">Bruce.Palmer@pnnl.gov</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Subject: </b>Re: [mpich-discuss] Crash on MPI_Rput</span></p>
</div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
</div>
<p class="xmsonormal"><span style="font-size:11.0pt">I upgraded to mpich-4.0.2 (latest stable release) and get pretty much the same result. This failure is reproducible, I get the same error on multiple runs so it doesn’t look like an unexpected process failure.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">One other feature that I forgot to mention earlier is that I’m running this test on 4 processors distributed over 2 nodes. If I run 4 processes on 1 node, the code runs without error.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Bruce</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal" style="margin-bottom:12.0pt"><b><span style="font-size:12.0pt;color:black">From:
</span></b><span style="font-size:12.0pt;color:black">Zhou, Hui <</span><a href="mailto:zhouh@anl.gov"><span style="font-size:12.0pt">zhouh@anl.gov</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Date: </b>Tuesday, September 27, 2022 at 2:55 PM<br>
<b>To: </b></span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black"> <</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Cc: </b>Palmer, Bruce J <</span><a href="mailto:Bruce.Palmer@pnnl.gov"><span style="font-size:12.0pt">Bruce.Palmer@pnnl.gov</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Subject: </b>Re: Crash on MPI_Rput</span></p>
</div>
<p class="xmsonormal"><span style="font-size:11.0pt">Hi Bruce,</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<ul style="margin-top:0in" type="disc">
<li class="xmsonormal" style="mso-list:l0 level1 lfo3"><span style="font-size:11.0pt">srun: error: node003: task 1: Exited with exit code 7</span><o:p></o:p></li></ul>
<p class="xmsolistparagraph"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt">Looks like one of the process crashed unexpectedly.</span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<div>
<div>
<div>
<p class="xmsonormal"><span style="font-size:11.0pt">-- <br>
Hui Zhou</span></p>
</div>
</div>
</div>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal"><span style="font-size:11.0pt"> </span></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="xmsonormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">
<b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Palmer, Bruce J via discuss <</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Date: </b>Tuesday, September 27, 2022 at 3:32 PM<br>
<b>To: </b></span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black"> <</span><a href="mailto:discuss@mpich.org"><span style="font-size:12.0pt">discuss@mpich.org</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Cc: </b>Palmer, Bruce J <</span><a href="mailto:Bruce.Palmer@pnnl.gov"><span style="font-size:12.0pt">Bruce.Palmer@pnnl.gov</span></a><span style="font-size:12.0pt;color:black">><br>
<b>Subject: </b>[mpich-discuss] Crash on MPI_Rput</span></p>
</div>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">Hi,</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">I’m testing the MPI-RMA runtime in Global Arrays and I’m getting a lot more crashes than I’ve seen in the past. The MPI-RMA runtime code is fairly stable and hasn’t been modified
much recently and all the tests used to pass using one of the more recent MPICH releases. However, I’m getting significant crashes at this point. One of them occurs in a program designed to test non-blocking communication. It creates an MPI window, using MPI_Alloc_mem
followed by MPI_Win_create and then calls MPI_Win_lock_all on the window. The code currently crashes when it gets to an MPI_Rput call. I’m trying to see if there is something different in the environment that might be causing this.</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">I’m currently up to MPICH-4.0b1 configured with</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">./configure --prefix=/people/d3g293/mpich/mpich-4.0b1/install --with-device=ch4:ofi:sockets --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc CXX=g++</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">#./configure --prefix=/people/d3g293/mpich/mpich-3.4.1/install-newell-nocuda --with-device=ch4:ofi:sockets --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc
CXX=g++</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">I’ve tried other recent vintages of MPICH, but I get similar results. The error I’m seeing when the program crashes is</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[proxy:0:1@node003.local] HYD_pmcd_pmip_control_cmd_cb (pm/pmiserv/pmip_cb.c:899): assert (!closed) failed</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[proxy:0:1@node003.local] HYDT_dmxu_poll_wait_for_event (tools/demux/demux_poll.c:76): callback returned error status</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">srun: error: node003: task 1: Exited with exit code 7</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[proxy:0:1@node003.local] main (pm/pmiserv/pmip.c:169): demux engine error waiting for event</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[mpiexec@node002.local] HYDT_bscu_wait_for_completion (tools/bootstrap/utils/bscu_wait.c:74): one of the processes terminated badly; aborting</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[mpiexec@node002.local] HYDT_bsci_wait_for_completion (tools/bootstrap/src/bsci_wait.c:21): launcher returned error waiting for completion</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[mpiexec@node002.local] HYD_pmci_wait_for_completion (pm/pmiserv/pmiserv_pmci.c:179): launcher returned error waiting for completion</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">[mpiexec@node002.local] main (ui/mpich/mpiexec.c:325): process manager error waiting for completion</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">Any suggestions about what might be going wrong here? It could be a problem with the machine configuration, since this code seemed to be running fine a while ago and has not been
modified since then. I’ll try building the latest stable release and see if that fixes anything, but as I mentioned none of the recent releases seems to work.</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">Bruce Palmer</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">Computer Scientist</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">Pacific Northwest National Laboratory</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt">(509) 375-3899</span></p>
<p class="xmsonormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span></p>
</div>
</div>
</div>
</div>
</body>
</html>