<!-- BaNnErBlUrFlE-BoDy-start -->
<!-- Preheader Text : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">
Hi, I’m trying to debug a GPU-aware runtime for the Global Arrays library. We had a version of this working a while ago, but it has mysteriously started failing and we are trying to track down why. Currently, we are getting failures in MPI_Wait
</div>
<!-- Preheader Text : END -->
<!-- Email Banner : BEGIN -->
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerStart</div>
<!--[if ((ie)|(mso))]>
<table border="0" cellspacing="0" cellpadding="0" width="100%" style="padding: 16px 0px 16px 0px; direction: ltr" ><tr><td>
<table border="0" cellspacing="0" cellpadding="0" style="padding: 0px 10px 5px 6px; width: 100%; border-radius:4px; border-top:4px solid #90a4ae;background-color:#D0D8DC;"><tr><td valign="top">
<table align="left" border="0" cellspacing="0" cellpadding="0" style="padding: 4px 8px 4px 8px">
<tr><td style="color:#000000; font-family: 'Arial', sans-serif; font-weight:bold; font-size:14px; direction: ltr">
This Message Is From an External Sender
</td></tr>
<tr><td style="color:#000000; font-weight:normal; font-family: 'Arial', sans-serif; font-size:12px; direction: ltr">
This message came from outside your organization.
</td></tr>
</table>
</td></tr></table>
</td></tr></table>
<![endif]-->
<![if !((ie)|(mso))]>
<div dir="ltr" id="pfptBannerbiv18om" style="all: revert !important; display:block !important; text-align: left !important; margin:16px 0px 16px 0px !important; padding:8px 16px 8px 16px !important; border-radius: 4px !important; min-width: 200px !important; background-color: #D0D8DC !important; background-color: #D0D8DC; border-top: 4px solid #90a4ae !important; border-top: 4px solid #90a4ae;">
<div id="pfptBannerbiv18om" style="all: unset !important; float:left !important; display:block !important; margin: 0px 0px 1px 0px !important; max-width: 600px !important;">
<div id="pfptBannerbiv18om" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-weight:bold !important; font-weight:bold; font-size:14px !important; line-height:18px !important; line-height:18px">
This Message Is From an External Sender
</div>
<div id="pfptBannerbiv18om" style="all: unset !important; display:block !important; visibility: visible !important; background-color: #D0D8DC !important; color:#000000 !important; color:#000000; font-weight:normal; font-family: 'Arial', sans-serif !important; font-family: 'Arial', sans-serif; font-size:12px !important; line-height:18px !important; line-height:18px; margin-top:2px !important;">
This message came from outside your organization.
</div>
</div>
<div style="clear: both !important; display: block !important; visibility: hidden !important; line-height: 0 !important; font-size: 0.01px !important; height: 0px"> </div>
</div>
<![endif]>
<div style="display:none !important;display:none;visibility:hidden;mso-hide:all;font-size:1px;color:#ffffff;line-height:1px;height:0px;max-height:0px;opacity:0;overflow:hidden;">ZjQcmQRYFpfptBannerEnd</div>
<!-- Email Banner : END -->
<!-- BaNnErBlUrFlE-BoDy-end -->
<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head><!-- BaNnErBlUrFlE-HeAdEr-start -->
<style>
#pfptBannerbiv18om { all: revert !important; display: block !important;
visibility: visible !important; opacity: 1 !important;
background-color: #D0D8DC !important;
max-width: none !important; max-height: none !important }
.pfptPrimaryButtonbiv18om:hover, .pfptPrimaryButtonbiv18om:focus {
background-color: #b4c1c7 !important; }
.pfptPrimaryButtonbiv18om:active {
background-color: #90a4ae !important; }
</style>
<!-- BaNnErBlUrFlE-HeAdEr-end -->
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:Menlo;
panose-1:2 11 6 9 3 8 4 2 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Aptos",sans-serif;
mso-ligatures:standardcontextual;}
p.p1, li.p1, div.p1
{mso-style-name:p1;
margin:0in;
font-size:8.5pt;
font-family:Menlo;
color:black;}
span.s2
{mso-style-name:s2;
background:#878A04;}
span.s1
{mso-style-name:s1;}
span.apple-converted-space
{mso-style-name:apple-converted-space;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:11.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="#467886" vlink="#96607D" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi,</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m trying to debug a GPU-aware runtime for the Global Arrays library. We had a version of this working a while ago, but it has mysteriously started failing and we are trying to track down why. Currently, we are getting failures in MPI_Wait
and were wondering if anyone could provide some information on what exactly seems to be failing inside the wait call. The error we are getting is</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="p1"><span class="s1">Abort(206752655) on node 0: Fatal error in internal_Wait: Other MPI error, error stack:</span><o:p></o:p></p>
<p class="p1"><span class="s1">internal_Wait(68205)..........: </span><span class="s2">MPI_Wai</span><span class="s1">t(request=0x500847a0, status=0x7ffff9331800) failed</span><o:p></o:p></p>
<p class="p1"><span class="s1">MPIR_Wait(780)................:</span><o:p></o:p></p>
<p class="p1"><span class="s1">MPIR_Wait_state(737)..........:</span><o:p></o:p></p>
<p class="p1"><span class="s1">MPIDI_progress_test(134)......:</span><o:p></o:p></p>
<p class="p1"><span class="s1">MPIDI_OFI_handle_cq_error(793): OFI poll failed (ofi_events.c:793:MPIDI_OFI_handle_cq_error:Input/output error)</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve verified that the handle corresponding to <span class="s1">
0x500847a0 is getting set earlier in the code in an MPI_Isend call and that no MPI_Wait or MPI_Test is called on the handle before it crashes with the above error message. I’m using MPICH 4.2.1 using gcc/8.3.0. The MPICH library was configured with<o:p></o:p></span></p>
<p class="MsoNormal"><span class="s1"><o:p> </o:p></span></p>
<p class="p1"><span class="s1">../configure --prefix=/people/d3g293/mpich/mpich-4.2.1/build_newell/install \</span><o:p></o:p></p>
<p class="p1"><span class="apple-converted-space"> </span><span class="s1">--with-device=ch4:ofi:sockets --with-libfabric=embedded \</span><o:p></o:p></p>
<p class="p1"><span class="apple-converted-space"> </span><span class="s1">--without-ucx --enable-threads=multiple --with-slurm \</span><o:p></o:p></p>
<p class="p1"><span class="apple-converted-space"> </span><span class="s1">CC=gcc CXX=g+</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve tried building with UCX and gotten the same results.</p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Are these errors indicative of corruption of the request handle or problems with some internal MPI data structures or something else? Any information you can provide would be appreciated.</p>
<p class="MsoNormal"><br>
Thanks,</p>
<p class="MsoNormal">Bruce</p>
</div>
</body>
</html>