<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
@font-face
{font-family:Aptos;
panose-1:2 11 0 4 2 2 2 2 2 4;}
@font-face
{font-family:Consolas;
panose-1:2 11 6 9 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:12.0pt;
font-family:"Aptos",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
span.EmailStyle210
{mso-style-type:personal-reply;
font-family:"Aptos",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;
mso-ligatures:none;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal"><span style="font-size:11.0pt">Hi Martin,<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">You are correct, we failed to warn users about the possible runtime issues using hcoll. I will have to go back and check when the last successful tests were run. As you guessed, this integration has been neglected
somewhat in recent times. I have created a github issue to update our documentation and hopefully backport a fix to the stable branch once confirmed.
<a href="https://urldefense.us/v3/__https://github.com/pmodels/mpich/issues/7475__;!!G_uCfscf7eWS!dGXPC7vHe2DFYtui1VPbP55rKqzCi1ifkksXSWoCtBDdQCZ0t9CtpfeHs1SGPwcZH8S3LtDhM0pHcEEj$">https://github.com/pmodels/mpich/issues/7475</a><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt">Ken<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-size:11.0pt"><o:p> </o:p></span></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">
<b><span style="color:black">From: </span></b><span style="color:black">Audet, Martin <Martin.Audet@cnrc-nrc.gc.ca><br>
<b>Date: </b>Monday, June 23, 2025 at 2:48</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">PM<br>
<b>To: </b>Raffenetti, Ken <raffenet@anl.gov>, discuss@mpich.org <discuss@mpich.org><br>
<b>Subject: </b>RE: Re: [mpich-discuss] mpich 4.3.1 still have compilation problem when using --with-hcoll=/opt/mellanox/hcoll<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in;mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">Thanks for this quick reply. You say that hcoll don’t work correctly (runtime) so in this case, there should be a warning or something to warn
users if they try to use it (like us). Slower but correct results are far better than faster but incorrect<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in;mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerStart<o:p></o:p></span></p>
</div>
<div style="border:none;border-top:solid #90A4AE 3.0pt;padding:0in 0in 0in 0in;display:block!important;text-align:left!important;margin:0px!important;padding:16px!important;border-radius:4px!important;min-width:200px!important;background-color:#D0D8DC!important;border-top:#90a4ae!important" id="pfptBannerxpjbf3u">
<div id="pfptBannerxpjbf3u">
<div id="pfptBannerxpjbf3u">
<p class="MsoNormal" style="margin-left:.5in;line-height:13.5pt;background:#D0D8DC">
<b><span style="font-family:"Arial",sans-serif;color:black">This Message Is From an External Sender
<o:p></o:p></span></b></p>
</div>
<div id="pfptBannerxpjbf3u">
<p class="MsoNormal" style="margin-left:.5in;line-height:13.5pt;background:#D0D8DC">
<span style="font-family:"Arial",sans-serif;color:black">This message came from outside your organization.
<o:p></o:p></span></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in;background:#D0D8DC"><span style="color:black"> </span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in;mso-line-height-alt:.75pt"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerEnd<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Thanks for this quick reply.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">You say that hcoll don’t work correctly (runtime) so in this case, there should be a warning or something to warn users if they try to use it (like
us). Slower but correct results are far better than faster but incorrect ones. I will recompile the library without this option so that it doesn’t create problems for the users of our cluster.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Since which version the hcoll don’t work correctly ?</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">I may also disable it in the older mpich versions we keep available for our users.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Thanks,</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">Martin</span><o:p></o:p></p>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="margin-left:.5in"><b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif">From:</span></b><span style="font-size:11.0pt;font-family:"Calibri",sans-serif"> Raffenetti, Ken <raffenet@anl.gov>
<br>
<b>Sent:</b> June 23, 2025 15:40<br>
<b>To:</b> discuss@mpich.org<br>
<b>Cc:</b> Audet, Martin <Martin.Audet@cnrc-nrc.gc.ca><br>
<b>Subject:</b> EXT: Re: [mpich-discuss] mpich 4.3.1 still have compilation problem when using --with-hcoll=/opt/mellanox/hcoll</span><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:.5in"><b>***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.</b><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt">Hi Martin,</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt">My apologies for the lack of update on this topic. We did not include this patch because even with successful compilation, MPICH hcoll integration does not function correctly at runtime
in our tests. Due to other priorities, we have not yet spent the time to fix the issue.</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt">Ken</span><o:p></o:p></p>
<p class="MsoNormal" style="margin-left:.5in"><span style="font-size:11.0pt"> </span><o:p></o:p></p>
<div id="mail-editor-reference-message-container">
<div>
<div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:1.0in">
<b><span style="color:black">From: </span></b><span style="color:black">Audet, Martin via discuss <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Date: </b>Monday, June 23, 2025 at 10:04</span><span style="font-family:"Arial",sans-serif;color:black"> </span><span style="color:black">AM<br>
<b>To: </b><a href="mailto:discuss@mpich.org">discuss@mpich.org</a> <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Cc: </b>Audet, Martin <<a href="mailto:Martin.Audet@cnrc-nrc.gc.ca">Martin.Audet@cnrc-nrc.gc.ca</a>><br>
<b>Subject: </b>[mpich-discuss] mpich 4.3.1 still have compilation problem when using --with-hcoll=/opt/mellanox/hcoll</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:1.0pt;color:white">Hello, It seems that the silly compilation problem with hcoll_rte.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">c
I had back in April with mpich 4.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">3.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">0
when using --with-hcoll=/opt/mellanox/hcoll configuration option is still present in 4.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">3.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">1,
see: https:</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">//lists.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">mpich.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">org/mailman/htdig/discuss/2025-April/006725.</span><span style="font-size:1.0pt;font-family:"Arial",sans-serif;color:white"> </span><span style="font-size:1.0pt;color:white">html</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerStart</span><o:p></o:p></p>
</div>
<div style="border:none;border-top:solid #90A4AE 3.0pt;padding:0in 0in 0in 0in;display:block!important;text-align:left!important;margin:0px!important;padding:16px!important;border-radius:4px!important;min-width:200px!important;background-color:#D0D8DC!important;border-top:#90a4ae!important" id="pfptBannerbh5bsrx">
<div id="pfptBannerbh5bsrx">
<div id="pfptBannerbh5bsrx">
<p class="MsoNormal" style="margin-left:1.0in;line-height:13.5pt;background:#D0D8DC">
<b><span style="font-family:"Arial",sans-serif;color:black">This Message Is From an External Sender
</span></b><o:p></o:p></p>
</div>
<div id="pfptBannerbh5bsrx">
<p class="MsoNormal" style="margin-left:1.0in;line-height:13.5pt;background:#D0D8DC">
<span style="font-family:"Arial",sans-serif;color:black">This message came from outside your organization.
</span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in;background:#D0D8DC"><span style="color:black"> </span><o:p></o:p></p>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-size:1.0pt;color:white">ZjQcmQRYFpfptBannerEnd</span><o:p></o:p></p>
</div>
<div id="divtagdefaultwrapper">
<p style="margin-left:1.0in"><span style="color:black">Hello,</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">It seems that the silly compilation problem with hcoll_rte.c I had back in April with mpich 4.3.0 when using --with-hcoll=/opt/mellanox/hcoll configuration option is still present in 4.3.1, see:</span><o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:"Calibri",sans-serif;color:black"> </span><o:p></o:p></p>
</div>
<blockquote style="margin-left:30.0pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:"Calibri",sans-serif;color:black"><a href="https://urldefense.us/v3/__https:/lists.mpich.org/mailman/htdig/discuss/2025-April/006725.html__;!!G_uCfscf7eWS!fvaja_SlDAvIzwz1hZZHt1QY74b9Va08hlq4gBLPtbxoN3xFpFmYKz6GBSA1PFywgC_JRwhwv3olRL2syH0Mhruza_g$">https://lists.mpich.org/mailman/htdig/discuss/2025-April/006725.html</a></span><o:p></o:p></p>
</div>
</blockquote>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">It seems that the following very simple patch I was told to try with 4.3.0 haven't been included in 4.3.1:</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<blockquote style="margin-left:30.0pt;margin-top:5.0pt;margin-right:0in;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black">--- src/mpid/common/hcoll/hcoll_rte.c 2025-04-16 12:54:24.847337975 -0400</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black">+++ src/mpid/common/hcoll/hcoll_rte.c 2025-04-16 12:55:05.428164974 -0400</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black">@@ -55,7 +55,7 @@</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> /* FIXME: The hcoll library needs to be updated to return</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> * error codes. The progress function pointer right now</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> * expects that the function returns void. */</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black">- ret = hcoll_do_progress(&made_progress);</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black">+ ret = hcoll_do_progress(-1, &made_progress);</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> MPIR_Assert(ret == MPI_SUCCESS);</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> }</span><o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:Consolas;color:black"> }</span><o:p></o:p></p>
</div>
</blockquote>
<div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:"Calibri",sans-serif;color:black"> </span><o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-left:1.0in"><span style="font-family:"Calibri",sans-serif;color:black">So it look like this code path is not compiled very often by mpich developers or it's QA process.
</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">BTW applying the same patch fix the compilation problem, but:</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">What does it mean for us users ? Should we still use this option ? BTW hcoll is a very cool mechanism for improving collective operations efficiency. Is this option obsolete ? Was it replaced by something
else ?</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">Thanks,</span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black"> </span><o:p></o:p></p>
<p style="margin-left:1.0in"><span style="color:black">Martin Audet</span><o:p></o:p></p>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</div>
</body>
</html>