[mpich-discuss] mpich 4.3.1 still have compilation problem when using --with-hcoll=/opt/mellanox/hcoll

Audet, Martin Martin.Audet at cnrc-nrc.gc.ca
Mon Jun 23 10:03:53 CDT 2025


Hello,


It seems that the silly compilation problem with hcoll_rte.c I had back in April with mpich 4.3.0 when using --with-hcoll=/opt/mellanox/hcoll configuration option is still present in 4.3.1, see:


https://urldefense.us/v3/__https://lists.mpich.org/mailman/htdig/discuss/2025-April/006725.html__;!!G_uCfscf7eWS!fvaja_SlDAvIzwz1hZZHt1QY74b9Va08hlq4gBLPtbxoN3xFpFmYKz6GBSA1PFywgC_JRwhwv3olRL2syH0Mhruza_g$ 


It seems that the following very simple patch I was told to try with 4.3.0 haven't been included in 4.3.1:


--- src/mpid/common/hcoll/hcoll_rte.c   2025-04-16 12:54:24.847337975 -0400

+++ src/mpid/common/hcoll/hcoll_rte.c   2025-04-16 12:55:05.428164974 -0400

@@ -55,7 +55,7 @@

         /* FIXME: The hcoll library needs to be updated to return

          * error codes.  The progress function pointer right now

          * expects that the function returns void. */

-        ret = hcoll_do_progress(&made_progress);

+        ret = hcoll_do_progress(-1, &made_progress);

         MPIR_Assert(ret == MPI_SUCCESS);

     }

 }

So it look like this code path is not compiled very often by mpich developers or it's QA process.


BTW applying the same patch fix the compilation problem, but:


What does it mean for us users ? Should we still use this option ? BTW hcoll is a very cool mechanism for improving collective operations efficiency. Is this option obsolete ? Was it replaced by something else ?


Thanks,


Martin Audet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250623/510e167f/attachment.html>


More information about the discuss mailing list