[mpich-discuss] mpich 4.3.1 still have compilation problem when using --with-hcoll=/opt/mellanox/hcoll
Audet, Martin
Martin.Audet at cnrc-nrc.gc.ca
Mon Jun 23 10:03:53 CDT 2025
Hello,
It seems that the silly compilation problem with hcoll_rte.c I had back in April with mpich 4.3.0 when using --with-hcoll=/opt/mellanox/hcoll configuration option is still present in 4.3.1, see:
https://urldefense.us/v3/__https://lists.mpich.org/mailman/htdig/discuss/2025-April/006725.html__;!!G_uCfscf7eWS!fvaja_SlDAvIzwz1hZZHt1QY74b9Va08hlq4gBLPtbxoN3xFpFmYKz6GBSA1PFywgC_JRwhwv3olRL2syH0Mhruza_g$
It seems that the following very simple patch I was told to try with 4.3.0 haven't been included in 4.3.1:
--- src/mpid/common/hcoll/hcoll_rte.c 2025-04-16 12:54:24.847337975 -0400
+++ src/mpid/common/hcoll/hcoll_rte.c 2025-04-16 12:55:05.428164974 -0400
@@ -55,7 +55,7 @@
/* FIXME: The hcoll library needs to be updated to return
* error codes. The progress function pointer right now
* expects that the function returns void. */
- ret = hcoll_do_progress(&made_progress);
+ ret = hcoll_do_progress(-1, &made_progress);
MPIR_Assert(ret == MPI_SUCCESS);
}
}
So it look like this code path is not compiled very often by mpich developers or it's QA process.
BTW applying the same patch fix the compilation problem, but:
What does it mean for us users ? Should we still use this option ? BTW hcoll is a very cool mechanism for improving collective operations efficiency. Is this option obsolete ? Was it replaced by something else ?
Thanks,
Martin Audet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250623/510e167f/attachment.html>
More information about the discuss
mailing list