[mpich-discuss] mpich 4.3.0 compilation problem when using --with-hcoll=/opt/mellanox/hcoll
Audet, Martin
Martin.Audet at cnrc-nrc.gc.ca
Wed Apr 16 13:54:27 CDT 2025
Hello Hui,
I tried the patch and it works (it compiles at least).
Thanks for your very quick response !
Martin
________________________________
From: Zhou, Hui <zhouh at anl.gov>
Sent: April 16, 2025 12:17 PM
To: discuss at mpich.org
Cc: Audet, Martin
Subject: EXT: Re: [mpich-discuss] mpich 4.3.0 compilation problem when using --with-hcoll=/opt/mellanox/hcoll
***Attention*** This email originated from outside of the NRC. ***Attention*** Ce courriel provient de l'extérieur du CNRC.
Hi Martin,
Could you try the patch in https://urldefense.us/v3/__https://github.com/pmodels/mpich/pull/7047?__;!!G_uCfscf7eWS!fFz0Y7WM3O3wILKEFJYH4NdS0Q4bnjjAlwkuzQk7YDRXZwbZpV6PhzYRIIp689vvuEERZIuT2W0cbtCyvZX_-QIOCT0$
--
Hui
________________________________
From: Audet, Martin via discuss <discuss at mpich.org>
Sent: Wednesday, April 16, 2025 11:13 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Audet, Martin <Martin.Audet at cnrc-nrc.gc.ca>
Subject: [mpich-discuss] mpich 4.3.0 compilation problem when using --with-hcoll=/opt/mellanox/hcoll
Hello mpich community, When I try to compile mpich 4. 3. 0 configured with --with-hcoll=/opt/mellanox/hcoll option, I get a compilation error because the hcoll_do_progress() function is defined with two arguments in hcoll_init. c but it is called
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hello mpich community,
When I try to compile mpich 4.3.0 configured with --with-hcoll=/opt/mellanox/hcoll option, I get a compilation error because the hcoll_do_progress() function is defined with two arguments in hcoll_init.c but it is called only with one in hcoll_rte.c !
Here is the error message I get:
src/mpid/common/hcoll/hcoll_rte.c: In function 'progress':
src/mpid/common/hcoll/hcoll_rte.c:58:33: warning: passing argument 1 of 'hcoll_do_progress' makes integer from pointer without a cast [-Wint-conversion]
58 | ret = hcoll_do_progress(&made_progress);
| ^~~~~~~~~~~~~~
| |
| int *
In file included from ./src/mpid/ch4/netmod/include/../ucx/ucx_coll.h:11,
from ./src/mpid/ch4/netmod/include/../ucx/netmod_inline.h:15,
from ./src/mpid/ch4/netmod/include/netmod_impl.h:1589,
from ./src/mpid/ch4/include/mpidch4.h:448,
from ./src/mpid/ch4/include/mpidpost.h:10,
from ./src/include/mpiimpl.h:232,
from src/mpid/common/hcoll/hcoll_rte.c:6:
./src/mpid/ch4/netmod/include/../ucx/../../../common/hcoll/hcoll.h:42:27: note: expected 'int' but argument is of type 'int *'
42 | int hcoll_do_progress(int vci, int *made_progress);
| ~~~~^~~
src/mpid/common/hcoll/hcoll_rte.c:58:15: error: too few arguments to function 'hcoll_do_progress'
58 | ret = hcoll_do_progress(&made_progress);
| ^~~~~~~~~~~~~~~~~
In file included from ./src/mpid/ch4/netmod/include/../ucx/ucx_coll.h:11,
from ./src/mpid/ch4/netmod/include/../ucx/netmod_inline.h:15,
from ./src/mpid/ch4/netmod/include/netmod_impl.h:1589,
from ./src/mpid/ch4/include/mpidch4.h:448,
from ./src/mpid/ch4/include/mpidpost.h:10,
from ./src/include/mpiimpl.h:232,
from src/mpid/common/hcoll/hcoll_rte.c:6:
./src/mpid/ch4/netmod/include/../ucx/../../../common/hcoll/hcoll.h:42:5: note: declared here
42 | int hcoll_do_progress(int vci, int *made_progress);
| ^~~~~~~~~~~~~~~~~
I use to compile mpich versions 3.4,x, 4.1.x, and 4.2.x configured with this option (--with-hcoll=) in the past without any problems. It looks like some recent changes in the related files introduced a problem that slip into 4.3.0 and makes compilation impossible.
Could it be fixed ? Or could the --with-hcoll option be removed if it is no longer relevant (I guess that if we use ch4:ucx, ucx may itself use hcoll internally to optimize collective operations when running on hierarchical environment) ?
Here are some details:
arch: x86_64
OS: RHEL 9.5 (up to date except kernel)
MOFED: 24.10-2.1.8.0-LTS
hcoll: 4.8.3230-1.2410068
ucx: 1.18.0-1.2410068
The complete configuration line:
./configure --with-device=ch4:ucx --with-hcoll=/opt/mellanox/hcoll --prefix=/work/software/x86_64/mpich/mpich-ch4_ucx-4.3.0 --with-xpmem --enable-g=none --enable-fast=all --enable-romio --with-file-system=ufs+nfs+lustre --enable-shared --enable-sharedlibs=gcc
Thanks,
Martin Audet
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250416/0e95dc36/attachment.html>
More information about the discuss
mailing list