[mpich-discuss] How does MPICH dtermine available NICs?
Raymond, Michael
mraymond at hpe.com
Tue Mar 10 10:34:52 CDT 2026
The NIC symmetry test is specific to HPE Cray MPICH.
As a workaround, try setting MPICH_OFI_NUM_NICS=1.
—
Michael Raymond
HPE HPC programming environment
From: Kevin Buckley via discuss <discuss at mpich.org>
Date: Tuesday, March 10, 2026 at 00:47
To: discuss at mpich.org <discuss at mpich.org>
Cc: Kevin Buckley <kevin.buckley.pawsey.org.au at gmail.com>
Subject: Re: [mpich-discuss] How does MPICH dtermine available NICs?
This Message Is From an External Sender
This message came from outside your organization.
On 2026/03/09 23:24, Zhou, Hui wrote:
> Could you try the upstream MPICH?
I followed you suggestion to try the upstream MPICH.
Trying to configure 3.4a2, the version "inside" the cray-mpich,
modulo any vendor updates made to that base release, saw
...
configure: error: The Fortran compiler gfortran will not compile files that\
call the same routine with arguments of different types.
$
$ which gfortran
/opt/cray/pe/gcc-native/14/bin/gfortran
$
Wasn't sure if that meant that the MPICH code is too old to
make correct compiler checks, or that the compiler is too
new to respond them - so I shelved that line of attack.
Was able to configure and build 5.0.0, and have used that to
compile and run the same noddy "connectivity_c.c" example code
that I had been seeing the issue with.
It all seems to work without any warnings (as does compiling/
running within an OpenMPI 5.0.3 environment), in that a 2-node
job spanning a node with just one NIC, and the dual-NIC node
with one NIC disabled, completes, as does a 2-node job spanning
the node with just one NIC, and a dual-NIC node with neither
NIC disabled.
Would you though, expect the vanilla MPICH 5.0.0 to have
warned me about the NIC_SYMMETRY inconsistencies?
Your comment in the original reply,
> The upstream MPICH should be fine with different number of
> NICs on different nodes. By default, the process picks a
> nic that is closest to process's CPU affinity; or if you
> have multiple processes on the same node, each process
> will try pick a different nic in a round-robin fashion.
> Usually a disabled nic won't be selected during init,
would suggest that "later" (than cray-mpich's 3.4a2 starting
point, plus whatver enhancements Cray and/or HPE have since
added) MPICH-s would not bother to give the warning, as those
versions would "just do the right thing" (tm)?
If so, think it's sounding as though the consideration of
the disabled-at-boot-time NIC as somehow being "available"
for MPI traffic, is tied into something that the vendor is
doing: somewhere within their overall MPI+Slingshot stack.
Thanks again for fielding this one,
Kevin M. Buckley
--
Supercomputing Systems Administrator
Pawsey Supercomputing Centre
PERTH
Australia
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20260310/9fc82c43/attachment-0001.html>
More information about the discuss
mailing list