[mpich-discuss] Configuring MPICH 4.1.2 without increasing the locked memory limit

Raffenetti, Ken raffenet at anl.gov
Tue Oct 24 09:32:14 CDT 2023


Hi Kurt,

This looks to be an issue allocating resources on infiniband device on the node. MPI_Init should not require any special system settings. Are you able to run infiniband diagnostics without any MPI library? Ibstatus should tell you if the IB card is online and what state it is in. From there, you could try running an ib_send_bw test across 2 nodes and verify that traffic is flowing.

Ken

From: "Mccall, Kurt E. (MSFC-EV41) via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Wednesday, October 18, 2023 at 12:36 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] Configuring MPICH 4.1.2 without increasing the locked memory limit

Hi,

I have configured MPICH 4.1.2 with both –with-device=ch4:ofi   and –with-device=ch4:ucx.    My application fails in both cases when it can’t allocate enough memory.  For –with-device=ch4:ofi :

Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory
n001.cluster.pssclabs.com:rank0.NeedlesMpiMM: Unable to initialize verbs NIC /sys/class/infiniband/mlx5_0 (unit 0:0)
n001.cluster.pssclabs.com:rank0: PSM3 can't open nic unit: 0 (err=23)
Abort(606197135): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffc1cbd334c, argv=0x7ffc1cbd3340) failed
MPII_Init_thread(234)....:
MPID_Init(513)...........:
MPIDI_OFI_init_local(604):
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)

Configuring using  –with-device=ch4:ucx, there was a very similar error involving /sys/class/infiniband/mlx5_0  that explicitly stating that the locked memory limit (ulimit -l) needs to be set to “unlimited”.   Are there any other ch4 device configuration options that don’t require unlimited locked memory?

Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20231024/b2b96c8a/attachment.html>


More information about the discuss mailing list