[mpich-discuss] Configuring MPICH 4.1.2 without increasing the locked memory limit

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Wed Oct 18 12:35:57 CDT 2023


I have configured MPICH 4.1.2 with both -with-device=ch4:ofi   and -with-device=ch4:ucx.    My application fails in both cases when it can't allocate enough memory.  For -with-device=ch4:ofi :

Unable to create send CQ of size 5080 on mlx5_0: Cannot allocate memory
n001.cluster.pssclabs.com:rank0.NeedlesMpiMM: Unable to initialize verbs NIC /sys/class/infiniband/mlx5_0 (unit 0:0)
n001.cluster.pssclabs.com:rank0: PSM3 can't open nic unit: 0 (err=23)
Abort(606197135): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffc1cbd334c, argv=0x7ffc1cbd3340) failed
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)

Configuring using  -with-device=ch4:ucx, there was a very similar error involving /sys/class/infiniband/mlx5_0  that explicitly stating that the locked memory limit (ulimit -l) needs to be set to "unlimited".   Are there any other ch4 device configuration options that don't require unlimited locked memory?

