[mpich-discuss] MPICH fails to allocate memory at beginning of job

Zhou, Hui zhouh at anl.gov
Mon Jul 8 16:47:32 CDT 2024


The ucx netmod provides better support on infiniband. Could you try configure it with
--with-device=ch4:ucx​?

--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Monday, July 8, 2024 3:00 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] MPICH fails to allocate memory at beginning of job

I configured MPICH 4. 1. 2 as follows. Any clue as to what would fix the error below? $ cd mpich-4. 1. 2-build $ ./mpich-4. 1. 2/configure --prefix=/opt/mpich --with-device=ch4: ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd

I configured MPICH 4.1.2 as follows.  Any clue as to what would fix the error below?



$ cd mpich-4.1.2-build

$ ./mpich-4.1.2/configure --prefix=/opt/mpich --with-device=ch4:ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug  2>&1 | tee c.txt



When I run a job, the error message on each node is:



n007.cluster.pssclabs.com:rank6.HaystackMpiMM: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted

Abort(337761679): Fatal error in internal_Init: Other MPI error, error stack:

internal_Init(66)........: MPI_Init(argc=0x7ffd93a663ec, argv=0x7ffd93a663e0) failed

MPII_Init_thread(234)....:

MPID_Init(513)...........:

MPIDI_OFI_init_local(604):

create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)





The run command is:



mpiexec -launcher ssh \

        -print-all-exitcodes \

        -wdir ${work_dir} \

        -np ${num_proc} \

        -ppn 1  \

        my_program

…



Thanks,

Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240708/0d7bdd15/attachment-0001.html>


More information about the discuss mailing list