[mpich-discuss] MPICH fails to allocate memory at beginning of job
Zhou, Hui
zhouh at anl.gov
Mon Jul 8 16:47:32 CDT 2024
The ucx netmod provides better support on infiniband. Could you try configure it with
--with-device=ch4:ucx?
--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Monday, July 8, 2024 3:00 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] MPICH fails to allocate memory at beginning of job
I configured MPICH 4. 1. 2 as follows. Any clue as to what would fix the error below? $ cd mpich-4. 1. 2-build $ ./mpich-4. 1. 2/configure --prefix=/opt/mpich --with-device=ch4: ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
I configured MPICH 4.1.2 as follows. Any clue as to what would fix the error below?
$ cd mpich-4.1.2-build
$ ./mpich-4.1.2/configure --prefix=/opt/mpich --with-device=ch4:ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug 2>&1 | tee c.txt
When I run a job, the error message on each node is:
n007.cluster.pssclabs.com:rank6.HaystackMpiMM: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
Abort(337761679): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffd93a663ec, argv=0x7ffd93a663e0) failed
MPII_Init_thread(234)....:
MPID_Init(513)...........:
MPIDI_OFI_init_local(604):
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)
The run command is:
mpiexec -launcher ssh \
-print-all-exitcodes \
-wdir ${work_dir} \
-np ${num_proc} \
-ppn 1 \
my_program
…
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240708/0d7bdd15/attachment-0001.html>
More information about the discuss
mailing list