[mpich-discuss] MPICH fails to allocate memory at beginning of job
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Mon Jul 8 15:00:05 CDT 2024
I configured MPICH 4.1.2 as follows. Any clue as to what would fix the error below?
$ cd mpich-4.1.2-build
$ ./mpich-4.1.2/configure --prefix=/opt/mpich --with-device=ch4:ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug 2>&1 | tee c.txt
When I run a job, the error message on each node is:
n007.cluster.pssclabs.com:rank6.HaystackMpiMM: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
Abort(337761679): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffd93a663ec, argv=0x7ffd93a663e0) failed
MPII_Init_thread(234)....:
MPID_Init(513)...........:
MPIDI_OFI_init_local(604):
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)
The run command is:
mpiexec -launcher ssh \
-print-all-exitcodes \
-wdir ${work_dir} \
-np ${num_proc} \
-ppn 1 \
my_program
...
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240708/32a40ede/attachment.html>
More information about the discuss
mailing list