[mpich-discuss] MPICH fails to allocate memory at beginning of job

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Mon Jul 8 15:00:05 CDT 2024


I configured MPICH 4.1.2 as follows.  Any clue as to what would fix the error below?

$ cd mpich-4.1.2-build
$ ./mpich-4.1.2/configure --prefix=/opt/mpich --with-device=ch4:ofi --with-libfabric-embedded --with-slurm -enable-debuginfo --enable-g=debug  2>&1 | tee c.txt

When I run a job, the error message on each node is:

n007.cluster.pssclabs.com:rank6.HaystackMpiMM: Failed to modify UD QP to INIT on mlx5_0: Operation not permitted
Abort(337761679): Fatal error in internal_Init: Other MPI error, error stack:
internal_Init(66)........: MPI_Init(argc=0x7ffd93a663ec, argv=0x7ffd93a663e0) failed
MPII_Init_thread(234)....:
MPID_Init(513)...........:
MPIDI_OFI_init_local(604):
create_vni_context(982)..: OFI endpoint open failed (ofi_init.c:982:create_vni_context:Cannot allocate memory)


The run command is:

mpiexec -launcher ssh \
        -print-all-exitcodes \
        -wdir ${work_dir} \
        -np ${num_proc} \
        -ppn 1  \
        my_program
...

Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240708/32a40ede/attachment.html>


More information about the discuss mailing list