[mpich-discuss] Slurm and MPI_Comm_spawn

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Fri Jan 7 15:33:04 CST 2022

Thanks for the reply, Hui.

configure --prefix=/home/kmccall/mpich-install-4.0b1 --with-device=ch3:nemesis --disable-fortran  -enable-debuginfo --enable-g=debug

The program is run via sbatch, which is given a bash script as an argument.

sbatch  --nodes=2  --ntasks=2  --cpus-per-task=24   <bash_script>

The bash script calls mpiexec:

mpiexec -print-all-exitcodes -enable-x -np 2  -wdir ${work_dir} -env DISPLAY localhost:10.0 --ppn 1 <cmd>

From: Zhou, Hui <zhouh at anl.gov>
Sent: Friday, January 7, 2022 2:39 PM
To: discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [EXTERNAL] Re: Slurm and MPI_Comm_spawn

MPICH uses PMI 1 by default.

How is your MPICH configured? And how do you run your program, is it via srun?

Hui Zhou

From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Date: Friday, January 7, 2022 at 2:21 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: [mpich-discuss] Slurm and MPI_Comm_spawn
My MPICH/Slurm job is failing when the call to MPI_Comm_spawn is made.   The Slurm MPI guide https://slurm.schedmd.com/mpi_guide.html#mpich2<https://gcc02.safelinks.protection.outlook.com/?url=https%3A%2F%2Fslurm.schedmd.com%2Fmpi_guide.html%23mpich2&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7C85a0f5810bf64b7adaee08d9d21db0c7%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637771847237637255%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C3000&sdata=ypExoCbDiU1nAf7Y8XHIj8Og9I3ZRK0IgKXZi0KHnCw%3D&reserved=0> specifically states that MPI_Comm_spawn will work going through Hydra's PMI 1.1 interface.

How do I ensure that it goes through that interface?

Maybe we'll have to rebuild Slurm to support PMI 1.1.    This Slurm command  yields the following and PMI 1.1 is not mentioned, although PMI 2 is.

$ srun -mpi=list
srun: MPI types are...
srun: cray_shasta
srun: pmi2
srun: none

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220107/eb032896/attachment.html>

More information about the discuss mailing list