[mpich-discuss] MPI_Comm_spawn and SLURM

Sat Nov 27 12:01:27 CST 2021

Hui,

Looks like the child process failed to launch, so it may be a process manager issue.   I ran "scontrol show hosts" and it showed a number of possible errors in the compute node SLURM configuration (like "CoresPerSocket=1", which is wrong).    Looks like our compute nodes are not configured correctly for SLURM.

Thanks,
Kurt

From: Zhou, Hui <zhouh at anl.gov>
Sent: Friday, November 26, 2021 10:13 PM
To: discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [EXTERNAL] Re: MPI_Comm_spawn and SLURM

Kurt,

Could you verify whether the children process is launched? The way `MPI_Comm_spawn_multiple` works is -

  1.  Parent root open a port
  2.  Parent root call PMI routine to spawn children processes - here the process manager is SLURM. If the children processes failed to spawn, then it is likely the process manager issue.
  3.  Children processes run MPI_Init, and at the end internally call MPI_Comm_connect
  4.  Meanwhile, the parent root process wait in `MPI_Comm_accept`

If root process is hanging in `MPI_Comm_accept`, then either the process manager failed to launch children processes, or children processes are misconfigured by process manager and didn't call `MPI_Comm_connect`. Let's determine where children are at.

--
Hui Zhou

From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Date: Friday, November 26, 2021 at 5:05 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: [mpich-discuss] MPI_Comm_spawn and SLURM
I am attempting to run MPICH under SLURM for the first time, and there could be a lot of things I am doing wrong.   All processes are getting launched but the master process is freezing in MPI_Comm_spawn.    The stack trace is below, followed by the SLURM command I use to start the job.   Even though all 20 processes are running, one per node as desired, salloc reports that "Requested node configuration is not available".  Not sure if that is why MPI_Comm_spawn is frozen.   Thanks for any help.

#0  0x00007f0183d08a08 in poll () from /lib64/libc.so.6
#1  0x00007f0185aed611 in MPID_nem_tcp_connpoll (in_blocking_poll=<optimized out>)
    at ../mpich-4.0b1/src/mpid/ch3/channels/nemesis/netmod/tcp/socksm.c:1765
#2  0x00007f0185ad8186 in MPID_nem_mpich_blocking_recv (completions=<optimized out>, in_fbox=<synthetic pointer>,
    cell=<synthetic pointer>) at ../mpich-4.0b1/src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:947
#3  MPIDI_CH3I_Progress (progress_state=progress_state at entry=0x7ffcc4839760, is_blocking=is_blocking at entry=1)
    at ../mpich-4.0b1/src/mpid/ch3/channels/nemesis/src/ch3_progress.c:360
#4  0x00007f0185a8811a in MPIDI_Create_inter_root_communicator_accept (vc_pptr=<synthetic pointer>,
    comm_pptr=<synthetic pointer>, port_name=<optimized out>) at ../mpich-4.0b1/src/mpid/ch3/src/ch3u_port.c:417
#5  MPIDI_Comm_accept (port_name=<optimized out>, info=<optimized out>, root=0,
    comm_ptr=0x7f0185ef1540 <MPIR_Comm_builtin+832>, newcomm=0x7ffcc4839bb8)
    at ../mpich-4.0b1/src/mpid/ch3/src/ch3u_port.c:1176
#6  0x00007f0185ac3d45 in MPID_Comm_accept (
    port_name=port_name at entry=0x7ffcc4839a00 "tag#0$description#n001$port#55907$ifname#172.16.56.1$",
    info=info at entry=0x0, root=root at entry=0, comm=comm at entry=0x7f0185ef1540 <MPIR_Comm_builtin+832>,
    newcomm_ptr=newcomm_ptr at entry=0x7ffcc4839bb8) at ../mpich-4.0b1/src/mpid/ch3/src/mpid_port.c:130
#7  0x00007f0185a73285 in MPIDI_Comm_spawn_multiple (count=<optimized out>, commands=0x7ffcc4839b78,
    argvs=0x7ffcc4839b70, maxprocs=0x7ffcc4839b6c, info_ptrs=<optimized out>, root=<optimized out>,
    comm_ptr=0x7f0185ef1540 <MPIR_Comm_builtin+832>, intercomm=0x7ffcc4839bb8, errcodes=<optimized out>)
    at ../mpich-4.0b1/src/mpid/ch3/src/ch3u_comm_spawn_multiple.c:258
#8  0x00007f0185abec99 in MPID_Comm_spawn_multiple (count=count at entry=1,
    array_of_commands=array_of_commands at entry=0x7ffcc4839b78, array_of_argv=array_of_argv at entry=0x7ffcc4839b70,
    array_of_maxprocs=array_of_maxprocs at entry=0x7ffcc4839b6c,
    array_of_info_ptrs=array_of_info_ptrs at entry=0x7ffcc4839b60, root=root at entry=0,
    comm_ptr=0x7f0185ef1540 <MPIR_Comm_builtin+832>, intercomm=0x7ffcc4839bb8, array_of_errcodes=0x7ffcc4839c98)
    at ../mpich-4.0b1/src/mpid/ch3/src/mpid_comm_spawn_multiple.c:49
#9  0x00007f0185a34895 in MPIR_Comm_spawn_impl (command=<optimized out>, command at entry=0x995428 "NeedlesMpiMM",
    argv=<optimized out>, argv at entry=0x993f50, maxprocs=<optimized out>, maxprocs at entry=1, info_ptr=<optimized out>,
    root=root at entry=0, comm_ptr=comm_ptr at entry=0x7f0185ef1540 <MPIR_Comm_builtin+832>, p_intercomm_ptr=0x7ffcc4839bb8,
    array_of_errcodes=0x7ffcc4839c98) at ../mpich-4.0b1/src/mpi/spawn/spawn_impl.c:168
#10 0x00007f0185953637 in internal_Comm_spawn (array_of_errcodes=0x7ffcc4839c98, intercomm=0x7ffcc4839d7c,
    comm=1140850689, root=0, info=-1677721600, maxprocs=1, argv=<optimized out>, command=0x995428 "NeedlesMpiMM")
    at ../mpich-4.0b1/src/binding/c/spawn/comm_spawn.c:83
#11 PMPI_Comm_spawn (command=0x995428 "NeedlesMpiMM", argv=0x993f50, maxprocs=1, info=-1677721600, root=0,
    comm=1140850689, intercomm=0x7ffcc4839d7c, array_of_errcodes=0x7ffcc4839c98)
    at ../mpich-4.0b1/src/binding/c/spawn/comm_spawn.c:169
#12 0x000000000040cec3 in needles::NeedlesMpiMaster::spawnNewManager (this=0x995380, nodenum=0,
    host_name="n001.cluster.pssclabs.com", intercom=@0x7ffcc4839d7c: 67108864) at src/NeedlesMpiMaster.cpp:1432
#13 0x00000000004084cb in needles::NeedlesMpiMaster::init (this=0x995380, argc=23, argv=0x7ffcc483a548, rank=0,
    world_size=20) at src/NeedlesMpiMaster.cpp:246
#14 0x0000000000406799 in main (argc=23, argv=0x7ffcc483a548) at src/NeedlesMpiManagerMain.cpp:96

Here is my salloc command to start the job.   I want one task per node, reserving the rest of the cores on the node for spawning of additional processes.

$ salloc -ntasks=20 --cpus-per-task=24 -verbose

Here is what salloc reports:

salloc: -------------------- --------------------
salloc: cpus-per-task       : 24
salloc: ntasks              : 20
salloc: verbose             : 1
salloc: -------------------- --------------------
salloc: end of defined options
salloc: Linear node selection plugin loaded with argument 4
salloc: select/cons_res loaded with argument 4
salloc: Cray/Aries node selection plugin loaded
salloc: select/cons_tres loaded with argument 4
salloc: Granted job allocation 34311
srun: error: Unable to create step for job 34311: Requested node configuration is not available

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20211127/47678dbf/attachment.html>