[mpich-discuss] MPI_Comm_spawn crosses node boundaries

Raffenetti, Ken raffenet at anl.gov
Fri Jan 28 10:49:02 CST 2022


Are you using mpiexec or srun when initially launching your job? Hydra (mpiexec) should support the "host" info key, but I'm not sure if srun will.

Ken

On 1/28/22, 10:41 AM, "Mccall, Kurt E. (MSFC-EV41) via discuss" <discuss at mpich.org> wrote:

    Hi,

    Running MPICH under Slurm, MPI_Comm_spawn unexpectedly creates new processes on any and all of the nodes that Slurm allocates to the job.   I would like it to only create new processes locally on the node that called MPI_Comm_spawn.

    I’ve tried passing MPI_Comm_spawn an info struct created like this:

            MPI_Info info;
            MPI_Info_create(&info);
            MPI_Info_set(info, "host", host_name);
            MPI_Info_set(info, "bind_to", "core");

    where hostname = “n001” or even the full name “n001.cluster.pssclabs.com”

    but that doesn’t prevent the problem.  Any suggestions?

    Thanks,
    Kurt



More information about the discuss mailing list