[mpich-discuss] MPI_Comm_spawn crosses node boundaries
Raffenetti, Ken
raffenet at anl.gov
Fri Jan 28 12:36:11 CST 2022
"ip_address" won't be recognized, only "host", "hosts", or "hostfile". Could you run an example using "mpiexec -v" and capture/share the output? That should help tell us if the hostname information is being fed correctly to the process manager by the spawn command.
Ken
On 1/28/22, 11:35 AM, "Mccall, Kurt E. (MSFC-EV41)" <kurt.e.mccall at nasa.gov> wrote:
Ken,
I'm using sbatch, which calls a bash script that calls mpiexec (4.0rc3). Which host name convention is correct, the short or the long host name? Would the "ip_address" info key work?
Kurt
-----Original Message-----
From: Raffenetti, Ken <raffenet at anl.gov>
Sent: Friday, January 28, 2022 10:49 AM
To: discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [EXTERNAL] Re: [mpich-discuss] MPI_Comm_spawn crosses node boundaries
Are you using mpiexec or srun when initially launching your job? Hydra (mpiexec) should support the "host" info key, but I'm not sure if srun will.
Ken
On 1/28/22, 10:41 AM, "Mccall, Kurt E. (MSFC-EV41) via discuss" <discuss at mpich.org> wrote:
Hi,
Running MPICH under Slurm, MPI_Comm_spawn unexpectedly creates new processes on any and all of the nodes that Slurm allocates to the job. I would like it to only create new processes locally on the node that called MPI_Comm_spawn.
I’ve tried passing MPI_Comm_spawn an info struct created like this:
MPI_Info info;
MPI_Info_create(&info);
MPI_Info_set(info, "host", host_name);
MPI_Info_set(info, "bind_to", "core");
where hostname = “n001” or even the full name “n001.cluster.pssclabs.com”
but that doesn’t prevent the problem. Any suggestions?
Thanks,
Kurt
More information about the discuss
mailing list