[mpich-discuss] MPIExec.hydra allocates too many CPU's on SLURM

Balaji, Pavan balaji at anl.gov
Thu Apr 10 07:48:33 CDT 2014


Ruben,

If you want Hydra to use SLURM’s allocation, you should not provide your own allocation on nodes on the mpiexec command-line.

You could do something like this:

% mpiexec.hydra -n 1 ./master : -n 7 ./slave

Hydra will automatically detect the slurm environment and pick the hosts from there.

You could also do:

% mpiexec.hydra -bootstrap slurm -rmk slurm -n 1 ./master : -n 7 ./slave

but that’s redundant, since the “rmk” and “launcher” are automatically detected.  “bootstrap” is deprecated at this point.  Please don’t use it.  You could use “launcher” instead, but that’s autodetected anyway.

The following is *not* what you want to do:

% mpiexec.hydra -hosts node2-a,node2-b -bootstrap slurm -rmk slurm -n 1 ./master : -n 7 ./slave

The “-hosts” option overrides the resource allocation from the “rmk”.  So you are telling hydra to ignore SLURM’s resource allocation and use what you have provided.

The model is available for expert users who want to explicitly manage their own resources.

  — Pavan

On Apr 10, 2014, at 6:49 AM, Ruben Faelens <faelens at kth.se> wrote:

> Hello list,
> 
> I have been messing around with SLURM and MPICH2 for about a week now, in order to run NONMEM on my cluster.
> 
> I start my job using the following command line:
> > mpiexec.hydra -hosts node2-a,node2-b -bootstrap slurm -rmk slurm -n 1 ./master : -n 7 ./slave
> 
> Using pstree, I can see that this launches the following command:
> > srun -N 4 -n 4 /usr/local/bin/hydra_pmi_proxy --control-port node2-head:38860 --rmk slurm --launcher slurm --demux poll --pgid 0 --retries 10 --usize -2 --proxy-id -1
> 
> Apparently, Hydra launches one single 'hydra_pmi_proxy' per node, after which the hydra_pmi_proxy launches the other processes. This completely screws up allocation rules in SLURM.
> 
> Is this normal behaviour? I would rather have mpiexec.hydra allocate the right number of resources on SLURM, instead of seemingly deciding on its own which hosts to use.
> 
> Kind regards,
> Ruben FAELENS
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list