[mpich-discuss] MPIExec.hydra allocates too many CPU's on SLURM

Ruben Faelens faelens at kth.se
Thu Apr 10 06:49:03 CDT 2014


Hello list,

I have been messing around with SLURM and MPICH2 for about a week now, in
order to run NONMEM on my cluster.

I start my job using the following command line:
> mpiexec.hydra -hosts node2-a,node2-b -bootstrap slurm -rmk slurm -n 1
./master : -n 7 ./slave

Using pstree, I can see that this launches the following command:
> srun -N 4 -n 4 /usr/local/bin/hydra_pmi_proxy --control-port
node2-head:38860 --rmk slurm --launcher slurm --demux poll --pgid 0
--retries 10 --usize -2 --proxy-id -1

Apparently, Hydra launches one single 'hydra_pmi_proxy' per node, after
which the hydra_pmi_proxy launches the other processes. This completely
screws up allocation rules in SLURM.

Is this normal behaviour? I would rather have mpiexec.hydra allocate the
right number of resources on SLURM, instead of seemingly deciding on its
own which hosts to use.

Kind regards,
Ruben FAELENS
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140410/cc433cac/attachment.html>


More information about the discuss mailing list