[mpich-discuss] MPIExec.hydra allocates too many CPU's on SLURM

Ruben Faelens faelens at kth.se
Thu Apr 10 16:33:07 CDT 2014


Hi list,

I specified the hosts because mpich2-1.2 did not seem to detect I was using
slurm. Not specifying the -hosts parameter made mpiexec launch using slurm,
but using host 'localhost' (which is not a slurm compute node).

I have now compiled mpich-3.1 from source, installed it on all nodes in the
cluster, and everything works beautifully.
salloc -n 8 mpiexec.hydra -n 1 ./master : -n 7 ./slave

Thank you for the support!

/ Ruben


On Thu, Apr 10, 2014 at 2:24 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov>wrote:

> What version of mpich are you using? Why are you specifying -hosts in your
> mpiexec command? Are you allocating resources in some other way prior to
> launching your job?
>
> You are correct about the normal behavior of hydra. It launches a proxies
> on all nodes which then launch the MPI processes.
>
>
> On 04/10/2014 06:49 AM, Ruben Faelens wrote:
>
>> Hello list,
>>
>> I have been messing around with SLURM and MPICH2 for about a week now,
>> in order to run NONMEM on my cluster.
>>
>> I start my job using the following command line:
>>  > mpiexec.hydra -hosts node2-a,node2-b -bootstrap slurm -rmk slurm -n 1
>> ./master : -n 7 ./slave
>>
>> Using pstree, I can see that this launches the following command:
>>  > srun -N 4 -n 4 /usr/local/bin/hydra_pmi_proxy --control-port
>> node2-head:38860 --rmk slurm --launcher slurm --demux poll --pgid 0
>> --retries 10 --usize -2 --proxy-id -1
>>
>> Apparently, Hydra launches one single 'hydra_pmi_proxy' per node, after
>> which the hydra_pmi_proxy launches the other processes. This completely
>> screws up allocation rules in SLURM.
>>
>> Is this normal behaviour? I would rather have mpiexec.hydra allocate the
>> right number of resources on SLURM, instead of seemingly deciding on its
>> own which hosts to use.
>>
>> Kind regards,
>> Ruben FAELENS
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>  _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
/ Ruben FAELENS
+32 494 06 72 59
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20140410/ce07b58a/attachment.html>


More information about the discuss mailing list