[mpich-discuss] Problems running MPICH jobs under SLURM

Markus Geimer m.geimer at fz-juelich.de
Sun Jun 2 04:31:34 CDT 2013


Hi Pavan,

> Markus: does this happen only with SLURM or can you reproduce this
> without SLURM as well?

It seems to happen only when hydra queries the host list from SLURM.
I tried executing two different setups on the head node, both listing
two compute nodes in a hostfile:

  1) mpiexec -f hostfile -n 4 ./hello

     Since the SLURM PAM module was disabled, SSH login to the nodes
     was possible and the job run as expected, with two ranks on each
     node. SLURM's sinfo showed both nodes as state 'idle' and the
     HYDRA_DEBUG output said '--rmk user --launcher ssh'.

  2) mpiexec -f hostfile -rmk slurm -n 4 ./hello

     This job ran as well, with the nodes allocated via SLURM and
     shown as 'alloc'. Debug output: '--rmk slurm --launcher slurm'.

Specifying a hostfile with mpiexec within a SLURM batch job also
worked, but that's obviously not what you normally want to do...

Hope this helps. If there is anything else I should try out to help
tracking down the issue, please let me know.

Thanks,
Markus

--
Dr. Markus Geimer
Juelich Supercomputing Centre
Institute for Advanced Simulation
Forschungszentrum Juelich GmbH
52425 Juelich, Germany

Phone:  +49-2461-61-1773
Fax:    +49-2461-61-6656
E-mail: m.geimer at fz-juelich.de
WWW:    http://www.fz-juelich.de/jsc/


------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------
Forschungszentrum Juelich GmbH
52425 Juelich
Sitz der Gesellschaft: Juelich
Eingetragen im Handelsregister des Amtsgerichts Dueren Nr. HR B 3498
Vorsitzender des Aufsichtsrats: MinDir Dr. Karl Eugen Huthmacher
Geschaeftsfuehrung: Prof. Dr. Achim Bachem (Vorsitzender),
Karsten Beneke (stellv. Vorsitzender), Prof. Dr.-Ing. Harald Bolt,
Prof. Dr. Sebastian M. Schmidt
------------------------------------------------------------------------------------------------
------------------------------------------------------------------------------------------------



More information about the discuss mailing list