[mpich-discuss] Problems running MPICH jobs under SLURM

Pavan Balaji balaji at mcs.anl.gov
Sun Jun 2 12:35:13 CDT 2013


Hi Markus,

On 06/02/2013 04:31 AM, Markus Geimer wrote:
>> Markus: does this happen only with SLURM or can you reproduce this
>> without SLURM as well?
>
> It seems to happen only when hydra queries the host list from SLURM.
> I tried executing two different setups on the head node, both listing
> two compute nodes in a hostfile:
>
>    1) mpiexec -f hostfile -n 4 ./hello
>
>       Since the SLURM PAM module was disabled, SSH login to the nodes
>       was possible and the job run as expected, with two ranks on each
>       node. SLURM's sinfo showed both nodes as state 'idle' and the
>       HYDRA_DEBUG output said '--rmk user --launcher ssh'.

That sounds good.

>    2) mpiexec -f hostfile -rmk slurm -n 4 ./hello
>
>       This job ran as well, with the nodes allocated via SLURM and
>       shown as 'alloc'. Debug output: '--rmk slurm --launcher slurm'.
>
> Specifying a hostfile with mpiexec within a SLURM batch job also
> worked, but that's obviously not what you normally want to do...

Hmm.  -f hostfile and -rmk slurm are contradictory options, since both 
are just ways to get the host list.  This should throw an error.  I'll 
add that into Hydra.

> Hope this helps. If there is anything else I should try out to help
> tracking down the issue, please let me know.

I'm still trying to find the exact option that fails.  Can you try the 
following:

# Use the slurm launcher, and user-specified resources
% mpiexec -f hostfile -launcher slurm -n 4 ./hello

# Use the ssh launcher, and user-specified resources
% mpiexec -f hostfile -launcher ssh -n 4 ./hello

# Use the ssh launcher, and slurm-specified resources
% mpiexec -rmk slurm -launcher ssh -n 4 ./hello

# Use the slurm launcher, and slurm-specified resources
% mpiexec -rmk slurm -launcher slurm -n 4 ./hello

At least one of them should throw the error you reported.

  -- Pavan

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji



More information about the discuss mailing list