[mpich-discuss] hydra, stdin close(), and SLURM

Aaron Knister aaron.s.knister at nasa.gov
Sat Jul 25 21:26:24 CDT 2015


I sent this off to the mvapich list yesterday and it was suggested I 
raise it here since this is the upstream project:

This is a bit of a cross post from a thread I started on the slurm dev 
list: http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176

I'd like to get feedback on the idea that "--input none" be passed to 
srun when using the SLURM hydra bootstrap mechanism. I figured it would 
be inserted somewhere around here 
http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98. 


Without this argument I'm getting spurious job aborts and confusing 
errors. The gist of it is mpiexec.hydra closes stdin before it exec's 
srun. srun then (possibly via the munge libraries) calls some function 
that does a look up via nss. We use sssd for AAA so libnss_sssd will 
handle this request. Part of the caching mechanism sssd uses will cause 
the library to open() the cache file. The lowest fd available is 0 so 
the cache file is opened on fd 0. srun then believes it's got stdin 
attached and it causes the issues outlined in the slurm dev post. I 
think passing "--input none" is the right thing to do here since hydra 
has in fact closed stdin to srun. I tested this via the 
HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does resolve the 
errors I described.

Thanks!
-Aaron

-- 
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776


-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 859 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150725/cf91b397/attachment.sig>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list