[mpich-discuss] hydra, stdin close(), and SLURM
Aaron Knister
aaron.s.knister at nasa.gov
Sat Jul 25 21:26:24 CDT 2015
I sent this off to the mvapich list yesterday and it was suggested I
raise it here since this is the upstream project:
This is a bit of a cross post from a thread I started on the slurm dev
list: http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
I'd like to get feedback on the idea that "--input none" be passed to
srun when using the SLURM hydra bootstrap mechanism. I figured it would
be inserted somewhere around here
http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98.
Without this argument I'm getting spurious job aborts and confusing
errors. The gist of it is mpiexec.hydra closes stdin before it exec's
srun. srun then (possibly via the munge libraries) calls some function
that does a look up via nss. We use sssd for AAA so libnss_sssd will
handle this request. Part of the caching mechanism sssd uses will cause
the library to open() the cache file. The lowest fd available is 0 so
the cache file is opened on fd 0. srun then believes it's got stdin
attached and it causes the issues outlined in the slurm dev post. I
think passing "--input none" is the right thing to do here since hydra
has in fact closed stdin to srun. I tested this via the
HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does resolve the
errors I described.
Thanks!
-Aaron
--
Aaron Knister
NASA Center for Climate Simulation (Code 606.2)
Goddard Space Flight Center
(301) 286-2776
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 859 bytes
Desc: OpenPGP digital signature
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150725/cf91b397/attachment.sig>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list