[mpich-discuss] hydra, stdin close(), and SLURM

Balaji, Pavan balaji at anl.gov
Sun Jul 26 09:34:38 CDT 2015


Thanks, Aaron.  That seems like a reasonable workaround (though the error is really in the slurm code for assuming that fd 0 is already stdin).

Do you think you can provide a git patch?  I can signoff on it (which is our way of "reviewing it") and include it in the upcoming release.

Regards,

  -- Pavan





On 7/25/15, 9:26 PM, "Aaron Knister" <aaron.s.knister at nasa.gov> wrote:

>I sent this off to the mvapich list yesterday and it was suggested I 
>raise it here since this is the upstream project:
>
>This is a bit of a cross post from a thread I started on the slurm dev 
>list: http://article.gmane.org/gmane.comp.distributed.slurm.devel/8176
>
>I'd like to get feedback on the idea that "--input none" be passed to 
>srun when using the SLURM hydra bootstrap mechanism. I figured it would 
>be inserted somewhere around here 
>http://trac.mpich.org/projects/mpich/browser/src/pm/hydra/tools/bootstrap/external/slurm_launch.c#L98. 
>
>
>Without this argument I'm getting spurious job aborts and confusing 
>errors. The gist of it is mpiexec.hydra closes stdin before it exec's 
>srun. srun then (possibly via the munge libraries) calls some function 
>that does a look up via nss. We use sssd for AAA so libnss_sssd will 
>handle this request. Part of the caching mechanism sssd uses will cause 
>the library to open() the cache file. The lowest fd available is 0 so 
>the cache file is opened on fd 0. srun then believes it's got stdin 
>attached and it causes the issues outlined in the slurm dev post. I 
>think passing "--input none" is the right thing to do here since hydra 
>has in fact closed stdin to srun. I tested this via the 
>HYDRA_LAUNCHER_EXTRA_ARGS environment variable and it does resolve the 
>errors I described.
>
>Thanks!
>-Aaron
>
>-- 
>Aaron Knister
>NASA Center for Climate Simulation (Code 606.2)
>Goddard Space Flight Center
>(301) 286-2776
>
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list