[mpich-discuss] Environment variable forwarding using Hydra and "-launcher ssh"
Raffenetti, Ken
raffenet at anl.gov
Mon Apr 22 12:06:53 CDT 2024
Hi Edric,
It looks like we may have unintentionally inverted the logic on that function return value. I’ll submit a PR to fix. Thanks for bringing it to our attention.
Ken
From: Edric Ellis via discuss <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Monday, April 22, 2024 at 9:44 AM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: Edric Ellis <eellis at mathworks.com>
Subject: [mpich-discuss] Environment variable forwarding using Hydra and "-launcher ssh"
We’re in the process of moving from mpich-3. x to mpich-4. 1. 2. We’ve run into some odd behaviour on SLURM related to environment variable forwarding by mpiexec. It looks like mpiexec now propagates only SLURM_* environment variables,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
We’re in the process of moving from mpich-3.x to mpich-4.1.2. We’ve run into some odd behaviour on SLURM related to environment variable forwarding by mpiexec. It looks like mpiexec now propagates only SLURM_* environment variables, instead of filtering them out (or intending to). Consider something like this:
$ mpiexec -launcher slurm printenv HOME SLURM_JOBID
Using mpich-3.x, the HOME variable gets forward. Using mpich-4.1.2, it does not. I believe that mpich-3.x intends to filter out SLURM_JOBID, but the value still seems to be present, maybe srun forwards that. It’s the fact that HOME doesn’t get through using mpich-4.1.2 that is causing us problems.
Running mpich-
Here’s what I think is the relevant change for SLURM: https://urldefense.us/v3/__https://github.com/pmodels/mpich/commit/95ba4ddc7efc7ddc7f25ed41480ee35248184680__;!!G_uCfscf7eWS!eMxAu_HzK52_CYgh_dXfEsP6C185rIsVuorvaJMEtJb7x-4JJx7gvGGgBowWu3tLcQTAgtkbLENhHLit$ <https://urldefense.us/v3/__https:/github.com/pmodels/mpich/commit/95ba4ddc7efc7ddc7f25ed41480ee35248184680__;!!G_uCfscf7eWS!brQm1StWngU3EbSpC0Df2zQCAvifuBeZbPxODF7IvoCSVfssx6981wRQlhd_U21YOFIC7DJL8npL9gU$> . Am I reading that correctly?
The doc here https://urldefense.us/v3/__https://github.com/pmodels/mpich/blob/main/doc/wiki/how_to/Using_the_Hydra_Process_Manager.md*environment-settings__;Iw!!G_uCfscf7eWS!eMxAu_HzK52_CYgh_dXfEsP6C185rIsVuorvaJMEtJb7x-4JJx7gvGGgBowWu3tLcQTAgtkbLOLmAmhz$ <https://urldefense.us/v3/__https:/github.com/pmodels/mpich/blob/main/doc/wiki/how_to/Using_the_Hydra_Process_Manager.md*environment-settings__;Iw!!G_uCfscf7eWS!brQm1StWngU3EbSpC0Df2zQCAvifuBeZbPxODF7IvoCSVfssx6981wRQlhd_U21YOFIC7DJLq23wk4w$> states that SLURM_ things should be filtered out, but that doesn’t appear to be happening?
For reference, here’s what mpich-4.1.2 “mpiexec -verbose -launcher slurm” prints:
mpiexec options:
----------------
Base path: /path/to/mpich-4.1.2
Launcher: slurm
Debug level: 1
Enable X: -1
Global environment:
-------------------
SLURM_JOBID=102437
SLURM_JOB_USER=eellis
SLURM_JOB_QOS=normal
SLURM_JOB_NUM_NODES=2
SLURM_TASKS_PER_NODE=1(x2)
SLURM_TOPOLOGY_ADDR_PATTERN=node
… many more SLURM_*
And here’s what mpich-3.x prints:
mpiexec options:
----------------
Base path: /path/to/mpich-3.x
Launcher: slurm
Debug level: 1
Enable X: -1
Global environment:
-------------------
ALTERNATE_EDITOR=emacs
MAIL=/var/mail/eellis
USER=eellis
SLURM_JOB_USER=eellis
l=/local/eellis
XDG_SESSION_TYPE=unspecified
SLURM_JOB_QOS=normal
… no SLURM_JOBID
Cheers,
Edric.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240422/d21f675c/attachment-0001.html>
More information about the discuss
mailing list