[mpich-discuss] Environment variable forwarding using Hydra and "-launcher ssh"

Raffenetti, Ken raffenet at anl.gov
Mon Apr 22 12:06:53 CDT 2024


Hi Edric,

It looks like we may have unintentionally inverted the logic on that function return value. I’ll submit a PR to fix. Thanks for bringing it to our attention.

Ken

From: Edric Ellis via discuss <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Monday, April 22, 2024 at 9:44 AM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: Edric Ellis <eellis at mathworks.com>
Subject: [mpich-discuss] Environment variable forwarding using Hydra and "-launcher ssh"

We’re in the process of moving from mpich-3. x to mpich-4. 1. 2. We’ve run into some odd behaviour on SLURM related to environment variable forwarding by mpiexec. It looks like mpiexec now propagates only SLURM_* environment variables,
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
We’re in the process of moving from mpich-3.x to mpich-4.1.2. We’ve run into some odd behaviour on SLURM related to environment variable forwarding by mpiexec. It looks like mpiexec now propagates only SLURM_* environment variables, instead of filtering them out (or intending to). Consider something like this:

$ mpiexec -launcher slurm printenv HOME SLURM_JOBID

Using mpich-3.x, the HOME variable gets forward. Using mpich-4.1.2, it does not. I believe that mpich-3.x intends to filter out SLURM_JOBID, but the value still seems to be present, maybe srun forwards that. It’s the fact that HOME doesn’t get through using mpich-4.1.2 that is causing us problems.

Running mpich-

Here’s what I think is the relevant change for SLURM: https://urldefense.us/v3/__https://github.com/pmodels/mpich/commit/95ba4ddc7efc7ddc7f25ed41480ee35248184680__;!!G_uCfscf7eWS!eMxAu_HzK52_CYgh_dXfEsP6C185rIsVuorvaJMEtJb7x-4JJx7gvGGgBowWu3tLcQTAgtkbLENhHLit$ <https://urldefense.us/v3/__https:/github.com/pmodels/mpich/commit/95ba4ddc7efc7ddc7f25ed41480ee35248184680__;!!G_uCfscf7eWS!brQm1StWngU3EbSpC0Df2zQCAvifuBeZbPxODF7IvoCSVfssx6981wRQlhd_U21YOFIC7DJL8npL9gU$> . Am I reading that correctly?

The doc here https://urldefense.us/v3/__https://github.com/pmodels/mpich/blob/main/doc/wiki/how_to/Using_the_Hydra_Process_Manager.md*environment-settings__;Iw!!G_uCfscf7eWS!eMxAu_HzK52_CYgh_dXfEsP6C185rIsVuorvaJMEtJb7x-4JJx7gvGGgBowWu3tLcQTAgtkbLOLmAmhz$ <https://urldefense.us/v3/__https:/github.com/pmodels/mpich/blob/main/doc/wiki/how_to/Using_the_Hydra_Process_Manager.md*environment-settings__;Iw!!G_uCfscf7eWS!brQm1StWngU3EbSpC0Df2zQCAvifuBeZbPxODF7IvoCSVfssx6981wRQlhd_U21YOFIC7DJLq23wk4w$> states that SLURM_ things should be filtered out, but that doesn’t appear to be happening?

For reference, here’s what mpich-4.1.2 “mpiexec -verbose -launcher slurm” prints:

mpiexec options:
----------------
  Base path: /path/to/mpich-4.1.2
  Launcher: slurm
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    SLURM_JOBID=102437
    SLURM_JOB_USER=eellis
    SLURM_JOB_QOS=normal
    SLURM_JOB_NUM_NODES=2
    SLURM_TASKS_PER_NODE=1(x2)
    SLURM_TOPOLOGY_ADDR_PATTERN=node
    … many more SLURM_*

And here’s what mpich-3.x prints:

mpiexec options:
----------------
  Base path: /path/to/mpich-3.x
  Launcher: slurm
  Debug level: 1
  Enable X: -1

  Global environment:
  -------------------
    ALTERNATE_EDITOR=emacs
    MAIL=/var/mail/eellis
    USER=eellis
    SLURM_JOB_USER=eellis
    l=/local/eellis
    XDG_SESSION_TYPE=unspecified
    SLURM_JOB_QOS=normal
    … no SLURM_JOBID

Cheers,
Edric.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20240422/d21f675c/attachment-0001.html>


More information about the discuss mailing list