[mpich-discuss] Hydra WARNING: too many ssh connections

Zhou, Hui zhouh at anl.gov
Fri Apr 1 16:24:47 CDT 2022


The spawned process launches the same way the first MPI_COMM_WORLD is launched, i.e. in a round-robin fashion through the list of nodes.

--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Sent: Friday, April 1, 2022 4:22 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Subject: Re: Hydra WARNING: too many ssh connections


Thanks Hui,  is the spawned process on the local host, or the remote host or both?



Kurt



From: Zhou, Hui <zhouh at anl.gov>
Sent: Friday, April 1, 2022 4:20 PM
To: discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [EXTERNAL] Re: Hydra WARNING: too many ssh connections



Every time you call MPI_Comm_spawn, hydra will launch a ssh (for each host) to create a proxy. It is certainly not ideal for applications relying on spawning many processes.

________________________________

From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>>
Sent: Friday, April 1, 2022 4:08 PM
To: discuss at mpich.org<mailto:discuss at mpich.org> <discuss at mpich.org<mailto:discuss at mpich.org>>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov<mailto:kurt.e.mccall at nasa.gov>>
Subject: [mpich-discuss] Hydra WARNING: too many ssh connections



Hi,  you provided the following information about the warning “too many ssh connections”:



The particular warning is issued by hydra, MPICH’s process manager. Following excerpt is the comment in that source code:



        /* ssh has many types of security controls that do not allow a

         * user to ssh to the same node multiple times very

         * quickly. If this happens, the ssh daemons disables ssh

         * connections causing the job to fail. This is basically a

         * hack to slow down ssh connections to the same node. We

         * check for offset == 0 before applying this hack, so we only

         * slow down the cases where ssh is being used, and not the

         * cases where we fall back to fork. */



Is this just during an initial ssh connection attempt?  I’m trying to figure out where my code is triggering this warning.  Could it be from



  1.  MPI_Intercomm_create
  2.  MPI_Comm_spawn
  3.  others?



I’m calling mpiexec with the “—launcher ssh” option, MPICH 4.0.1.



Thanks,

Kurt




-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220401/8dc99e1d/attachment.html>


More information about the discuss mailing list