[mpich-discuss] too many ssh connections warning

Zhou, Hui zhouh at anl.gov
Mon Dec 2 15:26:06 CST 2019


The particular warning is issued by hydra, MPICH’s process manager. Following excerpt is the comment in that source code:

```
        /* ssh has many types of security controls that do not allow a
         * user to ssh to the same node multiple times very
         * quickly. If this happens, the ssh daemons disables ssh
         * connections causing the job to fail. This is basically a
         * hack to slow down ssh connections to the same node. We
         * check for offset == 0 before applying this hack, so we only
         * slow down the cases where ssh is being used, and not the
         * cases where we fall back to fork. */

```

—
Hui Zhou









On Dec 2, 2019, at 3:14 PM, Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:

My application uses mainly inter-communicators rather than intra-communicators for fault tolerance.    A particular process might have 20 inter-communicators active at one time.   I’m receiving the warning

[mpiexec at n010.cluster.com<mailto:mpiexec at n010.cluster.com>] WARNING: too many ssh connections to n009.cluster.com<http://n009.cluster.com/>; waiting 6 seconds

What is the cause of this?   I have several guesses:

1)      MPICH has an internal limit on the number of  connections
2)      I’m bumping up against a Linux limit on the number of connections
3)      Non-blocking communication using MPI_Isend() creates a temporary ssh connection (not likely)

The other question is, what are  the consequences of “waiting 6 seconds”?   Are some non-blocking messages dropped?

I’m using MPICH 3.3.2, CentOS 3.10 and the Portland Group compiler pgc++ 19.5.0.


_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20191202/25e90c0b/attachment-0001.html>


More information about the discuss mailing list