[mpich-discuss] [EXTERNAL] Re: too many ssh connections warning
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Tue Dec 3 08:45:23 CST 2019
Reuti,
Sorry, I forgot to mention that I am starting the job under PBS/Torque with the qsub command. I'll check with our sysadmins to see if there are firewall issues. What is PAM?
Hui Zhou,
What do you expect would be making multiple SSH connections to the node? The creation of inter-communicators? Individual MPI_Iprobe/MPI_Isend/MPI_IRecv commands? If you have a guess, that would help me know how to fix the problem.
Kurt
-----Original Message-----
From: Reuti <reuti at staff.uni-marburg.de>
Sent: Monday, December 2, 2019 3:20 PM
To: discuss at mpich.org
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [EXTERNAL] Re: [mpich-discuss] too many ssh connections warning
> Am 02.12.2019 um 22:14 schrieb Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>:
>
> My application uses mainly inter-communicators rather than intra-communicators for fault tolerance. A particular process might have 20 inter-communicators active at one time. I’m receiving the warning
>
> [mpiexec at n010.cluster.com] WARNING: too many ssh connections to n009.cluster.com; waiting 6 seconds
>
> What is the cause of this? I have several guesses:
>
> 1) MPICH has an internal limit on the number of connections
> 2) I’m bumping up against a Linux limit on the number of connections
> 3) Non-blocking communication using MPI_Isend() creates a temporary ssh connection (not likely)
4) Firewall or PAM settings on the target prevent to many logins in a certain timeframe.
Are you using a queuing system and have the chance to skip SSH and startup MPICH by the queuing system?
-- Reuti
> The other question is, what are the consequences of “waiting 6 seconds”? Are some non-blocking messages dropped?
>
> I’m using MPICH 3.3.2, CentOS 3.10 and the Portland Group compiler pgc++ 19.5.0.
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.mpich.org_mailman_listinfo_discuss&d=DwIFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=6cP1IfXu3IZOHSDh_vBqciYiIh4uuVgs1MSi5K7l5fQ&m=97JqnCQfN2Iy11xYlubB_AugrnlkH8C8vw4uQg6cJho&s=XXNw4ApjKsaCVdFY88_0_gD-tbjnIn4-0nxojl5hj6Y&e=
More information about the discuss
mailing list