[mpich-discuss] [EXTERNAL] Re: too many ssh connections warning

Reuti reuti at staff.uni-marburg.de
Tue Dec 3 09:01:31 CST 2019


Hi:

> Am 03.12.2019 um 15:45 schrieb Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>:
> 
> Reuti,
> 
> Sorry, I forgot to mention that I am starting the job under PBS/Torque with the qsub command.

Then it should be possible to use the task manager interface without `ssh`:

http://docs.adaptivecomputing.com/torque/4-2-7/Content/topics/7-messagePassing/MPICH.htm


>   I'll check with our sysadmins to see if there are firewall issues.

This could also be later an issue if MPICH will connect to other machines directly to talk to the already started daemons.


>   What is PAM?

https://en.wikipedia.org/wiki/Linux_PAM

Several limits can be set here, depending on your distribution:

ls /lib64/security/

will show the available ones which are installed by default and are then used/configured in /etc/pam.d

-- Reuti


> Hui Zhou,
> 
> What do you expect would be making multiple SSH connections to the node?  The creation of inter-communicators?   Individual MPI_Iprobe/MPI_Isend/MPI_IRecv commands?  If you have a guess, that would help me know how to fix the problem.
> 
> Kurt
> 
> 
> -----Original Message-----
> From: Reuti <reuti at staff.uni-marburg.de> 
> Sent: Monday, December 2, 2019 3:20 PM
> To: discuss at mpich.org
> Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
> Subject: [EXTERNAL] Re: [mpich-discuss] too many ssh connections warning
> 
> 
>> Am 02.12.2019 um 22:14 schrieb Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>:
>> 
>> My application uses mainly inter-communicators rather than intra-communicators for fault tolerance.    A particular process might have 20 inter-communicators active at one time.   I’m receiving the warning
>> 
>> [mpiexec at n010.cluster.com] WARNING: too many ssh connections to n009.cluster.com; waiting 6 seconds
>> 
>> What is the cause of this?   I have several guesses:
>> 
>> 1)      MPICH has an internal limit on the number of  connections
>> 2)      I’m bumping up against a Linux limit on the number of connections
>> 3)      Non-blocking communication using MPI_Isend() creates a temporary ssh connection (not likely)
> 
> 4) Firewall or PAM settings on the target prevent to many logins in a certain timeframe.
> 
> Are you using a queuing system and have the chance to skip SSH and startup MPICH by the queuing system?
> 
> -- Reuti
> 
> 
>> The other question is, what are  the consequences of “waiting 6 seconds”?   Are some non-blocking messages dropped?
>> 
>> I’m using MPICH 3.3.2, CentOS 3.10 and the Portland Group compiler pgc++ 19.5.0.
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.mpich.org_mailman_listinfo_discuss&d=DwIFaQ&c=ApwzowJNAKKw3xye91w7BE1XMRKi2LN9kiMk5Csz9Zk&r=6cP1IfXu3IZOHSDh_vBqciYiIh4uuVgs1MSi5K7l5fQ&m=97JqnCQfN2Iy11xYlubB_AugrnlkH8C8vw4uQg6cJho&s=XXNw4ApjKsaCVdFY88_0_gD-tbjnIn4-0nxojl5hj6Y&e= 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list