[mpich-discuss] too many ssh connections warning

Reuti reuti at staff.uni-marburg.de
Mon Dec 2 15:20:02 CST 2019


> Am 02.12.2019 um 22:14 schrieb Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>:
> 
> My application uses mainly inter-communicators rather than intra-communicators for fault tolerance.    A particular process might have 20 inter-communicators active at one time.   I’m receiving the warning
>  
> [mpiexec at n010.cluster.com] WARNING: too many ssh connections to n009.cluster.com; waiting 6 seconds
>  
> What is the cause of this?   I have several guesses:
>  
> 1)      MPICH has an internal limit on the number of  connections
> 2)      I’m bumping up against a Linux limit on the number of connections
> 3)      Non-blocking communication using MPI_Isend() creates a temporary ssh connection (not likely)

4) Firewall or PAM settings on the target prevent to many logins in a certain timeframe.

Are you using a queuing system and have the chance to skip SSH and startup MPICH by the queuing system?

-- Reuti


> The other question is, what are  the consequences of “waiting 6 seconds”?   Are some non-blocking messages dropped?
>  
> I’m using MPICH 3.3.2, CentOS 3.10 and the Portland Group compiler pgc++ 19.5.0.
>  
>  
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list