[mpich-discuss] How to set port range used under LSF?

Sendu Bala sb10 at sanger.ac.uk
Tue Mar 9 07:15:21 CST 2021


Via a bsub, I’m doing:

MPICH_PORT_RANGE="46107:46140” mpiexec mpich/examples/cpi

When I ssh to the controlling node, I see it has spawned a set of blaunch processes with `--control-port node-12-3-2:46107` as expected, but:

ss -l -p -n | grep blaunch
tcp               LISTEN              0                    128                                                                 0.0.0.0:34361            0.0.0.0:*                                                                                users:(("blaunch",pid=2823,fd=7))
tcp               LISTEN              0                    128                                                                 0.0.0.0:46107            0.0.0.0:*                                                                                users:(("cpi",pid=2839,fd=5),("blaunch",pid=2837,fd=5),("blaunch",pid=2836,fd=5),("blaunch",pid=2835,fd=5),("blaunch",pid=2834,fd=5),("blaunch",pid=2833,fd=5),("blaunch",pid=2832,fd=5),("blaunch",pid=2831,fd=5),("blaunch",pid=2830,fd=5),("blaunch",pid=2829,fd=5),("blaunch",pid=2828,fd=5),("blaunch",pid=2827,fd=5),("blaunch",pid=2826,fd=5),("blaunch",pid=2825,fd=5),("blaunch",pid=2824,fd=5),("blaunch",pid=2823,fd=5),("hydra_pmi_proxy",pid=2822,fd=5),("mpiexec",pid=2821,fd=5))
tcp               LISTEN              0                    128                                                                 0.0.0.0:43741            0.0.0.0:*                                                                                users:(("blaunch",pid=2825,fd=12))
tcp               LISTEN              0                    128                                                                 0.0.0.0:41983            0.0.0.0:*                                                                                users:(("blaunch",pid=2830,fd=22))
tcp               LISTEN              0                    128                                                                 0.0.0.0:41215            0.0.0.0:*                                                                                users:(("blaunch",pid=2832,fd=26))
tcp               LISTEN              0                    128                                                                 0.0.0.0:34433            0.0.0.0:*                                                                                users:(("blaunch",pid=2831,fd=24))
tcp               LISTEN              0                    128                                                                 0.0.0.0:33219            0.0.0.0:*                                                                                users:(("blaunch",pid=2827,fd=16))
tcp               LISTEN              0                    128                                                                 0.0.0.0:34405            0.0.0.0:*                                                                                users:(("blaunch",pid=2837,fd=36))
tcp               LISTEN              0                    128                                                                 0.0.0.0:43465            0.0.0.0:*                                                                                users:(("blaunch",pid=2836,fd=34))
tcp               LISTEN              0                    128                                                                 0.0.0.0:39755            0.0.0.0:*                                                                                users:(("blaunch",pid=2833,fd=28))
tcp               LISTEN              0                    128                                                                 0.0.0.0:38095            0.0.0.0:*                                                                                users:(("blaunch",pid=2829,fd=20))
tcp               LISTEN              0                    128                                                                 0.0.0.0:44625            0.0.0.0:*                                                                                users:(("blaunch",pid=2834,fd=30))
tcp               LISTEN              0                    128                                                                 0.0.0.0:35345            0.0.0.0:*                                                                                users:(("blaunch",pid=2835,fd=32))
tcp               LISTEN              0                    128                                                                 0.0.0.0:43827            0.0.0.0:*                                                                                users:(("blaunch",pid=2826,fd=14))
tcp               LISTEN              0                    128                                                                 0.0.0.0:40915            0.0.0.0:*                                                                                users:(("blaunch",pid=2828,fd=18))
tcp               LISTEN              0                    128                                                                 0.0.0.0:42549            0.0.0.0:*                                                                                users:(("blaunch",pid=2824,fd=9))

Why are these all listening on ports outside my range? I’ve also tried setting MPIEXEC_PORT_RANGE and MPIR_CVAR_CH3_PORT_RANGE and still have the problem.

Is there any way to fully control the ports used?


Cheers,
Sendu.




-- 
 The Wellcome Sanger Institute is operated by Genome Research 
 Limited, a charity registered in England with number 1021457 and a 
 company registered in England with number 2742969, whose registered 
 office is 215 Euston Road, London, NW1 2BE.


More information about the discuss mailing list