<html xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=Windows-1252">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
        {font-family:"Cambria Math";
        panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
        {font-family:Calibri;
        panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
        {margin:0in;
        font-size:11.0pt;
        font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
        {mso-style-priority:99;
        color:blue;
        text-decoration:underline;}
span.apple-converted-space
        {mso-style-name:apple-converted-space;}
span.EmailStyle19
        {mso-style-type:personal-reply;
        font-family:"Calibri",sans-serif;
        color:windowtext;}
.MsoChpDefault
        {mso-style-type:export-only;
        font-size:10.0pt;}
@page WordSection1
        {size:8.5in 11.0in;
        margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
        {page:WordSection1;}
--></style>
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hi Sendu,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Sorry for not understanding the issue earlier. `blaunch` handles the connection itself so hydra does not need do that. Therefore, port range option does not apply (ignored). It doesn’t look like `blaunch` has a similar port range option.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div>
<div>
<p class="MsoNormal">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:.5in">
<b><span style="font-size:12.0pt;color:black">From: </span></b><span style="font-size:12.0pt;color:black">Sendu Bala <sb10@sanger.ac.uk><br>
<b>Date: </b>Tuesday, March 9, 2021 at 5:36 PM<br>
<b>To: </b>Zhou, Hui <zhouh@anl.gov><br>
<b>Cc: </b>discuss@mpich.org <discuss@mpich.org><br>
<b>Subject: </b>Re: How to set port range used under LSF? [EXT]<o:p></o:p></span></p>
</div>
<p class="MsoNormal" style="margin-left:.5in">Hi, <o:p></o:p></p>
<div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">As noted, I had already tried MPIEXEC_PORT_RANGE with no difference.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">I don’t know if it’s normal for blaunch to just have its own ports and this is nothing to do with mpich or mpiexec at all, but the follow on issue is if I ssh to one of the other nodes, nothing is listening on the
 control port. I only see a process listening on one of these other ports outside my specified range. Which I think might be related to my fundamental problem of mpiexec failing to do anything under LSF most of the time at high host counts (see my previous
 thread on this list).<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
</div>
<div>
<div>
<p class="MsoNormal" style="margin-left:.5in"><br>
<br>
<o:p></o:p></p>
<blockquote style="margin-top:5.0pt;margin-bottom:5.0pt">
<div>
<p class="MsoNormal" style="margin-left:.5in">On 9 Mar 2021, at 16:21, Zhou, Hui <<a href="mailto:zhouh@anl.gov">zhouh@anl.gov</a>> wrote:<o:p></o:p></p>
</div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
<div>
<div>
<p class="MsoNormal" style="margin-left:.5in">Hi Sendu,<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">Could you try `MPIEXEC_PORT_RANGE` in stead?<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in">I know it is confusing and the documentation probably need update/correction, but `MPICH_PORT_RANGE` is for `MPICH` rather than `MPIEXEC`.<o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<div>
<div>
<div>
<div>
<p class="MsoNormal" style="margin-left:.5in">-- <br>
Hui Zhou<o:p></o:p></p>
</div>
</div>
</div>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<div>
<p class="MsoNormal" style="margin-left:.5in"> <o:p></o:p></p>
</div>
<div style="border:none;border-top:solid #B5C4DF 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal" style="mso-margin-top-alt:0in;margin-right:0in;margin-bottom:12.0pt;margin-left:1.0in">
<b><span style="font-size:12.0pt">From:<span class="apple-converted-space"> </span></span></b><span style="font-size:12.0pt">Sendu Bala via discuss <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Date:<span class="apple-converted-space"> </span></b>Tuesday, March 9, 2021 at 7:16 AM<br>
<b>To:<span class="apple-converted-space"> </span></b><a href="mailto:discuss@mpich.org">discuss@mpich.org</a> <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Cc:<span class="apple-converted-space"> </span></b>Sendu Bala <<a href="mailto:sb10@sanger.ac.uk">sb10@sanger.ac.uk</a>><br>
<b>Subject:<span class="apple-converted-space"> </span></b>[mpich-discuss] How to set port range used under LSF?</span><o:p></o:p></p>
</div>
<div>
<div style="margin-left:.5in">
<p class="MsoNormal" style="margin-left:.5in">Via a bsub, I’m doing:<br>
<br>
MPICH_PORT_RANGE="46107:46140” mpiexec mpich/examples/cpi<br>
<br>
When I ssh to the controlling node, I see it has spawned a set of blaunch processes with `--control-port node-12-3-2:46107` as expected, but:<br>
<br>
ss -l -p -n | grep blaunch<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:34361            0.0.0.0:*                                                                                users:(("blaunch",pid=2823,fd=7))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:46107            0.0.0.0:*                                                                                users:(("cpi",pid=2839,fd=5),("blaunch",pid=2837,fd=5),("blaunch",pid=2836,fd=5),("blaunch",pid=2835,fd=5),("blaunch",pid=2834,fd=5),("blaunch",pid=2833,fd=5),("blaunch",pid=2832,fd=5),("blaunch",pid=2831,fd=5),("blaunch",pid=2830,fd=5),("blaunch",pid=2829,fd=5),("blaunch",pid=2828,fd=5),("blaunch",pid=2827,fd=5),("blaunch",pid=2826,fd=5),("blaunch",pid=2825,fd=5),("blaunch",pid=2824,fd=5),("blaunch",pid=2823,fd=5),("hydra_pmi_proxy",pid=2822,fd=5),("mpiexec",pid=2821,fd=5))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:43741            0.0.0.0:*                                                                                users:(("blaunch",pid=2825,fd=12))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:41983            0.0.0.0:*                                                                                users:(("blaunch",pid=2830,fd=22))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:41215            0.0.0.0:*                                                                                users:(("blaunch",pid=2832,fd=26))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:34433            0.0.0.0:*                                                                                users:(("blaunch",pid=2831,fd=24))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:33219            0.0.0.0:*                                                                                users:(("blaunch",pid=2827,fd=16))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:34405            0.0.0.0:*                                                                                users:(("blaunch",pid=2837,fd=36))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:43465            0.0.0.0:*                                                                                users:(("blaunch",pid=2836,fd=34))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:39755            0.0.0.0:*                                                                                users:(("blaunch",pid=2833,fd=28))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:38095            0.0.0.0:*                                                                                users:(("blaunch",pid=2829,fd=20))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:44625            0.0.0.0:*                                                                                users:(("blaunch",pid=2834,fd=30))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:35345            0.0.0.0:*                                                                                users:(("blaunch",pid=2835,fd=32))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:43827            0.0.0.0:*                                                                                users:(("blaunch",pid=2826,fd=14))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:40915            0.0.0.0:*                                                                                users:(("blaunch",pid=2828,fd=18))<br>
tcp               LISTEN              0                    128                                                                 0.0.0.0:42549            0.0.0.0:*                                                                                users:(("blaunch",pid=2824,fd=9))<br>
<br>
Why are these all listening on ports outside my range? I’ve also tried setting MPIEXEC_PORT_RANGE and MPIR_CVAR_CH3_PORT_RANGE and still have the problem.<br>
<br>
Is there any way to fully control the ports used?<br>
<br>
<br>
Cheers,<br>
Sendu.<br>
<br>
<br>
<br>
<br>
--<span class="apple-converted-space"> </span><br>
 The Wellcome Sanger Institute is operated by Genome Research<span class="apple-converted-space"> </span><br>
 Limited, a charity registered in England with number 1021457 and a<span class="apple-converted-space"> </span><br>
 company registered in England with number 2742969, whose registered<span class="apple-converted-space"> </span><br>
 office is 215 Euston Road, London, NW1 2BE.<br>
_______________________________________________<br>
discuss mailing list    <span class="apple-converted-space"> </span><a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://urldefense.proofpoint.com/v2/url?u=https-3A__lists.mpich.org_mailman_listinfo_discuss&d=DwMF-g&c=D7ByGjS34AllFgecYw0iC6Zq7qlm8uclZFI0SqQnqBo&r=R4ZUzQZ7_TZ1SVV_pAmysrrJ1zatMHFpzMNAdJSpPIo&m=CuEiWIvEL-XMgdluJKtxPAWm-NPb8DmcQkMrblVojU0&s=PWAs4LCPVtlyPby567cPJCzNtS471MUxw-iuL_53ZlY&e=">https://lists.mpich.org/mailman/listinfo/discuss
 [lists.mpich.org]</a><o:p></o:p></p>
</div>
</div>
</div>
</blockquote>
</div>
<p class="MsoNormal" style="margin-left:.5in"><o:p> </o:p></p>
</div>
<p class="MsoNormal" style="margin-left:.5in">-- The Wellcome Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215
 Euston Road, London, NW1 2BE. <o:p></o:p></p>
</div>
</body>
</html>