<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div>Dear <span class="" id=":djs.1" tabindex="-1">MPICH</span>.</div><div>I have an additional information.</div><div>This "strange configuration" (hydra connected to computer not from the list) is result of <span class="" id=":djs.2" tabindex="-1">unhandled</span> Main process fail (similar to abort() call) without killing children process (hydra). </div><div>Thus I can see "<span class="" id=":djs.3" tabindex="-1">init"</span> process becomes a father of hydra process. </div><div>Can you please refer me to document explaining hydra behavior when father process is dead (an emergency situation).</div><div>I understand that this situation shouldn't happen and this bug will be fixed, but I'm curious about the hydra logic.</div><div><br></div><div>Regards,</div><div><span class="" id=":djs.4" tabindex="-1">Anatoly</span>.</div><br><div class="gmail_quote">---------- Forwarded message ----------<br>From: <b class="gmail_sendername"><span class="" id=":djs.5" tabindex="-1">Anatoly</span> G</b> <span dir="ltr"><<span class="" id=":djs.6" tabindex="-1">anatolyrishon</span>@<a href="http://gmail.com">gmail.com</a>></span><br>Date: Wed, Dec 24, 2014 at 1:00 PM<br>Subject: <span class="" id=":djs.7" tabindex="-1">mpiexec</span>.hydra creates <span class="" id=":djs.8" tabindex="-1">unexpectable</span> <span class="" id=":djs.9" tabindex="-1">TCP</span> socket.<br>To: discuss@<span class="" id=":djs.10" tabindex="-1">mpich</span>.org<br><br><br><div dir="ltr">Dear <span><span class="" id=":djs.11" tabindex="-1">MPICH</span></span>.<div>I'm using <span><span class="" id=":djs.12" tabindex="-1">mpich</span></span> 3.1 (hydra+<span><span class="" id=":djs.13" tabindex="-1">MPI</span></span>).</div><div>I execute main application (Main) which calls <span><span class="" id=":djs.14" tabindex="-1">mpiexec</span></span>.hydra in following way:</div><div><br></div><div><span><span class="" id=":djs.15" tabindex="-1">mpiexec</span></span>.hydra -<span><span class="" id=":djs.16" tabindex="-1">genvall</span></span>  -disable-auto-cleanup  -f <span><span class="" id=":djs.17" tabindex="-1">MpiConfigMachines</span></span>.<span><span class="" id=":djs.18" tabindex="-1">txt</span></span> -launcher=ssh -n 3 <span><span class="" id=":djs.19" tabindex="-1">MPI</span></span>_<span><span class="" id=":djs.20" tabindex="-1">Prog</span></span> <br></div><div><br></div><div><span><span class="" id=":djs.21" tabindex="-1">MpiConfigMachines</span></span>.<span><span class="" id=":djs.22" tabindex="-1">txt</span></span> content:<br></div><div><div><a href="http://10.3.2.100:1" target="_blank">10.3.2.100:1</a></div><div><a href="http://10.3.2.101:2" target="_blank">10.3.2.101:2</a></div></div><div><br></div><div>Where 10.3.2.100 is a local host.</div><div>As result I get</div><div><ul><li>Main + single <span><span class="" id=":djs.23" tabindex="-1">MPI</span></span>_<span><span class="" id=":djs.24" tabindex="-1">Prog</span></span> processes on local computer<br></li><li>2 <span><span class="" id=":djs.25" tabindex="-1">MPI</span></span>_<span><span class="" id=":djs.26" tabindex="-1">Prog</span></span> processes on remote one.</li></ul><div>Main application establish <span><span class="" id=":djs.27" tabindex="-1">TCP</span></span> socket with local <span><span class="" id=":djs.28" tabindex="-1">MPI</span></span>_<span><span class="" id=":djs.29" tabindex="-1">Prog</span></span>.</div></div><div>Main application establish <span><span class="" id=":djs.30" tabindex="-1">TCP</span></span> socket with controller on other computer 10.3.2.170, which is not included in <span><span class="" id=":djs.31" tabindex="-1">MpiConfigMachines</span></span>.<span><span class="" id=":djs.32" tabindex="-1">txt</span></span> file.</div><div><br></div><div>After executing some time (hours, sometimes days) I see via <span><span class="" id=":djs.33" tabindex="-1">netstat</span></span> that created new connection from <span><span class="" id=":djs.34" tabindex="-1">mpiexec</span></span>.hydra and controller. </div><div><br></div><div>Before executing <span><span class="" id=":djs.35" tabindex="-1">mpiexec</span></span>.hydra I set environment variable</div><div><p class="MsoNormal"><span><span class="" id=":djs.36" tabindex="-1">setenv</span></span> <span><span class="" id=":djs.37" tabindex="-1">MPIEXEC</span></span>_PORT_RANGE 50010:65535</p><p class="MsoNormal">According to manual this variable limits hydra destination ports to [50010:65535].</p><p class="MsoNormal"><br></p><p class="MsoNormal">I see that hydra uses these ports with <span><span class="" id=":djs.38" tabindex="-1">MPI</span></span>_<span><span class="" id=":djs.39" tabindex="-1">Prog</span></span>, but connection with controller done on port 701 (controller computer).</p><p class="MsoNormal"><br></p><p class="MsoNormal">Controller program is a server. It can accept connections only.<br></p><p class="MsoNormal"><br></p><p class="MsoNormal">Can you please advice how to stand with this problem?</p><p class="MsoNormal">How hydra recognizes controller <span><span class="" id=":djs.40" tabindex="-1">IP</span></span> and establish connection with it?</p><p class="MsoNormal"><br></p><p class="MsoNormal">Sincerely,</p><p class="MsoNormal"><span><span class="" id=":djs.41" tabindex="-1">Anatoly</span></span>.</p></div><div><br></div></div>
</div><br></div>