<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr"><div class="markdown-here-wrapper" style=""><p style="margin:1.2em 0px!important">When you pass <code style="font-size:0.85em;font-family:Consolas,Inconsolata,Courier,monospace;margin:0px 0.15em;padding:0px 0.3em;white-space:pre-wrap;border:1px solid rgb(234,234,234);background-color:rgb(248,248,248);border-top-left-radius:3px;border-top-right-radius:3px;border-bottom-right-radius:3px;border-bottom-left-radius:3px;display:inline">-disable-auto-cleanup</code> on the command line to mpiexec, you’re telling Hydra not to clean up other processes when one process in your job fails. It’s assumed that those processes will either clean themselves up or complete successfully.</p>
<p style="margin:1.2em 0px!important">It’s not clear to me what your program is trying to do that would be erroneous, but usually when a process crashes, it’s the result of an erroneous program rather than a bug in MPICH. I’m not saying that there’s no bugs in MPICH, but we’d like to be able to narrow down where to look.</p>
<p style="margin:1.2em 0px!important">Thanks,<br>Wesley</p>
<div title="MDH:V2hlbiB5b3UgcGFzcyBgLWRpc2FibGUtYXV0by1jbGVhbnVwYCBvbiB0aGUgY29tbWFuZCBsaW5l
IHRvIG1waWV4ZWMsIHlvdSdyZSB0ZWxsaW5nIEh5ZHJhIG5vdCB0byBjbGVhbiB1cCBvdGhlciBw
cm9jZXNzZXMgd2hlbiBvbmUgcHJvY2VzcyBpbiB5b3VyIGpvYiBmYWlscy4gSXQncyBhc3N1bWVk
IHRoYXQgdGhvc2UgcHJvY2Vzc2VzIHdpbGwgZWl0aGVyIGNsZWFuIHRoZW1zZWx2ZXMgdXAgb3Ig
Y29tcGxldGUgc3VjY2Vzc2Z1bGx5LjxkaXY+PGJyPjwvZGl2PjxkaXY+SXQncyBub3QgY2xlYXIg
dG8gbWUgd2hhdCB5b3VyIHByb2dyYW0gaXMgdHJ5aW5nIHRvIGRvIHRoYXQgd291bGQgYmUgZXJy
b25lb3VzLCBidXQgdXN1YWxseSB3aGVuIGEgcHJvY2VzcyBjcmFzaGVzLCBpdCdzIHRoZSByZXN1
bHQgb2YgYW4gZXJyb25lb3VzIHByb2dyYW0gcmF0aGVyIHRoYW4gYSBidWcgaW4gTVBJQ0guIEkn
bSBub3Qgc2F5aW5nIHRoYXQgdGhlcmUncyBubyBidWdzIGluIE1QSUNILCBidXQgd2UnZCBsaWtl
IHRvIGJlIGFibGUgdG8gbmFycm93IGRvd24gd2hlcmUgdG8gbG9vay48L2Rpdj48ZGl2Pjxicj48
L2Rpdj48ZGl2PlRoYW5rcyw8L2Rpdj48ZGl2Pldlc2xleTwvZGl2Pg==" style="height:0;font-size:0em;padding:0;margin:0">​</div></div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Jan 1, 2015 at 6:35 AM, Anatoly G <span dir="ltr"><<a href="mailto:anatolyrishon@gmail.com" target="_blank">anatolyrishon@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">



<div>
<div dir="ltr">
<div>Dear <span>MPICH</span>.</div>
<div>I have an additional information.</div>
<div>This "strange configuration" (hydra connected to computer not from the list) is result of
<span>unhandled</span> Main process fail (similar to abort() call) without killing children process (hydra). </div>
<div>Thus I can see "<span>init"</span> process becomes a father of hydra process. </div>
<div>Can you please refer me to document explaining hydra behavior when father process is dead (an emergency situation).</div>
<div>I understand that this situation shouldn't happen and this bug will be fixed, but I'm curious about the hydra logic.</div>
<div><br>
</div>
<div>Regards,</div>
<div><span>Anatoly</span>.</div>
<br>
<div class="gmail_quote">---------- Forwarded message ----------<br>
From: <b class="gmail_sendername"><span>Anatoly</span> G</b>
<span dir="ltr"><<span>anatolyrishon</span>@<a href="http://gmail.com" target="_blank">gmail.com</a>></span><br>
Date: Wed, Dec 24, 2014 at 1:00 PM<br>
Subject: <span>mpiexec</span>.hydra creates <span>
unexpectable</span> <span>TCP</span> socket.<br>
To: discuss@<span>mpich</span>.org<br>
<br>
<br>
<div dir="ltr">Dear <span><span>MPICH</span></span>.
<div>I'm using <span><span>mpich</span></span> 3.1 (hydra+<span><span>MPI</span></span>).</div>
<div>I execute main application (Main) which calls <span><span>mpiexec</span></span>.hydra in following way:</div>
<div><br>
</div>
<div><span><span>mpiexec</span></span>.hydra -<span><span>genvall</span></span>  -disable-auto-cleanup  -f
<span><span>MpiConfigMachines</span></span>.<span><span>txt</span></span> -launcher=ssh -n 3
<span><span>MPI</span></span>_<span><span>Prog</span></span> <br>
</div>
<div><br>
</div>
<div><span><span>MpiConfigMachines</span></span>.<span><span>txt</span></span> content:<br>
</div>
<div>
<div><a href="http://10.3.2.100:1" target="_blank">10.3.2.100:1</a></div>
<div><a href="http://10.3.2.101:2" target="_blank">10.3.2.101:2</a></div>
</div>
<div><br>
</div>
<div>Where 10.3.2.100 is a local host.</div>
<div>As result I get</div>
<div>
<ul>
<li>Main + single <span><span>MPI</span></span>_<span><span>Prog</span></span> processes on local computer<br>
</li><li>2 <span><span>MPI</span></span>_<span><span>Prog</span></span> processes on remote one.</li></ul>
<div>Main application establish <span><span>TCP</span></span> socket with local
<span><span>MPI</span></span>_<span><span>Prog</span></span>.</div>
</div>
<div>Main application establish <span><span>TCP</span></span> socket with controller on other computer 10.3.2.170, which is not included in
<span><span>MpiConfigMachines</span></span>.<span><span>txt</span></span> file.</div>
<div><br>
</div>
<div>After executing some time (hours, sometimes days) I see via <span><span>netstat</span></span> that created new connection from
<span><span>mpiexec</span></span>.hydra and controller. </div>
<div><br>
</div>
<div>Before executing <span><span>mpiexec</span></span>.hydra I set environment variable</div>
<div>
<p class="MsoNormal"><span><span>setenv</span></span>
<span><span>MPIEXEC</span></span>_PORT_RANGE 50010:65535</p>
<p class="MsoNormal">According to manual this variable limits hydra destination ports to [50010:65535].</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">I see that hydra uses these ports with <span><span>MPI</span></span>_<span><span>Prog</span></span>, but connection with controller done on port 701 (controller computer).</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">Controller program is a server. It can accept connections only.<br>
</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">Can you please advice how to stand with this problem?</p>
<p class="MsoNormal">How hydra recognizes controller <span><span>IP</span></span> and establish connection with it?</p>
<p class="MsoNormal"><br>
</p>
<p class="MsoNormal">Sincerely,</p>
<p class="MsoNormal"><span><span>Anatoly</span></span>.</p>
</div>
<div><br>
</div>
</div>
</div>
<br>
</div>
</div>

</blockquote></div><br></div>