[mpich-discuss] Fwd: mpiexec.hydra creates unexpectable TCP socket.
Wesley Bland
wbland at anl.gov
Mon Jan 5 11:09:04 CST 2015
When you pass -disable-auto-cleanup on the command line to mpiexec, you’re
telling Hydra not to clean up other processes when one process in your job
fails. It’s assumed that those processes will either clean themselves up or
complete successfully.
It’s not clear to me what your program is trying to do that would be
erroneous, but usually when a process crashes, it’s the result of an
erroneous program rather than a bug in MPICH. I’m not saying that there’s
no bugs in MPICH, but we’d like to be able to narrow down where to look.
Thanks,
Wesley
On Thu, Jan 1, 2015 at 6:35 AM, Anatoly G <anatolyrishon at gmail.com> wrote:
> Dear MPICH.
> I have an additional information.
> This "strange configuration" (hydra connected to computer not from the
> list) is result of unhandled Main process fail (similar to abort() call)
> without killing children process (hydra).
> Thus I can see "init" process becomes a father of hydra process.
> Can you please refer me to document explaining hydra behavior when father
> process is dead (an emergency situation).
> I understand that this situation shouldn't happen and this bug will be
> fixed, but I'm curious about the hydra logic.
>
> Regards,
> Anatoly.
>
> ---------- Forwarded message ----------
> From: Anatoly G <anatolyrishon at gmail.com>
> Date: Wed, Dec 24, 2014 at 1:00 PM
> Subject: mpiexec.hydra creates unexpectable TCP socket.
> To: discuss at mpich.org
>
>
> Dear MPICH.
> I'm using mpich 3.1 (hydra+MPI).
> I execute main application (Main) which calls mpiexec.hydra in following
> way:
>
> mpiexec.hydra -genvall -disable-auto-cleanup -f MpiConfigMachines.txt
> -launcher=ssh -n 3 MPI_Prog
>
> MpiConfigMachines.txt content:
> 10.3.2.100:1
> 10.3.2.101:2
>
> Where 10.3.2.100 is a local host.
> As result I get
>
> - Main + single MPI_Prog processes on local computer
> - 2 MPI_Prog processes on remote one.
>
> Main application establish TCP socket with local MPI_Prog.
> Main application establish TCP socket with controller on other computer
> 10.3.2.170, which is not included in MpiConfigMachines.txt file.
>
> After executing some time (hours, sometimes days) I see via netstat that
> created new connection from mpiexec.hydra and controller.
>
> Before executing mpiexec.hydra I set environment variable
>
> setenv MPIEXEC_PORT_RANGE 50010:65535
>
> According to manual this variable limits hydra destination ports to
> [50010:65535].
>
>
> I see that hydra uses these ports with MPI_Prog, but connection with
> controller done on port 701 (controller computer).
>
>
> Controller program is a server. It can accept connections only.
>
>
> Can you please advice how to stand with this problem?
>
> How hydra recognizes controller IP and establish connection with it?
>
>
> Sincerely,
>
> Anatoly.
>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150105/ac247a33/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list