[mpich-discuss] Strange, inconsistent behaviour with MPI_Comm_spawn

Alexander Rast alex.rast.technical at gmail.com
Tue Jun 13 11:15:27 CDT 2017


OK, some progress. After more work I determined that the bulk of the truly
bizarre inconsistencies was down to obnoxious, intrusive behaviour in
Ubuntu of gnome-keyring-daemon, for example see the following post:

https://askubuntu.com/questions/564821/why-cant-i-interact-with-my-ssh-agent-e-g-ssh-add-d-doesnt-work

which by the way has not been fixed as of 16.04 Ubuntu.

It seems gnome-keyring-daemon is a particularly badly-behaved utility and
doesn't help itself with multiple autoload attempts at Ubuntu startup. You
don't know what ssh agent is actually loaded and which RSA keys it has
cached. It's also apparently very difficult to get rid of, although there
are ways. I got it eventually to stop loading itself. (Perhaps the MPI
community might whinge to Ubuntu about this behaviour? Many people have
complained about the antisocial behaviour of gnome-keyring-daemon but so
far Ubuntu's response has been: 'we can't see why this should be considered
a problem. And we have doubts about what you're trying to achieve')

So now the problem has got to the point where there are 2 alternative error
responses, both occurring at the MPI_Comm_spawn command. I've included
typical error outputs for both scenarios, using the code posted earlier.
The 2 errors occur with slightly different versions of the configuration
file used to spawn the processes, which I'm also including. Obviously _2
files go together.

Any thoughts now on what might be causing either of these 2 problems? I
find the gethostbyname failed messages particularly perplexing, since I'm
able to ssh into the machines themselves without difficulty either by name
or IP address.

On Fri, Jun 9, 2017 at 1:53 PM, Alexander Rast <
alex.rast.technical at gmail.com> wrote:

> I've reached a limit of mystification. Attempting to run an MPI
> application using MPI_Comm_spawn from a host is resulting in bizarre,
> inconsistent behaviour of ssh and ssh-askpass.
>
> What I did is, I created an RSA keypair using ssh-keygen, copied the
> public keys into the ./ssh directories on the machines I'll be running MPI
> on, put them in the authorized_keys file, placed all the machines in the
> known_hosts file on the launcher host (which is starting MPI_Comm_spawn),
> then ran eval ssh-agent and added the id_rsa file to the agent on the
> launcher host.
>
> You can verify that this part of the system is working because I can use
> ssh directly to access the worker machines that will be running the
> application.
>
> But when I actually try to run the MPI application, when it gets to the
> spawn, all sorts of wierd and wild stuff happens. Sometimes a dialogue
> (which aggressively grabs focus) comes up asking for a password (OpenSSH
> Authentication). Other times the same program has just said that the
> identity/authenticity of the target machine can't be established - do I
> want to continue? (A yes causes authentication to fail). In still other
> cases, it appeared to open the connection but then MPI crashed saying it
> couldn't get the host by name. (yes, every machine has the hostnames of
> every other machine in its hosts file). And in yet another case, it seemed
> to try to run but then crashed saying unexpected end-of-file. And so on.
> There seems to be no rhyme or reason to the errors, I can't reproduce
> anything, each time I try it some new and surprising behaviour comes up.
> What's happening? Do I have to do something unusual with machine
> configuration/environment?
>
> Here are the associated MPI files if anyone wants to look for errors. In
> fact there are probably some errors in the code itself, because it's never
> been able to be debugged (because of this wierd behaviour) but I am fairly
> sure at least the sequence through to the spawn command is OK. All help
> appreciated...
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170613/015b4edc/attachment.html>
-------------- next part --------------
-n 1 -host UOS-213873 ./MPI_Example_3_0
-n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_1
-n 1 -host pi at Burns /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_2
-------------- next part --------------
#-n 1 -host UOS-213873 /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0
-n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0
-n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_1
-n 1 -host pi at Burns /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_2
-------------- next part --------------
Spawn_info object: 0x7ffd414c86c0, 0x12f8820
Line number 0, info object 0x9c000000
Spawn_info object: 0x7ffd414c86c0, 0x12f8838
Line number 1, info object 0x9c000001
Spawn_info object: 0x7ffd414c86c0, 0x12f8850
Line number 2, info object 0x9c000002
max_line_count=75
Line number: 0
Line: -n 1 -host UOS-213873 ./MPI_Example_3_0

Spawn_info object for parsing: 0x7ffd414c86c0, 0x12f8820
Line number: 1
Line: -n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_1

Spawn_info object for parsing: 0x7ffd414c86c0, 0x12f8838
Line number: 2
Line: -n 1 -host pi at Burns /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_2

Spawn_info object for parsing: 0x7ffd414c86c0, 0x12f8850
Spawning 3 processes
Fatal error in MPI_Init: Other MPI error, error stack:
MPIR_Init_thread(474)..............: 
MPID_Init(190).....................: channel initialization failed
MPIDI_CH3_Init(89).................: 
MPID_nem_init(320).................: 
MPID_nem_tcp_init(173).............: 
MPID_nem_tcp_get_business_card(420): 
MPID_nem_tcp_init(379).............: gethostbyname failed, pi at Burns (errno 1)
-------------- next part --------------
Spawn_info object: 0x7fff436f4a20, 0x2455820
Line number 0, info object 0x9c000000
Spawn_info object: 0x7fff436f4a20, 0x2455838
Line number 1, info object 0x9c000001
Spawn_info object: 0x7fff436f4a20, 0x2455850
Line number 2, info object 0x9c000002
max_line_count=75
Line number: 0
Line: #-n 1 -host UOS-213873 /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0

Line number: 0
Line: -n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0

Spawn_info object for parsing: 0x7fff436f4a20, 0x2455820
Line number: 1
Line: -n 1 -host pi at Shakespeare /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_1

Spawn_info object for parsing: 0x7fff436f4a20, 0x2455838
Line number: 2
Line: -n 1 -host pi at Burns /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_2

Spawn_info object for parsing: 0x7fff436f4a20, 0x2455850
Spawning 3 processes
/home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0: 1: /home/pi/data/MPI/Dynamic_Procs/MPI_Example_3_0: Syntax error: end of file unexpected
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list