[mpich-discuss] Strange, inconsistent behaviour with MPI_Comm_spawn

Alexander Rast alex.rast.technical at gmail.com
Fri Jun 9 07:53:57 CDT 2017


I've reached a limit of mystification. Attempting to run an MPI application
using MPI_Comm_spawn from a host is resulting in bizarre, inconsistent
behaviour of ssh and ssh-askpass.

What I did is, I created an RSA keypair using ssh-keygen, copied the public
keys into the ./ssh directories on the machines I'll be running MPI on, put
them in the authorized_keys file, placed all the machines in the
known_hosts file on the launcher host (which is starting MPI_Comm_spawn),
then ran eval ssh-agent and added the id_rsa file to the agent on the
launcher host.

You can verify that this part of the system is working because I can use
ssh directly to access the worker machines that will be running the
application.

But when I actually try to run the MPI application, when it gets to the
spawn, all sorts of wierd and wild stuff happens. Sometimes a dialogue
(which aggressively grabs focus) comes up asking for a password (OpenSSH
Authentication). Other times the same program has just said that the
identity/authenticity of the target machine can't be established - do I
want to continue? (A yes causes authentication to fail). In still other
cases, it appeared to open the connection but then MPI crashed saying it
couldn't get the host by name. (yes, every machine has the hostnames of
every other machine in its hosts file). And in yet another case, it seemed
to try to run but then crashed saying unexpected end-of-file. And so on.
There seems to be no rhyme or reason to the errors, I can't reproduce
anything, each time I try it some new and surprising behaviour comes up.
What's happening? Do I have to do something unusual with machine
configuration/environment?

Here are the associated MPI files if anyone wants to look for errors. In
fact there are probably some errors in the code itself, because it's never
been able to be debugged (because of this wierd behaviour) but I am fairly
sure at least the sequence through to the spawn command is OK. All help
appreciated...
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170609/9dc3e335/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: Dynamic_Procs.tar.gz
Type: application/x-gzip
Size: 9907 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170609/9dc3e335/attachment.gz>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list