[mpich-discuss] Running a program on multiple computers

Mahdi, Sam sam.mahdi.846 at my.csun.edu
Sat Oct 15 19:40:25 CDT 2016


Hello everyone,

I am attempting to run a single program on 32 cores split across 4
computers (So each computer has 8 cores). I am attempting to use mpich for
this. I currently am just testing on 2 computers, I have the program
installed on both, as well as mpich installed on both. I have created a
register key and can login in using ssh into the other computer without a
password. I have come across 2 problems. One, when I attempt to connect
using the mpirun -np 3 --host a (the IP of the computer I am attempting to
connect to) hostname
I recieve the error
 unable to connect from "localhost.localdomain" to "localhost.localdomain"

This is indicating my computers "localhost.localdomain" is attempting to
connect to another "localhost.localdomain". How can I change this so that
it connects via my IP to the other computers IP?

Secondly, I attempted to use a host file instead using the hydra process
wiki. I created a hosts file with just the IP of the computer I am
attempting to connect to. When I type in the command mpiexec -f hosts -n 4
./applic

I get this error
[mpiexec at localhost.localdomain] HYDU_parse_hostfile
(./utils/args/args.c:323): unable to open host file: hosts

along with other errors of unable to parse hostfile, match handler etc. I
assume this is all due to it being unable to read the host file. Is there
any specific place I should save my hosts file? I have it saved directly on
my Desktop. I have attempted to indicate the full path where it is located,
but I still get the same error.

For the first problem, I have read that I need to change /etc/hosts
manually by using the sudo command to manually enter the IP of the computer
I am attempting to connect to in the /etc/hosts file. I assume the computer
is attempting to connect to itself (set up the program first on its own
core, then send it to another, hence attempting to start it on
localhost.localdomain).

For the second problem, I have attempted to add run the command
 mpirun --host my computer IP, the other computer IP ./program

This now gives the error that localhost.localdomain cannot connect to my
computers IP (it cannot create a connection).

Sincerely,
Sam



Sincerely,
Sam
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20161015/be13ec74/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list