[mpich-discuss] MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection timed out

Balaji, Pavan balaji at anl.gov
Wed Mar 16 00:01:39 CDT 2016


> On Mar 15, 2016, at 10:26 PM, amelie chi zhou <amelie.czhou at gmail.com> wrote:
> Here is the full output info. Thanks!

The IP addresses and ports seem to be correctly setup, so that's not the problem.

I created my own amazon instances to see what the problem is.  It looks like the instances are not able to communicate even though there's no explicit firewall enabled that is shown inside the Linux instance.  I did some digging and found the "Security group" settings and found that the inbound rules only allowed ssh.  I changed it to "All traffic" and now I can run my jobs fine.

% ./install/bin/mpiexec -hosts ec2-52-36-15-57.us-west-2.compute.amazonaws.com,ec2-5
2-37-222-189.us-west-2.compute.amazonaws.com -n 4 ./examples/cpi
Process 3 of 4 is on ip-172-31-28-127
Process 2 of 4 is on ip-172-31-21-12
Process 1 of 4 is on ip-172-31-28-127
Process 0 of 4 is on ip-172-31-21-12
pi is approximately 3.1415926544231243, Error is 0.0000000008333312
wall clock time = 0.010181

Can you try that?

  -- Pavan

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list