[mpich-discuss] MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection timed out

Balaji, Pavan balaji at anl.gov
Tue Mar 15 22:08:55 CDT 2016


Amelie,

Can you run your mpiexec command with the -verbose option and paste the output here?

% mpiexec -n 2 -f host_file -verbose ./send_recv_test

  -- Pavan

> On Mar 15, 2016, at 10:01 PM, amelie chi zhou <amelie.czhou at gmail.com> wrote:
> 
> Hi, Ken,
> 
> I tried with netcat and the connection is successfully established.
> 
> On one side of the machines, I ran:
> ubuntu at ip-10-235-37-156:~$ netcat -l 10000
> 
> On the other side:
> ubuntu at ip-10-169-125-85:~/mpitest$ netcat -v ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000
> Connection to ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000 port [tcp/webmin] succeeded!
> 
> On Wed, Mar 16, 2016 at 12:11 AM, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:
> I suspect that there is still a firewall in the way given that the EC2 instances are in different regions. One way to test your security group rules without MPI would be to try to establish a connection between the 2 machines on a high TCP port (e.g. 10000) with a simple utility like netcat (https://en.wikipedia.org/wiki/Netcat).
> 
> Ken
> 
> 
> On 03/15/2016 10:38 AM, amelie chi zhou wrote:
> Hi, Ken,
> 
> Thanks for the reply.
> What kind of problem are you referring to?
> In the rules of the security groups, I allow tcp connections from all ip addresses for all ports. Also, the two machines can ssh and scp to each other with no problem. In this simple test, security is not my major concern.
> 
> Regards,
> Amelie
> On 15 Mar 2016, at 10:23 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:
> 
> The different regions are a problem in this setup. Note that security groups in EC2 are *per region*.
> 
> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group
> 
> I'll note that using MPI across the internet like this is a bad idea if you have concerns about security.
> 
> Ken
> 
> On 03/15/2016 06:16 AM, amelie chi zhou wrote:
> Hi,
> 
> I configured two virtual machines on Amazon EC2 to run mpich-3.2. The
> system is Ubuntu 12.04.2 LTS.
> 
> The two virtual machines can ssh to each other successfully
> (passwordless) and I can run a simple hello world program using the two
> machines.
> 
> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./hello_world
> Hello world from processor ip-10-169-125-85, rank 1 out of 2 processors
> Hello world from processor ip-10-235-37-156, rank 0 out of 2 processors
> 
> Then I run a simple program with MPI_Send and MPI_Receive to communicate
> between the two vms. Following are the core code of the program.
> 
>   if (world_rank == 0) {
>      // If we are rank 0, set the number to -1 and send it to process 1
>      number = -1;
>      MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
>    } else if (world_rank == 1) {
>      MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>      printf("Process 1 received number %d from process 0\n", number);
>    }
> 
> 
> Following are the error msg I encountered.
> 
> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./send_recv
> Fatal error in MPI_Send: Unknown error class, error stack:
> MPI_Send(174)..............: MPI_Send(buf=0x7fff49f2759c, count=1,
> MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
> MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection
> timed out
> 
> 
> I googled similar errors and have made sure that: 1) there is no rule in
> my firewall setting, 2) there is a tcp port listening on both sides when
> the send_recv program runs. I cannot think of any other possible way to
> fix this problem. BTW, the two virtual machines are on two different
> regions of Amazon EC2 and are not in VPCs. Please help. Thanks!
> 
> Regards,
> Amelie
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list