[mpich-discuss] MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection timed out

amelie chi zhou amelie.czhou at gmail.com
Tue Mar 15 22:01:33 CDT 2016


Hi, Ken,

I tried with netcat and the connection is successfully established.

On one side of the machines, I ran:
ubuntu at ip-10-235-37-156:~$ netcat -l 10000

On the other side:
ubuntu at ip-10-169-125-85:~/mpitest$ netcat -v
ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000
Connection to ec2-54-188-xx-xx.us-west-2.compute.amazonaws.com 10000 port
[tcp/webmin] succeeded!

On Wed, Mar 16, 2016 at 12:11 AM, Kenneth Raffenetti <raffenet at mcs.anl.gov>
wrote:

> I suspect that there is still a firewall in the way given that the EC2
> instances are in different regions. One way to test your security group
> rules without MPI would be to try to establish a connection between the 2
> machines on a high TCP port (e.g. 10000) with a simple utility like netcat (
> https://en.wikipedia.org/wiki/Netcat).
>
> Ken
>
>
> On 03/15/2016 10:38 AM, amelie chi zhou wrote:
>
>> Hi, Ken,
>>
>> Thanks for the reply.
>> What kind of problem are you referring to?
>> In the rules of the security groups, I allow tcp connections from all ip
>> addresses for all ports. Also, the two machines can ssh and scp to each
>> other with no problem. In this simple test, security is not my major
>> concern.
>>
>> Regards,
>> Amelie
>>
>>> On 15 Mar 2016, at 10:23 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov>
>>> wrote:
>>>
>>> The different regions are a problem in this setup. Note that security
>>> groups in EC2 are *per region*.
>>>
>>>
>>> https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group
>>>
>>> I'll note that using MPI across the internet like this is a bad idea if
>>> you have concerns about security.
>>>
>>> Ken
>>>
>>> On 03/15/2016 06:16 AM, amelie chi zhou wrote:
>>>> Hi,
>>>>
>>>> I configured two virtual machines on Amazon EC2 to run mpich-3.2. The
>>>> system is Ubuntu 12.04.2 LTS.
>>>>
>>>> The two virtual machines can ssh to each other successfully
>>>> (passwordless) and I can run a simple hello world program using the two
>>>> machines.
>>>>
>>>> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./hello_world
>>>> Hello world from processor ip-10-169-125-85, rank 1 out of 2 processors
>>>> Hello world from processor ip-10-235-37-156, rank 0 out of 2 processors
>>>>
>>>> Then I run a simple program with MPI_Send and MPI_Receive to communicate
>>>> between the two vms. Following are the core code of the program.
>>>>
>>>>   if (world_rank == 0) {
>>>>      // If we are rank 0, set the number to -1 and send it to process 1
>>>>      number = -1;
>>>>      MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
>>>>    } else if (world_rank == 1) {
>>>>      MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD,
>>>> MPI_STATUS_IGNORE);
>>>>      printf("Process 1 received number %d from process 0\n", number);
>>>>    }
>>>>
>>>>
>>>> Following are the error msg I encountered.
>>>>
>>>> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./send_recv
>>>> Fatal error in MPI_Send: Unknown error class, error stack:
>>>> MPI_Send(174)..............: MPI_Send(buf=0x7fff49f2759c, count=1,
>>>> MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
>>>> MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection
>>>> timed out
>>>>
>>>>
>>>> I googled similar errors and have made sure that: 1) there is no rule in
>>>> my firewall setting, 2) there is a tcp port listening on both sides when
>>>> the send_recv program runs. I cannot think of any other possible way to
>>>> fix this problem. BTW, the two virtual machines are on two different
>>>> regions of Amazon EC2 and are not in VPCs. Please help. Thanks!
>>>>
>>>> Regards,
>>>> Amelie
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160316/04749f4f/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list