[mpich-discuss] MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection timed out

Kenneth Raffenetti raffenet at mcs.anl.gov
Tue Mar 15 09:23:49 CDT 2016


The different regions are a problem in this setup. Note that security 
groups in EC2 are *per region*.

https://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html#default-security-group

I'll note that using MPI across the internet like this is a bad idea if 
you have concerns about security.

Ken

On 03/15/2016 06:16 AM, amelie chi zhou wrote:
> Hi,
>
> I configured two virtual machines on Amazon EC2 to run mpich-3.2. The
> system is Ubuntu 12.04.2 LTS.
>
> The two virtual machines can ssh to each other successfully
> (passwordless) and I can run a simple hello world program using the two
> machines.
>
> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./hello_world
> Hello world from processor ip-10-169-125-85, rank 1 out of 2 processors
> Hello world from processor ip-10-235-37-156, rank 0 out of 2 processors
>
> Then I run a simple program with MPI_Send and MPI_Receive to communicate
> between the two vms. Following are the core code of the program.
>
>   if (world_rank == 0) {
>      // If we are rank 0, set the number to -1 and send it to process 1
>      number = -1;
>      MPI_Send(&number, 1, MPI_INT, 1, 0, MPI_COMM_WORLD);
>    } else if (world_rank == 1) {
>      MPI_Recv(&number, 1, MPI_INT, 0, 0, MPI_COMM_WORLD, MPI_STATUS_IGNORE);
>      printf("Process 1 received number %d from process 0\n", number);
>    }
>
>
> Following are the error msg I encountered.
>
> ubuntu at ip-10-169-125-85:~$ mpiexec -n 2 -f host_file ./send_recv
> Fatal error in MPI_Send: Unknown error class, error stack:
> MPI_Send(174)..............: MPI_Send(buf=0x7fff49f2759c, count=1,
> MPI_INT, dest=1, tag=0, MPI_COMM_WORLD) failed
> MPID_nem_tcp_connpoll(1835): Communication error with rank 1: Connection
> timed out
>
>
> I googled similar errors and have made sure that: 1) there is no rule in
> my firewall setting, 2) there is a tcp port listening on both sides when
> the send_recv program runs. I cannot think of any other possible way to
> fix this problem. BTW, the two virtual machines are on two different
> regions of Amazon EC2 and are not in VPCs. Please help. Thanks!
>
> Regards,
> Amelie
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list