<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Hi, Pavan,<div><br></div><div>Thanks a lot. It does work now!</div><div><br></div><div>Best Regards,</div><div>Amelie</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Wed, Mar 16, 2016 at 3:26 PM, Balaji, Pavan <span dir="ltr"><<a href="mailto:balaji@anl.gov" target="_blank">balaji@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Amelie,<br>
<br>
OK, I just tried it across two different subnets. Here's the problem --<br>
<br>
The amazon compute nodes hide their public IP addresses from the list of IP addresses visible locally. So when each node queries for its local IP address, it only gets its private IP address (which is obviously useless for other processes to connect).<br>
<br>
You can workaround this by making two changes to your environment:<br>
<br>
1. Explicitly use the public IP addresses directly instead of the hostnames in your hostfile. That is, instead of "<a href="http://ec2-52-36-15-57.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-36-15-57.us-west-2.compute.amazonaws.com</a>", use "52.36.15.57".<br>
<br>
2. Pass the -localhost option to mpiexec to give the public IP address of the host from which you are running mpiexec.<br>
<br>
I created two VM instances, one on the west subnet and the other on the east subnet:<br>
<br>
<a href="http://ec2-52-35-56-228.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-35-56-228.us-west-2.compute.amazonaws.com</a><br>
<a href="http://ec2-54-172-35-159.compute-1.amazonaws.com" rel="noreferrer" target="_blank">ec2-54-172-35-159.compute-1.amazonaws.com</a><br>
<br>
To run my application, I do this:<br>
<br>
% ./install/bin/mpiexec -localhost 52.35.56.228 -hosts 52.35.56.228,54.172.35.159 -n 2 ./examples/cpi<br>
<br>
Let us know if that works.<br>
<span class="HOEnZb"><font color="#888888"><br>
-- Pavan<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> On Mar 16, 2016, at 12:24 AM, amelie chi zhou <<a href="mailto:amelie.czhou@gmail.com">amelie.czhou@gmail.com</a>> wrote:<br>
><br>
> Hi, Pavan,<br>
><br>
> Thanks a lot for trying that.<br>
><br>
> I have enabled inbound traffic for all types of protocols including tcp, udp and icmp for all ports (0 - 65535) and for all ip addresses. I noticed that the two instances you created are from the same region (us west I suppose). The thing is, for instances in the same region, mpiexec can run successfully with no problem in my setup. But when I run mpi programs across regions, in my case, between an instance in us east and an instance in us west, the error in MPI_Send appears.<br>
> It seems that there might be some problems with the firewall or network interfaces, but I have checked and ruled out those possibilities (instances in different regions can ssh and scp to each other and there's no dropping rule in my firewall setting). So that's where I'm confused.<br>
><br>
> Regards,<br>
> Amelie<br>
><br>
> On Wed, Mar 16, 2016 at 1:01 PM, Balaji, Pavan <<a href="mailto:balaji@anl.gov">balaji@anl.gov</a>> wrote:<br>
><br>
> > On Mar 15, 2016, at 10:26 PM, amelie chi zhou <<a href="mailto:amelie.czhou@gmail.com">amelie.czhou@gmail.com</a>> wrote:<br>
> > Here is the full output info. Thanks!<br>
><br>
> The IP addresses and ports seem to be correctly setup, so that's not the problem.<br>
><br>
> I created my own amazon instances to see what the problem is. It looks like the instances are not able to communicate even though there's no explicit firewall enabled that is shown inside the Linux instance. I did some digging and found the "Security group" settings and found that the inbound rules only allowed ssh. I changed it to "All traffic" and now I can run my jobs fine.<br>
><br>
> % ./install/bin/mpiexec -hosts <a href="http://ec2-52-36-15-57.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-36-15-57.us-west-2.compute.amazonaws.com</a>,ec2-5<br>
> <a href="http://2-37-222-189.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">2-37-222-189.us-west-2.compute.amazonaws.com</a> -n 4 ./examples/cpi<br>
> Process 3 of 4 is on ip-172-31-28-127<br>
> Process 2 of 4 is on ip-172-31-21-12<br>
> Process 1 of 4 is on ip-172-31-28-127<br>
> Process 0 of 4 is on ip-172-31-21-12<br>
> pi is approximately 3.1415926544231243, Error is 0.0000000008333312<br>
> wall clock time = 0.010181<br>
><br>
> Can you try that?<br>
><br>
> -- Pavan<br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
_______________________________________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br></div>