<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Hi, Pavan,<div><br></div><div>Attached is a little summary on "How to run MPICH on Amazon EC2". I'm not sure whether it's clear enough. Please check.</div><div><br></div><div>Regards,</div><div>Amelie</div></div><div class="gmail_extra"><br><div class="gmail_quote">On Thu, Mar 17, 2016 at 9:22 AM, amelie chi zhou <span dir="ltr"><<a href="mailto:amelie.czhou@gmail.com" target="_blank">amelie.czhou@gmail.com</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">Sure. I'm glad to.<br>
<span class="HOEnZb"><font color="#888888"><br>
Amelie<br>
</font></span><div class="HOEnZb"><div class="h5"><br>
> On 17 Mar 2016, at 1:58 AM, Balaji, Pavan <<a href="mailto:balaji@anl.gov">balaji@anl.gov</a>> wrote:<br>
><br>
> Hi Amelie,<br>
><br>
> Would you be willing to write up some documentation on "How to use MPICH on Amazon EC2" including details on using servers in a single region vs. multiple regions? We'd like to put this up on our FAQ page.<br>
><br>
> Thanks,<br>
><br>
> -- Pavan<br>
><br>
>> On Mar 16, 2016, at 2:53 AM, amelie chi zhou <<a href="mailto:amelie.czhou@gmail.com">amelie.czhou@gmail.com</a>> wrote:<br>
>><br>
>> Hi, Pavan,<br>
>><br>
>> Thanks a lot. It does work now!<br>
>><br>
>> Best Regards,<br>
>> Amelie<br>
>><br>
>> On Wed, Mar 16, 2016 at 3:26 PM, Balaji, Pavan <<a href="mailto:balaji@anl.gov">balaji@anl.gov</a>> wrote:<br>
>> Amelie,<br>
>><br>
>> OK, I just tried it across two different subnets. Here's the problem --<br>
>><br>
>> The amazon compute nodes hide their public IP addresses from the list of IP addresses visible locally. So when each node queries for its local IP address, it only gets its private IP address (which is obviously useless for other processes to connect).<br>
>><br>
>> You can workaround this by making two changes to your environment:<br>
>><br>
>> 1. Explicitly use the public IP addresses directly instead of the hostnames in your hostfile. That is, instead of "<a href="http://ec2-52-36-15-57.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-36-15-57.us-west-2.compute.amazonaws.com</a>", use "52.36.15.57".<br>
>><br>
>> 2. Pass the -localhost option to mpiexec to give the public IP address of the host from which you are running mpiexec.<br>
>><br>
>> I created two VM instances, one on the west subnet and the other on the east subnet:<br>
>><br>
>> <a href="http://ec2-52-35-56-228.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-35-56-228.us-west-2.compute.amazonaws.com</a><br>
>> <a href="http://ec2-54-172-35-159.compute-1.amazonaws.com" rel="noreferrer" target="_blank">ec2-54-172-35-159.compute-1.amazonaws.com</a><br>
>><br>
>> To run my application, I do this:<br>
>><br>
>> % ./install/bin/mpiexec -localhost 52.35.56.228 -hosts 52.35.56.228,54.172.35.159 -n 2 ./examples/cpi<br>
>><br>
>> Let us know if that works.<br>
>><br>
>> -- Pavan<br>
>><br>
>>> On Mar 16, 2016, at 12:24 AM, amelie chi zhou <<a href="mailto:amelie.czhou@gmail.com">amelie.czhou@gmail.com</a>> wrote:<br>
>>><br>
>>> Hi, Pavan,<br>
>>><br>
>>> Thanks a lot for trying that.<br>
>>><br>
>>> I have enabled inbound traffic for all types of protocols including tcp, udp and icmp for all ports (0 - 65535) and for all ip addresses. I noticed that the two instances you created are from the same region (us west I suppose). The thing is, for instances in the same region, mpiexec can run successfully with no problem in my setup. But when I run mpi programs across regions, in my case, between an instance in us east and an instance in us west, the error in MPI_Send appears.<br>
>>> It seems that there might be some problems with the firewall or network interfaces, but I have checked and ruled out those possibilities (instances in different regions can ssh and scp to each other and there's no dropping rule in my firewall setting). So that's where I'm confused.<br>
>>><br>
>>> Regards,<br>
>>> Amelie<br>
>>><br>
>>>> On Wed, Mar 16, 2016 at 1:01 PM, Balaji, Pavan <<a href="mailto:balaji@anl.gov">balaji@anl.gov</a>> wrote:<br>
>>>><br>
>>>> On Mar 15, 2016, at 10:26 PM, amelie chi zhou <<a href="mailto:amelie.czhou@gmail.com">amelie.czhou@gmail.com</a>> wrote:<br>
>>>> Here is the full output info. Thanks!<br>
>>><br>
>>> The IP addresses and ports seem to be correctly setup, so that's not the problem.<br>
>>><br>
>>> I created my own amazon instances to see what the problem is. It looks like the instances are not able to communicate even though there's no explicit firewall enabled that is shown inside the Linux instance. I did some digging and found the "Security group" settings and found that the inbound rules only allowed ssh. I changed it to "All traffic" and now I can run my jobs fine.<br>
>>><br>
>>> % ./install/bin/mpiexec -hosts <a href="http://ec2-52-36-15-57.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">ec2-52-36-15-57.us-west-2.compute.amazonaws.com</a>,ec2-5<br>
>>> <a href="http://2-37-222-189.us-west-2.compute.amazonaws.com" rel="noreferrer" target="_blank">2-37-222-189.us-west-2.compute.amazonaws.com</a> -n 4 ./examples/cpi<br>
>>> Process 3 of 4 is on ip-172-31-28-127<br>
>>> Process 2 of 4 is on ip-172-31-21-12<br>
>>> Process 1 of 4 is on ip-172-31-28-127<br>
>>> Process 0 of 4 is on ip-172-31-21-12<br>
>>> pi is approximately 3.1415926544231243, Error is 0.0000000008333312<br>
>>> wall clock time = 0.010181<br>
>>><br>
>>> Can you try that?<br>
>>><br>
>>> -- Pavan<br>
>>><br>
>>> _______________________________________________<br>
>>> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
>>> To manage subscription options or unsubscribe:<br>
>>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
>>><br>
>>> _______________________________________________<br>
>>> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
>>> To manage subscription options or unsubscribe:<br>
>>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
>><br>
>> _______________________________________________<br>
>> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
>> To manage subscription options or unsubscribe:<br>
>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
>><br>
>> _______________________________________________<br>
>> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
>> To manage subscription options or unsubscribe:<br>
>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br></div>