Both instances are part of the same security group, and I made sure all inbound traffic was allowed. <br><br>StarCluster looks very useful. Can you recommend something similar to StarCluster but for windows instances? The software I am using is only available for windows.<br>
<br>thanks,<br>Nicholas<br><br><div class="gmail_quote">On Sun, Feb 10, 2013 at 1:57 PM, Rayson Ho <span dir="ltr"><<a href="mailto:raysonlogin@gmail.com" target="_blank">raysonlogin@gmail.com</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
How did you configure the EC2 security groups? By default, EC2<br>
instances have their inbound traffic blocked, and you will need to<br>
configure security group rules to enable inbound traffic.<br>
<br>
<a href="http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html" target="_blank">http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html</a><br>
<br>
Also, any reason you are manually creating EC2 HPC clusters instead of<br>
using a toolkit?? We are a fan of MIT's StarCluster -- with it we can<br>
start up and shut down clusters very quickly (usually a few minutes).<br>
It is Linux based, with MPICH (and/or Open MPI), Open Grid Scheduler /<br>
Grid Engine, and many tools needed for doing HPC in EC2:<br>
<br>
<a href="http://star.mit.edu/cluster/" target="_blank">http://star.mit.edu/cluster/</a><br>
<br>
And we built a 10,000-node cluster in EC2 based on StarCluster late<br>
last year, during SC12:<br>
<br>
<a href="http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html" target="_blank">http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html</a><br>
<br>
Rayson<br>
<br>
==================================================<br>
Open Grid Scheduler - The Official Open Source Grid Engine<br>
<a href="http://gridscheduler.sourceforge.net/" target="_blank">http://gridscheduler.sourceforge.net/</a><br>
<div><div><br>
<br>
On Fri, Feb 8, 2013 at 5:02 PM, Nicholas Sgro <<a href="mailto:nsgro060@gmail.com" target="_blank">nsgro060@gmail.com</a>> wrote:<br>
> Hi,<br>
> This is the command I'm using:<br>
><br>
> mpiexec.exe -machinefile machines.txt -env MPICH2_CHANNEL sock -n 2 cpi.exe<br>
><br>
> I have tried using both machine file and hosts in the command line, but I<br>
> get the same results. The program runs on a single instance with any number<br>
> of processors. I tried running mpiexec on one instance and using the other<br>
> as a single host and that also works.<br>
><br>
> -Nicholas<br>
><br>
><br>
> On Fri, Feb 8, 2013 at 12:04 PM, Jayesh Krishna <<a href="mailto:jayesh@mcs.anl.gov" target="_blank">jayesh@mcs.anl.gov</a>> wrote:<br>
>><br>
>> Hi,<br>
>> How are you running your job (mpiexec command)? Did you try using a<br>
>> machine file to specify the hostnames when running the job?<br>
>> Does the program (cpi) execute correctly on a single ec2 instance?<br>
>><br>
>> Regards,<br>
>> Jayesh<br>
>><br>
>> ----- Original Message -----<br>
>> From: "Nicholas Sgro" <<a href="mailto:nsgro060@gmail.com" target="_blank">nsgro060@gmail.com</a>><br>
>> To: "Jayesh Krishna" <<a href="mailto:jayesh@mcs.anl.gov" target="_blank">jayesh@mcs.anl.gov</a>><br>
>> Sent: Thursday, February 7, 2013 9:57:55 PM<br>
>> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine<br>
>><br>
>> I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to<br>
>> work either. With sock, I get to the point where I enter the number of<br>
>> intervals, but then it does nothing.<br>
>><br>
>> Do you know any reason it wouldn't work with ec2 instances?<br>
>><br>
>><br>
>><br>
>> On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < <a href="mailto:jayesh@mcs.anl.gov" target="_blank">jayesh@mcs.anl.gov</a> ><br>
>> wrote:<br>
>><br>
>><br>
>> Hi,<br>
>> Which version of MPICH2 are you using? Did you try the "sock" channel (See<br>
>> if it works)?<br>
>><br>
>> (PS: We haven't tested MPICH2 on Windows with ec2 instances.)<br>
>> Regards,<br>
>> Jayesh<br>
>><br>
>><br>
>> ----- Original Message -----<br>
>> From: "Nicholas Sgro" < <a href="mailto:nsgro060@gmail.com" target="_blank">nsgro060@gmail.com</a> ><br>
>> To: <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
>> Sent: Thursday, February 7, 2013 11:29:57 AM<br>
>> Subject: [mpich-discuss] Amazon ec2 Windows machine<br>
>><br>
>><br>
>> Hi all,<br>
>><br>
>> I am trying to run the example cpi.exe across 2 amazon ec2 instances<br>
>> running windows. I have different problems depending on the channel I<br>
>> choose. If I try nemesis, I get the following error:<br>
>><br>
>> Fatal error in MPI_Init: Other MPI error, error stack:<br>
>> MPIR_Init_thread(392).................:<br>
>> MPID_Init(139)........................: channel initialization failed<br>
>> MPIDI_CH3_Init(38)....................:<br>
>> MPID_nem_init(196)....................:<br>
>> MPIDI_CH3I_Seg_commit(366)............:<br>
>> MPIU_SHMW_Hnd_deserialize(324)........:<br>
>> MPIU_SHMW_Seg_open(863)...............:<br>
>> MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared memory -<br>
>> OpenFileMapping The system cannot find the file specified.<br>
>><br>
>> If I try to use shm, cpi.exe uses 100% of the processors on both machines,<br>
>> but makes no progress and I have to cancel the job.<br>
>><br>
>> I am attaching logs from smpd from both machines from the runs with<br>
>> nemesis and shm.<br>
>><br>
>> I don't have any experience with mpich, so I have no idea what the problem<br>
>> is. Any guidance would be appreciated.<br>
>><br>
>> Thanks<br>
>><br>
>><br>
>> _______________________________________________<br>
>> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
>> To manage subscription options or unsubscribe:<br>
>> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
>><br>
><br>
><br>
> _______________________________________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br>