[mpich-discuss] Amazon ec2 Windows machine

Nicholas Sgro nsgro060 at gmail.com
Sun Feb 10 15:46:27 CST 2013


Both instances are part of the same security group, and I made sure all
inbound traffic was allowed.

StarCluster looks very useful. Can you recommend something similar to
StarCluster but for windows instances? The software I am using is only
available for windows.

thanks,
Nicholas

On Sun, Feb 10, 2013 at 1:57 PM, Rayson Ho <raysonlogin at gmail.com> wrote:

> How did you configure the EC2 security groups? By default, EC2
> instances have their inbound traffic blocked, and you will need to
> configure security group rules to enable inbound traffic.
>
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html
>
> Also, any reason you are manually creating EC2 HPC clusters instead of
> using a toolkit?? We are a fan of MIT's StarCluster -- with it we can
> start up and shut down clusters very quickly (usually a few minutes).
> It is Linux based, with MPICH (and/or Open MPI), Open Grid Scheduler /
> Grid Engine, and many tools needed for doing HPC in EC2:
>
> http://star.mit.edu/cluster/
>
> And we built a 10,000-node cluster in EC2 based on StarCluster late
> last year, during SC12:
>
>
> http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
> On Fri, Feb 8, 2013 at 5:02 PM, Nicholas Sgro <nsgro060 at gmail.com> wrote:
> > Hi,
> > This is the command I'm using:
> >
> > mpiexec.exe -machinefile machines.txt -env MPICH2_CHANNEL sock -n 2
> cpi.exe
> >
> > I have tried using both machine file and hosts in the command line, but I
> > get the same results. The program runs on a single instance with any
> number
> > of processors. I tried running mpiexec on one instance and using the
> other
> > as a single host and that also works.
> >
> > -Nicholas
> >
> >
> > On Fri, Feb 8, 2013 at 12:04 PM, Jayesh Krishna <jayesh at mcs.anl.gov>
> wrote:
> >>
> >> Hi,
> >>  How are you running your job (mpiexec command)? Did you try using a
> >> machine file to specify the hostnames when running the job?
> >>  Does the program (cpi) execute correctly on a single ec2 instance?
> >>
> >> Regards,
> >> Jayesh
> >>
> >> ----- Original Message -----
> >> From: "Nicholas Sgro" <nsgro060 at gmail.com>
> >> To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
> >> Sent: Thursday, February 7, 2013 9:57:55 PM
> >> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
> >>
> >> I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to
> >> work either. With sock, I get to the point where I enter the number of
> >> intervals, but then it does nothing.
> >>
> >> Do you know any reason it wouldn't work with ec2 instances?
> >>
> >>
> >>
> >> On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> >> wrote:
> >>
> >>
> >> Hi,
> >> Which version of MPICH2 are you using? Did you try the "sock" channel
> (See
> >> if it works)?
> >>
> >> (PS: We haven't tested MPICH2 on Windows with ec2 instances.)
> >> Regards,
> >> Jayesh
> >>
> >>
> >> ----- Original Message -----
> >> From: "Nicholas Sgro" < nsgro060 at gmail.com >
> >> To: discuss at mpich.org
> >> Sent: Thursday, February 7, 2013 11:29:57 AM
> >> Subject: [mpich-discuss] Amazon ec2 Windows machine
> >>
> >>
> >> Hi all,
> >>
> >> I am trying to run the example cpi.exe across 2 amazon ec2 instances
> >> running windows. I have different problems depending on the channel I
> >> choose. If I try nemesis, I get the following error:
> >>
> >> Fatal error in MPI_Init: Other MPI error, error stack:
> >> MPIR_Init_thread(392).................:
> >> MPID_Init(139)........................: channel initialization failed
> >> MPIDI_CH3_Init(38)....................:
> >> MPID_nem_init(196)....................:
> >> MPIDI_CH3I_Seg_commit(366)............:
> >> MPIU_SHMW_Hnd_deserialize(324)........:
> >> MPIU_SHMW_Seg_open(863)...............:
> >> MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared
> memory -
> >> OpenFileMapping The system cannot find the file specified.
> >>
> >> If I try to use shm, cpi.exe uses 100% of the processors on both
> machines,
> >> but makes no progress and I have to cancel the job.
> >>
> >> I am attaching logs from smpd from both machines from the runs with
> >> nemesis and shm.
> >>
> >> I don't have any experience with mpich, so I have no idea what the
> problem
> >> is. Any guidance would be appreciated.
> >>
> >> Thanks
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130210/0198ae4e/attachment.html>


More information about the discuss mailing list