[mpich-discuss] Amazon ec2 Windows machine

Nicholas Sgro nsgro060 at gmail.com
Thu Feb 14 14:34:21 CST 2013


Hi,

Setting MPICH_NO_LOCAL 1 and using nemesis seems to have solved the
problem. If it interests you, I tried the sock channel, and that still does
not work. As far as I am concerned, there is no longer any problems.

I am fairly certain that I am using version 1.4.1p1 (according to
wmpiconfig), so I am not sure why I can choose shm and ssm channels (they
are options in wmpiexec, and I don't get an error from command line).

Thanks for your help,
Nicholas

On Mon, Feb 11, 2013 at 12:12 PM, Jayesh Krishna <jayesh at mcs.anl.gov> wrote:

> Hi,
>  The latest version of MPICH2 on Windows (1.4.1p1) do not have support for
> shm and ssm channels. Are you sure you are using the latest version of
> MPICH2 on your machines? Please use the "-channel" option to select the
> channels ("nemesis"/"sock").
>  Can you try running the job by setting the environment variable
> "MPICH_NO_LOCAL" to 1 (mpiexec -n 2 -env MPICH_NO_LOCAL 1
> c:\Progra~1\MPICH2\examples\cpi.exe)? This option should force all
> communication to go via tcp sockets.
>
> Regards,
> Jayesh
>
> ----- Original Message -----
> From: "Nicholas Sgro" <nsgro060 at gmail.com>
> To: "Rayson Ho" <raysonlogin at gmail.com>
> Cc: discuss at mpich.org
> Sent: Sunday, February 10, 2013 3:46:27 PM
> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
>
>
> Both instances are part of the same security group, and I made sure all
> inbound traffic was allowed.
>
> StarCluster looks very useful. Can you recommend something similar to
> StarCluster but for windows instances? The software I am using is only
> available for windows.
>
> thanks,
> Nicholas
>
>
> On Sun, Feb 10, 2013 at 1:57 PM, Rayson Ho < raysonlogin at gmail.com >
> wrote:
>
>
> How did you configure the EC2 security groups? By default, EC2
> instances have their inbound traffic blocked, and you will need to
> configure security group rules to enable inbound traffic.
>
>
> http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html
>
> Also, any reason you are manually creating EC2 HPC clusters instead of
> using a toolkit?? We are a fan of MIT's StarCluster -- with it we can
> start up and shut down clusters very quickly (usually a few minutes).
> It is Linux based, with MPICH (and/or Open MPI), Open Grid Scheduler /
> Grid Engine, and many tools needed for doing HPC in EC2:
>
> http://star.mit.edu/cluster/
>
> And we built a 10,000-node cluster in EC2 based on StarCluster late
> last year, during SC12:
>
>
> http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
>
> Rayson
>
> ==================================================
> Open Grid Scheduler - The Official Open Source Grid Engine
> http://gridscheduler.sourceforge.net/
>
>
>
>
> On Fri, Feb 8, 2013 at 5:02 PM, Nicholas Sgro < nsgro060 at gmail.com >
> wrote:
> > Hi,
> > This is the command I'm using:
> >
> > mpiexec.exe -machinefile machines.txt -env MPICH2_CHANNEL sock -n 2
> cpi.exe
> >
> > I have tried using both machine file and hosts in the command line, but I
> > get the same results. The program runs on a single instance with any
> number
> > of processors. I tried running mpiexec on one instance and using the
> other
> > as a single host and that also works.
> >
> > -Nicholas
> >
> >
> > On Fri, Feb 8, 2013 at 12:04 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> wrote:
> >>
> >> Hi,
> >> How are you running your job (mpiexec command)? Did you try using a
> >> machine file to specify the hostnames when running the job?
> >> Does the program (cpi) execute correctly on a single ec2 instance?
> >>
> >> Regards,
> >> Jayesh
> >>
> >> ----- Original Message -----
> >> From: "Nicholas Sgro" < nsgro060 at gmail.com >
> >> To: "Jayesh Krishna" < jayesh at mcs.anl.gov >
> >> Sent: Thursday, February 7, 2013 9:57:55 PM
> >> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
> >>
> >> I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to
> >> work either. With sock, I get to the point where I enter the number of
> >> intervals, but then it does nothing.
> >>
> >> Do you know any reason it wouldn't work with ec2 instances?
> >>
> >>
> >>
> >> On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
> >> wrote:
> >>
> >>
> >> Hi,
> >> Which version of MPICH2 are you using? Did you try the "sock" channel
> (See
> >> if it works)?
> >>
> >> (PS: We haven't tested MPICH2 on Windows with ec2 instances.)
> >> Regards,
> >> Jayesh
> >>
> >>
> >> ----- Original Message -----
> >> From: "Nicholas Sgro" < nsgro060 at gmail.com >
> >> To: discuss at mpich.org
> >> Sent: Thursday, February 7, 2013 11:29:57 AM
> >> Subject: [mpich-discuss] Amazon ec2 Windows machine
> >>
> >>
> >> Hi all,
> >>
> >> I am trying to run the example cpi.exe across 2 amazon ec2 instances
> >> running windows. I have different problems depending on the channel I
> >> choose. If I try nemesis, I get the following error:
> >>
> >> Fatal error in MPI_Init: Other MPI error, error stack:
> >> MPIR_Init_thread(392).................:
> >> MPID_Init(139)........................: channel initialization failed
> >> MPIDI_CH3_Init(38)....................:
> >> MPID_nem_init(196)....................:
> >> MPIDI_CH3I_Seg_commit(366)............:
> >> MPIU_SHMW_Hnd_deserialize(324)........:
> >> MPIU_SHMW_Seg_open(863)...............:
> >> MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared
> memory -
> >> OpenFileMapping The system cannot find the file specified.
> >>
> >> If I try to use shm, cpi.exe uses 100% of the processors on both
> machines,
> >> but makes no progress and I have to cancel the job.
> >>
> >> I am attaching logs from smpd from both machines from the runs with
> >> nemesis and shm.
> >>
> >> I don't have any experience with mpich, so I have no idea what the
> problem
> >> is. Any guidance would be appreciated.
> >>
> >> Thanks
> >>
> >>
> >> _______________________________________________
> >> discuss mailing list discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130214/8b5e4783/attachment.html>


More information about the discuss mailing list