[mpich-discuss] Amazon ec2 Windows machine
Jayesh Krishna
jayesh at mcs.anl.gov
Fri Feb 15 12:00:16 CST 2013
Hi,
Good to know MPICH is working for you. Are the hostnames of the two ec2 instances the same?
If you try to use ssm/shm with 1.4.1p1 mpiexec should abort with an error message.
(PS: MPICH_NO_LOCAL=1 forces the communication in nemesis to go through tcp sockets)
Regards,
Jayesh
----- Original Message -----
From: "Nicholas Sgro" <nsgro060 at gmail.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Cc: discuss at mpich.org
Sent: Thursday, February 14, 2013 2:34:21 PM
Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
Hi,
Setting MPICH_NO_LOCAL 1 and using nemesis seems to have solved the problem. If it interests you, I tried the sock channel, and that still does not work. As far as I am concerned, there is no longer any problems.
I am fairly certain that I am using version 1.4.1p1 (according to wmpiconfig), so I am not sure why I can choose shm and ssm channels (they are options in wmpiexec, and I don't get an error from command line).
Thanks for your help,
Nicholas
On Mon, Feb 11, 2013 at 12:12 PM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote:
Hi,
The latest version of MPICH2 on Windows (1.4.1p1) do not have support for shm and ssm channels. Are you sure you are using the latest version of MPICH2 on your machines? Please use the "-channel" option to select the channels ("nemesis"/"sock").
Can you try running the job by setting the environment variable "MPICH_NO_LOCAL" to 1 (mpiexec -n 2 -env MPICH_NO_LOCAL 1 c:\Progra~1\MPICH2\examples\cpi.exe)? This option should force all communication to go via tcp sockets.
Regards,
Jayesh
----- Original Message -----
From: "Nicholas Sgro" < nsgro060 at gmail.com >
To: "Rayson Ho" < raysonlogin at gmail.com >
Cc: discuss at mpich.org
Sent: Sunday, February 10, 2013 3:46:27 PM
Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
Both instances are part of the same security group, and I made sure all inbound traffic was allowed.
StarCluster looks very useful. Can you recommend something similar to StarCluster but for windows instances? The software I am using is only available for windows.
thanks,
Nicholas
On Sun, Feb 10, 2013 at 1:57 PM, Rayson Ho < raysonlogin at gmail.com > wrote:
How did you configure the EC2 security groups? By default, EC2
instances have their inbound traffic blocked, and you will need to
configure security group rules to enable inbound traffic.
http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html
Also, any reason you are manually creating EC2 HPC clusters instead of
using a toolkit?? We are a fan of MIT's StarCluster -- with it we can
start up and shut down clusters very quickly (usually a few minutes).
It is Linux based, with MPICH (and/or Open MPI), Open Grid Scheduler /
Grid Engine, and many tools needed for doing HPC in EC2:
http://star.mit.edu/cluster/
And we built a 10,000-node cluster in EC2 based on StarCluster late
last year, during SC12:
http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html
Rayson
==================================================
Open Grid Scheduler - The Official Open Source Grid Engine
http://gridscheduler.sourceforge.net/
On Fri, Feb 8, 2013 at 5:02 PM, Nicholas Sgro < nsgro060 at gmail.com > wrote:
> Hi,
> This is the command I'm using:
>
> mpiexec.exe -machinefile machines.txt -env MPICH2_CHANNEL sock -n 2 cpi.exe
>
> I have tried using both machine file and hosts in the command line, but I
> get the same results. The program runs on a single instance with any number
> of processors. I tried running mpiexec on one instance and using the other
> as a single host and that also works.
>
> -Nicholas
>
>
> On Fri, Feb 8, 2013 at 12:04 PM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote:
>>
>> Hi,
>> How are you running your job (mpiexec command)? Did you try using a
>> machine file to specify the hostnames when running the job?
>> Does the program (cpi) execute correctly on a single ec2 instance?
>>
>> Regards,
>> Jayesh
>>
>> ----- Original Message -----
>> From: "Nicholas Sgro" < nsgro060 at gmail.com >
>> To: "Jayesh Krishna" < jayesh at mcs.anl.gov >
>> Sent: Thursday, February 7, 2013 9:57:55 PM
>> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine
>>
>> I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to
>> work either. With sock, I get to the point where I enter the number of
>> intervals, but then it does nothing.
>>
>> Do you know any reason it wouldn't work with ec2 instances?
>>
>>
>>
>> On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < jayesh at mcs.anl.gov >
>> wrote:
>>
>>
>> Hi,
>> Which version of MPICH2 are you using? Did you try the "sock" channel (See
>> if it works)?
>>
>> (PS: We haven't tested MPICH2 on Windows with ec2 instances.)
>> Regards,
>> Jayesh
>>
>>
>> ----- Original Message -----
>> From: "Nicholas Sgro" < nsgro060 at gmail.com >
>> To: discuss at mpich.org
>> Sent: Thursday, February 7, 2013 11:29:57 AM
>> Subject: [mpich-discuss] Amazon ec2 Windows machine
>>
>>
>> Hi all,
>>
>> I am trying to run the example cpi.exe across 2 amazon ec2 instances
>> running windows. I have different problems depending on the channel I
>> choose. If I try nemesis, I get the following error:
>>
>> Fatal error in MPI_Init: Other MPI error, error stack:
>> MPIR_Init_thread(392).................:
>> MPID_Init(139)........................: channel initialization failed
>> MPIDI_CH3_Init(38)....................:
>> MPID_nem_init(196)....................:
>> MPIDI_CH3I_Seg_commit(366)............:
>> MPIU_SHMW_Hnd_deserialize(324)........:
>> MPIU_SHMW_Seg_open(863)...............:
>> MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared memory -
>> OpenFileMapping The system cannot find the file specified.
>>
>> If I try to use shm, cpi.exe uses 100% of the processors on both machines,
>> but makes no progress and I have to cancel the job.
>>
>> I am attaching logs from smpd from both machines from the runs with
>> nemesis and shm.
>>
>> I don't have any experience with mpich, so I have no idea what the problem
>> is. Any guidance would be appreciated.
>>
>> Thanks
>>
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list