[mpich-discuss] Amazon ec2 Windows machine

Jayesh Krishna jayesh at mcs.anl.gov
Mon Feb 11 11:12:56 CST 2013


Hi,
 The latest version of MPICH2 on Windows (1.4.1p1) do not have support for shm and ssm channels. Are you sure you are using the latest version of MPICH2 on your machines? Please use the "-channel" option to select the channels ("nemesis"/"sock").
 Can you try running the job by setting the environment variable "MPICH_NO_LOCAL" to 1 (mpiexec -n 2 -env MPICH_NO_LOCAL 1 c:\Progra~1\MPICH2\examples\cpi.exe)? This option should force all communication to go via tcp sockets.

Regards,
Jayesh

----- Original Message -----
From: "Nicholas Sgro" <nsgro060 at gmail.com>
To: "Rayson Ho" <raysonlogin at gmail.com>
Cc: discuss at mpich.org
Sent: Sunday, February 10, 2013 3:46:27 PM
Subject: Re: [mpich-discuss] Amazon ec2 Windows machine


Both instances are part of the same security group, and I made sure all inbound traffic was allowed. 

StarCluster looks very useful. Can you recommend something similar to StarCluster but for windows instances? The software I am using is only available for windows. 

thanks, 
Nicholas 


On Sun, Feb 10, 2013 at 1:57 PM, Rayson Ho < raysonlogin at gmail.com > wrote: 


How did you configure the EC2 security groups? By default, EC2 
instances have their inbound traffic blocked, and you will need to 
configure security group rules to enable inbound traffic. 

http://docs.aws.amazon.com/AWSEC2/latest/UserGuide/using-network-security.html 

Also, any reason you are manually creating EC2 HPC clusters instead of 
using a toolkit?? We are a fan of MIT's StarCluster -- with it we can 
start up and shut down clusters very quickly (usually a few minutes). 
It is Linux based, with MPICH (and/or Open MPI), Open Grid Scheduler / 
Grid Engine, and many tools needed for doing HPC in EC2: 

http://star.mit.edu/cluster/ 

And we built a 10,000-node cluster in EC2 based on StarCluster late 
last year, during SC12: 

http://blogs.scalablelogic.com/2012/11/running-10000-node-grid-engine-cluster.html 

Rayson 

================================================== 
Open Grid Scheduler - The Official Open Source Grid Engine 
http://gridscheduler.sourceforge.net/ 




On Fri, Feb 8, 2013 at 5:02 PM, Nicholas Sgro < nsgro060 at gmail.com > wrote: 
> Hi, 
> This is the command I'm using: 
> 
> mpiexec.exe -machinefile machines.txt -env MPICH2_CHANNEL sock -n 2 cpi.exe 
> 
> I have tried using both machine file and hosts in the command line, but I 
> get the same results. The program runs on a single instance with any number 
> of processors. I tried running mpiexec on one instance and using the other 
> as a single host and that also works. 
> 
> -Nicholas 
> 
> 
> On Fri, Feb 8, 2013 at 12:04 PM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote: 
>> 
>> Hi, 
>> How are you running your job (mpiexec command)? Did you try using a 
>> machine file to specify the hostnames when running the job? 
>> Does the program (cpi) execute correctly on a single ec2 instance? 
>> 
>> Regards, 
>> Jayesh 
>> 
>> ----- Original Message ----- 
>> From: "Nicholas Sgro" < nsgro060 at gmail.com > 
>> To: "Jayesh Krishna" < jayesh at mcs.anl.gov > 
>> Sent: Thursday, February 7, 2013 9:57:55 PM 
>> Subject: Re: [mpich-discuss] Amazon ec2 Windows machine 
>> 
>> I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to 
>> work either. With sock, I get to the point where I enter the number of 
>> intervals, but then it does nothing. 
>> 
>> Do you know any reason it wouldn't work with ec2 instances? 
>> 
>> 
>> 
>> On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < jayesh at mcs.anl.gov > 
>> wrote: 
>> 
>> 
>> Hi, 
>> Which version of MPICH2 are you using? Did you try the "sock" channel (See 
>> if it works)? 
>> 
>> (PS: We haven't tested MPICH2 on Windows with ec2 instances.) 
>> Regards, 
>> Jayesh 
>> 
>> 
>> ----- Original Message ----- 
>> From: "Nicholas Sgro" < nsgro060 at gmail.com > 
>> To: discuss at mpich.org 
>> Sent: Thursday, February 7, 2013 11:29:57 AM 
>> Subject: [mpich-discuss] Amazon ec2 Windows machine 
>> 
>> 
>> Hi all, 
>> 
>> I am trying to run the example cpi.exe across 2 amazon ec2 instances 
>> running windows. I have different problems depending on the channel I 
>> choose. If I try nemesis, I get the following error: 
>> 
>> Fatal error in MPI_Init: Other MPI error, error stack: 
>> MPIR_Init_thread(392).................: 
>> MPID_Init(139)........................: channel initialization failed 
>> MPIDI_CH3_Init(38)....................: 
>> MPID_nem_init(196)....................: 
>> MPIDI_CH3I_Seg_commit(366)............: 
>> MPIU_SHMW_Hnd_deserialize(324)........: 
>> MPIU_SHMW_Seg_open(863)...............: 
>> MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared memory - 
>> OpenFileMapping The system cannot find the file specified. 
>> 
>> If I try to use shm, cpi.exe uses 100% of the processors on both machines, 
>> but makes no progress and I have to cancel the job. 
>> 
>> I am attaching logs from smpd from both machines from the runs with 
>> nemesis and shm. 
>> 
>> I don't have any experience with mpich, so I have no idea what the problem 
>> is. Any guidance would be appreciated. 
>> 
>> Thanks 
>> 
>> 
>> _______________________________________________ 
>> discuss mailing list discuss at mpich.org 
>> To manage subscription options or unsubscribe: 
>> https://lists.mpich.org/mailman/listinfo/discuss 
>> 
> 
> 
> _______________________________________________ 
> discuss mailing list discuss at mpich.org 
> To manage subscription options or unsubscribe: 
> https://lists.mpich.org/mailman/listinfo/discuss 


_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list