[mpich-discuss] Amazon ec2 Windows machine

Jayesh Krishna jayesh at mcs.anl.gov
Fri Feb 8 11:04:35 CST 2013


Hi,
 How are you running your job (mpiexec command)? Did you try using a machine file to specify the hostnames when running the job?
 Does the program (cpi) execute correctly on a single ec2 instance?

Regards,
Jayesh

----- Original Message -----
From: "Nicholas Sgro" <nsgro060 at gmail.com>
To: "Jayesh Krishna" <jayesh at mcs.anl.gov>
Sent: Thursday, February 7, 2013 9:57:55 PM
Subject: Re: [mpich-discuss] Amazon ec2 Windows machine

I'm using version 1.4.1p1. I tried the sock channel. It doesn't seem to work either. With sock, I get to the point where I enter the number of intervals, but then it does nothing. 

Do you know any reason it wouldn't work with ec2 instances? 



On Thu, Feb 7, 2013 at 4:29 PM, Jayesh Krishna < jayesh at mcs.anl.gov > wrote: 


Hi, 
Which version of MPICH2 are you using? Did you try the "sock" channel (See if it works)? 

(PS: We haven't tested MPICH2 on Windows with ec2 instances.) 
Regards, 
Jayesh 


----- Original Message ----- 
From: "Nicholas Sgro" < nsgro060 at gmail.com > 
To: discuss at mpich.org 
Sent: Thursday, February 7, 2013 11:29:57 AM 
Subject: [mpich-discuss] Amazon ec2 Windows machine 


Hi all, 

I am trying to run the example cpi.exe across 2 amazon ec2 instances running windows. I have different problems depending on the channel I choose. If I try nemesis, I get the following error: 

Fatal error in MPI_Init: Other MPI error, error stack: 
MPIR_Init_thread(392).................: 
MPID_Init(139)........................: channel initialization failed 
MPIDI_CH3_Init(38)....................: 
MPID_nem_init(196)....................: 
MPIDI_CH3I_Seg_commit(366)............: 
MPIU_SHMW_Hnd_deserialize(324)........: 
MPIU_SHMW_Seg_open(863)...............: 
MPIU_SHMW_Seg_create_attach_templ(763): unable to allocate shared memory - OpenFileMapping The system cannot find the file specified. 

If I try to use shm, cpi.exe uses 100% of the processors on both machines, but makes no progress and I have to cancel the job. 

I am attaching logs from smpd from both machines from the runs with nemesis and shm. 

I don't have any experience with mpich, so I have no idea what the problem is. Any guidance would be appreciated. 

Thanks 


_______________________________________________ 
discuss mailing list discuss at mpich.org 
To manage subscription options or unsubscribe: 
https://lists.mpich.org/mailman/listinfo/discuss 




More information about the discuss mailing list