[mpich-discuss] Cluster with uneven CPU speeds

Fri Jun 6 11:06:08 CDT 2014

You need to declare your global variables as thread private and make other similar changes.  It’s not transparent.

  — Pavan

On Jun 6, 2014, at 10:23 AM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:

> You may consider AMPI (http://charm.cs.uiuc.edu/ppl_research/ampi/), which supports automatic dynamic load balancing, a very cool idea.
> 
> --Junchao Zhang
> 
> 
> On Thu, Jun 5, 2014 at 8:51 PM, Ron Palmer <ron.palmer at pgcgroup.com.au> wrote:
> Pavan,
> thanks for your reply and comments. Unfortunately, the actual application I am running is outside my sphere of influence (though as I am running a beta code, I will forward your thoughts in hope they will consider them).
> 
> Is there another way that does not include looking at the actual application doing the work?
> 
> Regards,
> Ron
> 
> 
> On 6/06/2014 11:48, Balaji, Pavan wrote:
>> Ron,
>> 
>> In general, oversubscribing the cores of a node is a bad idea.  MPI is optimized for the common case where each MPI process is on at least one core, which most applications use.  This, however, adds a cost when you oversubscribe, and is not recommended.
>> 
>> To deal with cores that operate at different speeds, the only good way is to restructure your algorithm to be more asynchronous in nature.  For example, if a master-worker model is possible, that might work great.  Some workers (which are running on faster cores) do more work than others.  However, not all algorithms can be expressed in this model.  There are other asynchronous models possible too.
>> 
>> In short, I think it’s time to go back to the whiteboard and see if the algorithm used by the application is appropriate or not.
>> 
>>   — Pavan
>> 
>> On Jun 5, 2014, at 7:42 PM, Ron Palmer 
>> <ron.palmer at pgcgroup.com.au>
>>  wrote:
>> 
>> 
>>> I have a small cluster of computers with uneven clock speed CPUs and currently I am running with "-np" == total number of cores. However, it appears as if the fastest computer has to wait for the slower ones to finish at the end (at least I believe so). The most recent process took 65 hours so I am interested in finding ways to optimise the process.
>>> 
>>> Is it possible to, say, use a larger "-np" and then increase the thread number for the faster CPUs in the machine file to make the faster computers do more work so, ideally, they all finish about the same time? Will it finish off the first batch then start on the next batch? Or, will the faster computers just get more concurrent jobs, possibly slowing down the processing?
>>> 
>>> eg, if the single CPU of PC_A has twice the clock rating to that of single CPU PC_B, and both has quad cores, then use -np=12 and then have the following in the machinefile:
>>> PC_A:8
>>> PC_B:4
>>> 
>>> Perhaps this is something better addressed with job scheduling software like GridEngine? Reuti?
>>> 
>>> Thanks,
>>> Ron
>>> 
>>> 
>>> -- 
>>> Ron Palmer MSc MBA.
>>> Principal Geophysicist
>>> 
>>> ron.palmer at pgcgroup.com.au
>>> 
>>> 0413 579 099
>>> 07 3103 4963
>>> 
>>> 
>>> _______________________________________________
>>> discuss mailing list     
>>> discuss at mpich.org
>>> 
>>> To manage subscription options or unsubscribe:
>>> 
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     
>> discuss at mpich.org
>> 
>> To manage subscription options or unsubscribe:
>> 
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> -- 
> Ron Palmer MSc MBA.
> Principal Geophysicist
> ron.palmer at pgcgroup.com.au
> 0413 579 099
> 07 3103 4963
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss