[mpich-discuss] Cluster with uneven CPU speeds

Gus Correa gus at ldeo.columbia.edu
Fri Jun 6 09:03:41 CDT 2014


On 06/06/2014 07:27 AM, Reuti wrote:
> Am 06.06.2014 um 02:42 schrieb Ron Palmer:
>
>> I have a small cluster of computers with uneven clock speed CPUs and currently I am running with "-np" == total number of cores. However, it appears as if the fastest computer has to wait for the slower ones to finish at the end (at least I believe so). The most recent process took 65 hours so I am interested in finding ways to optimise the process.
>>
>> Is it possible to, say, use a larger "-np" and then increase the thread number for the faster CPUs in the machine file to make the faster computers do more work so, ideally, they all finish about the same time? Will it finish off the first batch then start on the next batch? Or, will the faster computers just get more concurrent jobs, possibly slowing down the processing?
>>
>> eg, if the single CPU of PC_A has twice the clock rating to that of single CPU PC_B, and both has quad cores, then use -np=12 and then have the following in the machinefile:
>> PC_A:8
>> PC_B:4
>>
>> Perhaps this is something better addressed with job scheduling software like GridEngine? Reuti?
>
> You could define more slots than cores are available in the queue definition for (a) particular node(s), but this would enable the workaround you outlined already in SGE's way only.
>
> The way MPI was designed was having equal computing power per rank in mind I think (and often clusters are designed this way nowadays, even after an upgrade of a cluster buy adding newer/faster nodes you could route the jobs to use only machines of the same kind in SGE). In PVM there was even an entry for the relative speed of a machine in the hostfile (as a collection of different machines was often seen that days), but there is nothing similar in MPI AFAIK.
>
> -- Reuti
>
>
>> Thanks,
>> Ron
>>
>>
>> --
>> Ron Palmer MSc MBA.
>> Principal Geophysicist
>> ron.palmer at pgcgroup.com.au
>> 0413 579 099
>> 07 3103 4963

Hi Ron

It may be more efficient to just partition the cluster into slow and 
fast nodes, and direct the jobs to either group (based on queue name,
your group's priorities, etc).
The jobs should then run on groups of similar nodes.
I guess the overall job throughput of these smaller/longer-lived jobs
won't be worse than if you try
to load balance big jobs across processors of different capability 
(which is not an easy task anyway, as Pavan and Reuti pointed out).

I am not familiar to SGE, but I guess this partitioning
can be done there.
On Torque what this takes is to declare node's properties
(say, 'slow', 'fast',  or '1.8GHz', '2.4GHz')
in the nodes' file and, if desired,
create separate queues that default to nodes with the appropriate
properties, then direct the jobs to the proper queues (or request the
nodes with the specific properties).

My two cents,
Gus Correa




More information about the discuss mailing list