[mpich-discuss] Cluster with uneven CPU speeds

Reuti reuti at staff.uni-marburg.de
Fri Jun 6 09:11:29 CDT 2014


Am 06.06.2014 um 16:03 schrieb Gus Correa:

> On 06/06/2014 07:27 AM, Reuti wrote:
>> Am 06.06.2014 um 02:42 schrieb Ron Palmer:
>> 
>>> I have a small cluster of computers with uneven clock speed CPUs and currently I am running with "-np" == total number of cores. However, it appears as if the fastest computer has to wait for the slower ones to finish at the end (at least I believe so). The most recent process took 65 hours so I am interested in finding ways to optimise the process.
>>> 
>>> Is it possible to, say, use a larger "-np" and then increase the thread number for the faster CPUs in the machine file to make the faster computers do more work so, ideally, they all finish about the same time? Will it finish off the first batch then start on the next batch? Or, will the faster computers just get more concurrent jobs, possibly slowing down the processing?
>>> 
>>> eg, if the single CPU of PC_A has twice the clock rating to that of single CPU PC_B, and both has quad cores, then use -np=12 and then have the following in the machinefile:
>>> PC_A:8
>>> PC_B:4
>>> 
>>> Perhaps this is something better addressed with job scheduling software like GridEngine? Reuti?
>> 
>> You could define more slots than cores are available in the queue definition for (a) particular node(s), but this would enable the workaround you outlined already in SGE's way only.
>> 
>> The way MPI was designed was having equal computing power per rank in mind I think (and often clusters are designed this way nowadays, even after an upgrade of a cluster buy adding newer/faster nodes you could route the jobs to use only machines of the same kind in SGE). In PVM there was even an entry for the relative speed of a machine in the hostfile (as a collection of different machines was often seen that days), but there is nothing similar in MPI AFAIK.
>> 
>> -- Reuti
>> 
>> 
>>> Thanks,
>>> Ron
>>> 
>>> 
>>> --
>>> Ron Palmer MSc MBA.
>>> Principal Geophysicist
>>> ron.palmer at pgcgroup.com.au
>>> 0413 579 099
>>> 07 3103 4963
> 
> Hi Ron
> 
> It may be more efficient to just partition the cluster into slow and fast nodes, and direct the jobs to either group (based on queue name,
> your group's priorities, etc).
> The jobs should then run on groups of similar nodes.
> I guess the overall job throughput of these smaller/longer-lived jobs
> won't be worse than if you try
> to load balance big jobs across processors of different capability (which is not an easy task anyway, as Pavan and Reuti pointed out).
> 
> I am not familiar to SGE, but I guess this partitioning
> can be done there.
> On Torque what this takes is to declare node's properties
> (say, 'slow', 'fast',  or '1.8GHz', '2.4GHz')
> in the nodes' file and, if desired,
> create separate queues that default to nodes with the appropriate

In SGE it's better to live with one queue for such a setup and create different PEs like MPI1 and MPI2 which are attached to different hosts (or hostgroups) of the one and only queue. Besides submitting and requesting MPI1 or MPI2 for the PE, one can also request MPI* and SGE will select any of the two (or more) partitions of the cluster - but select nodes from this chosen partition only.

-- Reuti


> properties, then direct the jobs to the proper queues (or request the
> nodes with the specific properties).
> 
> My two cents,
> Gus Correa
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list