[mpich-devel] P states and MPI

Fri Jan 17 20:49:06 CST 2014

The released version of Adagio is here:

git at github.com:scalability-llnl/Adagio.git

The algorithm described is the most trivial of the three we looked at
in Rountree ICS 2009.  We called it "Fermata":  predict how long the
MPI rank would wait in a blocking MPI call, and if the expected
delay was long enough, drop down to the lowest CPU clock frequency
and pop back up just before the call was expected to complete.  We
saw significant (>10%) total system energy savings will <1% execution
time overhead.

The best algorithm, though, slowed down computation to consume
slack time that was generated by blocking calls, and only if any
slack remained we dropped into the lowest frequency.  That got us
20% total system energy savings on ParaDis with <1% execution time
overhead.

This was very much a research project and I expect bitrot has set
in by now.  Rewriting from scratch wouldn't be difficult (now that
I have 5 years more experience writing PMPI tools), but there are
a few complications that would need to be addressed before this
could be made into a production tool (OpenMP, nondeterminism caused
by turbo mode, processor performance inhomogeneity, etc).

I think the more interesting question is how to optimize performance
under a power bound; solving that problem has the side effect of
achieving near-optimal energy use (at least in the common case).
We're working on redoing Adagio to solve thing problem, using Intel
Running Average Power Limit (RAPL) technology.

Best,

Barry

On 1/17/14 8:25 AM, "Bronis R. de Supinski" <bronis at llnl.gov> wrote:

>
>Jeff:
>
>Adagio has similar characteristics. Barry has recently
>released it so he should comment on its availability.
>
>Bronis
>
>
>On Fri, 17 Jan 2014, Jeff Hammond wrote:
>
>> Thanks Bill.  I agree with your assessment.  If this is the type of
>> thing that pays off in practice, then it seems to argue for the
>> necessary policy changes.
>>
>> Bronis - What impresses me about this is the ease at which it can
>> deployed.  For the codes for which it reduces power consumption, it's
>> free joules.  If it doesn't, there is nothing lost.  If your code is
>> similarly portable and easy to deploy, then I'll have to evaluate
>> which one is more effective for the relevant workload.
>>
>> Best,
>>
>> Jeff
>>
>>
>> On Fri, Jan 17, 2014 at 8:46 AM, Bronis R. de Supinski
>><bronis at llnl.gov> wrote:
>>>
>>> Um, I would say the Dell approach has no novelty and is
>>> rather rudimentary. We have done a lot more in Adagio:
>>>
>>> http://dl.acm.org/citation.cfm?id=1542340
>>>
>>> @inproceedings{Rountree:2009:AMD:1542275.1542340,
>>>  author = {Rountree, Barry and Lownenthal, David K. and de Supinski,
>>>Bronis
>>> R. and Schulz, Martin and Freeh, Vincent W. and Bletsch, Tyler},
>>>  title = {Adagio: Making DVS Practical for Complex HPC Applications},
>>>  booktitle = {Proceedings of the 23rd International Conference on
>>> Supercomputing},
>>>  series = {ICS '09},
>>>  year = {2009},
>>>  isbn = {978-1-60558-498-0},
>>>  location = {Yorktown Heights, NY, USA},
>>>  pages = {460--469},
>>>  numpages = {10},
>>>  url = {http://doi.acm.org/10.1145/1542275.1542340},
>>>  doi = {10.1145/1542275.1542340},
>>>  acmid = {1542340},
>>>  publisher = {ACM},
>>>  address = {New York, NY, USA},
>>>  keywords = {dvfs, dvs, energy, mpi, runtime},
>>>
>>> }
>>>
>>>
>>>
>>> On Fri, 17 Jan 2014, William Gropp wrote:
>>>
>>>> There has been some work on this for parallel programs; Wu Feng at
>>>> Virginia Tech has done some, for example.  I don't recall seeing any
>>>>work
>>>> that used the profiling interface.  Some of the issues raised,
>>>>especially
>>>> the required privs, mean that this is only available for experiments,
>>>>not
>>>> production, until that changes.  Also, its sometimes possible to make
>>>> predictions about wait times based on previous iterations; that could
>>>>be
>>>> used to refine such control.
>>>>
>>>> Bill
>>>>
>>>> William Gropp
>>>> Director, Parallel Computing Institute
>>>> Deputy Director for Research
>>>> Institute for Advanced Computing Applications and Technologies
>>>> Thomas M. Siebel Chair in Computer Science
>>>> University of Illinois Urbana-Champaign
>>>>
>>>>
>>>>
>>>>
>>>> On Jan 16, 2014, at 9:36 PM, Jeff Hammond wrote:
>>>>
>>>>> I was impressed by
>>>>>
>>>>> 
>>>>>http://www.hpcadvisorycouncil.com/events/2013/Spain-Workshop/pdf/5_Del
>>>>>l.pdf
>>>>> /
>>>>> 
>>>>>http://www.bsc.es/sites/default/files/public/mare_nostrum/2013hpcac-05
>>>>>.pdf.
>>>>> I wonder if anyone else has seen this, done anything similar, is
>>>>> interested in doing something similar, etc.  I can imagine that a
>>>>>more
>>>>> integrated implementation has some performance advantages, albeit at
>>>>> much greater maintenance cost.
>>>>>
>>>>> Thanks,
>>>>>
>>>>> Jeff
>>>>>
>>>>> --
>>>>> Jeff Hammond
>>>>> jeff.science at gmail.com
>>>>
>>>>
>>>>
>>>
>>
>>
>>
>> -- 
>> Jeff Hammond
>> Argonne Leadership Computing Facility
>> University of Chicago Computation Institute
>> jhammond at anl.gov / jhammond at uchicago.edu / (630) 252-5381
>> http://www.linkedin.com/in/jeffhammond
>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>