[mpich-devel] MPI_Recv, blocking call concept

Wed Jun 13 23:48:33 CDT 2018

Ali MoradiAlamdarloo <timndus at gmail.com> writes:

> Thanks for your comment.
> On Fri, Jun 8, 2018 at 9:08 PM, Jed Brown <jed at jedbrown.org> wrote:
>
>> Your assumption is that exascale machines will be homogeneous
>
> Of course Exascale machines will be heterogeneous in a global view. probably
>  two application running on the system has heterogeneous computing resource
> comparing to each other, but I think each application must has their own
> homogeneous computing resources, otherwise maybe this heterogeneity itself
> makes load imbalances.

I meant coprocessors or distinct latency- and throughput-optimized
cores.  A CPU spinning is a trivial energy cost when connected to
several idle GPUs.

> Yes we can do some static hardware reconfiguration and find the best
> configuration(probably heterogeneous) for a specific application but I
> don't think this is going to be feasible in Exascale systems.
>
> filled with dirt-cheap hardware that requires lots of power, and operated by
>> low-wage workers
>
>  This is not my assumption. Yes, the hardware guys are building low power
> high efficient computing cores, but currently what is going to be run on
> top of it is a software that waste power at slacks.
> 1- there is a time frame that software waste the power.
> 2- power is the most constrained resource in Exascale.

It's money.  It costs $30M to buy the hardware that draws 1 MW, so you'd
need to run for 30 years to equal the acquisition cost (a MW-year costs
on the order of $1M).  Energy is a small fraction of the operating cost
for current HPC facilities -- hardware and staffing are much more
expensive.  Note that there exist 300 MW commercial data centers.

> 3- there will be a millions(maybe billions) of cores acting like this.

You have terrible load balance if most cores are waiting for a few, thus
an expensive mostly-idle machine.  Load balancing attempts the opposite
-- no stragglers, but it doesn't matter if a few complete very early.

> So this must be handled, no matter how much our computing cores are power
> efficient.
>
> and also that applications will be highly latency
>> tolerant while also having such atrocious load imbalance that reducing
>> power while waiting on communication will significantly reduce total
>> operating costs?
>>
> Yes, probably there will be some plentiful latency insensitive Exascale
> applications.
> We would like the developers to write code where every processors is doing
> pretty much the same thing, it takes the same amount of time, they all
> communicate at exactly the same time and they do it again, in the real
> world this never happens. until there are slacks we have the chance to go
> forward in efficiency.

You've already lost if a significant fraction of the core-hours on your
machine are spent waiting.  Optimizing CPU throttling when you know
you'll be waiting a long time is just picking up crumbs.