[mpich-devel] MPI_Recv, blocking call concept

Thu Jun 7 18:52:12 CDT 2018

On Thu, Jun 7, 2018 at 4:39 PM, Lisandro Dalcin <dalcinl at gmail.com> wrote:

> Though I acknowledge my solution is not very elegant, I had to find a
> workaround for a async execution Python framework I wrote
> (mpi4py.futures), and eventually decided to go with sleep() calls with
> exponential backoff. I have no control and don't want users to be
> forced to switch to ch3:sock, and this behavior is the common one in
> other MPIs out there.
>
> @Ali, if you can read Python, you may find some inspiration here:
> https://bitbucket.org/mpi4py/mpi4py/src/master/src/mpi4py/futures/_lib.py
> This certainly adds latency, but you can somehow decide how much
> (well, up to the accuracy of your kernel timer slack) and you use it
> where really really needed.
>

I am not fluent in Python but you are just using nonblocking calls and
testing them carefully, right?  That's what I'd implement if I wanted to
reduce spinning.

A more exotic implementation could do something along the lines of Casper
and offload progress to a different process, which would allow the calling
process to block on a interprocess semaphore or something like that, but it
might cause problems with threads in that process if the OS tries to put
the whole process to sleep.

> @Jeff, if you have a better implementation to suggest, please let us
> know (maybe off-list, this is unrelated to MPICH development).
>

As blocking poll is largely incompatible with low-latency and shared-memory
protocols, I don't think there is any implementation that is going to do a
good job at this, since it would not be very appealing to the majority of
MPI users.  The PETSc folks appear to be the biggest proponents of blocking
poll (solely for purposes of running dozens of MPI processes on their
laptops, it seems) and they seem to prefer ch3:sock.  I defer to their
experience as to whether a better implementation exists.

Jeff

>
> On Fri, 8 Jun 2018 at 01:00, Jeff Hammond <jeff.science at gmail.com> wrote:
> >
> > It spins because that is optimal for latency and how the shared-memory
> protocols work.  If you want blocking semantics, use ch3:sock, which will
> park the calling thread in the kernel.  It is great for oversubscription
> but terrible for performance in the common case of exact subscription or
> undersubscription.
> >
> > You can't save much power unless you drop into lower P/C-states, but the
> states that save you significant power will increase the latency a huge
> amount.  Dell did something a while back that turned down the frequency
> during MPI calls (http://www.hpcadvisorycouncil.com/events/
> 2013/Spain-Workshop/pdf/5_Dell.pdf), which saved a bit of power.
> >
> > Jeff
> >
> > On Thu, Jun 7, 2018 at 4:27 AM, Ali MoradiAlamdarloo <timndus at gmail.com>
> wrote:
> >>
> >> Dear all,
> >>
> >> The blocking call definition from my understanding is something like
> this:
> >> when a process(P0) do a blocking system call, scheduler block the
> process and assign another process(P1) in order to efficiently use of CPU
> core. Finally P0 response will be ready and scheduler can map it again on a
> core.
> >>
> >> But this is not what happening In MPICH->MPI_Recv function. you call it
> BLOCKING call, but the process that call this function actually doesn't
> block, it just continue running on core WAITING for his response.
> >>
> >> Why you decide to do this? why we have a process waiting on a valuable
> processing core and burning the power?
> >>
> >> _______________________________________________
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/devel
> >>
> >
> >
> >
> > --
> > Jeff Hammond
> > jeff.science at gmail.com
> > http://jeffhammond.github.io/
> > _______________________________________________
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/devel
>
>
>
> --
> Lisandro Dalcin
> ============
> Research Scientist
> Computer, Electrical and Mathematical Sciences & Engineering (CEMSE)
> Extreme Computing Research Center (ECRC)
> King Abdullah University of Science and Technology (KAUST)
> http://ecrc.kaust.edu.sa/
>
> 4700 King Abdullah University of Science and Technology
> al-Khawarizmi Bldg (Bldg 1), Office # 0109
> Thuwal 23955-6900, Kingdom of Saudi Arabia
> http://www.kaust.edu.sa
>
> Office Phone: +966 12 808-0459
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20180607/50e34813/attachment.html>