[mpich-discuss] Affinity with MPICH_ASYNC_PROGRESS

Jed Brown jedbrown at mcs.anl.gov
Sat Feb 23 20:00:05 CST 2013


On Sat, Feb 23, 2013 at 7:48 PM, Jeff Hammond <jhammond at alcf.anl.gov> wrote:

> There's absolutely no reason why it should.  I am not trying to
> maximize progress; I am trying to maximize computational performance.
> Pinning the 7 comm threads to one core is going to be terrible for
> them, but I am assuming that I don't need that much progress, whereas
> I do need to the computation to run at top speed.  DGEMM on 7 cores
> and 1 core of MPI should be much better than DGEMM on 8 cores where
> each core is time-shared with a comm thread.
>

Sure.


> You should run Intel MPI, MKL, OpenMP and TBB and only compile your
> code with Intel compilers.  That's your best chance to have everything
> work together.
>

Those are terrible programming models, oversynchronizing and inflexible,
and they don't have a concept of memory locality anyway. ;-)


>  I really don't see how Hydra is supposed to know what
> GOMP is doing and try to deal with it.
>

AFAIK, GOMP doesn't set affinity at all. It doesn't really make sense for
TBB because the programming model doesn't have explicit memory locality.
But in this case, _I'm_ setting affinity for my pthreads or my OpenMP
threads because I know how I use them collectively.


>
> What testing I have done of MPICH_NEMESIS_ASYNC_PROGRESS=1 on Cray
> XC30 indicates that NWChem is better off without comm threads since it
> communicates and computes at fine enough granularity such that the
> lack of progress is less than the overhead of internal locking
> (because ASYNC_PROGRESS implies MPI_THREAD_MULTIPLE) and competition
> for execution resources (even though XC30 has Intel SNB with HT
> enabled).
>

MPICH_NEMESIS_ASYNC_PROGRESS=1 is always slower when I've tried it on
Hopper, even when I have nonblocking communication running for a long time
while the application is computing. I'm hoping that works better in the
future, but for now, I still wait just as long when I get around to calling
MPI_Wait, except that everything in between ran slower due to
MPI_THREAD_MULTIPLE.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130223/d0b08972/attachment.html>


More information about the discuss mailing list