[mpich-discuss] Affinity with MPICH_ASYNC_PROGRESS

Sun Feb 24 09:07:22 CST 2013

I'll respond to the non-MPICH aspects of this thread offline...

>> I think the only way for async progress to work well is to have
>> fine-grain locking since of MPI, as is done in PAMID.  Any
>> implementation that resorts to fat-locking is probably better off
>> without async progress unless the application is doing something
>> really silly (like never calling MPI on a rank that is the target of
>> MPI RMA).
>
> There's more than 100k cycles between times that I enter the MPI stack.
> There are only two threads ever that contend for locks (my funneled thread
> and MPICH's async-progress thread). I'm not convinced that you need super
> fine-grained locks to make progress during that time period. FWIW, my
> experience has been that standard Nemesis does a _much_ better job of making
> asynchronous progress than Cray's implementation. If we could look at the
> code for Cray's implementation, we might be able to get a better idea of
> why.

I would be interesting to analyze async progress in detail using MPICH
and MVAPICH, since those are OSS.  We are never going to get the
CrayPICH source (I've asked, NCSA asked) so it's not prudent to debate
what we would do with it if we did.  We could reimplement the uGNI
netmod if we had nothing better to do.

Jeff

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond