[mpich-devel] ROMIO collective i/o memory use

Mon May 6 16:44:48 CDT 2013

On Mon, May 06, 2013 at 04:35:11PM -0500, Jeff Hammond wrote:
> Do alltoallv actually run faster than send-recv for the MPIO use case?
>  For >1MB messages, is alltoallv noticeably faster than a well-written
> send-recv implantation?

We only have data from Blue Gene /L: Hao Yu's paper only studied
scalability up to 1k mpi processes.  Alltoallv got the implementation
an extra 100 MiB /sec or so. 

That paper cites almassi's ICS 2005 paper "MPI collective communication on
BlueGene/L systems", but surely there has been more recent work in
this area?

What is a "well written send-recv implementation"?  One that posts a
bunch of nonblocking sends/receives and lets the MPICH progress engine
figure it out?    One that tries to schedule the send/recieves in some
way?

==rob

-- 
Rob Latham
Mathematics and Computer Science Division
Argonne National Lab, IL USA