[mpich-devel] O(N^p) data

Dan Ibanez dan.a.ibanez at gmail.com
Wed Aug 24 13:05:45 CDT 2016


Thanks Jeff !

Yea, I've been able to write scalable MPI-based code
that doesn't use MPI_All* functions, and the
MPI_Neighbor_all* variants are just perfect; they have
replaced lots of low-level send/recv systems.

I was interested in the theoretical scalability of the
implementation, and your answer is pretty comprehensive
so I'll go read those papers.

On Wed, Aug 24, 2016 at 1:55 PM, Jeff Hammond <jeff.science at gmail.com>
wrote:

> It depends on where you look in MPICH.  I analyzed memory consumption of
> MPI on Blue Gene/Q, which was based on MPICH (and is OSS, so you can read
> all of it).  There was O(nproc) memory usage at every node, but I recall it
> the prefactor was ~82 bytes, which is pretty lean (~25 MB per rank at 3M
> ranks).  I don't know if the O(nproc) data was in MPICH itself or the
> underlying layer (PAMI), or both, but it doesn't really matter from a user
> perspective.
>
> Some _networks_ might make it hard not to have O(nproc) eager buffers on
> every rank, and there are other "features" of network HW/SW that may
> require O(nproc) data.  Obviously, since this sort of thing is not
> scalable, networks that historically had such properties have evolved to
> support more scalable designs.  Some of the low-level issues are addressed
> in https://www.open-mpi.org/papers/ipdps-2006/ipdps-2006-
> openmpi-ib-scalability.pdf.
>
> User buffers are a separate issue.  MPI_Alltoall and MPI_Allgather acts on
> O(nproc) user storage.  MPI_Allgatherv, MPI_Alltoallv and MPI_Alltoallw
> have O(nproc) input vectors.  MPI experts often refer to the vector
> collectives as unscalable, but of course this may not matter in practice
> for many users.  And in some of the cases where MPI_Alltoallv is used, one
> can replace with a carefully written loop over Send-Recv calls that does
> not require the user to allocate O(nproc) vectors specifically for MPI.
>
> There's a paper by Argonne+IBM that addresses this topic in more detail:
> http://www.mcs.anl.gov/~thakur/papers/mpi-million.pdf
>
> Jeff
>
>
> On Wed, Aug 24, 2016 at 10:28 AM, Dan Ibanez <dan.a.ibanez at gmail.com>
> wrote:
>
>> Hello,
>>
>> This may be a silly question, but the reason
>> I'm asking is to obtain a fairly definitive answer.
>> Basically, does MPICH have data structures
>> which are of size:
>> 1) O(N)
>> 2) O(N^2)
>> Where N is the size of MPI_COMM_WORLD ?
>> My initial guess would be no, because there
>> exist machines (Mira) for which it is not
>> possible to store N^2 bytes, and even N bytes
>> becomes an issue.
>> I understand there are MPI functions (MPI_alltoall) one can
>> call that by definition will require at least O(N) memory,
>> but supposing one does not use these, would the internal
>> MPICH systems still have this memory complexity ?
>>
>> Thank you for looking at this anyway
>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20160824/1ca443c3/attachment-0001.html>


More information about the devel mailing list