[mpich-devel] O(N^p) data

Wed Aug 24 14:16:27 CDT 2016

Just a slight discrepancy:
> the prefactor was ~82 bytes, which is pretty lean (~25 MB per rank at 3M
ranks)
82 * 3M = 246MB
Did you mean 8.2 bytes per rank ?

On Wed, Aug 24, 2016 at 2:05 PM, Dan Ibanez <dan.a.ibanez at gmail.com> wrote:

> Thanks Jeff !
>
> Yea, I've been able to write scalable MPI-based code
> that doesn't use MPI_All* functions, and the
> MPI_Neighbor_all* variants are just perfect; they have
> replaced lots of low-level send/recv systems.
>
> I was interested in the theoretical scalability of the
> implementation, and your answer is pretty comprehensive
> so I'll go read those papers.
>
> On Wed, Aug 24, 2016 at 1:55 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>
>> It depends on where you look in MPICH.  I analyzed memory consumption of
>> MPI on Blue Gene/Q, which was based on MPICH (and is OSS, so you can read
>> all of it).  There was O(nproc) memory usage at every node, but I recall it
>> the prefactor was ~82 bytes, which is pretty lean (~25 MB per rank at 3M
>> ranks).  I don't know if the O(nproc) data was in MPICH itself or the
>> underlying layer (PAMI), or both, but it doesn't really matter from a user
>> perspective.
>>
>> Some _networks_ might make it hard not to have O(nproc) eager buffers on
>> every rank, and there are other "features" of network HW/SW that may
>> require O(nproc) data.  Obviously, since this sort of thing is not
>> scalable, networks that historically had such properties have evolved to
>> support more scalable designs.  Some of the low-level issues are addressed
>> in https://www.open-mpi.org/papers/ipdps-2006/ipdps-2006-ope
>> nmpi-ib-scalability.pdf.
>>
>> User buffers are a separate issue.  MPI_Alltoall and MPI_Allgather acts
>> on O(nproc) user storage.  MPI_Allgatherv, MPI_Alltoallv and MPI_Alltoallw
>> have O(nproc) input vectors.  MPI experts often refer to the vector
>> collectives as unscalable, but of course this may not matter in practice
>> for many users.  And in some of the cases where MPI_Alltoallv is used, one
>> can replace with a carefully written loop over Send-Recv calls that does
>> not require the user to allocate O(nproc) vectors specifically for MPI.
>>
>> There's a paper by Argonne+IBM that addresses this topic in more detail:
>> http://www.mcs.anl.gov/~thakur/papers/mpi-million.pdf
>>
>> Jeff
>>
>>
>> On Wed, Aug 24, 2016 at 10:28 AM, Dan Ibanez <dan.a.ibanez at gmail.com>
>> wrote:
>>
>>> Hello,
>>>
>>> This may be a silly question, but the reason
>>> I'm asking is to obtain a fairly definitive answer.
>>> Basically, does MPICH have data structures
>>> which are of size:
>>> 1) O(N)
>>> 2) O(N^2)
>>> Where N is the size of MPI_COMM_WORLD ?
>>> My initial guess would be no, because there
>>> exist machines (Mira) for which it is not
>>> possible to store N^2 bytes, and even N bytes
>>> becomes an issue.
>>> I understand there are MPI functions (MPI_alltoall) one can
>>> call that by definition will require at least O(N) memory,
>>> but supposing one does not use these, would the internal
>>> MPICH systems still have this memory complexity ?
>>>
>>> Thank you for looking at this anyway
>>>
>>> _______________________________________________
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/devel
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20160824/fa1f29dd/attachment.html>