No subject
Tue Jun 18 13:52:11 CDT 2019
MPI_Barrier before MPI_Allreduce can improve performance
significantly.
This is hardly the full story. It would be useful to know more about
what you are trying to accomplish.
Best,
Jeff
On Wed, Feb 25, 2015 at 3:07 PM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
> Yes. Many collectives have optimizations for power-of-two processes. In
> MPICH's source code allreduce.c, you can find the following comments.
>
> /* This is the default implementation of allreduce. The algorithm is:
>
> Algorithm: MPI_Allreduce
>
> For the heterogeneous case, we call MPI_Reduce followed by MPI_Bcast
> in order to meet the requirement that all processes must have the
> same result. For the homogeneous case, we use the following algorithms.
>
>
> For long messages and for builtin ops and if count >= pof2 (where
> pof2 is the nearest power-of-two less than or equal to the number
> of processes), we use Rabenseifner's algorithm (see
> http://www.hlrs.de/mpi/myreduce.html).
> This algorithm implements the allreduce in two steps: first a
> reduce-scatter, followed by an allgather. A recursive-halving
> algorithm (beginning with processes that are distance 1 apart) is
> used for the reduce-scatter, and a recursive doubling
> algorithm is used for the allgather. The non-power-of-two case is
> handled by dropping to the nearest lower power-of-two: the first
> few even-numbered processes send their data to their right neighbors
> (rank+1), and the reduce-scatter and allgather happen among the remaining
> power-of-two processes. At the end, the first few even-numbered
> processes get the result from their right neighbors.
>
> For the power-of-two case, the cost for the reduce-scatter is
> lgp.alpha + n.((p-1)/p).beta + n.((p-1)/p).gamma. The cost for the
> allgather lgp.alpha + n.((p-1)/p).beta. Therefore, the
> total cost is:
> Cost = 2.lgp.alpha + 2.n.((p-1)/p).beta + n.((p-1)/p).gamma
>
> For the non-power-of-two case,
> Cost = (2.floor(lgp)+2).alpha + (2.((p-1)/p) + 2).n.beta +
> n.(1+(p-1)/p).gamma
>
>
> For short messages, for user-defined ops, and for count < pof2
> we use a recursive doubling algorithm (similar to the one in
> MPI_Allgather). We use this algorithm in the case of user-defined ops
> because in this case derived datatypes are allowed, and the user
> could pass basic datatypes on one process and derived on another as
> long as the type maps are the same. Breaking up derived datatypes
> to do the reduce-scatter is tricky.
>
> Cost = lgp.alpha + n.lgp.beta + n.lgp.gamma
>
> Possible improvements:
>
> End Algorithm: MPI_Allreduce
> */
>
> --Junchao Zhang
>
> On Wed, Feb 25, 2015 at 2:59 PM, Aiman Fang <aimanf at cs.uchicago.edu> wrote:
>>
>> Hi,
>>
>> I came across a problem in experiments that makes me wondering if there is
>> any optimization of collective calls, such as MPI_Allreduce, for 2^n number
>> of ranks?
>>
>> We did some experiments on Argonne Vesta system to measure the time of
>> MPI_Allreduce calls using 511, 512 and 513 processes. (one process per
>> node). In each run, the synthetic benchmark first did some computation and
>> then called MPI_Allreduce 30 times, for total 100 loops. We measured the
>> total time spent on communication.
>>
>> We found that 512-process run gives the best performance. The time for
>> 511, 512 and 513 processes are 0.1492, 0.1449 and 0.1547 seconds
>> respectively. 512-proc outperforms 511-proc by 3.7%, and 513-proc by 6.7%.
>>
>> The mpich version we used is as follows.
>>
>> $ mpichversion
>> MPICH Version: 3.1.2
>> MPICH Release date: Mon Jul 21 16:00:21 CDT 2014
>> MPICH Device: pamid
>> MPICH configure: --prefix=/home/fujita/soft/mpich-3.1.2
>> --host=powerpc64-bgq-linux --with-device=pamid --with-file-system=gpfs:BGQ
>> --disable-wrapper-rpath
>> MPICH CC: powerpc64-bgq-linux-gcc -O2
>> MPICH CXX: powerpc64-bgq-linux-g++ -O2
>> MPICH F77: powerpc64-bgq-linux-gfortran -O2
>> MPICH FC: powerpc64-bgq-linux-gfortran -O2
>>
>> Thanks!
>>
>> Best,
>> Aiman
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
--
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list