[mpich-discuss] Is there any optimization of collective calls (MPI_Allreduce) for 2^n ranks?
Aiman Fang
aimanf at cs.uchicago.edu
Wed Feb 25 14:59:06 CST 2015
Hi,
I came across a problem in experiments that makes me wondering if there is any optimization of collective calls, such as MPI_Allreduce, for 2^n number of ranks?
We did some experiments on Argonne Vesta system to measure the time of MPI_Allreduce calls using 511, 512 and 513 processes. (one process per node). In each run, the synthetic benchmark first did some computation and then called MPI_Allreduce 30 times, for total 100 loops. We measured the total time spent on communication.
We found that 512-process run gives the best performance. The time for 511, 512 and 513 processes are 0.1492, 0.1449 and 0.1547 seconds respectively. 512-proc outperforms 511-proc by 3.7%, and 513-proc by 6.7%.
The mpich version we used is as follows.
$ mpichversion
MPICH Version: 3.1.2
MPICH Release date: Mon Jul 21 16:00:21 CDT 2014
MPICH Device: pamid
MPICH configure: --prefix=/home/fujita/soft/mpich-3.1.2 --host=powerpc64-bgq-linux --with-device=pamid --with-file-system=gpfs:BGQ --disable-wrapper-rpath
MPICH CC: powerpc64-bgq-linux-gcc -O2
MPICH CXX: powerpc64-bgq-linux-g++ -O2
MPICH F77: powerpc64-bgq-linux-gfortran -O2
MPICH FC: powerpc64-bgq-linux-gfortran -O2
Thanks!
Best,
Aiman
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list