[mpich-discuss] Is there any optimization of collective calls (MPI_Allreduce) for 2^n ranks?

Wed Feb 25 14:59:06 CST 2015

Hi,

I came across a problem in experiments that makes me wondering if there is any optimization of collective calls, such as MPI_Allreduce, for 2^n number of ranks?

We did some experiments on Argonne Vesta system to measure the time of MPI_Allreduce calls using 511, 512 and 513 processes. (one process per node). In each run, the synthetic benchmark first did some computation and then called MPI_Allreduce 30 times, for total 100 loops. We measured the total time spent on communication.

We found that 512-process run gives the best performance. The time for 511, 512 and 513 processes are 0.1492, 0.1449 and 0.1547 seconds respectively. 512-proc outperforms 511-proc by 3.7%, and 513-proc by 6.7%.

The mpich version we used is as follows.

$ mpichversion   
MPICH Version:    	3.1.2
MPICH Release date:	Mon Jul 21 16:00:21 CDT 2014
MPICH Device:    	pamid
MPICH configure: 	--prefix=/home/fujita/soft/mpich-3.1.2 --host=powerpc64-bgq-linux --with-device=pamid --with-file-system=gpfs:BGQ --disable-wrapper-rpath
MPICH CC: 	powerpc64-bgq-linux-gcc    -O2
MPICH CXX: 	powerpc64-bgq-linux-g++   -O2
MPICH F77: 	powerpc64-bgq-linux-gfortran   -O2
MPICH FC: 	powerpc64-bgq-linux-gfortran   -O2

Thanks!

Best,
Aiman
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss