[mpich-discuss] MPI_Allreduce is slow for 7 processes or more

David Froger david.froger.ml at mailoo.org
Mon May 4 10:14:42 CDT 2015

Thanks Junchao.

> I don't see you measure MPI_Allreduce. 

You're right, let's call my code "a simple example to reproduce a bug" rather
than a benchmark.

> Basically you only measured some random > numbers across processes.

The usleep simulate the time to perform computation in my real code
(Computational Fluid Dynamic software). bench_mpi.cxx only do a
usleep(microseconds) then call MPI_Allreduce. microseconds is a constant base
time, divised by mpi_size (+ a random overhead between 0% and 5%, so that
MPI_Allreduce is not called at the same wall clock on all proceses, but I
thing a should have use a different seed on each proc).

So because what the code do is only usleep(base_time / mpi_size), I expect the
wall clock time to be half with twice processor.

With MPiCH 3.1.4, the wall clock time increase with 7 or more processes.
MPI_Allreduce become very slow without a reason. I'm triying to understand
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:

More information about the discuss mailing list