[mpich-discuss] MPI_Allreduce is slow for 7 processes or more

David Froger david.froger.ml at mailoo.org
Mon May 4 02:35:43 CDT 2015


I've written a little MPI_Allreduce benchmarks [0].

The results with MPICH 1.4.1 [1] and Open MPI 1.8.4 [2] (1 column for the
number of processors, and 1 columns for the corresponding wall clock time) show
that clock time is half with twice processors.

But with `MPICH 3.1.4`, the wall clock time increase for 7, 8 or more processes [3].

In my real code and for all of the 3 above MPI implementation, I observe the
same problem for 7, 8 or more processes, while I expect my code to be scallable
to at least 8 or 16 processes.

So I'm trying to understand what could happen with the little benchmark and
MPICH 3.1.4?

Thanks for reading.

Best regards,

[0] https://github.com/dfroger/issue/blob/8b8bdd8e4b2b5e265c25fc2ba7077f6a108bb34a/mpi/bench_mpi.cxx
[1] https://github.com/dfroger/issue/blob/8b8bdd8e4b2b5e265c25fc2ba7077f6a108bb34a/mpi/carla/conda-default.mpich2.1.4.1p1.txt
[2] https://github.com/dfroger/issue/blob/8b8bdd8e4b2b5e265c25fc2ba7077f6a108bb34a/mpi/carla/conda-mpi4py-channel.openmpi.1.8.4.txt
[3] https://github.com/dfroger/issue/blob/8b8bdd8e4b2b5e265c25fc2ba7077f6a108bb34a/mpi/carla/conda-mpi4py-channel.mpich.3.1.4.txt
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:

More information about the discuss mailing list