[mpich-devel] Suboptimal MPI_Allreduce() for intercommunicators
Lisandro Dalcin
dalcinl at gmail.com
Wed Apr 30 04:44:49 CDT 2014
The implementation of Allreduce for intercommunicator
(MPIR_Allreduce_inter in src/mpi/coll/allreduce.c) uses more or less
the following algorithm (this is Python code I'm using to test this
issue)
def allreduce_inter_mpich(obj, op, comm, tag, localcomm, low_group):
zero = 0
if comm.rank == 0:
root = MPI.ROOT
else:
root = MPI.PROC_NULL
if low_group:
ignore = reduce_inter(obj, op, zero, comm, tag, localcomm)
result = reduce_inter(obj, op, root, comm, tag, localcomm)
else:
result = reduce_inter(obj, op, root, comm, tag, localcomm)
ignore = reduce_inter(obj, op, zero, comm, tag, localcomm)
return localcomm.bcast(result, 0)
However, while the broadcasts at each group overlap, the calls to
reduce_inter() introduce serialization. A much better implementation
would be:
def allreduce_inter_dalcinl(obj, op, comm, tag, localcomm):
result = reduce_binomial(obj, op, 0, localcomm, tag)
if comm.rank == 0:
result = comm.sendrecv(result, 0, tag, None, 0, tag)
return localcomm.bcast(result, 0)
i.e, perform (overlaped) reductions in the local groups, exchange
results between local and remote rank 0, and (overlaped) broadcast in
the local groups.
I'm ataching a test Python script (I do not expect you to run it :-),
but perhaps you want to see the code). I'm defining a reduce operation
that artificially sleeps 1 second. Running this code in 8 cores in my
desktop clearly shows the issue with the MPICH implementation:
$ mpiexec -n 8 python test-reduce.py
[mpich] time: min=4.003491e+00 max=4.003569e+00
[dalcinl] time: min=2.002367e+00 max=2.002456e+00
What do you think? Am I right? Or perhaps I'm missing something obvious?
--
Lisandro Dalcin
---------------
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-allreduce.py
Type: text/x-python
Size: 3404 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20140430/40383431/attachment.py>
More information about the devel
mailing list