[mpich-devel] Suboptimal MPI_Allreduce() for intercommunicators

Lisandro Dalcin dalcinl at gmail.com
Wed Apr 30 04:44:49 CDT 2014


The implementation of Allreduce for intercommunicator
(MPIR_Allreduce_inter in src/mpi/coll/allreduce.c) uses more or less
the following algorithm (this is Python code I'm using to test this
issue)

def allreduce_inter_mpich(obj, op, comm, tag, localcomm, low_group):
    zero = 0
    if comm.rank == 0:
        root = MPI.ROOT
    else:
        root = MPI.PROC_NULL
    if low_group:
        ignore = reduce_inter(obj, op, zero, comm, tag, localcomm)
        result = reduce_inter(obj, op, root, comm, tag, localcomm)
    else:
        result = reduce_inter(obj, op, root, comm, tag, localcomm)
        ignore = reduce_inter(obj, op, zero, comm, tag, localcomm)
    return localcomm.bcast(result, 0)


However, while the broadcasts at each group overlap, the calls to
reduce_inter() introduce serialization. A much better implementation
would be:

def allreduce_inter_dalcinl(obj, op, comm, tag, localcomm):
    result = reduce_binomial(obj, op, 0, localcomm, tag)
    if comm.rank == 0:
        result = comm.sendrecv(result, 0, tag, None, 0, tag)
    return localcomm.bcast(result, 0)

i.e, perform (overlaped) reductions in the local groups, exchange
results between local and remote rank 0, and (overlaped) broadcast in
the local groups.

I'm ataching a test Python script (I do not expect you to run it :-),
but perhaps you want to see the code). I'm defining a reduce operation
that artificially sleeps 1 second. Running this code in 8 cores in my
desktop clearly shows the issue with the MPICH implementation:

$ mpiexec -n 8 python test-reduce.py
[mpich]   time: min=4.003491e+00 max=4.003569e+00
[dalcinl] time: min=2.002367e+00 max=2.002456e+00

What do you think? Am I right? Or perhaps I'm missing something obvious?


-- 
Lisandro Dalcin
---------------
CIMEC (UNL/CONICET)
Predio CONICET-Santa Fe
Colectora RN 168 Km 472, Paraje El Pozo
3000 Santa Fe, Argentina
Tel: +54-342-4511594 (ext 1016)
Tel/Fax: +54-342-4511169
-------------- next part --------------
A non-text attachment was scrubbed...
Name: test-allreduce.py
Type: text/x-python
Size: 3404 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20140430/40383431/attachment.py>


More information about the devel mailing list