[mpich-discuss] Implementation of MPICH collectives

Jiri Simsa jsimsa at cs.cmu.edu
Thu Sep 12 12:36:45 CDT 2013


Hello,

I have been trying to understand how MPICH implements collective
operations. To do so, I have been reading the MPICH source code and
stepping through mpiexec executions.

For the sake of this discussion, let's assume that all MPI processes are
executed on the same computer using: mpiexec -n <n> <mpi_program>

This is my current abstract understanding of MPICH:

- mpiexec spawns a hydra_pmi_proxy process, which in turn spawns <n>
instances of <mpi_program>
- hydra_pmi_proxy process uses socket pairs to communicate with the
instances of <mpi_program>

I am not quite sure though what happens under the hoods when a collective
operation, such as MPI_Allreduce, is executed. I have noticed that
instances of <mpi_program> create and listen on a socket in the course of
executing MPI_Allreduce but I am not sure who connects to these sockets.
Any chance someone could describe the data flow inside of MPICH when a
collective operation, such as MPI_Allreduce, is executed? Thanks!

Best,

--Jiri Simsa
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130912/aad85ddb/attachment.html>


More information about the discuss mailing list