<div dir="ltr">Hello,<div><br></div><div>I have been trying to understand how MPICH implements collective operations. To do so, I have been reading the MPICH source code and stepping through mpiexec executions. </div><div>
<br></div><div>For the sake of this discussion, let's assume that all MPI processes are executed on the same computer using: mpiexec -n <n> <mpi_program></div><div><br></div><div>This is my current abstract understanding of MPICH:</div>
<div><br></div><div>- mpiexec spawns a hydra_pmi_proxy process, which in turn spawns <n> instances of <mpi_program></div><div>- hydra_pmi_proxy process uses socket pairs to communicate with the instances of <mpi_program><br>
</div><div><br></div><div>I am not quite sure though what happens under the hoods when a collective operation, such as MPI_Allreduce, is executed. I have noticed that instances of <mpi_program> create and listen on a socket in the course of executing MPI_Allreduce but I am not sure who connects to these sockets. Any chance someone could describe the data flow inside of MPICH when a collective operation, such as MPI_Allreduce, is executed? Thanks!</div>
<div><br></div><div>Best,</div><div><br></div><div>--Jiri Simsa</div></div>