[mpich-discuss] Implementation of MPICH collectives
Pavan Balaji
balaji at mcs.anl.gov
Fri Sep 13 19:56:47 CDT 2013
Not really. It shouldn't be using the nemesis fast box. Are you setting the environment correctly?
-- Pavan
On Sep 13, 2013, at 7:08 PM, Jiri Simsa wrote:
> To be more precise, I don't see any such call before MPI_Bcast() returns in the root. Is MPICH buffering the data to be broadcasted to some later point?
>
> --Jiri
>
>
> On Fri, Sep 13, 2013 at 7:55 PM, Jiri Simsa <jsimsa at cs.cmu.edu> wrote:
> Well, it seems like it is copying data from "nemesis fastbox". More importantly, I don't see any call to socket(), connect(), and send(), sendto(), or sendmsg() that I would expect to be part of the data transfer.
>
> --Jiri
>
>
> On Fri, Sep 13, 2013 at 5:44 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> Depends on what the memcpy is doing. It might be some internal data manipulation.
>
> -- Pavan
>
> On Sep 13, 2013, at 4:34 PM, Jiri Simsa wrote:
>
> > Hm, I have set that variable and then I have stepped through a program that calls MPI_Bcast (using mpiexec -n 2 <program> on a single node). The MPI_Bcast still seems to use memcpy() while I would expect it to use the sockets interface. Is the memcpy() to be expected?
> >
> > --Jiri
> >
> >
> > On Fri, Sep 13, 2013 at 10:25 AM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> >
> > Yes, you can set the environment variable MPIR_PARAM_CH3_NOLOCAL=1.
> >
> > -- Pavan
> >
> > On Sep 13, 2013, at 7:53 AM, Jiri Simsa wrote:
> >
> > > Pavan,
> > >
> > > Thank you for your answer. That's precisely what I was looking for. Any chance there is a way to force the intranode communication to use tcp?
> > >
> > > --Jiri
> > >
> > > Within the node, it uses shared memory. Outside the node, it depends on the netmod you configured with. tcp is the default netmod.
> > > -- Pavan
> > > On Sep 12, 2013, at 2:24 PM, Jiri Simsa wrote:
> > > > The high-order bit of my question is: What OS interface(s) does MPICH use to transfer data from one MPI process to another?
> > > >
> > > >
> > > > On Thu, Sep 12, 2013 at 1:36 PM, Jiri Simsa <jsimsa at cs.cmu.edu> wrote:
> > > > Hello,
> > > >
> > > > I have been trying to understand how MPICH implements collective operations. To do so, I have been reading the MPICH source code and stepping through mpiexec executions.
> > > >
> > > > For the sake of this discussion, let's assume that all MPI processes are executed on the same computer using: mpiexec -n <n> <mpi_program>
> > > >
> > > > This is my current abstract understanding of MPICH:
> > > >
> > > > - mpiexec spawns a hydra_pmi_proxy process, which in turn spawns <n> instances of <mpi_program>
> > > > - hydra_pmi_proxy process uses socket pairs to communicate with the instances of <mpi_program>
> > > >
> > > > I am not quite sure though what happens under the hoods when a collective operation, such as MPI_Allreduce, is executed. I have noticed that instances of <mpi_program> create and listen on a socket in the course of executing MPI_Allreduce but I am not sure who connects to these sockets. Any chance someone could describe the data flow inside of MPICH when a collective operation, such as MPI_Allreduce, is executed? Thanks!
> > > >
> > > > Best,
> > > >
> > > > --Jiri Simsa
> > > >
> > > > _______________________________________________
> > > > discuss mailing list discuss at mpich.org
> > > > To manage subscription options or unsubscribe:
> > > > https://lists.mpich.org/mailman/listinfo/discuss
> > > --
> > > Pavan Balaji
> > > http://www.mcs.anl.gov/~balaji
> >
> > --
> > Pavan Balaji
> > http://www.mcs.anl.gov/~balaji
> >
> >
>
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
>
>
>
--
Pavan Balaji
http://www.mcs.anl.gov/~balaji
More information about the discuss
mailing list