[mpich-discuss] Implementation of MPICH collectives

Jiri Simsa jsimsa at cs.cmu.edu
Fri Sep 13 19:08:36 CDT 2013


To be more precise, I don't see any such call before MPI_Bcast() returns in
the root. Is MPICH buffering the data to be broadcasted to some later point?

--Jiri


On Fri, Sep 13, 2013 at 7:55 PM, Jiri Simsa <jsimsa at cs.cmu.edu> wrote:

> Well, it seems like it is copying data from "nemesis fastbox". More
> importantly, I don't see any call to socket(), connect(), and send(),
> sendto(), or sendmsg() that I would expect to be part of the data transfer.
>
> --Jiri
>
>
> On Fri, Sep 13, 2013 at 5:44 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
>>
>> Depends on what the memcpy is doing.  It might be some internal data
>> manipulation.
>>
>>  -- Pavan
>>
>> On Sep 13, 2013, at 4:34 PM, Jiri Simsa wrote:
>>
>> > Hm, I have set that variable and then I have stepped through a program
>> that calls MPI_Bcast (using mpiexec -n 2 <program> on a single node). The
>> MPI_Bcast still seems to use memcpy() while I would expect it to use the
>> sockets interface. Is the memcpy() to be expected?
>> >
>> > --Jiri
>> >
>> >
>> > On Fri, Sep 13, 2013 at 10:25 AM, Pavan Balaji <balaji at mcs.anl.gov>
>> wrote:
>> >
>> > Yes, you can set the environment variable MPIR_PARAM_CH3_NOLOCAL=1.
>> >
>> >  -- Pavan
>> >
>> > On Sep 13, 2013, at 7:53 AM, Jiri Simsa wrote:
>> >
>> > > Pavan,
>> > >
>> > > Thank you for your answer. That's precisely what I was looking for.
>> Any chance there is a way to force the intranode communication to use tcp?
>> > >
>> > > --Jiri
>> > >
>> > > Within the node, it uses shared memory.  Outside the node, it depends
>> on the netmod you configured with.  tcp is the default netmod.
>> > >  -- Pavan
>> > > On Sep 12, 2013, at 2:24 PM, Jiri Simsa wrote:
>> > > > The high-order bit of my question is: What OS interface(s) does
>> MPICH use to transfer data from one MPI process to another?
>> > > >
>> > > >
>> > > > On Thu, Sep 12, 2013 at 1:36 PM, Jiri Simsa <jsimsa at cs.cmu.edu>
>> wrote:
>> > > > Hello,
>> > > >
>> > > > I have been trying to understand how MPICH implements collective
>> operations. To do so, I have been reading the MPICH source code and
>> stepping through mpiexec executions.
>> > > >
>> > > > For the sake of this discussion, let's assume that all MPI
>> processes are executed on the same computer using: mpiexec -n <n>
>> <mpi_program>
>> > > >
>> > > > This is my current abstract understanding of MPICH:
>> > > >
>> > > > - mpiexec spawns a hydra_pmi_proxy process, which in turn spawns
>> <n> instances of <mpi_program>
>> > > > - hydra_pmi_proxy process uses socket pairs to communicate with the
>> instances of <mpi_program>
>> > > >
>> > > > I am not quite sure though what happens under the hoods when a
>> collective operation, such as MPI_Allreduce, is executed. I have noticed
>> that instances of <mpi_program> create and listen on a socket in the course
>> of executing MPI_Allreduce but I am not sure who connects to these sockets.
>> Any chance someone could describe the data flow inside of MPICH when a
>> collective operation, such as MPI_Allreduce, is executed? Thanks!
>> > > >
>> > > > Best,
>> > > >
>> > > > --Jiri Simsa
>> > > >
>> > > > _______________________________________________
>> > > > discuss mailing list     discuss at mpich.org
>> > > > To manage subscription options or unsubscribe:
>> > > > https://lists.mpich.org/mailman/listinfo/discuss
>> > > --
>> > > Pavan Balaji
>> > > http://www.mcs.anl.gov/~balaji
>> >
>> > --
>> > Pavan Balaji
>> > http://www.mcs.anl.gov/~balaji
>> >
>> >
>>
>> --
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
>>
>>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20130913/7cb287d7/attachment.html>


More information about the discuss mailing list