<div dir="ltr">Yes. To verify the behavior I wrote a simple test program:<div><br></div><div><div><font face="courier new, monospace">#include "mpi.h"</font></div><div><font face="courier new, monospace">#include <stdlib.h></font></div>

<div><font face="courier new, monospace">#include <string.h></font></div><div><font face="courier new, monospace"><br></font></div><div><font face="courier new, monospace">int main(int argc, char **argv) {</font></div>

<div><font face="courier new, monospace">  char message[256];</font></div><div><font face="courier new, monospace">  int rank;</font></div><div><font face="courier new, monospace">  if (getenv("MPIR_PARAM_CH3_NO_LOCAL") != NULL) {</font></div>

<div><font face="courier new, monospace">    printf("MPIR_PARAM_CH3_NO_LOCAL = %s\n", getenv("MPIR_PARAM_CH3_NO_LOCAL"));</font></div><div><font face="courier new, monospace">  }</font></div><div><font face="courier new, monospace">  MPI_Init(&argc, &argv);</font></div>

<div><font face="courier new, monospace">  MPI_Comm_rank(MPI_COMM_WORLD, &rank);</font></div><div><font face="courier new, monospace">  if (rank == 0) { strncpy(message, "Hello!", strlen("Hello!")); }</font></div>

<div><font face="courier new, monospace">  MPI_Bcast(message, strlen("Hello!"), MPI_CHAR, 0, MPI_COMM_WORLD);</font></div><div><font face="courier new, monospace">  MPI_Finalize();</font></div><div><font face="courier new, monospace">  printf("%d: %s\n", rank, message);</font></div>

<div><font face="courier new, monospace">  return 0;</font></div><div><font face="courier new, monospace">}</font></div></div><div><br></div><div>When I run it with "mpiexec -n 2 ./simple" I get the following output:</div>

<div><br></div><div><div><font face="courier new, monospace">MPIR_PARAM_CH3_NO_LOCAL = 1</font></div><div><font face="courier new, monospace">MPIR_PARAM_CH3_NO_LOCAL = 1</font></div><div><font face="courier new, monospace">0: Hello!</font></div>

<div><font face="courier new, monospace">1: Hello!</font></div></div><div><br></div><div>I have compiled mpich-3.0.4 with --enable-g=dbg,log and set the MPICH_DBG environment variable to FILE and the MPICH_DBG_LEVEL environment variable to VERBOSE. I am attaching the log file for the process 0, which shows (to the best of my understanding) that the broadcast uses fbox and memcpy to transfer the data. </div>

</div><div class="gmail_extra"><br><br><div class="gmail_quote">On Fri, Sep 13, 2013 at 8:56 PM, Pavan Balaji <span dir="ltr"><<a href="mailto:balaji@mcs.anl.gov" target="_blank">balaji@mcs.anl.gov</a>></span> wrote:<br>

<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>

Not really.  It shouldn't be using the nemesis fast box.  Are you setting the environment correctly?<br>

<span class="HOEnZb"><font color="#888888"><br>

 -- Pavan<br>

</font></span><div class="HOEnZb"><div class="h5"><br>

On Sep 13, 2013, at 7:08 PM, Jiri Simsa wrote:<br>

<br>

> To be more precise, I don't see any such call before MPI_Bcast() returns in the root. Is MPICH buffering the data to be broadcasted to some later point?<br>

><br>

> --Jiri<br>

><br>

><br>

> On Fri, Sep 13, 2013 at 7:55 PM, Jiri Simsa <<a href="mailto:jsimsa@cs.cmu.edu">jsimsa@cs.cmu.edu</a>> wrote:<br>

> Well, it seems like it is copying data from "nemesis fastbox". More importantly, I don't see any call to socket(), connect(), and send(), sendto(), or sendmsg() that I would expect to be part of the data transfer.<br>


><br>

> --Jiri<br>

><br>

><br>

> On Fri, Sep 13, 2013 at 5:44 PM, Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov">balaji@mcs.anl.gov</a>> wrote:<br>

><br>

> Depends on what the memcpy is doing.  It might be some internal data manipulation.<br>

><br>

>  -- Pavan<br>

><br>

> On Sep 13, 2013, at 4:34 PM, Jiri Simsa wrote:<br>

><br>

> > Hm, I have set that variable and then I have stepped through a program that calls MPI_Bcast (using mpiexec -n 2 <program> on a single node). The MPI_Bcast still seems to use memcpy() while I would expect it to use the sockets interface. Is the memcpy() to be expected?<br>


> ><br>

> > --Jiri<br>

> ><br>

> ><br>

> > On Fri, Sep 13, 2013 at 10:25 AM, Pavan Balaji <<a href="mailto:balaji@mcs.anl.gov">balaji@mcs.anl.gov</a>> wrote:<br>

> ><br>

> > Yes, you can set the environment variable MPIR_PARAM_CH3_NOLOCAL=1.<br>

> ><br>

> >  -- Pavan<br>

> ><br>

> > On Sep 13, 2013, at 7:53 AM, Jiri Simsa wrote:<br>

> ><br>

> > > Pavan,<br>

> > ><br>

> > > Thank you for your answer. That's precisely what I was looking for. Any chance there is a way to force the intranode communication to use tcp?<br>

> > ><br>

> > > --Jiri<br>

> > ><br>

> > > Within the node, it uses shared memory.  Outside the node, it depends on the netmod you configured with.  tcp is the default netmod.<br>

> > >  -- Pavan<br>

> > > On Sep 12, 2013, at 2:24 PM, Jiri Simsa wrote:<br>

> > > > The high-order bit of my question is: What OS interface(s) does MPICH use to transfer data from one MPI process to another?<br>

> > > ><br>

> > > ><br>

> > > > On Thu, Sep 12, 2013 at 1:36 PM, Jiri Simsa <<a href="mailto:jsimsa@cs.cmu.edu">jsimsa@cs.cmu.edu</a>> wrote:<br>

> > > > Hello,<br>

> > > ><br>

> > > > I have been trying to understand how MPICH implements collective operations. To do so, I have been reading the MPICH source code and stepping through mpiexec executions.<br>

> > > ><br>

> > > > For the sake of this discussion, let's assume that all MPI processes are executed on the same computer using: mpiexec -n <n> <mpi_program><br>

> > > ><br>

> > > > This is my current abstract understanding of MPICH:<br>

> > > ><br>

> > > > - mpiexec spawns a hydra_pmi_proxy process, which in turn spawns <n> instances of <mpi_program><br>

> > > > - hydra_pmi_proxy process uses socket pairs to communicate with the instances of <mpi_program><br>

> > > ><br>

> > > > I am not quite sure though what happens under the hoods when a collective operation, such as MPI_Allreduce, is executed. I have noticed that instances of <mpi_program> create and listen on a socket in the course of executing MPI_Allreduce but I am not sure who connects to these sockets. Any chance someone could describe the data flow inside of MPICH when a collective operation, such as MPI_Allreduce, is executed? Thanks!<br>


> > > ><br>

> > > > Best,<br>

> > > ><br>

> > > > --Jiri Simsa<br>

> > > ><br>

> > > > _______________________________________________<br>

> > > > discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>

> > > > To manage subscription options or unsubscribe:<br>

> > > > <a href="https://lists.mpich.org/mailman/listinfo/discuss" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>

> > > --<br>

> > > Pavan Balaji<br>

> > > <a href="http://www.mcs.anl.gov/~balaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>

> ><br>

> > --<br>

> > Pavan Balaji<br>

> > <a href="http://www.mcs.anl.gov/~balaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>

> ><br>

> ><br>

><br>

> --<br>

> Pavan Balaji<br>

> <a href="http://www.mcs.anl.gov/~balaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>

><br>

><br>

><br>

<br>

--<br>

Pavan Balaji<br>

<a href="http://www.mcs.anl.gov/~balaji" target="_blank">http://www.mcs.anl.gov/~balaji</a><br>

<br>

</div></div></blockquote></div><br></div>