[mpich-discuss] Can't receive messages

Jeff Hammond jeff.science at gmail.com
Wed Jan 1 18:30:16 CST 2014


On Wed, Jan 1, 2014 at 6:19 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> On Jan 1, 2014, at 5:58 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>> What is this ring you keep referring to?  There’s no ring in hydra.  Setting one up will lead to more trouble than be useful, with inter-proxy connections and such.  I bet that’ll create more queries on this mailing list than help.
>>
>> A ring is the minimum spanning topology required to verify
>> homogeneity.  Feel free to use something less efficient if that's
>> already available in Hydra.  I assume that Hydra proves each process
>> the ability to communicate with _at least_ one process outside its
>> node and that the connections it has the ability to establish have the
>> ability to connect all nodes, otherwise MPI_Init wouldn't be possible.
>
> Hydra does not do any of those connections.  That’s done by the mpich library.
>
> Furthermore, the hydra proxies starting and communicating with mpiexec already assumes homogeneity.  This might not work for heterogeneous platforms at all.  So I can’t setup a ring or any other topology.
>
>> My fundamental conviction is that Hydra could verify homogeneity by
>> exchanging and comparing a magic value created by the hash on the set
>> independent datatype sizes at negligible overhead compared to whatever
>> bootstrap it does already.
>
> No, it can’t.  How will hydra exchange any information efficiently without starting a proxy?  The proxies and mpiexec cannot communicate with each other in a heterogeneous environment.  The only thing that is heterogeneous safe is “ssh” (or some other launcher).  The moment I use ssh to spawn a proxy and expect the proxy to talk to either mpiexec or another proxy, I’m already assuming heterogeneity.
>
> The only thing you can do is have mpiexec ssh to each node, and get the information needed.  This is (1) not scalable, and (2) not viable unless we start distributing a program that gets the datatype information for all datatypes we care about and compare the bits inside mpiexec.

Can't you do this in a tree rather than root-to-all?

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list