[mpich-discuss] Can't receive messages

Jeff Hammond jeff.science at gmail.com
Wed Jan 1 17:58:29 CST 2014

On Wed, Jan 1, 2014 at 4:34 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> On Jan 1, 2014, at 4:09 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>> 3. Doing an extra ssh to all the nodes is expensive to do every time.
>> You just need a ring to verify homogeneity.  Forming a ring is O(1) per node.
> What is this ring you keep referring to?  There’s no ring in hydra.  Setting one up will lead to more trouble than be useful, with inter-proxy connections and such.  I bet that’ll create more queries on this mailing list than help.

A ring is the minimum spanning topology required to verify
homogeneity.  Feel free to use something less efficient if that's
already available in Hydra.  I assume that Hydra proves each process
the ability to communicate with _at least_ one process outside its
node and that the connections it has the ability to establish have the
ability to connect all nodes, otherwise MPI_Init wouldn't be possible.

My fundamental conviction is that Hydra could verify homogeneity by
exchanging and comparing a magic value created by the hash on the set
independent datatype sizes at negligible overhead compared to whatever
bootstrap it does already.


Jeff Hammond
jeff.science at gmail.com

More information about the discuss mailing list