[mpich-discuss] Can't receive messages

Jeff Hammond jeff.science at gmail.com
Wed Jan 1 17:58:29 CST 2014


On Wed, Jan 1, 2014 at 4:34 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> On Jan 1, 2014, at 4:09 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>> 3. Doing an extra ssh to all the nodes is expensive to do every time.
>>
>> You just need a ring to verify homogeneity.  Forming a ring is O(1) per node.
>
> What is this ring you keep referring to?  There’s no ring in hydra.  Setting one up will lead to more trouble than be useful, with inter-proxy connections and such.  I bet that’ll create more queries on this mailing list than help.

A ring is the minimum spanning topology required to verify
homogeneity.  Feel free to use something less efficient if that's
already available in Hydra.  I assume that Hydra proves each process
the ability to communicate with _at least_ one process outside its
node and that the connections it has the ability to establish have the
ability to connect all nodes, otherwise MPI_Init wouldn't be possible.

My fundamental conviction is that Hydra could verify homogeneity by
exchanging and comparing a magic value created by the hash on the set
independent datatype sizes at negligible overhead compared to whatever
bootstrap it does already.

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com



More information about the discuss mailing list