[mpich-discuss] Can't receive messages
Jeff Hammond
jeff.science at gmail.com
Wed Jan 1 17:58:29 CST 2014
On Wed, Jan 1, 2014 at 4:34 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
>
> On Jan 1, 2014, at 4:09 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>> 3. Doing an extra ssh to all the nodes is expensive to do every time.
>>
>> You just need a ring to verify homogeneity. Forming a ring is O(1) per node.
>
> What is this ring you keep referring to? There’s no ring in hydra. Setting one up will lead to more trouble than be useful, with inter-proxy connections and such. I bet that’ll create more queries on this mailing list than help.
A ring is the minimum spanning topology required to verify
homogeneity. Feel free to use something less efficient if that's
already available in Hydra. I assume that Hydra proves each process
the ability to communicate with _at least_ one process outside its
node and that the connections it has the ability to establish have the
ability to connect all nodes, otherwise MPI_Init wouldn't be possible.
My fundamental conviction is that Hydra could verify homogeneity by
exchanging and comparing a magic value created by the hash on the set
independent datatype sizes at negligible overhead compared to whatever
bootstrap it does already.
Jeff
--
Jeff Hammond
jeff.science at gmail.com
More information about the discuss
mailing list