[mpich-discuss] Can't receive messages

Jeff Hammond jeff.science at gmail.com
Wed Jan 1 16:09:48 CST 2014



Sent from my iPhone

> On Jan 1, 2014, at 1:50 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> 
> 
>> On Jan 1, 2014, at 1:25 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> Isn't the process manager infrastructure hetero-safe? At the very least, ssh-ing "uname -a" around the ring of procs can identify the problem on O(1) cost.
> 
> 1. The process manager not heterogeneous-safe.

Is it possible to get the proc table sufficient for 3 though?

> 2. "uname -a" is not an accurate representation, since two different Linux distributions on the same architecture are not considered heterogeneous.  The datatype representation has to be different.
> 

Well query whatever you need to query then. MPICH could hash all the datatypes sizes during configuration and store that value somewhere useful. 

> 3. Doing an extra ssh to all the nodes is expensive to do every time.
> 

You just need a ring to verify homogeneity.  Forming a ring is O(1) per node. 

Jeff 

>  — Pavan
> 
> --
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list