[mpich-discuss] Can't receive messages

Pavan Balaji balaji at mcs.anl.gov
Wed Jan 1 13:50:30 CST 2014


On Jan 1, 2014, at 1:25 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> Isn't the process manager infrastructure hetero-safe? At the very least, ssh-ing "uname -a" around the ring of procs can identify the problem on O(1) cost. 

1. The process manager not heterogeneous-safe.

2. "uname -a" is not an accurate representation, since two different Linux distributions on the same architecture are not considered heterogeneous.  The datatype representation has to be different.

3. Doing an extra ssh to all the nodes is expensive to do every time.

  — Pavan

--
Pavan Balaji
http://www.mcs.anl.gov/~balaji




More information about the discuss mailing list