[mpich-discuss] Can't receive messages
balaji at mcs.anl.gov
Wed Jan 1 13:50:30 CST 2014
On Jan 1, 2014, at 1:25 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
> Isn't the process manager infrastructure hetero-safe? At the very least, ssh-ing "uname -a" around the ring of procs can identify the problem on O(1) cost.
1. The process manager not heterogeneous-safe.
2. "uname -a" is not an accurate representation, since two different Linux distributions on the same architecture are not considered heterogeneous. The datatype representation has to be different.
3. Doing an extra ssh to all the nodes is expensive to do every time.
More information about the discuss