<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40"><head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 14 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri","sans-serif";}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:purple;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal-compose;
font-family:"Calibri","sans-serif";
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-family:"Calibri","sans-serif";}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple">
<div class="WordSection1">
<p class="MsoNormal">Hi,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I installed MPICH on a cluster deployed in a virtual environment. After that I spent a lot of time trying to figure out why it doesn’t work.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m new to MPICH (and even to MPI). Hence my suggestions about the system functioning may contradict to the real world. But looks like in my case the problem has the following roots:<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">1) Looks like MPICH nodes use something similar to ‘hostname –i’ to discover their own IPs<o:p></o:p></p>
<p class="MsoNormal">2) If several IP aliases exist for a single host name, MPICH picks the first of them<o:p></o:p></p>
<p class="MsoNormal">3) Order of IP aliases resembles their order in ‘/etc/hosts/’ list<o:p></o:p></p>
<p class="MsoNormal">4) VMWare hypervisor automatically adds loop back alias for the host name (when the cluster is deployed by cloning of a single node). And this alias is added as a first line in ‘/etc/hosts’<o:p></o:p></p>
<p class="MsoNormal">5) All of the above means that MPICH discovers loop back IP in my case.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The problem disappeared when I removed loop back alias from ‘/etc/hosts/’.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Please, change IP discovery process to avoid loop back IP. Or add an explicit requirement to documentation to avoid loop back aliases in ‘/etc/hosts’.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal"> --Andrey<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><a href="https://twitter.com/Andrey_Nevolin">@Andrey_Nevolin</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
</div>
</body>
</html>