<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<!--[if !mso]><style>v\:* {behavior:url(#default#VML);}
o\:* {behavior:url(#default#VML);}
w\:* {behavior:url(#default#VML);}
.shape {behavior:url(#default#VML);}
</style><![endif]--><style><!--
/* Font Definitions */
@font-face
{font-family:"Cambria Math";
panose-1:2 4 5 3 5 4 6 3 2 4;}
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:blue;
text-decoration:underline;}
p.xmsonormal, li.xmsonormal, div.xmsonormal
{mso-style-name:x_msonormal;
margin:0in;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
span.EmailStyle21
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:windowtext;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="blue" vlink="purple" style="word-wrap:break-word">
<div class="WordSection1">
<p class="MsoNormal">Hui,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Please disregard my last message. I got MPICH to build with at newer version of pgc++, namely nvc++.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks<o:p></o:p></p>
<p class="MsoNormal">Kurt<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall@nasa.gov>
<br>
<b>Sent:</b> Tuesday, October 26, 2021 2:55 AM<br>
<b>To:</b> discuss@mpich.org<br>
<b>Cc:</b> Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall@nasa.gov><br>
<b>Subject:</b> Re: Maximum number of inter-communicators?<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Hui,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’m trying to build 4.0a2 with the Portland Group compiler pgc++ 19.5-0. configure seems to finish without problems.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">../mpich-4.0a2/configure -prefix=/home/kmccall/mpich-install-4.0a2 CC=pgcc CXX=pgc++ --with-pbs=/opt/torque --with-device=ch3:nemesis --disable-fortran --with-pm=hydra -enable-debuginfo 2>&1 | tee c.txt<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">but when I run make it ends with this error (actually, many of the same error):<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">/home/kmccall/mpich-build-4.0a2/src/pm/hydra/.libs/libhydra.a(args.o): In function `MPL_gpu_query_pointer_attr':<o:p></o:p></p>
<p class="MsoNormal">/home/kmccall/mpich-4.0a2/src/pm/hydra/mpl/include/mpl_gpu.h:44: multiple definition of `MPL_gpu_query_pointer_attr'<o:p></o:p></p>
<p class="MsoNormal">tools/bootstrap/persist/hydra_persist-persist_server.o:/home/kmccall/mpich-4.0a2/src/pm/hydra/mpl/include/mpl_gpu.h:44: first defined here<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I’ve attached c.txt and m.txt from configure and make. Thanks for any help.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks,<o:p></o:p></p>
<p class="MsoNormal">Kurt <o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<div style="border:none;border-top:solid #E1E1E1 1.0pt;padding:3.0pt 0in 0in 0in">
<p class="MsoNormal"><b>From:</b> Zhou, Hui <<a href="mailto:zhouh@anl.gov">zhouh@anl.gov</a>>
<br>
<b>Sent:</b> Sunday, October 24, 2021 6:46 PM<br>
<b>To:</b> <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
<b>Cc:</b> Mccall, Kurt E. (MSFC-EV41) <<a href="mailto:kurt.e.mccall@nasa.gov">kurt.e.mccall@nasa.gov</a>><br>
<b>Subject:</b> [EXTERNAL] Re: Maximum number of inter-communicators?<o:p></o:p></p>
</div>
</div>
<p class="MsoNormal"><o:p> </o:p></p>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Hi Kurt,<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">There is indeed a limit on maximum number of communicators that you can have, including both intra communicators and inter-communicators. Try free the communicators that you no longer need. In
older version of MPICH, there may be additional limit on how many dynamic processes one can connect. If you still hit crash after making sure there isn't too many simultaneous active communicators, could you try the latest release --
<a href="https://gcc02.safelinks.protection.outlook.com/?url=http%3A%2F%2Fwww.mpich.org%2Fstatic%2Fdownloads%2F4.0a2%2Fmpich-4.0a2.tar.gz&data=04%7C01%7Ckurt.e.mccall%40nasa.gov%7Cf784f0a87c7245e8a5f808d99855e822%7C7005d45845be48ae8140d43da96dd17b%7C0%7C0%7C637708316997034873%7CUnknown%7CTWFpbGZsb3d8eyJWIjoiMC4wLjAwMDAiLCJQIjoiV2luMzIiLCJBTiI6Ik1haWwiLCJXVCI6Mn0%3D%7C1000&sdata=P2D25y9EReF3fLdicTKBU5N1k5tzRtAH2a9ZbOLf3cs%3D&reserved=0">
http://www.mpich.org/static/downloads/4.0a2/mpich-4.0a2.tar.gz</a>, and see if the issue persist?<o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black"><o:p> </o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">-- <o:p></o:p></span></p>
</div>
<div>
<p class="MsoNormal"><span style="font-size:12.0pt;color:black">Hui<o:p></o:p></span></p>
</div>
<div class="MsoNormal" align="center" style="text-align:center">
<hr size="2" width="98%" align="center">
</div>
<div id="divRplyFwdMsg">
<p class="MsoNormal"><b><span style="color:black">From:</span></b><span style="color:black"> Mccall, Kurt E. (MSFC-EV41) via discuss <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Sent:</b> Sunday, October 24, 2021 2:37 PM<br>
<b>To:</b> <a href="mailto:discuss@mpich.org">discuss@mpich.org</a> <<a href="mailto:discuss@mpich.org">discuss@mpich.org</a>><br>
<b>Cc:</b> Mccall, Kurt E. (MSFC-EV41) <<a href="mailto:kurt.e.mccall@nasa.gov">kurt.e.mccall@nasa.gov</a>><br>
<b>Subject:</b> [mpich-discuss] Maximum number of inter-communicators?</span> <o:p>
</o:p></p>
<div>
<p class="MsoNormal"> <o:p></o:p></p>
</div>
</div>
<div>
<div>
<p class="xmsonormal">Hi,<o:p></o:p></p>
<p class="xmsonormal"> <o:p></o:p></p>
<p class="xmsonormal">Based on a paper I read about giving an MPI job some fault tolerance, I’m exclusively connecting my processes with inter-communicators.<o:p></o:p></p>
<p class="xmsonormal">I’ve found that if I increase the number of processes beyond a certain point, many processes don’t get created at all and the whole job
<o:p></o:p></p>
<p class="xmsonormal">crashes. Am I running up against an operating system limit (like the number of open file descriptors – it is set at 1024), or some sort of
<o:p></o:p></p>
<p class="xmsonormal">MPICH limit?<o:p></o:p></p>
<p class="xmsonormal"> <o:p></o:p></p>
<p class="xmsonormal">If it matters, my process architecture (a tree) is as follows: one master process connected to 21 manager processes on 21 other nodes,
<o:p></o:p></p>
<p class="xmsonormal">and each manager connected to 8 worker processes on the manager’s own node. This is the largest job I’ve been able to create
<o:p></o:p></p>
<p class="xmsonormal">without it crashing. Attempting to increase the number of workers beyond 8 results in a crash.<o:p></o:p></p>
<p class="xmsonormal"> <o:p></o:p></p>
<p class="xmsonormal">I’m using MPICH 3.3.2 on Centos 3.10.0. MPICH was compiled with the Portland Group compiler pgc++ 19.5-0.<o:p></o:p></p>
<p class="xmsonormal"> <o:p></o:p></p>
<p class="xmsonormal">Thanks,<o:p></o:p></p>
<p class="xmsonormal">Kurt<o:p></o:p></p>
</div>
</div>
</div>
</body>
</html>