<html xmlns:v="urn:schemas-microsoft-com:vml" xmlns:o="urn:schemas-microsoft-com:office:office" xmlns:w="urn:schemas-microsoft-com:office:word" xmlns:m="http://schemas.microsoft.com/office/2004/12/omml" xmlns="http://www.w3.org/TR/REC-html40">
<head>
<meta http-equiv="Content-Type" content="text/html; charset=us-ascii">
<meta name="Generator" content="Microsoft Word 15 (filtered medium)">
<style><!--
/* Font Definitions */
@font-face
{font-family:Calibri;
panose-1:2 15 5 2 2 2 4 3 2 4;}
/* Style Definitions */
p.MsoNormal, li.MsoNormal, div.MsoNormal
{margin:0in;
margin-bottom:.0001pt;
font-size:11.0pt;
font-family:"Calibri",sans-serif;}
a:link, span.MsoHyperlink
{mso-style-priority:99;
color:#0563C1;
text-decoration:underline;}
a:visited, span.MsoHyperlinkFollowed
{mso-style-priority:99;
color:#954F72;
text-decoration:underline;}
span.EmailStyle17
{mso-style-type:personal;
font-family:"Calibri",sans-serif;
color:windowtext;}
span.EmailStyle18
{mso-style-type:personal-reply;
font-family:"Calibri",sans-serif;
color:#1F497D;}
.MsoChpDefault
{mso-style-type:export-only;
font-size:10.0pt;}
@page WordSection1
{size:8.5in 11.0in;
margin:1.0in 1.0in 1.0in 1.0in;}
div.WordSection1
{page:WordSection1;}
--></style><!--[if gte mso 9]><xml>
<o:shapedefaults v:ext="edit" spidmax="1026" />
</xml><![endif]--><!--[if gte mso 9]><xml>
<o:shapelayout v:ext="edit">
<o:idmap v:ext="edit" data="1" />
</o:shapelayout></xml><![endif]-->
</head>
<body lang="EN-US" link="#0563C1" vlink="#954F72">
<div class="WordSection1">
<p class="MsoNormal">Hello,<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">We are in the process of transitioning an application from mpich2 1.5 to mpich 3.2.1 . I am seeing different behavior between these two versions in how stdin is handled for processes other than the rank=0 process – I will refer to these
as “slave processes”. I am trying to understand if this behavior is expected, and if so, if there is some way (switch etc.) to revert behavior to match that of mpich 2.x.
<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">The behavior I see in mpich2 is that for these slave processes, if a read occurs in stdin, an “EOF” is returned. For mpich 3.x, when one of these processes reads stdin, it appears to just block on the read. So, it appears as if the stdin
was previously (for mpich2) closed, but is now left open, though not really receiving stdin. I have tried this with mpich 3.2.1 and mpich 3.3.1, and see the same behavior. Below I include a simple test I ran to reproduce the behavior.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I understand that these slave processes should not be reading stdin at all. However, we have a source code that is used in a variety of scenarios, not always as a parallel application, and it can be problematic to just eliminate all cases
where this can occur. We handled the EOF condition gracefully, but not the blocking behavior. So I first just want to understand whether this new behavior in mpich 3.x (blocking on stdin read) is a bug or is expected behavior with mpich 3.x. (And also,
just out of curiousity: why did it change between mpich2 and mpich 3.x?) My guess is this behavior is expected or an artifact of some other change, and the solution is to avoid stdin reads for all processes besides rank=0. But I would like to confirm this
if possible.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">I have read discussions about a couple stdin issues that seem related, but I can’t quite map them to my particular situation, perhaps because I don’t understand the full context of these discussions (e.g., “slurm”):<o:p></o:p></p>
<p class="MsoNormal"><a href="https://lists.mpich.org/mailman/htdig/discuss/2015-July/001104.html">https://lists.mpich.org/mailman/htdig/discuss/2015-July/001104.html</a><o:p></o:p></p>
<p class="MsoNormal"><a href="https://github.com/pmodels/mpich/issues/1782">https://github.com/pmodels/mpich/issues/1782</a><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Below is a simple C program I launched from mpiexec to test this, along with the mpiexec command I use to launch two processes running the test program. Note that it makes no calls to the MPI API, it just tries to read a character from
stdin from each process.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">When I run with mpich2 1.5, what I see from the slave/child process is that reading stdin (getc call) immediately returns EOF. With mpich 3.2.1 or 3.3.1, it appears to block on the call to getc. (And then a subsequent attempt to enter
a character, after the first process exits, causes an mpiexec failure, perhaps because the first process has exited.)<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Here is the mpiexec commands used to run the test (test program executable is:<o:p></o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><MPICH_PATH>/mpich2-1.5/linux/bin/mpiexec -n 1 read_stdin : -n 1 read_stdin –child<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><MPICH_PATH>/mpich-3.3.1/linux/bin/mpiexec -n 1 read_stdin : -n 1 read_stdin -child</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Test program “read_stdin.c” :<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">/***** BEGIN TEST PROGRAM *****/<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">#include <stdio.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">#include <sys/time.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">#include <sys/types.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">#include <unistd.h><o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">/* Simple program that reads line from stdin - for testing mpiexec and stdin<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> redirect (for flps/mc2 issue) */<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">main(int argc, char *argv[])<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">{<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> int stat = 0;<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> pid_t pid = getpid();<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> if ((argc > 1) && (!strcmp(argv[1], "-child"))) {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : child process\n", pid);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> else {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : master process\n", pid);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : Enter char: \n", pid);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> stat = getc(stdin);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> if (stat == EOF) {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : getc returned EOF\n", pid);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> else {<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : getc returned %c\n", pid, (char) stat);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> }<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fprintf(stdout, "pid=%d : exiting\n", pid);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""> fflush(stdout);<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">}<o:p></o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New""><o:p> </o:p></span></p>
<p class="MsoNormal"><span style="font-family:"Courier New"">/***** END TEST PROGRAM *****/</span><o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Thanks in advance for any information about this.<o:p></o:p></p>
<p class="MsoNormal"><o:p> </o:p></p>
<p class="MsoNormal">Devon Kehoe<o:p></o:p></p>
</div>
</body>
</html>