[mpich-discuss] stdin behavior with mpich / mpiexec 3.x

Kehoe, Devon devon_kehoe at mentor.com
Tue Jul 16 16:01:53 CDT 2019


We are in the process of transitioning an application from mpich2 1.5 to mpich 3.2.1 .   I am seeing different behavior between these two versions in how stdin is handled for processes other than the rank=0 process - I will refer to these as "slave processes".   I am trying to understand if this behavior is expected, and if so, if there is some way (switch etc.) to revert behavior to match that of mpich 2.x.

The behavior I see in mpich2 is that for these slave processes, if a read occurs in stdin, an "EOF" is returned.   For mpich 3.x, when one of these processes reads stdin, it appears to just block on the read.  So, it appears as if the stdin was previously (for mpich2) closed, but is now left open, though not really receiving stdin.   I have tried this with mpich 3.2.1 and mpich 3.3.1, and see the same behavior.   Below I include a simple test I ran to reproduce the behavior.

I understand that these slave processes should not be reading stdin at all.  However, we have a source code that is used in a variety of scenarios, not always as a parallel application, and it can be problematic to just eliminate all cases where this can occur.  We handled the EOF condition gracefully, but not the blocking behavior.  So I first just want to understand whether this new behavior in mpich 3.x (blocking on stdin read) is a bug or is expected behavior with mpich 3.x.   (And also, just out of curiousity: why did it change between mpich2 and mpich 3.x?)  My guess is this behavior is expected or an artifact of some other change, and the solution is to avoid stdin reads for all processes besides rank=0.  But I would like to confirm this if possible.

I have read discussions about a couple stdin issues that seem related, but I can't quite map them to my particular situation, perhaps because I don't understand the full context of these discussions (e.g., "slurm"):

Below is a simple C program I launched from mpiexec to test this, along with the mpiexec command I use to launch two processes running the test program.  Note that it makes no calls to the MPI API, it just tries to read a character from stdin from each process.

When I run with mpich2 1.5, what I see from the slave/child process is that reading stdin (getc call) immediately returns EOF.  With mpich 3.2.1 or 3.3.1, it appears to block on the call to getc.  (And then a subsequent attempt to enter a character, after the first process exits, causes an mpiexec failure, perhaps because the first process has exited.)

Here is the mpiexec commands used to run the test (test program executable is:
<MPICH_PATH>/mpich2-1.5/linux/bin/mpiexec  -n 1 read_stdin : -n 1 read_stdin -child
<MPICH_PATH>/mpich-3.3.1/linux/bin/mpiexec -n 1 read_stdin   : -n 1 read_stdin -child

Test program "read_stdin.c" :

/*****   BEGIN TEST PROGRAM *****/

#include <stdio.h>
#include <sys/time.h>
#include <sys/types.h>
#include <unistd.h>

/* Simple program that reads line from stdin - for testing mpiexec and stdin
   redirect (for flps/mc2 issue) */

main(int argc, char *argv[])
    int stat = 0;
     pid_t pid = getpid();

     if ((argc > 1) && (!strcmp(argv[1], "-child"))) {
           fprintf(stdout, "pid=%d : child process\n", pid);
     else {
           fprintf(stdout, "pid=%d : master process\n", pid);

     fprintf(stdout, "pid=%d : Enter char: \n", pid);

     stat = getc(stdin);

     if (stat == EOF) {
           fprintf(stdout, "pid=%d : getc returned EOF\n", pid);
     else {
        fprintf(stdout, "pid=%d : getc returned %c\n", pid, (char) stat);

     fprintf(stdout, "pid=%d : exiting\n", pid);

/*****   END TEST PROGRAM *****/

Thanks in advance for any information about this.

Devon Kehoe
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20190716/56fb50bb/attachment.html>

More information about the discuss mailing list