<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN"><html><head>
<meta http-equiv="Content-Type" content="text/html; charset=iso-8859-1">
</head>
<body bgcolor="#ffffff" text="#000000">
Jan Bierbaum wrote:
<blockquote cite="mid:557EC5FF.1010600@tudos.org" type="cite">
<pre wrap="">Hi Marvin!
</pre>
<blockquote type="cite">
<pre wrap="">Essentially, until you call MPI_Init_thread, the Linux socket command
will return 0 for all ranks except rank 0. This seems to me like a
problem.
</pre>
</blockquote>
<pre wrap=""><!---->Indeed. 'socket' should not return 0. Quoting from its manpage:
| On success, a file descriptor for the new socket is returned. On
| error, -1 is returned, and errno is set appropriately.
FD 0 is already taken by STDIN, so you should never get that return
value unless you close STDIN first.
</pre>
</blockquote>
<br>
The above statement is not necessarily true. I have worked on systems,
namely Crays, where stdin, stdout, and stderr were all closed when the
program starts, so calling socket() three times in a row would return
0, 1 and 2. MRNet (which we use in our product) contains the following
code that is executed during startup:<br>
<br>
<tt> // Make sure any data sockets we create have descriptors !=
{1,2}<br>
// to avoid nasty bugs where fprintf(stdout/stderr) writes to data
sockets.<br>
int fd;<br>
while( (fd = socket( AF_INET, SOCK_STREAM, 0 )) <= 2 ) {<br>
/* Code for closing descriptor on exec*/<br>
int fdflag = fcntl(fd, F_GETFD );<br>
if( fdflag == -1 )<br>
{<br>
// failed to retrive the fd flags<br>
fprintf(stderr, "F_GETFD failed!\n");<br>
}<br>
int fret = fcntl( fd, F_SETFD, fdflag | FD_CLOEXEC );<br>
if( fret == -1 )<br>
{<br>
// we failed to set the fd flags<br>
fprintf(stderr, "F_SETFD failed!\n");<br>
}<br>
<br>
if( fd == -1 ) break;<br>
} <br>
if( fd > 2 ) XPlat::SocketUtils::Close(fd);<br>
</tt><br>
Of course the irony above is that if it hits an error, it prints to
stderr :-)<br>
<br>
Cheers, John D.<br>
<blockquote cite="mid:557EC5FF.1010600@tudos.org" type="cite">
<pre wrap="">
</pre>
<blockquote type="cite">
<pre wrap="">Given the following code sample...
</pre>
</blockquote>
<pre wrap=""><!---->[...]
</pre>
<blockquote type="cite">
<pre wrap="">*The application returns output this with mpirun.
</pre>
</blockquote>
<pre wrap=""><!---->Works for me:
| $ mpirun -np 2 ./test-socket
| Pre Socket: 10
| MPI_Init
| Pre Socket: 6
| MPI_Init
| Post Socket: 14
| Post Socket: 15
|
| $ mpirun -np 3 ./test-socket
| Pre Socket: 10
| MPI_Init
| Pre Socket: 6
| MPI_Init
| Pre Socket: 6
| MPI_Init
| Post Socket: 14
| Post Socket: 15
| Post Socket: 10
</pre>
<blockquote type="cite">
<pre wrap="">I am running this code sample using Red-Hat Enterprise Linux 7.1 with
mpich
</pre>
</blockquote>
<pre wrap=""><!---->[...]
</pre>
<blockquote type="cite">
<pre wrap=""> Version: 3.0.4
</pre>
</blockquote>
<pre wrap=""><!---->For me it's MPICH 3.1.4 on Debian 8 and I get similar results with MPICH
3.1rc on a local cluster. Taking into account the weird return value you
get from 'socket' ... maybe your system is the problem.
Regards, Jan
_______________________________________________
discuss mailing list <a class="moz-txt-link-abbreviated" href="mailto:discuss@mpich.org">discuss@mpich.org</a>
To manage subscription options or unsubscribe:
<a class="moz-txt-link-freetext" href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a>
</pre>
</blockquote>
</body>
</html>