[mpich-discuss] valgrind warning with ROMIO hint file

Audet, Martin Martin.Audet at cnrc-nrc.gc.ca
Fri Dec 22 13:26:27 CST 2017


Hi Eric,

At the beginning I thought that your issue was related but different.

But now like you I think it is the same.

When I ran your mpich_mpiio_file_descriptor_bug.c program with valgrind compiled with the unmodified mpich 3.2.1 (ch3:sock device), I get the same warning message about read() using an invalid file descriptor (-1):

$ mpicc mpich_mpiio_file_descriptor_bug.c
$ valgrind ./a.out
==118187== Memcheck, a memory error detector
==118187== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==118187== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==118187== Command: ./a.out
==118187==
==118187== Warning: invalid file descriptor -1 in syscall read()
==118187==
==118187== HEAP SUMMARY:
==118187==     in use at exit: 0 bytes in 0 blocks
==118187==   total heap usage: 558 allocs, 527 frees, 21,101,877 bytes allocated
==118187==
==118187== All heap blocks were freed -- no leaks are possible
==118187==
==118187== For counts of detected and suppressed errors, rerun with: -v
==118187== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

However when I compile your program with a modified version of mpich 3.2.1 (same ch3:sock device) to avoid trying to read the romio configuration file if the file descriptor is invalid (the one line patch I shown in my original message), the valgrind warning about invalid file descriptor with read() disappear.

If we use the strace command to investigate what is happening (this command shows the system calls of another command and its return code) we see that with the unmodified mpich 3.2.1 (at the end of the long system call list):

$ mpicc mpich_mpiio_file_descriptor_bug.c
$ strace ./a.out

open("/etc/romio-hints", O_RDONLY)      = -1 ENOENT (No such file or directory)
read(-1, 0x1ae2890, 4096)               = -1 EBADF (Bad file descriptor)
brk(NULL)                               = 0x1ae7000
brk(0x1b14000)                          = 0x1b14000
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f5f60e6a000
umask(022)                              = 027
umask(027)                              = 022
open("touch.txt", O_RDONLY)             = 4
close(4)                                = 0
munmap(0x7f5f60e6a000, 16781312)        = 0
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
close(3)                                = 0
munmap(0x7f5f61e6b000, 2756608)         = 0
munmap(0x7f5f630b4000, 1314816)         = 0
exit_group(0)                           = ?

We see that the problematic -1 file descriptor is passed to read() for reading the ROMIO configuration file "/etc/romio-hints" (which doesn't exist on my machine), not to a system call associated with your "touch.txt" file.

If instead we compile with my patched version of mpich 3.2.1, the problematic read() is gone:

$ mpicc mpich_mpiio_file_descriptor_bug.c
$ strace ./a.out

open("/etc/romio-hints", O_RDONLY)      = -1 ENOENT (No such file or directory)
brk(NULL)                               = 0x1297000
brk(0x12c4000)                          = 0x12c4000
mmap(NULL, 16781312, PROT_READ|PROT_WRITE, MAP_PRIVATE|MAP_ANONYMOUS, -1, 0) = 0x7f16ddffb000
umask(022)                              = 027
umask(027)                              = 022
open("touch.txt", O_RDONLY)             = 4
close(4)                                = 0
munmap(0x7f16ddffb000, 16781312)        = 0
fcntl(3, F_GETFL)                       = 0x802 (flags O_RDWR|O_NONBLOCK)
fcntl(3, F_SETFL, O_RDWR)               = 0
close(3)                                = 0
munmap(0x7f16deffc000, 2756608)         = 0
munmap(0x7f16e0245000, 1314816)         = 0
exit_group(0)                           = ?
+++ exited with 0 +++

And of course valgrind doesn't complain anymore:

$ valgrind ./a.out
==117748== Memcheck, a memory error detector
==117748== Copyright (C) 2002-2015, and GNU GPL'd, by Julian Seward et al.
==117748== Using Valgrind-3.12.0 and LibVEX; rerun with -h for copyright info
==117748== Command: ./a.out
==117748==
==117748==
==117748== HEAP SUMMARY:
==117748==     in use at exit: 0 bytes in 0 blocks
==117748==   total heap usage: 558 allocs, 527 frees, 21,101,877 bytes allocated
==117748==
==117748== All heap blocks were freed -- no leaks are possible
==117748==
==117748== For counts of detected and suppressed errors, rerun with: -v
==117748== ERROR SUMMARY: 0 errors from 0 contexts (suppressed: 0 from 0)

I think the MPICH_Developers should apply my patch (or something equivalent).

Joyeuses fêtes...

Martin Audet


-----Original Message-----
From: Eric Chamberland [mailto:Eric.Chamberland at giref.ulaval.ca] 
Sent: December 21, 2017 14:36
To: discuss at mpich.org
Cc: Audet, Martin
Subject: Re: [mpich-discuss] valgrind warning with ROMIO hint file

Hi,

we have the same issue here...

Reported this 3 days ago:

https://github.com/pmodels/mpich/issues/2894

Eric

On 19/12/17 07:30 PM, Audet, Martin wrote:
> MPI_Developers,
> It seems that when a program using mpich uses MPI-IO, it try first to 
> open a hint file specified in the ROMIO_HINTS environment variable (or 
> "/etc/romio-hints" if it is undefined) and then try to read it. The 
> problem is that it will try to read from it with the file descriptor 
> returned by
> open() even if the open() failed and returned a -1 (ex: file not found).
> When this happens read() also fails and also return -1.
> The problem is when the program is run under valgrind (with the 
> valgrind friendly ch3:sock device), valgrind notice that an invalid 
> file descriptor was passed to read() and complain with a warning:
> ==113473== Memcheck, a memory error detector ==113473== Copyright (C) 
> 2002-2015, and GNU GPL'd, by Julian Seward et al.
> ==113473== Using Valgrind-3.12.0 and LibVEX; rerun with -h for 
> copyright info ==113473== Command: ...
> ==113473== Parent PID: 48694
> ==113473==
> ==113473== Warning: invalid file descriptor -1 in syscall read() 
> ==113473== ==113473== HEAP SUMMARY:
> ==113473==     in use at exit: 0 bytes in 0 blocks
> ==113473==   total heap usage: 4,420 allocs, 4,209 frees, 72,241,561 
> bytes allocated
> ==113473==
> ==113473== All heap blocks were freed -- no leaks are possible 
> ==113473== ==113473== For counts of detected and suppressed errors, 
> rerun with: -v ==113473== ERROR SUMMARY: 0 errors from 0 contexts 
> (suppressed: 0 from 0) The problem is that no matter how clean is the 
> program, this warning will appear in its valgrind report. A developer 
> may think that his own code is problematic and start investigating...
> The solution is of course not trying to read the file if the file 
> descriptor is invalid. This could be done simply as follow (mpich 3.2.1):
> --- mpich-3.2.1/src/mpi/romio/adio/common/system_hints.c
> +++ mpich-3.2.1-patch/src/mpi/romio/adio/common/system_hints.c
> @@ -98,7 +98,7 @@
>       buffer = (char *)ADIOI_Calloc(HINTFILE_MAX_SIZE, sizeof (char));
>       if (rank == 0) {
> -       ret = read(fd, buffer, HINTFILE_MAX_SIZE);
> +       ret = (fd >= 0) ? read(fd, buffer, HINTFILE_MAX_SIZE) : -1;
>          /* any error: bad/nonexistent fd, no perms, anything: set up a null
>           * buffer and the subsequent string parsing will quit 
> immediately */
>          if (ret == -1)
> Could you consider applying this patch (or an equivalent one) to mpich 
> ?  It would be much appreciated.
> Thanks in advance,
> Martin Audet
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list