[mpich-discuss] gcc 6.0 miscompiles mpich ?

VandeVondele Joost joost.vandevondele at mat.ethz.ch
Thu Apr 7 01:47:04 CDT 2016


Hi Halim, Jeff,

just to clarify, current gcc trunk is in stage 4 (last regression fixing phase before release, https://gcc.gnu.org/ml/gcc/2016-03/msg00108.html), which is the last moment to report bugs in the hope of getting them fixed before release.  I ran into this issue as I have started extensive prerelease gcc testing for our project (cp2k), including a recompilation of all dependencies, which includes mpich. I'm actually also gcc maintainer (mostly Fortran related), so I'm used to file the bug reports, and I know that a good testcase will be the starting point.

The issue is in mpich, as the recompilation with the mentioned options is of the mpich libraries only, the user binary remains the same (dynamically linked).

So, what I'm hoping is that somebody on the mpich team could try to reproduce this to analyze this in more depth.

Thanks,

Joost



________________________________________
From: Halim Amer [aamer at anl.gov]
Sent: Wednesday, April 06, 2016 9:26 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] gcc 6.0 miscompiles mpich ?

 > If changing compiler flags alone causes MPICH to go from working to
 > nonworking, it's hard not to blame the compiler.

It won't be the first time a bug shows-up/disappears when changing
compiler flags, and many times the compiler wasn't the culprit. My point
is that we need more information to isolate the problem. If the bug is
in the user code or MPICH, we shouldn't bother the gcc folks.

--Halim
www.mcs.anl.gov/~aamer

On 4/6/16 12:31 PM, Jeff Hammond wrote:
>
>
> On Wed, Apr 6, 2016 at 9:32 AM, Halim Amer <aamer at anl.gov
> <mailto:aamer at anl.gov>> wrote:
>
>     This information is insufficient to blame the compiler.
>
>
> This statement sure seems sufficient to me:
>
> "mpich gets miscompiled when building with CFLAGS='-O2 -ftree-vectorize
> -g' (but not with just -O2 -g or -O3 -g)."
>
> If changing compiler flags alone causes MPICH to go from working to
> nonworking, it's hard not to blame the compiler.
>
> Of course, if the app flags changed, then all bets are off.
>
> Anyways, I concur that using the latest development tree of a compiler
> isn't a good idea.  I'm sure the GCC folks would readily admit that
> there are bugs in the trunk.  It never hurts to file this bug, but if
> there is no MCVE, it will be hard for anyone to make progress.
>
> Jeff
>
>     We don't know if the user (application code), MPICH, or gcc is doing
>     something wrong.
>
>     First, you need to use a stable compiler version (not trunk) and a
>     recent MPICH (3.2 not the old 3.1 version).
>
>     Second, you need to make sure the user code is not doing something
>     wrong (e.g. MPI_Alltoall could have gotten an invalid buffer address
>     or count). If you send us a toy program that reproduces the problem,
>     we could help with this.
>
>     --Halim
>     www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/~aamer>
>
>
>     On 4/6/16 10:24 AM, VandeVondele  Joost wrote:
>
>         Hi,
>
>         when compiling mpich-3.1.2 using gcc trunk (which will be gcc
>         6.0 in a
>         couple of days/weeks), I noticed mpich gets miscompiled when
>         building
>         with CFLAGS='-O2 -ftree-vectorize -g' (but not with just -O2 -g
>         or -O3 -g).
>
>         I believe that the miscompilation happens with the macro
>         MPIDI_Request_create_sreq since a typical trace looks like:
>
>         Program received signal SIGSEGV: Segmentation fault - invalid memory
>         reference.
>
>         Backtrace for this error:
>         #0  0x33ac83269f in ???
>         #1  0x7f3743953d69 in MPIDI_Isend_self
>               at src/mpid/ch3/src/mpidi_isend_self.c:34
>         #2  0x7f374394dfa3 in MPID_Isend
>               at src/mpid/ch3/src/mpid_isend.c:55
>         #3  0x7f374390793a in MPIC_Isend
>               at src/mpi/coll/helper_fns.c:646
>         #4  0x7f37438628c5 in MPIR_Alltoall_intra
>               at src/mpi/coll/alltoall.c:520
>         #5  0x7f374386376e in MPIR_Alltoall
>               at src/mpi/coll/alltoall.c:726
>         #6  0x7f3743863827 in MPIR_Alltoall_impl
>               at src/mpi/coll/alltoall.c:761
>         #7  0x7f3743863e7a in PMPI_Alltoall
>               at src/mpi/coll/alltoall.c:884
>         #8  0x7f37424181bf in pmpi_alltoall_
>               at src/binding/fortran/mpif_h/alltoallf.c:272
>
>         I don't know mpich well enough to extract a testcase, but it
>         would be
>         great if somebody could extract that and report to the gcc team (I'm
>         happy to do the reporting if needed).
>
>         thanks in advance,
>
>         Joost
>
>
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list