[mpich-discuss] Possible integer overflows at scale in gather.c

Thu Oct 1 13:09:36 CDT 2015

I would be happy to continue the discussion too. It's just that since 
I'm new here I don't want to violate the rules (if there are). I know 
that non mpich topics should be discussed in stack-overflow and places 
like that. If there are more questions, just let me know.

Ignacio

On 10/1/15 10:50 AM, Jeff Hammond wrote:
> I would be happy to see this discussion continue on the MPICH devel list
> even if it gets into LLVM weeds.  Tools for ensuring count-safety are
> vital to MPICH and related efforts in the MPI community, and I think
> many of the people who are subscribed to devel at mpich.org
> <mailto:devel at mpich.org> would be interested in this topic.  At the very
> least, they should tolerate it, since everyone on that list knows how to
> use email filters ;-)
>
> Best,
>
> Jeff
>
> On Thu, Oct 1, 2015 at 10:35 AM, Ignacio Laguna <lagunaperalt1 at llnl.gov
> <mailto:lagunaperalt1 at llnl.gov>> wrote:
>
>     Rob,
>
>     Thanks for replying! I'm answering your questions inline below.
>
>     On 10/1/15 9:15 AM, Rob Latham wrote:
>
>
>
>         On 09/30/2015 04:08 PM, Ignacio Laguna wrote:
>
>             Hi,
>
>             I found some 'potential' integer overflows when running
>             mpich at large
>             scale and/or with large inputs in gather.c. I believe that
>             they are
>             related to ticket #1767
>             (http://trac.mpich.org/projects/mpich/ticket/1767), but I
>             didn't see any
>             other bug reports about them so I thought I should confirm
>             with mpich
>             developers.
>
>             In addition to this case in line 171:
>
>                   98        int tmp_buf_size, missing;
>             ...
>                  169        if (nbytes < MPIR_CVAR_GATHER_VSMALL_MSG_SIZE)
>             tmp_buf_size++;
>                  170
>                  171        tmp_buf_size *= nbytes;
>
>             which I believe it's fixed in mpich-3.2b4 (where tmp_buf_size is
>             declared as a 64-bit MPI_Aint),
>
>
>         I fixed this (I sure hope!) over the summer in commit
>         68f8c7aa798f7009
>
>
>     Yes, this is fixed (according to my analysis), so don't worry, I
>     just used it as a reference.
>
>
>             I found the following cases in the same
>             file src/mpi/coll/gather.c (in both mpich-3.1.4 and
>             mpich-3.2b4):
>
>             Case 1:
>
>                  222    mpi_errno = MPIC_Recv(((char *)recvbuf +
>                  223                (((rank + mask) %
>             comm_size)*recvcount*extent)),
>                  224                recvblks * recvcount, recvtype, src,
>                  225                MPIR_GATHER_TAG, comm,
>                  226                &status, errflag);
>
>             In line 223 I believe we get an integer overflow as follows.
>             Suppose I
>             run 2^20 = 1,048,576 ranks and do a gather with 4,096
>             elements. In this
>             case (if I understand the algorithm well), ((rank + mask) %
>             comm_size)
>             would be 2^20 / 2 = 524,288, and recvcount = 4,096. Then the
>             ((rank +
>             mask) % comm_size)*recvcount expression would overflow:
>             524,288 * 4,096
>             = 2,147,483,648, and become negative.
>
>
>         We're doing address math here, so I hope the compiler is not
>         trying to
>         store intermediate results in an integer.  However, if LLVM is
>         saying
>         otherwise, then sure, let's promote those types.
>
>            ((rank + mask) % comm_size)*recvcount is of type int*int,
>         which is
>         then multiplied by an MPI_Aint... yeah, looks suspicious.
>
>
>     Even when they are intermediate operations, the compiler could place
>     the result of ((rank + mask) % comm_size)*recvcount in an 32-bit
>     integer register. In fact, in llvm with -O0 optimization level
>     that's what it does. This is the portion of the code in llvm IR:
>
>        %mul220 = mul nsw i32 %rem219.pre, %recvcount
>        %conv221 = sext i32 %mul220 to i64
>        %mul222 = mul nsw i64 %conv221, %extent.1851
>        %add.ptr223 = getelementptr inbounds i8* %recvbuf, i64 %mul222
>
>     In the first line it multiplies the result of ((rank + mask) %
>     comm_size) with recvcount. Note that this is a 32-bit multiplication
>     (it says i32). Then in the 2nd line it extends the result to a
>     64-bit register (unfortunately at this point %mul220 is already
>     wrapped around) and stores it in %conv221. Then it multiplies it by
>     %extent, and then gets an element of the array (with getelementptr).
>
>     I'm not sure if this would be a problem with higher optimization
>     levels, though, but we should not rely on that.
>
>     I also verified these things are propagated to actual machine
>     registers by looking at the assembly code. Here's an example:
>
>     $ cat test.c
>     #include <stdio.h>
>     int main()
>     {
>        int val1, val2;
>        long long tmp, final;
>        val1 = (1 << 20) / 2;
>        val2 = 4096;
>        tmp = 4;
>        final = val1 * val2 * tmp;
>        printf("val1: %d, val2: %d, final: %lld\n", val1, val2, final);
>        return 0;
>     }
>
>     I compiled it with clang and -O0 and this is what I get when I run it:
>
>     $ ./test
>     val1: 524288, val2: 4096, final: -8589934592
>
>     This is the assembly code in my Mac:
>
>     $ otool -vVt test
>     test:
>     (__TEXT,__text) section
>     _main:
>     0000000100000f00        pushq   %rbp
>     0000000100000f01        movq    %rsp, %rbp
>     0000000100000f04        subq    $0x30, %rsp
>     0000000100000f08        leaq    0x6f(%rip), %rdi
>     0000000100000f0f        movl    $0x0, -0x4(%rbp)
>     0000000100000f16        movl    $0x80000, -0x8(%rbp)
>     0000000100000f1d        movl    $0x1000, -0xc(%rbp)
>     0000000100000f24        movq    $0x4, -0x18(%rbp)
>     0000000100000f2c        movl    -0x8(%rbp), %eax
>     0000000100000f2f        imull   -0xc(%rbp), %eax
>     0000000100000f33        movslq  %eax, %rcx
>     0000000100000f36        imulq   -0x18(%rbp), %rcx
>
>     The first multiplication with imull is stored in %eax, and then
>     second one in %rcx, but at this point %eax already has an overflow.
>
>
>         There are quite a few places where we do math on recvcount and
>         recvblocks -- does your static analysis show problems with all
>         of those
>         spots?
>
>
>     I checked all MPI calls in mpich-3.2b4 and the analysis doesn't
>     report anything else. Now, this doesn't mean there are no more
>     bugs--just that it could not find anything else :-)
>
>             When multiplied with 'extent', which is size_t or MPI_Aint,
>             it will
>             become negative I believe or a huge positive which in any
>             case will
>             point to the wrong location in the recvbuf buffer, unless of
>             course this
>             wraparound behavior is intended.
>
>
>         Nope, wraparound sure is not intended!  We would have no hope of
>         cleaning up all these locations if it were not for clang's
>         -Wshorten64-to-32 flag.
>
>
>     Thanks! I was just double checking. In some programs wraparounds are
>     actually intended.
>
>             Case 2:
>
>             There might be a similar problem in line 224 in the above
>             code. With
>             2^20 ranks, recvblks becomes 524,288 (again if I understand
>             well the
>             algorithm), so the recvblks * recvcount operation will also
>             overflow.
>
>             I might be wrong on this -- I'm catching these issues with
>             LLVM symbolic
>             analysis -- so they can be totally false positives,
>
>
>         Tell me more about LLVM symbolic analysis!  We have started using
>         coverity for static analysis but yes, there are a lot of messages
>         obscuring the genuine bugs.
>
>
>     I can tell you more in a separate email (I know this mailing list is
>     only for mpich discussions).
>
>     Basically we are looking for scale-dependent bugs, i.e., bugs that
>     will only manifest at large scale and/or with large inputs. It does
>     data-flow analysis to understand what code is dependent on things
>     like the size of the problem (in mpich and/or MPI apps this is
>     basically code that depends on the size of the communicators, for
>     example). It then performs tests to see if things could go wrong at
>     large scale, e.g., by replacing ranks = 1 million values and so on.
>     And then it reports what line of the code and file the problem is.
>
>     Having said that, coverity should find much more bugs. This is very
>     specialized for scale-dependent bugs.
>
>
>             but I just wanted to
>             check with the mpich developers if they are valid issues or
>             not. If they
>             are, I believe fixes can be easy to implement (just make all
>             these
>             computations size_t).
>
>
>         I promoted the prototype for MPIC_Recv to this:
>
>         int MPIC_Recv(void *buf, MPI_Aint count,
>         MPI_Datatype datatype, int source, int tag,
>         MPID_Comm *comm_ptr, MPI_Status *status,
>         MPIR_Errflag_t *errflag)
>
>         Top-level MPI routines are stuck using an 'int' count, but
>         internally we
>         can (and now do) use a larger MPI_Aint type.
>
>         True, int*int can overflow, but since that parameter to
>         MPIC_Recv is of
>         the larger MPI_Aint type, we're ok here.   LLVM disagrees?
>
>
>     Once you have a version I can download with the patches, I can run
>     the analysis again and see if I can catch more things. Just let me
>     know and I would be happy to test it.
>
>     In general, overflows can also occur in intermediate computations,
>     depending on how code is optimized, I believe.
>
>     Thanks!
>
>     Ignacio
>
>         ==rob
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss