[mpich-discuss] Possible integer overflows at scale in gather.c

Ignacio Laguna lagunaperalt1 at llnl.gov
Thu Oct 1 16:51:21 CDT 2015


Rob, I agree, debugging these issues without the appropriate tools is 
close to impossible. Okay now that I'm allowed to speak more about let 
me elaborate :-D

This started as a compiler analysis tool to catch problems before codes 
are tested at large scale or with large input sets. We see these 
problems again again and users spend a huge amount of time (sometimes 
months) debugging them. We are trying to see if the community has 
interest on these tools, and if so, we could potentially release them.

Basically the analysis detects integer operations that depend on the 
scale/input size. For example, it detects if there are loops like this:
for (i=0; i < SCALE; ++i)
{
...
}
where SCALE can be a variable that comes from MPI_Comm_size or from the 
communicator size itself (e.g., in the case of gather.c), and if so it 
instruments the instructions there and save the resulting values. Then 
it runs the program at various small scales (e.g., 4, 8, 16) and then 
predicts if the operations will overflow say with 2^20 ranks.

This is something you should be able to put in the mpich tests once we 
release it. The only things you need are:
- Be able to compile mpich with clang (not a problem, I did it already)
- Write a simple program that executes the MPI functions that you want 
to test. The more functions you test the better. I only tested the most 
used MPI functions, e.g., alltoall, gather, broadcast, etc.
- A script that runs the test at multiple small scales, say mpirun -n4, 
-n8 and -n16.

That's it.

I would be happy to help you incorporate this in the mpich tests later.

Thanks for your help!

Ignacio

PS: Now I'm moving to openmpi to see if I find bugs there ;-)



On 10/1/15 2:13 PM, Rob Latham wrote:
>
>
> On 10/01/2015 01:09 PM, Ignacio Laguna wrote:
>> I would be happy to continue the discussion too. It's just that since
>> I'm new here I don't want to violate the rules (if there are). I know
>> that non mpich topics should be discussed in stack-overflow and places
>> like that. If there are more questions, just let me know.
>
> I think the question we all have is "when can we add this to our git
> commit hooks?" -- debugging these problems without the aid of tooling is
> basically impossible.
>
> ==rob
>
>>
>> Ignacio
>>
>>
>> On 10/1/15 10:50 AM, Jeff Hammond wrote:
>>> I would be happy to see this discussion continue on the MPICH devel list
>>> even if it gets into LLVM weeds.  Tools for ensuring count-safety are
>>> vital to MPICH and related efforts in the MPI community, and I think
>>> many of the people who are subscribed to devel at mpich.org
>>> <mailto:devel at mpich.org> would be interested in this topic.  At the very
>>> least, they should tolerate it, since everyone on that list knows how to
>>> use email filters ;-)
>>>
>>> Best,
>>>
>>> Jeff
>>>
>>> On Thu, Oct 1, 2015 at 10:35 AM, Ignacio Laguna <lagunaperalt1 at llnl.gov
>>> <mailto:lagunaperalt1 at llnl.gov>> wrote:
>>>
>>>     Rob,
>>>
>>>     Thanks for replying! I'm answering your questions inline below.
>>>
>>>     On 10/1/15 9:15 AM, Rob Latham wrote:
>>>
>>>
>>>
>>>         On 09/30/2015 04:08 PM, Ignacio Laguna wrote:
>>>
>>>             Hi,
>>>
>>>             I found some 'potential' integer overflows when running
>>>             mpich at large
>>>             scale and/or with large inputs in gather.c. I believe that
>>>             they are
>>>             related to ticket #1767
>>>             (http://trac.mpich.org/projects/mpich/ticket/1767), but I
>>>             didn't see any
>>>             other bug reports about them so I thought I should confirm
>>>             with mpich
>>>             developers.
>>>
>>>             In addition to this case in line 171:
>>>
>>>                   98        int tmp_buf_size, missing;
>>>             ...
>>>                  169        if (nbytes <
>>> MPIR_CVAR_GATHER_VSMALL_MSG_SIZE)
>>>             tmp_buf_size++;
>>>                  170
>>>                  171        tmp_buf_size *= nbytes;
>>>
>>>             which I believe it's fixed in mpich-3.2b4 (where
>>> tmp_buf_size is
>>>             declared as a 64-bit MPI_Aint),
>>>
>>>
>>>         I fixed this (I sure hope!) over the summer in commit
>>>         68f8c7aa798f7009
>>>
>>>
>>>     Yes, this is fixed (according to my analysis), so don't worry, I
>>>     just used it as a reference.
>>>
>>>
>>>             I found the following cases in the same
>>>             file src/mpi/coll/gather.c (in both mpich-3.1.4 and
>>>             mpich-3.2b4):
>>>
>>>             Case 1:
>>>
>>>                  222    mpi_errno = MPIC_Recv(((char *)recvbuf +
>>>                  223                (((rank + mask) %
>>>             comm_size)*recvcount*extent)),
>>>                  224                recvblks * recvcount, recvtype, src,
>>>                  225                MPIR_GATHER_TAG, comm,
>>>                  226                &status, errflag);
>>>
>>>             In line 223 I believe we get an integer overflow as follows.
>>>             Suppose I
>>>             run 2^20 = 1,048,576 ranks and do a gather with 4,096
>>>             elements. In this
>>>             case (if I understand the algorithm well), ((rank + mask) %
>>>             comm_size)
>>>             would be 2^20 / 2 = 524,288, and recvcount = 4,096. Then the
>>>             ((rank +
>>>             mask) % comm_size)*recvcount expression would overflow:
>>>             524,288 * 4,096
>>>             = 2,147,483,648, and become negative.
>>>
>>>
>>>         We're doing address math here, so I hope the compiler is not
>>>         trying to
>>>         store intermediate results in an integer.  However, if LLVM is
>>>         saying
>>>         otherwise, then sure, let's promote those types.
>>>
>>>            ((rank + mask) % comm_size)*recvcount is of type int*int,
>>>         which is
>>>         then multiplied by an MPI_Aint... yeah, looks suspicious.
>>>
>>>
>>>     Even when they are intermediate operations, the compiler could place
>>>     the result of ((rank + mask) % comm_size)*recvcount in an 32-bit
>>>     integer register. In fact, in llvm with -O0 optimization level
>>>     that's what it does. This is the portion of the code in llvm IR:
>>>
>>>        %mul220 = mul nsw i32 %rem219.pre, %recvcount
>>>        %conv221 = sext i32 %mul220 to i64
>>>        %mul222 = mul nsw i64 %conv221, %extent.1851
>>>        %add.ptr223 = getelementptr inbounds i8* %recvbuf, i64 %mul222
>>>
>>>     In the first line it multiplies the result of ((rank + mask) %
>>>     comm_size) with recvcount. Note that this is a 32-bit multiplication
>>>     (it says i32). Then in the 2nd line it extends the result to a
>>>     64-bit register (unfortunately at this point %mul220 is already
>>>     wrapped around) and stores it in %conv221. Then it multiplies it by
>>>     %extent, and then gets an element of the array (with getelementptr).
>>>
>>>     I'm not sure if this would be a problem with higher optimization
>>>     levels, though, but we should not rely on that.
>>>
>>>     I also verified these things are propagated to actual machine
>>>     registers by looking at the assembly code. Here's an example:
>>>
>>>     $ cat test.c
>>>     #include <stdio.h>
>>>     int main()
>>>     {
>>>        int val1, val2;
>>>        long long tmp, final;
>>>        val1 = (1 << 20) / 2;
>>>        val2 = 4096;
>>>        tmp = 4;
>>>        final = val1 * val2 * tmp;
>>>        printf("val1: %d, val2: %d, final: %lld\n", val1, val2, final);
>>>        return 0;
>>>     }
>>>
>>>     I compiled it with clang and -O0 and this is what I get when I run
>>> it:
>>>
>>>     $ ./test
>>>     val1: 524288, val2: 4096, final: -8589934592
>>>
>>>     This is the assembly code in my Mac:
>>>
>>>     $ otool -vVt test
>>>     test:
>>>     (__TEXT,__text) section
>>>     _main:
>>>     0000000100000f00        pushq   %rbp
>>>     0000000100000f01        movq    %rsp, %rbp
>>>     0000000100000f04        subq    $0x30, %rsp
>>>     0000000100000f08        leaq    0x6f(%rip), %rdi
>>>     0000000100000f0f        movl    $0x0, -0x4(%rbp)
>>>     0000000100000f16        movl    $0x80000, -0x8(%rbp)
>>>     0000000100000f1d        movl    $0x1000, -0xc(%rbp)
>>>     0000000100000f24        movq    $0x4, -0x18(%rbp)
>>>     0000000100000f2c        movl    -0x8(%rbp), %eax
>>>     0000000100000f2f        imull   -0xc(%rbp), %eax
>>>     0000000100000f33        movslq  %eax, %rcx
>>>     0000000100000f36        imulq   -0x18(%rbp), %rcx
>>>
>>>     The first multiplication with imull is stored in %eax, and then
>>>     second one in %rcx, but at this point %eax already has an overflow.
>>>
>>>
>>>         There are quite a few places where we do math on recvcount and
>>>         recvblocks -- does your static analysis show problems with all
>>>         of those
>>>         spots?
>>>
>>>
>>>     I checked all MPI calls in mpich-3.2b4 and the analysis doesn't
>>>     report anything else. Now, this doesn't mean there are no more
>>>     bugs--just that it could not find anything else :-)
>>>
>>>             When multiplied with 'extent', which is size_t or MPI_Aint,
>>>             it will
>>>             become negative I believe or a huge positive which in any
>>>             case will
>>>             point to the wrong location in the recvbuf buffer, unless of
>>>             course this
>>>             wraparound behavior is intended.
>>>
>>>
>>>         Nope, wraparound sure is not intended!  We would have no hope of
>>>         cleaning up all these locations if it were not for clang's
>>>         -Wshorten64-to-32 flag.
>>>
>>>
>>>     Thanks! I was just double checking. In some programs wraparounds are
>>>     actually intended.
>>>
>>>             Case 2:
>>>
>>>             There might be a similar problem in line 224 in the above
>>>             code. With
>>>             2^20 ranks, recvblks becomes 524,288 (again if I understand
>>>             well the
>>>             algorithm), so the recvblks * recvcount operation will also
>>>             overflow.
>>>
>>>             I might be wrong on this -- I'm catching these issues with
>>>             LLVM symbolic
>>>             analysis -- so they can be totally false positives,
>>>
>>>
>>>         Tell me more about LLVM symbolic analysis!  We have started
>>> using
>>>         coverity for static analysis but yes, there are a lot of
>>> messages
>>>         obscuring the genuine bugs.
>>>
>>>
>>>     I can tell you more in a separate email (I know this mailing list is
>>>     only for mpich discussions).
>>>
>>>     Basically we are looking for scale-dependent bugs, i.e., bugs that
>>>     will only manifest at large scale and/or with large inputs. It does
>>>     data-flow analysis to understand what code is dependent on things
>>>     like the size of the problem (in mpich and/or MPI apps this is
>>>     basically code that depends on the size of the communicators, for
>>>     example). It then performs tests to see if things could go wrong at
>>>     large scale, e.g., by replacing ranks = 1 million values and so on.
>>>     And then it reports what line of the code and file the problem is.
>>>
>>>     Having said that, coverity should find much more bugs. This is very
>>>     specialized for scale-dependent bugs.
>>>
>>>
>>>             but I just wanted to
>>>             check with the mpich developers if they are valid issues or
>>>             not. If they
>>>             are, I believe fixes can be easy to implement (just make all
>>>             these
>>>             computations size_t).
>>>
>>>
>>>         I promoted the prototype for MPIC_Recv to this:
>>>
>>>         int MPIC_Recv(void *buf, MPI_Aint count,
>>>         MPI_Datatype datatype, int source, int tag,
>>>         MPID_Comm *comm_ptr, MPI_Status *status,
>>>         MPIR_Errflag_t *errflag)
>>>
>>>         Top-level MPI routines are stuck using an 'int' count, but
>>>         internally we
>>>         can (and now do) use a larger MPI_Aint type.
>>>
>>>         True, int*int can overflow, but since that parameter to
>>>         MPIC_Recv is of
>>>         the larger MPI_Aint type, we're ok here.   LLVM disagrees?
>>>
>>>
>>>     Once you have a version I can download with the patches, I can run
>>>     the analysis again and see if I can catch more things. Just let me
>>>     know and I would be happy to test it.
>>>
>>>     In general, overflows can also occur in intermediate computations,
>>>     depending on how code is optimized, I believe.
>>>
>>>     Thanks!
>>>
>>>     Ignacio
>>>
>>>         ==rob
>>>
>>>     _______________________________________________
>>>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>>     To manage subscription options or unsubscribe:
>>>     https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>>> http://jeffhammond.github.io/
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list