[mpich-discuss] Segfault with MPICH 3.2+Clang but not GCC

Jeff Hammond jeff.science at gmail.com
Tue Jul 26 11:16:00 CDT 2016


Indeed, Rob is right.  I only tested np<=2.  Didn't see failure until
nproc=16 because of how malloc works but in any case, it is trivial to fix
this by allocating C to be nproc elements.

Jeff

#include <mpi.h>

#include <stdio.h>

#include <stdlib.h>


int main(int argc, char *argv[])

{

    MPI_Init(&argc, &argv);


    MPI_Comm comm = MPI_COMM_WORLD;

    uint64_t *A, *C;

    int rnk, siz;


    MPI_Comm_rank(comm, &rnk);

    MPI_Comm_size(comm, &siz);

    A = calloc(1, sizeof(uint64_t));

    C = calloc(siz, sizeof(uint64_t));

    A[0] = rnk + 1;


    MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);


    free(C);

    free(A);


    MPI_Finalize();

    return 0;

}

On Tue, Jul 26, 2016 at 9:00 AM, Rob Latham <robl at mcs.anl.gov> wrote:

>
>
> On 07/26/2016 10:17 AM, Andreas Noack wrote:
>
>> On my El Capitan macbook I get a segfault when running the program below
>> with more than a single process but only when MPICH has been compiled
>> with Clang.
>>
>> I don't get that good debug info but here is some of what I got
>>
>
>
> valgrind is pretty good at sussing out these sorts of things:
>
> ==18132== Unaddressable byte(s) found during client check request
> ==18132==    at 0x504D1D7: MPIR_Localcopy (helper_fns.c:84)
> ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
> ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
> ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
> ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
> ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
> ==18132==    by 0x4008F5: main (noack_segv.c:18)
> ==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
> ==18132==    at 0x4C2FB55: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18132==    by 0x4008B0: main (noack_segv.c:15)
> ==18132==
> ==18132== Invalid write of size 8
> ==18132==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18132==    by 0x504D31B: MPIR_Localcopy (helper_fns.c:84)
> ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
> ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
> ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
> ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
> ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
> ==18132==    by 0x4008F5: main (noack_segv.c:18)
> ==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
> ==18132==    at 0x4C2FB55: calloc (in
> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
> ==18132==    by 0x4008B0: main (noack_segv.c:15)
>
>
>
>>      MPI_Comm_rank(comm, &rnk);
>>      A = calloc(1, sizeof(uint64_t));
>>      C = calloc(2, sizeof(uint64_t));
>>      A[0] = rnk + 1;
>>
>>      MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);
>>
>
> Your 'buf count tuple' is ok for A: every process sends one uint64
>
> your 'buf count tuple' is too small for C if there are any more than 2
> proceses .
>
> When you say "more than one"... do you mean 2?
>
> ==rob
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160726/e2e6449a/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list