[mpich-discuss] Segfault with MPICH 3.2+Clang but not GCC

Rob Latham robl at mcs.anl.gov
Tue Jul 26 11:00:00 CDT 2016



On 07/26/2016 10:17 AM, Andreas Noack wrote:
> On my El Capitan macbook I get a segfault when running the program below
> with more than a single process but only when MPICH has been compiled
> with Clang.
>
> I don't get that good debug info but here is some of what I got


valgrind is pretty good at sussing out these sorts of things:

==18132== Unaddressable byte(s) found during client check request
==18132==    at 0x504D1D7: MPIR_Localcopy (helper_fns.c:84)
==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
==18132==    by 0x4008F5: main (noack_segv.c:18)
==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
==18132==    at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132==    by 0x4008B0: main (noack_segv.c:15)
==18132==
==18132== Invalid write of size 8
==18132==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132==    by 0x504D31B: MPIR_Localcopy (helper_fns.c:84)
==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
==18132==    by 0x4008F5: main (noack_segv.c:18)
==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
==18132==    at 0x4C2FB55: calloc (in 
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132==    by 0x4008B0: main (noack_segv.c:15)


>
>      MPI_Comm_rank(comm, &rnk);
>      A = calloc(1, sizeof(uint64_t));
>      C = calloc(2, sizeof(uint64_t));
>      A[0] = rnk + 1;
>
>      MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);

Your 'buf count tuple' is ok for A: every process sends one uint64

your 'buf count tuple' is too small for C if there are any more than 2 
proceses .

When you say "more than one"... do you mean 2?

==rob
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list