[mpich-discuss] Segfault with MPICH 3.2+Clang but not GCC
Rob Latham
robl at mcs.anl.gov
Tue Jul 26 11:00:00 CDT 2016
On 07/26/2016 10:17 AM, Andreas Noack wrote:
> On my El Capitan macbook I get a segfault when running the program below
> with more than a single process but only when MPICH has been compiled
> with Clang.
>
> I don't get that good debug info but here is some of what I got
valgrind is pretty good at sussing out these sorts of things:
==18132== Unaddressable byte(s) found during client check request
==18132== at 0x504D1D7: MPIR_Localcopy (helper_fns.c:84)
==18132== by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
==18132== by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
==18132== by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
==18132== by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
==18132== by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
==18132== by 0x4008F5: main (noack_segv.c:18)
==18132== Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
==18132== at 0x4C2FB55: calloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132== by 0x4008B0: main (noack_segv.c:15)
==18132==
==18132== Invalid write of size 8
==18132== at 0x4C326CB: memcpy@@GLIBC_2.14 (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132== by 0x504D31B: MPIR_Localcopy (helper_fns.c:84)
==18132== by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
==18132== by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
==18132== by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
==18132== by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
==18132== by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
==18132== by 0x4008F5: main (noack_segv.c:18)
==18132== Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
==18132== at 0x4C2FB55: calloc (in
/usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
==18132== by 0x4008B0: main (noack_segv.c:15)
>
> MPI_Comm_rank(comm, &rnk);
> A = calloc(1, sizeof(uint64_t));
> C = calloc(2, sizeof(uint64_t));
> A[0] = rnk + 1;
>
> MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);
Your 'buf count tuple' is ok for A: every process sends one uint64
your 'buf count tuple' is too small for C if there are any more than 2
proceses .
When you say "more than one"... do you mean 2?
==rob
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list