[mpich-discuss] Segfault with MPICH 3.2+Clang but not GCC

Kenneth Raffenetti raffenet at mcs.anl.gov
Tue Jul 26 14:57:03 CDT 2016


Thanks for confirming. We'll make sure the fix gets applied to an 
upcoming bug-fix release.

Ken

On 07/26/2016 01:56 PM, Andreas Noack wrote:
> I just built 0d6412303488428c461986655a56639cbbdbf705 and there I don't
> see the issue. It's definitely on 3.2, though.
>
> On Tue, Jul 26, 2016 at 2:17 PM, Andreas Noack
> <andreasnoackjensen at gmail.com <mailto:andreasnoackjensen at gmail.com>> wrote:
>
>     Thanks for the replies and sorry for the incomplete information. I
>     should have written that I get a segfault for np=2 but not np=1.
>     Find the exact error message below. I've also tried the program Jeff
>     sent and I get the same segfault as before for all np>1. I'll try to
>     checkout and build the development version and see if I can reproduce.
>
>     ===================================================================================
>     =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
>     =   PID 7703 RUNNING AT 30-9-81.wireless.csail.mit.edu
>     <http://30-9-81.wireless.csail.mit.edu>
>     =   EXIT CODE: 11
>     =   CLEANING UP REMAINING PROCESSES
>     =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>     ===================================================================================
>     YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation
>     fault: 11 (signal 11)
>     This typically refers to a problem with your application.
>     Please see the FAQ page for debugging suggestions
>
>
>     On Tue, Jul 26, 2016 at 12:16 PM, Jeff Hammond
>     <jeff.science at gmail.com <mailto:jeff.science at gmail.com>> wrote:
>
>         Indeed, Rob is right.  I only tested np<=2.  Didn't see failure
>         until nproc=16 because of how malloc works but in any case, it
>         is trivial to fix this by allocating C to be nproc elements.
>
>         Jeff
>
>         #include <mpi.h>
>
>         #include <stdio.h>
>
>         #include <stdlib.h>
>
>
>         int main(int argc, char *argv[])
>
>         {
>
>             MPI_Init(&argc, &argv);
>
>
>             MPI_Comm comm = MPI_COMM_WORLD;
>
>             uint64_t *A, *C;
>
>             int rnk, siz;
>
>
>             MPI_Comm_rank(comm, &rnk);
>
>             MPI_Comm_size(comm, &siz);
>
>             A = calloc(1, sizeof(uint64_t));
>
>             C = calloc(siz, sizeof(uint64_t));
>
>             A[0] = rnk + 1;
>
>
>             MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);
>
>
>             free(C);
>
>             free(A);
>
>
>             MPI_Finalize();
>
>             return 0;
>
>         }
>
>
>         On Tue, Jul 26, 2016 at 9:00 AM, Rob Latham <robl at mcs.anl.gov
>         <mailto:robl at mcs.anl.gov>> wrote:
>
>
>
>             On 07/26/2016 10:17 AM, Andreas Noack wrote:
>
>                 On my El Capitan macbook I get a segfault when running
>                 the program below
>                 with more than a single process but only when MPICH has
>                 been compiled
>                 with Clang.
>
>                 I don't get that good debug info but here is some of
>                 what I got
>
>
>
>             valgrind is pretty good at sussing out these sorts of things:
>
>             ==18132== Unaddressable byte(s) found during client check
>             request
>             ==18132==    at 0x504D1D7: MPIR_Localcopy (helper_fns.c:84)
>             ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra
>             (allgather.c:169)
>             ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
>             ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
>             ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
>             ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
>             ==18132==    by 0x4008F5: main (noack_segv.c:18)
>             ==18132==  Address 0x6f2f138 is 8 bytes after a block of
>             size 16 alloc'd
>             ==18132==    at 0x4C2FB55: calloc (in
>             /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>             ==18132==    by 0x4008B0: main (noack_segv.c:15)
>             ==18132==
>             ==18132== Invalid write of size 8
>             ==18132==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in
>             /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>             ==18132==    by 0x504D31B: MPIR_Localcopy (helper_fns.c:84)
>             ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra
>             (allgather.c:169)
>             ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
>             ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
>             ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
>             ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
>             ==18132==    by 0x4008F5: main (noack_segv.c:18)
>             ==18132==  Address 0x6f2f138 is 8 bytes after a block of
>             size 16 alloc'd
>             ==18132==    at 0x4C2FB55: calloc (in
>             /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>             ==18132==    by 0x4008B0: main (noack_segv.c:15)
>
>
>
>                      MPI_Comm_rank(comm, &rnk);
>                      A = calloc(1, sizeof(uint64_t));
>                      C = calloc(2, sizeof(uint64_t));
>                      A[0] = rnk + 1;
>
>                      MPI_Allgather(A, 1, MPI_UINT64_T, C, 1,
>                 MPI_UINT64_T, comm);
>
>
>             Your 'buf count tuple' is ok for A: every process sends one
>             uint64
>
>             your 'buf count tuple' is too small for C if there are any
>             more than 2 proceses .
>
>             When you say "more than one"... do you mean 2?
>
>             ==rob
>
>             _______________________________________________
>             discuss mailing list     discuss at mpich.org
>             <mailto:discuss at mpich.org>
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>         --
>         Jeff Hammond
>         jeff.science at gmail.com <mailto:jeff.science at gmail.com>
>         http://jeffhammond.github.io/
>
>         _______________________________________________
>         discuss mailing list     discuss at mpich.org
>         <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list