[mpich-discuss] Segfault with MPICH 3.2+Clang but not GCC

Andreas Noack andreasnoackjensen at gmail.com
Tue Jul 26 13:56:14 CDT 2016


I just built 0d6412303488428c461986655a56639cbbdbf705 and there I don't see
the issue. It's definitely on 3.2, though.

On Tue, Jul 26, 2016 at 2:17 PM, Andreas Noack <andreasnoackjensen at gmail.com
> wrote:

> Thanks for the replies and sorry for the incomplete information. I should
> have written that I get a segfault for np=2 but not np=1. Find the exact
> error message below. I've also tried the program Jeff sent and I get the
> same segfault as before for all np>1. I'll try to checkout and build the
> development version and see if I can reproduce.
>
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 7703 RUNNING AT 30-9-81.wireless.csail.mit.edu
> =   EXIT CODE: 11
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
>
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Segmentation fault: 11
> (signal 11)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
>
> On Tue, Jul 26, 2016 at 12:16 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
>
>> Indeed, Rob is right.  I only tested np<=2.  Didn't see failure until
>> nproc=16 because of how malloc works but in any case, it is trivial to fix
>> this by allocating C to be nproc elements.
>>
>> Jeff
>>
>> #include <mpi.h>
>>
>> #include <stdio.h>
>>
>> #include <stdlib.h>
>>
>>
>> int main(int argc, char *argv[])
>>
>> {
>>
>>     MPI_Init(&argc, &argv);
>>
>>
>>     MPI_Comm comm = MPI_COMM_WORLD;
>>
>>     uint64_t *A, *C;
>>
>>     int rnk, siz;
>>
>>
>>     MPI_Comm_rank(comm, &rnk);
>>
>>     MPI_Comm_size(comm, &siz);
>>
>>     A = calloc(1, sizeof(uint64_t));
>>
>>     C = calloc(siz, sizeof(uint64_t));
>>
>>     A[0] = rnk + 1;
>>
>>
>>     MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);
>>
>>
>>     free(C);
>>
>>     free(A);
>>
>>
>>     MPI_Finalize();
>>
>>     return 0;
>>
>> }
>>
>> On Tue, Jul 26, 2016 at 9:00 AM, Rob Latham <robl at mcs.anl.gov> wrote:
>>
>>>
>>>
>>> On 07/26/2016 10:17 AM, Andreas Noack wrote:
>>>
>>>> On my El Capitan macbook I get a segfault when running the program below
>>>> with more than a single process but only when MPICH has been compiled
>>>> with Clang.
>>>>
>>>> I don't get that good debug info but here is some of what I got
>>>>
>>>
>>>
>>> valgrind is pretty good at sussing out these sorts of things:
>>>
>>> ==18132== Unaddressable byte(s) found during client check request
>>> ==18132==    at 0x504D1D7: MPIR_Localcopy (helper_fns.c:84)
>>> ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
>>> ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
>>> ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
>>> ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
>>> ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
>>> ==18132==    by 0x4008F5: main (noack_segv.c:18)
>>> ==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
>>> ==18132==    at 0x4C2FB55: calloc (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==18132==    by 0x4008B0: main (noack_segv.c:15)
>>> ==18132==
>>> ==18132== Invalid write of size 8
>>> ==18132==    at 0x4C326CB: memcpy@@GLIBC_2.14 (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==18132==    by 0x504D31B: MPIR_Localcopy (helper_fns.c:84)
>>> ==18132==    by 0x4EC8EA1: MPIR_Allgather_intra (allgather.c:169)
>>> ==18132==    by 0x4ECA5EC: MPIR_Allgather (allgather.c:791)
>>> ==18132==    by 0x4ECA7A4: MPIR_Allgather_impl (allgather.c:832)
>>> ==18132==    by 0x4EC8B5C: MPID_Allgather (mpid_coll.h:61)
>>> ==18132==    by 0x4ECB9F7: PMPI_Allgather (allgather.c:978)
>>> ==18132==    by 0x4008F5: main (noack_segv.c:18)
>>> ==18132==  Address 0x6f2f138 is 8 bytes after a block of size 16 alloc'd
>>> ==18132==    at 0x4C2FB55: calloc (in
>>> /usr/lib/valgrind/vgpreload_memcheck-amd64-linux.so)
>>> ==18132==    by 0x4008B0: main (noack_segv.c:15)
>>>
>>>
>>>
>>>>      MPI_Comm_rank(comm, &rnk);
>>>>      A = calloc(1, sizeof(uint64_t));
>>>>      C = calloc(2, sizeof(uint64_t));
>>>>      A[0] = rnk + 1;
>>>>
>>>>      MPI_Allgather(A, 1, MPI_UINT64_T, C, 1, MPI_UINT64_T, comm);
>>>>
>>>
>>> Your 'buf count tuple' is ok for A: every process sends one uint64
>>>
>>> your 'buf count tuple' is too small for C if there are any more than 2
>>> proceses .
>>>
>>> When you say "more than one"... do you mean 2?
>>>
>>> ==rob
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160726/8a8923f5/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list