[mpich-discuss] MPICH-3.2: SIGSEGV in MPID_Request_create () at src/mpid/ch3/src/ch3u_request.c:101

Mark Davis markdavisinboston at gmail.com
Wed Aug 17 15:29:40 CDT 2016


Hello,

Sure. Please see the attached which is a simple single-threaded test
that simply loops through many broadcasts in a row. Note that when I
run this with NPROCS=2 it runs fine, but when I turn up the NPROCS to
6, about 50% of the time it seems to hang for a couple seconds, and
then SEGV (backtrace below). Again, this is a transient issue and
seems to happen on every MPI program I try, assuming the NPROCS is
large enough; sometimes it runs fine with no "hanging". Note that I'm
running on an 8-core macbook pro. (When I run the same application on
my Linux cluster, I never have this problem.)

Secondly, note that I used the recommendation of compiling with
--enable-g=most,mem and that did allow me to compile master HEAD.
However, debug symbols aren't being either generated or at least not
loaded by gdb. I did notice that no libpmpi.12.dylib.dSYM directories
(with DWARF debug format info) were created when I built master from
source, despite my --enable-g=most,mem flag. I'm pretty sure when I
built MPICH 3.2 release it did create these. Is there something else I
can do during the build to force debug symbols?

Thank you,
Mark

Program received signal SIGSEGV, Segmentation fault.
[Switching to Thread 0x1313 of process 62381]
MPID_Request_init (req=0x105be2098) at src/mpid/ch3/src/ch3u_request.c:56
56          req->dev.ext_hdr_ptr       = NULL;
(gdb) bt full
#0  MPID_Request_init (req=0x105be2098) at src/mpid/ch3/src/ch3u_request.c:56
No locals.
#1  0x000000010036aeea in ?? () from /Users/m/local/lib/libpmpi.12.dylib
No symbol table info available.
#2  0x0000000100445550 in ?? () from /Users/m/local/lib/libpmpi.12.dylib
No symbol table info available.
#3  0x0000000000000000 in ?? ()
No symbol table info available.

On Tue, Aug 16, 2016 at 7:32 PM, Halim Amer <aamer at anl.gov> wrote:
> Can you send us a toy program that reproduces this problem?
>
> --Halim
> www.mcs.anl.gov/~aamer
>
>
> On 8/16/16 6:04 PM, Mark Davis wrote:
>>
>> I pulled from master (at d8bb1df from yesterday) again and then
>> recompiled with the --enable-g=most,mem flag instead of just
>> --enable-g=most.
>>
>> The good news is that the --enable-g=most,mem flag compiled successfully.
>>
>> The bad news is two-fold:
>>
>> 1. I believe I'm still getting the same SEGV as I was getting before
>> related to req->dev.ext_hdr_ptr       = NULL; (Although it's now
>> pointing to a different line in src/mpid/ch3/src/ch3u_request.c (line
>> 56 instead of line 101 as before). I'm not sure if the line is
>> relevant; some other things may have moved around in that file since
>> then the 3.2 release version.
>>
>> 2. I no longer have debugging symbols in my library, so my backtraces
>> are not helpful. It's possible these two issues are related?
>>
>> I did double check that I rebuilt my application from scratch so it
>> linked in the new library and that the library was indeed rebuilt (by
>> looking at file creation timestamps).
>>
>> Any ideas about these two issues? Thank you.
>>
>> Program received signal SIGSEGV, Segmentation fault.
>> [Switching to Thread 0x1313 of process 62381]
>> MPID_Request_init (req=0x105be2098) at src/mpid/ch3/src/ch3u_request.c:56
>> 56          req->dev.ext_hdr_ptr       = NULL;
>> (gdb) bt full
>> #0  MPID_Request_init (req=0x105be2098) at
>> src/mpid/ch3/src/ch3u_request.c:56
>> No locals.
>> #1  0x000000010036aeea in ?? () from /Users/m/local/lib/libpmpi.12.dylib
>> No symbol table info available.
>> #2  0x0000000100445550 in ?? () from /Users/m/local/lib/libpmpi.12.dylib
>> No symbol table info available.
>> #3  0x0000000000000000 in ?? ()
>> No symbol table info available.
>>
>> On Mon, Aug 15, 2016 at 5:32 PM, Halim Amer <aamer at anl.gov> wrote:
>>>
>>> Good catch! the `most` option implies `mem`, but the root configure
>>> failed
>>> to forward the `mem` option to the MPL software layer. We will push a
>>> fix,
>>> but meanwhile you can specify `--enable-g=most,mem` to get the desired
>>> behavior.
>>>
>>> --Halim
>>> www.mcs.anl.gov/~aamer
>>>
>>>
>>> On 8/12/16 9:37 PM, Mark Davis wrote:
>>>>
>>>>
>>>> I've tried both git HEAD (3ea7589) as well as the August 1 master
>>>> snapshot and am having trouble building it; I'm getting the same error
>>>> in both cases. I've configured with --enable-g=most but otherwise it's
>>>> all default. I'm running on OSX (Darwin - 15.6.0) and clang 3.8.1.
>>>>
>>>> It's erroring on compiling src/mpi/attr/lib_libpmpi_la-attr_delete.lo
>>>> due to an issue with the macro MPL_free
>>>>
>>>> Has anyone seen this before? I'm including the full error trace below:
>>>>
>>>>
>>>>
>>>>
>>>>
>>>> Making all in .
>>>>   CC       src/mpi/attr/lib_libpmpi_la-attr_delete.lo
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:217:
>>>> ./src/include/mpir_request.h:281:13: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>             MPL_free(req->u.ureq.greq_fns);
>>>>             ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:217:
>>>> ./src/include/mpir_request.h:281:13: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:217:
>>>> ./src/include/mpir_request.h:281:13: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:50: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:217:
>>>> ./src/include/mpir_request.h:281:13: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:51: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:238:43: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>   *_dst = (*_src == NULL) ? NULL : (char*)utarray_strdup_(*_src);
>>>>                                           ^
>>>> ./src/include/mpir_utarray.h:56:33: note: expanded from macro
>>>> 'utarray_strdup_'
>>>> #define utarray_strdup_(x_)     MPL_strdup(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:17:20:
>>>> note: expanded from macro 'MPL_strdup'
>>>> #define MPL_strdup strdup
>>>>                    ^
>>>> ./src/include/mpir_mem.h:100:27: note: expanded from macro 'strdup'
>>>> #define strdup(a)         'Error use MPL_strdup' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:238:43: warning: character constant too
>>>> long for its type
>>>> ./src/include/mpir_utarray.h:56:33: note: expanded from macro
>>>> 'utarray_strdup_'
>>>> #define utarray_strdup_(x_)     MPL_strdup(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:17:20:
>>>> note: expanded from macro 'MPL_strdup'
>>>> #define MPL_strdup strdup
>>>>                    ^
>>>> ./src/include/mpir_mem.h:100:27: note: expanded from macro 'strdup'
>>>> #define strdup(a)         'Error use MPL_strdup' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:238:43: error: expected ';' after
>>>> expression
>>>> ./src/include/mpir_utarray.h:56:33: note: expanded from macro
>>>> 'utarray_strdup_'
>>>> #define utarray_strdup_(x_)     MPL_strdup(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:17:20:
>>>> note: expanded from macro 'MPL_strdup'
>>>> #define MPL_strdup strdup
>>>>                    ^
>>>> ./src/include/mpir_mem.h:100:50: note: expanded from macro 'strdup'
>>>> #define strdup(a)         'Error use MPL_strdup' :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:238:43: error: expected expression
>>>> ./src/include/mpir_utarray.h:56:33: note: expanded from macro
>>>> 'utarray_strdup_'
>>>> #define utarray_strdup_(x_)     MPL_strdup(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:17:20:
>>>> note: expanded from macro 'MPL_strdup'
>>>> #define MPL_strdup strdup
>>>>                    ^
>>>> ./src/include/mpir_mem.h:100:51: note: expanded from macro 'strdup'
>>>> #define strdup(a)         'Error use MPL_strdup' :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:242:14: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>   if (*eltc) utarray_free_(*eltc);
>>>>              ^
>>>> ./src/include/mpir_utarray.h:54:33: note: expanded from macro
>>>> 'utarray_free_'
>>>> #define utarray_free_(x_)       MPL_free(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:242:14: warning: character constant too
>>>> long for its type
>>>> ./src/include/mpir_utarray.h:54:33: note: expanded from macro
>>>> 'utarray_free_'
>>>> #define utarray_free_(x_)       MPL_free(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:242:14: error: expected ';' after
>>>> expression
>>>> ./src/include/mpir_utarray.h:54:33: note: expanded from macro
>>>> 'utarray_free_'
>>>> #define utarray_free_(x_)       MPL_free(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:50: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:225:
>>>> In file included from ./src/include/mpir_cvars.h:17:
>>>> In file included from ./src/include/mpitimpl.h:18:
>>>> ./src/include/mpir_utarray.h:242:14: error: expected expression
>>>> ./src/include/mpir_utarray.h:54:33: note: expanded from macro
>>>> 'utarray_free_'
>>>> #define utarray_free_(x_)       MPL_free(x_)
>>>>                                 ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:51: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:61:29: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>         nIndirect = (int *) MPL_calloc(objmem->indirect_size,
>>>> sizeof(int));
>>>>                             ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:61:29: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:61:29: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:50: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:61:29: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:51: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:117:9: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>         MPL_free(nIndirect);
>>>>         ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:117:9: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:117:9: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:50: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:117:9: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:51: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:179:9: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>         MPL_free((*indirect)[i]);
>>>>         ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:179:9: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:179:9: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:50: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:179:9: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:51: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:182:9: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>         MPL_free(indirect);
>>>>         ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:182:9: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:27: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:182:9: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:50: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:182:9: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:110:26:
>>>> note: expanded from macro 'MPL_free'
>>>> #define MPL_free(a)      free((void *)(a))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:92:51: note: expanded from macro 'free'
>>>> #define free(a)           'Error use MPL_free'   :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:249:30: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>         *indirect = (void *) MPL_calloc(indirect_num_blocks, sizeof(void
>>>> *));
>>>>                              ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:249:30: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:249:30: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:50: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:249:30: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:51: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                   ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:264:26: warning: multi-character
>>>> character constant [-Wmultichar]
>>>>     block_ptr = (void *) MPL_calloc(indirect_num_indices, obj_size);
>>>>                          ^
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:264:26: warning: character constant too
>>>> long for its type
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:27: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                           ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:264:26: error: expected ';' after
>>>> expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:50: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                  ^
>>>> In file included from src/mpi/attr/attr_delete.c:8:
>>>> In file included from ./src/include/mpiimpl.h:228:
>>>> ./src/include/mpir_handlemem.h:264:26: error: expected expression
>>>>
>>>>
>>>> /Users/m/local/src/mpich-master-v3.2-370-g0d6412303488/src/mpl/include/mpl_trmem.h:109:26:
>>>> note: expanded from macro 'MPL_calloc'
>>>> #define MPL_calloc(a,b)  calloc((size_t)(a),(size_t)(b))
>>>>                          ^
>>>> ./src/include/mpir_mem.h:91:51: note: expanded from macro 'calloc'
>>>> #define calloc(a,b)       'Error use MPL_calloc' :::
>>>>                                                   ^
>>>> 18 warnings and 18 errors generated.
>>>> make[2]: *** [src/mpi/attr/lib_libpmpi_la-attr_delete.lo] Error 1
>>>> make[1]: *** [all-recursive] Error 1
>>>> Makefile:10270: recipe for target 'all' failed
>>>> gmake: *** [all] Error 2
>>>>
>>>> On Thu, Aug 11, 2016 at 5:21 PM, Halim Amer <aamer at anl.gov> wrote:
>>>>>
>>>>>
>>>>> This should be related to the alignment problem reported before
>>>>> (http://lists.mpich.org/pipermail/discuss/2016-May/004764.html).
>>>>>
>>>>> We plan to include a fix in the 3.2.x bug fix release series.
>>>>> Meanwhile,
>>>>> please try the repo version (git.mpich.org/mpich.git), which should not
>>>>> suffer from this problem.
>>>>>
>>>>> --Halim
>>>>> www.mcs.anl.gov/~aamer
>>>>>
>>>>>
>>>>> On 8/11/16 8:48 AM, Mark Davis wrote:
>>>>>>
>>>>>>
>>>>>>
>>>>>> Hello, I'm running into a segfault when I run some relatively simple
>>>>>> MPI programs. In this particular case, I'm running a small program in
>>>>>> a loop that does MPI_Bcast, once per loop, within MPI_COMM_WORLD. The
>>>>>> buffer consists of just 7 doubles. I'm running with 6 procs on a
>>>>>> machine with 8 cores on OSX (Darwin - 15.6.0 Darwin Kernel Version
>>>>>> 15.6.0: Thu Jun 23 18:25:34 PDT 2016;
>>>>>> root:xnu-3248.60.10~1/RELEASE_X86_64 x86_64). When I run the same
>>>>>> program with a smaller number of procs, the error usually doesn't show
>>>>>> up. My compiler (both for compiling the MPICH source as well as my
>>>>>> application) is clang 3.8.1.
>>>>>>
>>>>>> When I run the same program on linux, also with MPICH-3.2 (I believe
>>>>>> the same exact source), compiled with gcc 5.3, I do not get this
>>>>>> error. This seems to be something I get only with
>>>>>>
>>>>>> gdb shows the following stack trace. I have a feeling that this has
>>>>>> something to do with my toolchain and/or libraries on my system given
>>>>>> that I never get this error on my other system (linux). However, it's
>>>>>> possible that there's an application bug as well.
>>>>>>
>>>>>> I'm running the MPICH-3.2 stable release; I haven't tried anything
>>>>>> from the repository yet.
>>>>>>
>>>>>> Does anyone have any ideas about what's going on here? I'm happy to
>>>>>> provide more details.
>>>>>>
>>>>>> Thank you,
>>>>>> Mark
>>>>>>
>>>>>>
>>>>>> Program received signal SIGSEGV, Segmentation fault.
>>>>>> MPID_Request_create () at src/mpid/ch3/src/ch3u_request.c:101
>>>>>> 101             req->dev.ext_hdr_ptr       = NULL;
>>>>>> (gdb) bt full
>>>>>> #0  MPID_Request_create () at src/mpid/ch3/src/ch3u_request.c:101
>>>>>> No locals.
>>>>>> #1  0x00000001003ac4c9 in MPIDI_CH3U_Recvq_FDP_or_AEU
>>>>>> (match=<optimized out>, foundp=0x7fff5fbfe2bc) at
>>>>>> src/mpid/ch3/src/ch3u_recvq.c:830
>>>>>>         proc_failure_bit_masked = <error reading variable
>>>>>> proc_failure_bit_masked (Cannot access memory at address 0x1)>
>>>>>>         error_bit_masked = <error reading variable error_bit_masked
>>>>>> (Cannot access memory at address 0x1)>
>>>>>>         prev_rreq = <optimized out>
>>>>>>         channel_matched = <optimized out>
>>>>>>         rreq = <optimized out>
>>>>>> #2  0x00000001003d1ffe in MPIDI_CH3_PktHandler_EagerSend
>>>>>> (vc=<optimized out>, pkt=0x1004b3fd8 <MPIU_DBG_MaxLevel>,
>>>>>> buflen=0x7fff5fbfe440, rreqp=0x7fff5fbfe438) at
>>>>>> src/mpid/ch3/src/ch3u_eager.c:629
>>>>>>         mpi_errno = <error reading variable mpi_errno (Cannot access
>>>>>> memory at address 0x0)>
>>>>>>         found = <error reading variable found (Cannot access memory at
>>>>>> address 0xefefefefefefefef)>
>>>>>>         rreq = <optimized out>
>>>>>>         data_len = <optimized out>
>>>>>>         complete = <optimized out>
>>>>>> #3  0x00000001003f6045 in MPID_nem_handle_pkt (vc=<optimized out>,
>>>>>> buf=0x102ad07e0 "", buflen=<optimized out>) at
>>>>>> src/mpid/ch3/channels/nemesis/src/ch3_progress.c:760
>>>>>>         len = 140734799800192
>>>>>>         mpi_errno = <optimized out>
>>>>>>         complete = <error reading variable complete (Cannot access
>>>>>> memory at address 0x1)>
>>>>>>         rreq = <optimized out>
>>>>>> #4  0x00000001003f4e41 in MPIDI_CH3I_Progress
>>>>>> (progress_state=0x7fff5fbfe750, is_blocking=1) at
>>>>>> src/mpid/ch3/channels/nemesis/src/ch3_progress.c:570
>>>>>>         payload_len = 4299898840
>>>>>>         cell_buf = <optimized out>
>>>>>>         rreq = <optimized out>
>>>>>>         vc = 0x102ad07e8
>>>>>>         made_progress = <error reading variable made_progress (Cannot
>>>>>> access memory at address 0x0)>
>>>>>>         mpi_errno = <optimized out>
>>>>>> #5  0x000000010035386d in MPIC_Wait (request_ptr=<optimized out>,
>>>>>> errflag=<optimized out>) at src/mpi/coll/helper_fns.c:225
>>>>>>         progress_state = {ch = {completion_count = -1409286143}}
>>>>>>         mpi_errno = <error reading variable mpi_errno (Cannot access
>>>>>> memory at address 0x0)>
>>>>>> #6  0x0000000100353b10 in MPIC_Send (buf=0x100917c30,
>>>>>> count=4299945096, datatype=-1581855963, dest=<optimized out>,
>>>>>> tag=4975608, comm_ptr=0x1004b3fd8 <MPIU_DBG_MaxLevel>,
>>>>>> errflag=<optimized out>) at src/mpi/coll/helper_fns.c:302
>>>>>>         mpi_errno = <optimized out>
>>>>>>         request_ptr = 0x1004bf7e0 <MPID_Request_direct+1760>
>>>>>> #7  0x0000000100246031 in MPIR_Bcast_binomial (buffer=<optimized out>,
>>>>>> count=<optimized out>, datatype=<optimized out>, root=<optimized out>,
>>>>>> comm_ptr=<optimized out>, errflag=<optimized out>) at
>>>>>> src/mpi/coll/bcast.c:280
>>>>>>         nbytes = <optimized out>
>>>>>>         mpi_errno_ret = <optimized out>
>>>>>>         mpi_errno = 0
>>>>>>         comm_size = <optimized out>
>>>>>>         rank = 2
>>>>>>         type_size = <optimized out>
>>>>>>         tmp_buf = 0x0
>>>>>>         position = <optimized out>
>>>>>>         relative_rank = <optimized out>
>>>>>>         mask = <optimized out>
>>>>>>         src = <optimized out>
>>>>>>         status = <optimized out>
>>>>>>         recvd_size = <optimized out>
>>>>>>         dst = <optimized out>
>>>>>> #8  0x00000001002455a3 in MPIR_SMP_Bcast (buffer=<optimized out>,
>>>>>> count=<optimized out>, datatype=<optimized out>, root=<optimized out>,
>>>>>> comm_ptr=<optimized out>, errflag=<optimized out>) at
>>>>>> src/mpi/coll/bcast.c:1087
>>>>>>         mpi_errno_ = <error reading variable mpi_errno_ (Cannot access
>>>>>> memory at address 0x0)>
>>>>>>         mpi_errno = <optimized out>
>>>>>>         mpi_errno_ret = <optimized out>
>>>>>>         nbytes = <optimized out>
>>>>>>         type_size = <optimized out>
>>>>>>         status = <optimized out>
>>>>>>         recvd_size = <optimized out>
>>>>>> #9  MPIR_Bcast_intra (buffer=0x100917c30, count=<optimized out>,
>>>>>> datatype=<optimized out>, root=1, comm_ptr=<optimized out>,
>>>>>> errflag=<optimized out>) at src/mpi/coll/bcast.c:1245
>>>>>>         nbytes = <optimized out>
>>>>>>         mpi_errno_ret = <error reading variable mpi_errno_ret (Cannot
>>>>>> access memory at address 0x0)>
>>>>>>         mpi_errno = <optimized out>
>>>>>>         type_size = <optimized out>
>>>>>>         comm_size = <optimized out>
>>>>>> #10 0x000000010024751e in MPIR_Bcast (buffer=<optimized out>,
>>>>>> count=<optimized out>, datatype=<optimized out>, root=<optimized out>,
>>>>>> comm_ptr=0x0, errflag=<optimized out>) at src/mpi/coll/bcast.c:1475
>>>>>>         mpi_errno = <optimized out>
>>>>>> #11 MPIR_Bcast_impl (buffer=0x1004bf7e0 <MPID_Request_direct+1760>,
>>>>>> count=-269488145, datatype=-16, root=0, comm_ptr=0x0,
>>>>>> errflag=0x1004bf100 <MPID_Request_direct>) at
>>>>>> src/mpi/coll/bcast.c:1451
>>>>>>         mpi_errno = <optimized out>
>>>>>> #12 0x00000001000f3c24 in MPI_Bcast (buffer=<optimized out>, count=7,
>>>>>> datatype=1275069445, root=1, comm=<optimized out>) at
>>>>>> src/mpi/coll/bcast.c:1585
>>>>>>         errflag = 2885681152
>>>>>>         mpi_errno = <optimized out>
>>>>>>         comm_ptr = <optimized out>
>>>>>> #13 0x0000000100001df7 in run_test<int> (my_rank=2,
>>>>>> num_ranks=<optimized out>, count=<optimized out>, root_rank=1,
>>>>>> datatype=@0x7fff5fbfeaec: 1275069445, iterations=<optimized out>) at
>>>>>> bcast_test.cpp:83
>>>>>> No locals.
>>>>>> #14 0x00000001000019cd in main (argc=<optimized out>, argv=<optimized
>>>>>> out>) at bcast_test.cpp:137
>>>>>>         root_rank = <optimized out>
>>>>>>         count = <optimized out>
>>>>>>         iterations = <optimized out>
>>>>>>         my_rank = 4978656
>>>>>>         num_errors = <optimized out>
>>>>>>         runtime_ns = <optimized out>
>>>>>>         stats = {<std::__1::__basic_string_common<true>> = {<No data
>>>>>> fields>}, __r_ =
>>>>>> {<std::__1::__libcpp_compressed_pair_imp<std::__1::basic_string<char,
>>>>>> std::__1::char_traits<char>, std::__1::allocator<char> >::__rep,
>>>>>> std::__1::allocator<char>, 2>> = {<std::__1::allocator<char>> = {<No
>>>>>> data fields>}, __first_ = {{__l = {__cap_ = 17289301308300324847,
>>>>>> __size_ = 17289301308300324847, __data_ = 0xefefefefefefefef <error:
>>>>>> Cannot access memory at address 0xefefefefefefefef>}
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
A non-text attachment was scrubbed...
Name: bcast_test_toy_segv.cpp
Type: text/x-c++src
Size: 4200 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160817/6d7f566b/attachment.bin>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list