[mpich-discuss] Shared Memory Segfaults

Jeff Hammond jeff.science at gmail.com
Sat May 2 19:46:03 CDT 2015


I do not understand what this program is supposed to do.
MPI_Win_allocate_shared is collective, as is MPI_Win_fence, so I don't
see how any interesting parallelism emerges from this type of
implementation of a linked-list.

Are you just using the linked-list to manage a collection of shared
memory windows (GMR does this in
http://git.mpich.org/armci-mpi.git/blob/HEAD:/src/gmr.c, for example)?

If you want to do a distributed linked-list using RMA, I recall there
is a linked-list example in the MPI-3 spec or perhaps somewhere else
(somebody on this list will remember, or I can look it up).  And this
example probably uses MPI_Win_create_dynamic and MPI_Win_attach, which
means no interprocess load-store.

Jeff

On Sat, May 2, 2015 at 4:06 PM, Junchao Zhang <jczhang at mcs.anl.gov> wrote:
> I can reproduce the segfault with the latest mpich. In Clean_List(), I think
> there is a data race, since all shmem members update the head node
>             head->next_win = temp_win;
>             head->next = temp_next;
> I tried to simplify Clean_List() further as follows,
>
> void Clean_List()
> {
>     Node *cur_node = head;
>     MPI_Win cur_win = head_win;
>     Node *next_node;
>     MPI_Win next_win;
>
>     while (cur_node) {
>         next_node = cur_node->next;
>         next_win = cur_node->next_win;
>         MPI_Win_free(&cur_win);
>         cur_node = next_node;
>         cur_win = next_win;
>     }
>
>     head = tail = NULL;
> }
>
> But I still met segfault. With gdb, the segfault disappears. If I comment
> out the call to Clean_List in main(), the error also disappear.
> I Cc'ed our local RMA expert Xin to see if she has new findings.
>
> --Junchao Zhang
>
> On Sat, May 2, 2015 at 10:26 AM, Brian Cornille <bcornille at wisc.edu> wrote:
>>
>> Hello,
>>
>>
>> In working on a project that is attempting to use MPI shared memory (from
>> MPI_Win_allocate_shared) I began getting inconsistent segfaults in portions
>> of the code that appeared to have no memory errors when investigated with
>> gdb.  I believe I have somewhat reproduced this error in a small code
>> (attached) that creates a linked list of MPI shared memory allocated
>> elements.
>>
>>
>> The attached program segfaults for me when run with more than one process.
>> However, will not segfault if run in gdb (e.g. mpirun -n 2 xterm -e gdb
>> ./mpi_shmem_ll).  I have done what I can to eliminate any apparent race
>> conditions.  Any help in this matter would be much appreciated.
>>
>>
>> Thanks and best,
>>
>> Brian Cornille
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list