[mpich-discuss] Is it allowed to attach automatic array for remote access with MPI_Win_attach?

Jeff Hammond jeff.science at gmail.com
Tue Apr 19 10:48:19 CDT 2016


When you use dynamic windows, you must use the virtual address of the
remote memory as the offset,  That means you must attach a buffer and then
get the address with MPI_GET_ADDRESS.  Then you must share that address
with any processes that target that memory, perhaps using MPI_SEND/MPI_RECV
or MPI_ALLGATHER of an address-sized integer (MPI_AINT is the MPI type
corresponding to the MPI_Aint C type).  It appears you are not doing this.

This issue should affect you whether you use automatic arrays or heap
data...

It does not appear to be a problem here, but if you use automatic arrays
with RMA, you must guarentee that they remain in scope throughout the
duration of when they will be accessed remotely.  I think you are doing
this sufficiently with a barrier.  However, at the point at which you are
calling barrier to ensure they stay in scope, you lose all of the benefits
of fine-grain synchronization from PSCW.  You might as well just use
MPI_Win_fence.

There is a sentence in the MPI spec that says that, strictly speaking,
using memory not allocated by MPI_Alloc_mem (or MPI_Win_allocate(_shared),
of course) in RMA is not portable, but I don't know any implementation that
actually behaves this way.  MPICH has an active-message implementation of
RMA, which does not care what storage class is involved, up to performance
differences (interprocess shared memory is faster in some cases).

This is a fairly complicated topic and it is possible that I have been a
bit crude in summarizing the MPI standard, so I apologize to any MPI Forum
experts who can find fault in what I've written :-)

Jeff

On Tue, Apr 19, 2016 at 5:51 AM, Maciej Szpindler <m.szpindler at icm.edu.pl>
wrote:

> This is simplified version of my routine. It may look odd but I am
> trying to migrate from send/recv scheme to one sided pscw and that
> is the reason for buffers etc. As long as dynamic windows are not used,
> it works fine (I believe). When I switch to dynamic windows it fails
> with segmentation fault. I would appreciate any comment and suggestion
> how to improve this.
>
> SUBROUTINE swap_simple_rma(field, row_length, rows, levels, halo_size)
>
> USE mpi
>
> IMPLICIT NONE
>
> INTEGER, INTENT(IN) :: row_length
> INTEGER, INTENT(IN) :: rows
> INTEGER, INTENT(IN) :: levels
> INTEGER, INTENT(IN) :: halo_size
> REAL(KIND=8), INTENT(INOUT) :: field(1:row_length, 1:rows, levels)
> REAL(KIND=8) :: send_buffer(halo_size, rows, levels)
> REAL(KIND=8) :: recv_buffer(halo_size, rows, levels)
> INTEGER  :: buffer_size
> INTEGER ::  i,j,k
> INTEGER(kind=MPI_INTEGER_KIND)  :: ierror
> INTEGER(kind=MPI_INTEGER_KIND) :: my_rank, comm_size
> Integer(kind=MPI_INTEGER_KIND) :: win, win_info
> Integer(kind=MPI_INTEGER_KIND) :: my_group, origin_group, target_group
> Integer(kind=MPI_INTEGER_KIND), DIMENSION(1) :: target_rank, origin_rank
> Integer(kind=MPI_ADDRESS_KIND) :: win_size, disp
>
>  CALL MPI_Comm_Rank(MPI_COMM_WORLD, my_rank, ierror)
>  CALL MPI_Comm_Size(MPI_COMM_WORLD, comm_size, ierror)
>
>  buffer_size = halo_size * rows * levels
>
>  CALL MPI_Info_create(win_info, ierror)
>  CALL MPI_Info_set(win_info, "no_locks", "true", ierror)
>
>  CALL MPI_Comm_group(MPI_COMM_WORLD, my_group, ierror)
>
>  If (my_rank /= comm_size - 1) Then
>    origin_rank = my_rank + 1
>    CALL MPI_Group_incl(my_group, 1, origin_rank, origin_group, ierror)
>    win_size = 8*buffer_size
>  Else
>    origin_group = MPI_GROUP_EMPTY
>    win_size = 0
>  End If
>
>  CALL MPI_Win_create_dynamic(win_info, MPI_COMM_WORLD, win, ierror)
> !! CALL MPI_Win_create(recv_buffer, win_size,      &
> !!        8, win_info, MPI_COMM_WORLD, win, ierror)
>  CALL MPI_Win_attach(win, recv_buffer, win_size, ierror)
>
>  CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>
>  CALL MPI_Win_post(origin_group, MPI_MODE_NOSTORE, win, ierror)
>
>  ! Prepare buffer
>     DO k=1,levels
>       DO j=1,rows
>         DO i=1,halo_size
>           send_buffer(i,j,k)=field(i,j,k)
>         END DO ! I
>        END DO ! J
>      END DO ! K
>
>  If (my_rank /= 0 ) Then
>     target_rank = my_rank - 1
>     CALL MPI_Group_incl(my_group, 1, target_rank, target_group, ierror)
>  Else
>     target_group = MPI_GROUP_EMPTY
>  End If
>
>  CALL MPI_Win_start(target_group, 0, win, ierror)
>
>  disp = 0
>
>  If (my_rank /= 0) Then
>    CALL MPI_Put(send_buffer, buffer_size, MPI_REAL8,   &
>        my_rank - 1, disp, buffer_size, MPI_REAL8, win, ierror)
>  End If
>  CALL MPI_Win_complete(win, ierror)
>
>  CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>  write (0,*) 'Put OK'
>  CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>
>  CALL MPI_Win_wait(win, ierror)
>
>  ! Read from buffer
>  If (my_rank /= comm_size -1 ) Then
>      DO k=1,levels
>        DO j=1,rows
>          DO i=1,halo_size
>            field(row_length+i,j,k) =  recv_buffer(i,j,k)
>          END DO
>        END DO
>      END DO
>  End if
>
>  CALL MPI_Win_detach(win, recv_buffer, ierror)
>  CALL MPI_Win_free(win, ierror)
>
> END SUBROUTINE swap_simple_rma
>
> Best Regards,
> Maciej
>
> W dniu 14.04.2016 o 19:21, Thakur, Rajeev pisze:
>
> After the Win_attach, did you add a barrier or some other form of
>> synchronization? The put shouldn’t happen before Win_attach returns.
>>
>> Rajeev
>>
>> On Apr 14, 2016, at 10:56 AM, Maciej Szpindler <m.szpindler at icm.edu.pl>
>>> wrote:
>>>
>>> Dear All,
>>>
>>> I am trying to use dynamic RMA windows in fortran. In my case I would
>>> like to attach automatic array to dynamic window. The question is if
>>> it is correct and allowed in MPICH. I feel that it is not working, at
>>> least in cray-mpich/7.3.2.
>>>
>>> I have a subroutine that use RMA windows:
>>>
>>> SUBROUTINE foo(x, y, z , ...)
>>>
>>> USE mpi
>>> ...
>>>
>>> INTEGER, INTENT(IN) :: x, y, z
>>> REAL(KIND=8) :: buffer(x, y, z)
>>> INTEGER(kind=MPI_INTEGER_KIND) :: win_info, win, comm
>>> INTEGER(kind=MPI_INTEGER_KIND) :: buff_size
>>> ...
>>>
>>> buff_size = x*y*z*8
>>>
>>> CALL MPI_Info_create(win_info, ierror)
>>> CALL MPI_Info_set(win_info, "no_locks", "true", ierror)
>>>
>>> CALL MPI_Win_create_dynamic(win_info, comm, win, ierror)
>>>
>>> CALL MPI_Win_attach(win, buffer, buff_size, ierror)
>>>
>>> ...
>>>
>>> This produces segmentation fault when MPI_Put is called on a window,
>>> while exactly the same routine code with static MPI_Win_create on
>>> buffer instead of create_dynamic+attach works fine. As far as I
>>> understand buffer is in this case "simply contiguous" in a sense of
>>> the MPI Standard. Any help would be appreciated!
>>>
>>> Best Regards,
>>> Maciej
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20160419/bee9694c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list