[mpich-discuss] Is it allowed to attach automatic array for remote access with MPI_Win_attach?

Maciej Szpindler m.szpindler at icm.edu.pl
Wed Apr 20 09:31:39 CDT 2016


Many thanks for the correction and explanation, Jeff. Now it is fixed.

I am wondering, but this is likely not MPICH specific, if my approach
is correct. I would like to replace send-receive halo exchange module
of the larger application with rma pscw scheme. Basic implementation
performs poorly and I were looking for improvement. Code structure
requires this module to initialize memory windows with every halo
exchange. I have tried to address this with dynamic windows but, as
you have pointed, additional synchronization is then required and
potential benefits of pscw approach are gone.

Should it be expect that the good implementation of pscw scheme compares
with message passing?

Reagrds,
Maciej

W dniu 19.04.2016 o 17:48, Jeff Hammond pisze:
> When you use dynamic windows, you must use the virtual address of the
> remote memory as the offset,  That means you must attach a buffer and then
> get the address with MPI_GET_ADDRESS.  Then you must share that address
> with any processes that target that memory, perhaps using MPI_SEND/MPI_RECV
> or MPI_ALLGATHER of an address-sized integer (MPI_AINT is the MPI type
> corresponding to the MPI_Aint C type).  It appears you are not doing this.
>
> This issue should affect you whether you use automatic arrays or heap
> data...
>
> It does not appear to be a problem here, but if you use automatic arrays
> with RMA, you must guarentee that they remain in scope throughout the
> duration of when they will be accessed remotely.  I think you are doing
> this sufficiently with a barrier.  However, at the point at which you are
> calling barrier to ensure they stay in scope, you lose all of the benefits
> of fine-grain synchronization from PSCW.  You might as well just use
> MPI_Win_fence.
>
> There is a sentence in the MPI spec that says that, strictly speaking,
> using memory not allocated by MPI_Alloc_mem (or MPI_Win_allocate(_shared),
> of course) in RMA is not portable, but I don't know any implementation that
> actually behaves this way.  MPICH has an active-message implementation of
> RMA, which does not care what storage class is involved, up to performance
> differences (interprocess shared memory is faster in some cases).
>
> This is a fairly complicated topic and it is possible that I have been a
> bit crude in summarizing the MPI standard, so I apologize to any MPI Forum
> experts who can find fault in what I've written :-)
>
> Jeff
>
> On Tue, Apr 19, 2016 at 5:51 AM, Maciej Szpindler <m.szpindler at icm.edu.pl>
> wrote:
>
>> This is simplified version of my routine. It may look odd but I am
>> trying to migrate from send/recv scheme to one sided pscw and that
>> is the reason for buffers etc. As long as dynamic windows are not used,
>> it works fine (I believe). When I switch to dynamic windows it fails
>> with segmentation fault. I would appreciate any comment and suggestion
>> how to improve this.
>>
>> SUBROUTINE swap_simple_rma(field, row_length, rows, levels, halo_size)
>>
>> USE mpi
>>
>> IMPLICIT NONE
>>
>> INTEGER, INTENT(IN) :: row_length
>> INTEGER, INTENT(IN) :: rows
>> INTEGER, INTENT(IN) :: levels
>> INTEGER, INTENT(IN) :: halo_size
>> REAL(KIND=8), INTENT(INOUT) :: field(1:row_length, 1:rows, levels)
>> REAL(KIND=8) :: send_buffer(halo_size, rows, levels)
>> REAL(KIND=8) :: recv_buffer(halo_size, rows, levels)
>> INTEGER  :: buffer_size
>> INTEGER ::  i,j,k
>> INTEGER(kind=MPI_INTEGER_KIND)  :: ierror
>> INTEGER(kind=MPI_INTEGER_KIND) :: my_rank, comm_size
>> Integer(kind=MPI_INTEGER_KIND) :: win, win_info
>> Integer(kind=MPI_INTEGER_KIND) :: my_group, origin_group, target_group
>> Integer(kind=MPI_INTEGER_KIND), DIMENSION(1) :: target_rank, origin_rank
>> Integer(kind=MPI_ADDRESS_KIND) :: win_size, disp
>>
>>   CALL MPI_Comm_Rank(MPI_COMM_WORLD, my_rank, ierror)
>>   CALL MPI_Comm_Size(MPI_COMM_WORLD, comm_size, ierror)
>>
>>   buffer_size = halo_size * rows * levels
>>
>>   CALL MPI_Info_create(win_info, ierror)
>>   CALL MPI_Info_set(win_info, "no_locks", "true", ierror)
>>
>>   CALL MPI_Comm_group(MPI_COMM_WORLD, my_group, ierror)
>>
>>   If (my_rank /= comm_size - 1) Then
>>     origin_rank = my_rank + 1
>>     CALL MPI_Group_incl(my_group, 1, origin_rank, origin_group, ierror)
>>     win_size = 8*buffer_size
>>   Else
>>     origin_group = MPI_GROUP_EMPTY
>>     win_size = 0
>>   End If
>>
>>   CALL MPI_Win_create_dynamic(win_info, MPI_COMM_WORLD, win, ierror)
>> !! CALL MPI_Win_create(recv_buffer, win_size,      &
>> !!        8, win_info, MPI_COMM_WORLD, win, ierror)
>>   CALL MPI_Win_attach(win, recv_buffer, win_size, ierror)
>>
>>   CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>>
>>   CALL MPI_Win_post(origin_group, MPI_MODE_NOSTORE, win, ierror)
>>
>>   ! Prepare buffer
>>      DO k=1,levels
>>        DO j=1,rows
>>          DO i=1,halo_size
>>            send_buffer(i,j,k)=field(i,j,k)
>>          END DO ! I
>>         END DO ! J
>>       END DO ! K
>>
>>   If (my_rank /= 0 ) Then
>>      target_rank = my_rank - 1
>>      CALL MPI_Group_incl(my_group, 1, target_rank, target_group, ierror)
>>   Else
>>      target_group = MPI_GROUP_EMPTY
>>   End If
>>
>>   CALL MPI_Win_start(target_group, 0, win, ierror)
>>
>>   disp = 0
>>
>>   If (my_rank /= 0) Then
>>     CALL MPI_Put(send_buffer, buffer_size, MPI_REAL8,   &
>>         my_rank - 1, disp, buffer_size, MPI_REAL8, win, ierror)
>>   End If
>>   CALL MPI_Win_complete(win, ierror)
>>
>>   CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>>   write (0,*) 'Put OK'
>>   CALL MPI_Barrier(MPI_COMM_WORLD, ierror)
>>
>>   CALL MPI_Win_wait(win, ierror)
>>
>>   ! Read from buffer
>>   If (my_rank /= comm_size -1 ) Then
>>       DO k=1,levels
>>         DO j=1,rows
>>           DO i=1,halo_size
>>             field(row_length+i,j,k) =  recv_buffer(i,j,k)
>>           END DO
>>         END DO
>>       END DO
>>   End if
>>
>>   CALL MPI_Win_detach(win, recv_buffer, ierror)
>>   CALL MPI_Win_free(win, ierror)
>>
>> END SUBROUTINE swap_simple_rma
>>
>> Best Regards,
>> Maciej
>>
>> W dniu 14.04.2016 o 19:21, Thakur, Rajeev pisze:
>>
>> After the Win_attach, did you add a barrier or some other form of
>>> synchronization? The put shouldn’t happen before Win_attach returns.
>>>
>>> Rajeev
>>>
>>> On Apr 14, 2016, at 10:56 AM, Maciej Szpindler <m.szpindler at icm.edu.pl>
>>>> wrote:
>>>>
>>>> Dear All,
>>>>
>>>> I am trying to use dynamic RMA windows in fortran. In my case I would
>>>> like to attach automatic array to dynamic window. The question is if
>>>> it is correct and allowed in MPICH. I feel that it is not working, at
>>>> least in cray-mpich/7.3.2.
>>>>
>>>> I have a subroutine that use RMA windows:
>>>>
>>>> SUBROUTINE foo(x, y, z , ...)
>>>>
>>>> USE mpi
>>>> ...
>>>>
>>>> INTEGER, INTENT(IN) :: x, y, z
>>>> REAL(KIND=8) :: buffer(x, y, z)
>>>> INTEGER(kind=MPI_INTEGER_KIND) :: win_info, win, comm
>>>> INTEGER(kind=MPI_INTEGER_KIND) :: buff_size
>>>> ...
>>>>
>>>> buff_size = x*y*z*8
>>>>
>>>> CALL MPI_Info_create(win_info, ierror)
>>>> CALL MPI_Info_set(win_info, "no_locks", "true", ierror)
>>>>
>>>> CALL MPI_Win_create_dynamic(win_info, comm, win, ierror)
>>>>
>>>> CALL MPI_Win_attach(win, buffer, buff_size, ierror)
>>>>
>>>> ...
>>>>
>>>> This produces segmentation fault when MPI_Put is called on a window,
>>>> while exactly the same routine code with static MPI_Win_create on
>>>> buffer instead of create_dynamic+attach works fine. As far as I
>>>> understand buffer is in this case "simply contiguous" in a sense of
>>>> the MPI Standard. Any help would be appreciated!
>>>>
>>>> Best Regards,
>>>> Maciej
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list