[mpich-discuss] MPI RMA
Jim Dinan
dinan at mcs.anl.gov
Fri Feb 8 13:08:59 CST 2013
Hi Nick,
That is technically an invalid MPI program.
However, it will work fine in MPICH. In MPICH 3.0, I redesigned the
synchronization error detection; as of 3.0, this is the only invalid RMA
synchronization where we don't flag an error. This is not so much
because we choose to support this as an extension to RMA, but because
the flush state needs to change collectively, making this difficult (not
O(1)) to detect.
~Jim.
On 2/8/13 12:45 PM, Nick Radcliffe wrote:
> Thanks for the quick response. Just to clarify, what if I did something like this:
>
> MPI_Win_fence(0, win)
> if (rank == src) MPI_Put(dest)
> MPI_Win_fence(0, win)
>
> if (rank == src) {
> MPI_Win_lock(dest, win)
> MPI_Put
> MPI_Win_unlock(dest ,win)
> }
>
> if (rank == src) MPI_Put(dest)
> MPI_Win_fence(0, win)
>
> The second call to MPI_Win_fence closes-and-reopens an exposure epoch for the dest rank, because the second call to MPI_Win_fence is followed by another call to MPI_Win_fence, and the dest rank is the target of an RMA operation.
>
> The problem is that there is no way for the call to MPI_Win_lock to know if the previous call to MPI_Win_fence simply ended an exposure epoch, or if it ended-and-reopened an exposure epoch. What I'm trying to understand is how the call to MPI_Win_lock could do error checking to verify that it is not locking a rank in an exposure epoch, since whether MPI_Win_fence closes or closes-and-reopens an exposure epoch seems to depend on whether there are any future calls to MPI_Put/MPI_Win_fence.
>
> -Nick
> ________________________________________
> From: discuss-bounces at mpich.org [discuss-bounces at mpich.org] on behalf of Jeff Hammond [jhammond at alcf.anl.gov]
> Sent: Friday, February 08, 2013 11:58 AM
> To: discuss at mpich.org
> Subject: Re: [mpich-discuss] MPI RMA
>
> From MPI-3 11.5.1:
>
> - The MPI call MPI_WIN_FENCE(assert, win) synchronizes RMA calls on win.
> - The call completes an RMA access epoch if it was preceded by another
> fence call and the local process issued RMA communication calls on win
> between these two calls.
> - The call completes an RMA exposure epoch if it was preceded by
> another fence call and the local window was the target of RMA accesses
> between these two calls.
> - The call starts an RMA access epoch if it is followed by another
> fence call and by RMA communication calls issued between these two
> fence calls.
> - The call starts an exposure epoch if it is followed by another fence
> call and the local window is the target of RMA accesses between these
> two fence calls.
>
> MPI_Win_fence may both start or complete an RMA epoch but it may also
> complete-one-and-start-another RMA epoch depending on the context and
> the assertions. All of the tests below are valid in my opinion.
>
> What the assertions below in the first example are saying is that the
> first call to MPI_Win_fence need not complete any RMA calls, which is
> true because it is the first sync call. The last MPI_Win_fence
> asserts the same thing in reverse. The middle one asserts nothing
> because it is both completing and starting an epoch.
>
> I think that MPI_Win_fence is poorly designed but that ship has sailed
> and I believe that the usage is well-defined in the standard despite
> the confusing properties of this function.
>
> Jeff
>
> On Fri, Feb 8, 2013 at 11:36 AM, Nick Radcliffe <nradclif at cray.com> wrote:
>> Hi,
>>
>> I have a question about MPI RMA, and the ANL regression tests for RMA in particular. The tests mixedsync.c and epochtest.c seem to have contradictory views of fence synchronization.
>>
>> epochtest.c seems to suggest that access/exposure epochs opened by a call to MPI_Win_fence are not closed until a call to MPI_Win_fence with assert==MPI_MODE_NOSUCCEED. The test looks roughly like this:
>>
>> MPI_Win_fence(MPI_MODE_NOPRECEDE, win)
>> if (rank == src) MPI_Put
>> MPI_Win_fence(0, win)
>> if (rank == dest) MPI_Put
>> etc...
>> MPI_Win_fence(MPI_MODE_NOSUCCEED, win)
>>
>> Since there is a call to MPI_Put after the second call to MPI_Win_fence, it would seem that the second call could not have ended the access epoch for dest, or the exposure epoch for src (which is the target of the second Put).
>>
>> On the other hand, the test mixedsync.c looks roughly like this:
>>
>> if (rank == src) {
>> MPI_Win_lock(...,win)
>> MPI_Put
>> MPI_Win_unlock(...,win)
>> }
>>
>> MPI_Win_fence(0, win)
>> if (rank == src) MPI_Put
>> MPI_Win_fence(0, win)
>>
>> if (rank == src) {
>> MPI_Win_lock(...,win)
>> MPI_Put
>> MPI_Win_unlock(...,win)
>> }
>>
>> The problem is that it is erroneous to call MPI_Win_lock on a window while that window is exposed due to a call to MPI_Win_fence. If mixedsync.c is not erroneous, then the second call to MPI_Win_fence must end the exposure epoch on win, contradicting what's implied about fence synchronization by epochtest.c.
>>
>> Sorry for the long post, but if anyone can shed some light on this for me, I would greatly appreciate it.
>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> Argonne Leadership Computing Facility
> University of Chicago Computation Institute
> jhammond at alcf.anl.gov / (630) 252-5381
> http://www.linkedin.com/in/jeffhammond
> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
More information about the discuss
mailing list