[mpich-discuss] One-sided semantics

Palmer, Bruce J Bruce.Palmer at pnnl.gov
Mon Mar 7 10:15:14 CST 2016


I put the barrier in at line 81, but it doesn't seem to be having much of an effect. I still seem to get the same results. I'm using MPICH 3.2 built with gcc 5.2.0 and am running on an infiniband cluster. What are you using?

As far as the flush_locals/waits etc., I've put those in so that we can reproduce GA/ARMCI semantics. I've tried to get things to work without them, but I don't seem to be able to get communication progress without them. The GA/ARMCI semantics for blocking calls are that the local buffer is safe for use when the put/get/accumulate returns and the only way to achieve that in MPI, as far as I know, is to use the combinations that are in the test program. However, if you have any suggestions on how to get rid of some of them, I'm all ears.

I didn't mention it in the previous email, but you can get the different options by commenting/uncommenting out the preprocessor symbols MPI_USE_REQUESTS, MPI_USE_FLUSH_LOCAL, and USE_SYNC. If you don't define either of the first two, the program uses the lock/unlock approach, if you define MPI_USE_REQUESTS it uses the request-based protocol with MPI_Wait and if you define MPI_USE_FLUSH_LOCAL it uses MPI_Win_flush_local on the remote processor. Defining or undefining USE_SYNC puts a synchronization operation between put and get.

The COMEX port is in the GA developer build under comex/src-mpi3. I think that is available via anonymous SVN, but I'll check.

Bruce

-----Original Message-----
From: Balaji, Pavan [mailto:balaji at anl.gov] 
Sent: Friday, March 04, 2016 8:02 PM
To: discuss at mpich.org
Subject: Re: [mpich-discuss] One-sided semantics


And, of course, I assume you'll eventually fix all the performance shortcomings with respect to doing extra flush/flush_locals or datatype commits inside the for loop.

We'll be happy to review the Comex port for you, once it's ready, if you like.

  -- Pavan

> On Mar 4, 2016, at 9:59 PM, Balaji, Pavan <balaji at anl.gov> wrote:
> 
> Bruce,
> 
> You are missing an MPI_Barrier on line 81 (after the initialization).  Without this, a remote process might update your buffer while you are still initializing.  The program works with the barrier.
> 
>  -- Pavan
> 
>> On Mar 4, 2016, at 6:22 PM, Palmer, Bruce J <Bruce.Palmer at pnnl.gov> wrote:
>> 
>> Hi,
>> 
>> I’ve been working on a thin implementation of the COMEX runtime over MPI-3. The COMEX interface has been used by most of the MPI-based runtimes in GA. One of the COMEX tests has processors writing to and then immediately reading from neighboring processes multiple times. The GA semantics are that for multiple consecutive operations between the same pair of processes, the operations are ordered on the remote process in the same order as on the originating process. The test for this frequently fails for the MPI-3 based implementation. I’ve tried testing this independently of GA but the results are confusing.
>> 
>> The implementation I’ve been working on uses three different strategies to implement one-sided communication calls that follow, or are at least close to, the GA communication semantics. The first uses MPI_Put/MPI_Get/MPI_Accumulate and surrounds these calls by and MPI_Lock and MPI_Unlock pair immediately before and after the one-sided communication call. My understanding is that this forces completion both locally and remotely.  The second approach calls MPI_Win_lock_all on the MPI window immediately after creation and MPI_Win_unlock_all when the window is destroy so that the window is always in a passive synchronization epoch. The put/get/accumulate calls are implemented with the request-based calls MPI_Rput/MPI_Rget/MPI_Raccumulate and followed immediately by a call to MPI_Wait on the request handle. Again, from my understanding, this should force local completion of the operation but not necessarily remote completion. Finally, the last implementation is to again use the MPI_Win_lock_all to guarantee that a window is in a permanent passive synchronization epoch, use MPI_Put/MPI_Get/MPI_Accumulate to implement put/get/accumulate and use MPI_Win_flush_local to force completion locally. The first implementation should require only a barrier to force synchronization between all processors, the second two include a call to MPI_Win_flush_all in conjunction with a barrier to synchronize the data on all processors.
>> 
>> I’ve written a small test code that implements all three schemes and attached it to this email. It creates a 200x200 array of doubles, fills each array with unique numbers, writes a portion of the array to the next higher rank using put and then reads it back using get (cyclic boundary conditions are used for the first and last ranks). This is repeated 2000 times, with each test using a slightly different set of numbers from the previous test. I’ve done this for all three implementations using both a synchronization between the put and the get and without synchronization. The code has been run on an Infinband cluster using 2 processors on 2 separate SMP nodes. The results I get are that the request-based implementation and the flush_local_all implementation without synchronization work pretty consistently while the tests with synchronization all fail. The lock/unlock implementation also fails both with and without synchronization. Most tests that fail get through at least a few put/get cycles before failing but they don’t do all 2000 iterations.
>> 
>> I’ve also tried this using OpenMPI. In the OpenMPI case, there doesn’t appear to be much of an effect from using synchronization. In addition, the lock/unlock algorithm does not consistently fail, although it fails more frequently than the other two.
>> 
>> Does anyone have a suggestion as to what I’m doing wrong here? From my understanding of the MPI-3 standard, all three implementations should work with synchronization. I’m not completely sure if they should work without synchronization.
>> 
>> Bruce Palmer
>> 
>> <testmpi.c>_______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list