[mpich-discuss] MCS lock and MPI RMA problem

Balaji, Pavan balaji at anl.gov
Sun Mar 12 20:55:27 CDT 2017


Shouldn't winsize be 3 integers in your code?  (sorry, I spent only 30 seconds looking at the code, so I might have missed something).

  -- Pavan

> On Mar 12, 2017, at 2:44 PM, Ask Jakobsen <afj at qeye-labs.com> wrote:
> 
> Interestingly, according to the paper you suggested it appears to include a similar test in pseudo code https://htor.inf.ethz.ch/publications/img/hpclocks.pdf (see Listing 3 in paper).
> 
> Unfortunately, removing the test in the release protocol did not solve the problem. The race condition is much more difficult to provoke, but I managed when setting the size of the communicator to 3 (only tested even sizes so far).
> 
> From Jeff's suggestion I have attempted to rewrite the code removing local loads and stores in the MPI_Win_lock_all epochs using MPI_Fetch_and_op (see attached files).
> 
> This version behaves very similar to the original code and also fails from time to time. Putting a sleep into the acquire busy loop (usleep(100)) will make the code "much more robust" (I hack, I know, but indicating some underlying race condition?!). Let me know if you see any problems in the way I am using MPI_Fetch_and_op in a busy loop. Flushing or syncing is not necessary in this case, right?
> 
> All work is done with export MPIR_CVAR_ASYNC_PROGRESS=1 on mpich-3.2 and mpich-3.3a2
> 
> On Wed, Mar 8, 2017 at 4:21 PM, Halim Amer <aamer at anl.gov> wrote:
> I cannot claim that I thoroughly verified the correctness of that code, so take it with a grain of salt. Please keep in mind that it is a test code from a tutorial book; those codes are meant for learning purposes not for deployment.
> 
> If your goal is to have a high performance RMA lock, I suggest you to look into the recent HPDC'16 paper: "High-Performance Distributed RMA Locks".
> 
> Halim
> www.mcs.anl.gov/~aamer
> 
> On 3/8/17 3:06 AM, Ask Jakobsen wrote:
> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1
> in release fixes the problem. Great work. Now I will try to understand why
> you are right. I hope the authors of the book will credit you for
> discovering the bug.
> 
> So in conclusion you need to remove the above mentioned test AND enable
> asynchronous progression using the environment variable
> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to work
> in openmpi).
> 
> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov> wrote:
> 
> detect that another process is being or already enqueued in the MCS
> queue.
> 
> Actually the problem occurs only when the waiting process already enqueued
> itself, i.e., the accumulate operation on the nextRank field succeeded.
> 
> Halim
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
> 
> 
> On 3/7/17 10:29 AM, Halim Amer wrote:
> 
> In the Release protocol, try removing this test:
> 
> if (lmem[nextRank] == -1) {
>    If-Block;
> }
> 
> but keep the If-Block.
> 
> The hang occurs because the process releasing the MCS lock fails to
> detect that another process is being or already enqueued in the MCS queue.
> 
> Halim
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
> 
> 
> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
> 
> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't
> find something similar in openmpi) and now all ranks acquire the lock and
> the program finish as expected. However if I put a while(1) loop
> around the
> acquire-release code in main.c it will fail again at random and go
> into an
> infinite loop. The simple unfair lock does not have this problem.
> 
> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <aamer at anl.gov> wrote:
> 
> My understanding is that this code assumes asynchronous progress.
> An example of when the processes hang is as follows:
> 
> 1) P0 Finishes MCSLockAcquire()
> 2) P1 is busy waiting in MCSLockAcquire() at
> do {
>       MPI_Win_sync(win);
>    } while (lmem[blocked] == 1);
> 3) P0 executes MCSLockRelease()
> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
> 
> Hang!
> 
> For P1 to get out of the loop, P0 has to get out of
> MPI_Win_lock_all() and
> executes its Compare_and_swap().
> 
> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it
> got
> the lock.
> 
> P1 does not make communication progress because MPI_Win_sync is not
> required to do so. It only synchronizes private and public copies.
> 
> For this hang to disappear, one can either trigger progress manually by
> using heavy-duty synchronization calls instead of Win_sync (e.g.,
> Win_unlock_all + Win_lock_all), or enable asynchronous progress.
> 
> To enable asynchronous progress in MPICH, set the
> MPIR_CVAR_ASYNC_PROGRESS
> env var to 1.
> 
> Halim
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer> <
> http://www.mcs.anl.gov/%7Eaamer>
> 
> 
> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
> 
>  I am testing on x86_64 platform.
> 
> I have tried to built both the mpich and the mcs lock code with -O0 to
> avoid agressive optimization. After your suggestion I have also
> tried to
> make volatile int *pblocked pointing to lmem[blocked] in the
> MCSLockAcquire
> function and volatile int *pnextrank pointing to lmem[nextRank] in
> MCSLockRelease, but it does not appear to make a difference.
> 
> On suggestion from Richard Warren I have also tried building the code
> using
> openmpi-2.0.2 without any luck (however it appears to acquire the
> lock a
> couple of extra times before failing) which I find troubling.
> 
> I think I will give up using local load/stores and will see if I can
> figure
> out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.
> Thanks for your help.
> 
> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <jeff.science at gmail.com>
> wrote:
> 
> What processor architecture are you testing?
> 
> 
> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather
> than a
> load.  MPI_Win_sync cannot prevent the compiler from caching *lmem
> in a
> register.
> 
> Jeff
> 
> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <afj at qeye-labs.com>
> wrote:
> 
> Hi,
> 
> 
> I have downloaded the source code for the MCS lock from the excellent
> book "Using Advanced MPI" from http://www.mcs.anl.gov/researc
> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
> 
> I have made a very simple piece of test code for testing the MCS lock
> but
> it works at random and often never escapes the busy loops in the
> acquire
> and release functions (see attached source code). The code appears
> semantically correct to my eyes.
> 
> #include <stdio.h>
> #include <mpi.h>
> #include "mcs-lock.h"
> 
> int main(int argc, char *argv[])
> {
>   MPI_Win win;
>   MPI_Init( &argc, &argv );
> 
>   MCSLockInit(MPI_COMM_WORLD, &win);
> 
>   int rank, size;
>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>   MPI_Comm_size(MPI_COMM_WORLD, &size);
> 
>   printf("rank: %d, size: %d\n", rank, size);
> 
> 
>   MCSLockAcquire(win);
>   printf("rank %d aquired lock\n", rank);   fflush(stdout);
>   MCSLockRelease(win);
> 
> 
>   MPI_Win_free(&win);
>   MPI_Finalize();
>   return 0;
> }
> 
> 
> I have tested on several hardware platforms and mpich-3.2 and
> mpich-3.3a2
> but with no luck.
> 
> It appears that the MPI_Win_Sync are not "refreshing" the local
> data or
> I
> have a bug I can't spot.
> 
> A simple unfair lock like http://www.mcs.anl.gov/researc
> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c works
> perfectly.
> 
> Best regards, Ask Jakobsen
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> 
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> 
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> 
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <main.c><mcs-lock-fop.c><mcs-lock.h>_______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list