[mpich-discuss] MCS lock and MPI RMA problem
Halim Amer
aamer at anl.gov
Wed Mar 8 12:29:58 CST 2017
The bug Ask was referring to is regarding the branch "if (lmem[nextRank]
== -1)", as I explained previously.
Halim
www.mcs.anl.gov/~aamer
On 3/8/17 12:09 PM, Jeff Hammond wrote:
> It's not a bug in the example. There is a bug in the MPI standard that
> tolerates implementations that do not provide asynchronous progress
> (https://github.com/mpi-forum/mpi-forum-historic/issues/359) and thus
> many of them do not do so by default (because doing so often negatively
> impacts the performance of other features). Fortunately, as noted
> already, Casper or environment variables fix this issue.
>
> Jeff, who is an asynchronous RMA zealot :-)
>
> On Wed, Mar 8, 2017 at 1:06 AM, Ask Jakobsen <afj at qeye-labs.com
> <mailto:afj at qeye-labs.com>> wrote:
>
> You are absolutely correct, Halim. Removing the test lmem[nextRank]
> == -1 in release fixes the problem. Great work. Now I will try to
> understand why you are right. I hope the authors of the book will
> credit you for discovering the bug.
>
> So in conclusion you need to remove the above mentioned test AND
> enable asynchronous progression using the environment variable
> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code
> to work in openmpi).
>
>
> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov
> <mailto:aamer at anl.gov>> wrote:
>
> > detect that another process is being or already enqueued in the MCS queue.
>
> Actually the problem occurs only when the waiting process
> already enqueued itself, i.e., the accumulate operation on the
> nextRank field succeeded.
>
> Halim
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>
>
> On 3/7/17 10:29 AM, Halim Amer wrote:
>
> In the Release protocol, try removing this test:
>
> if (lmem[nextRank] == -1) {
> If-Block;
> }
>
> but keep the If-Block.
>
> The hang occurs because the process releasing the MCS lock
> fails to
> detect that another process is being or already enqueued in
> the MCS queue.
>
> Halim
> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>
> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>
> Thanks, Halim. I have now enabled asynchronous progress
> in MPICH (can't
> find something similar in openmpi) and now all ranks
> acquire the lock and
> the program finish as expected. However if I put a
> while(1) loop
> around the
> acquire-release code in main.c it will fail again at
> random and go
> into an
> infinite loop. The simple unfair lock does not have this
> problem.
>
> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer
> <aamer at anl.gov <mailto:aamer at anl.gov>> wrote:
>
> My understanding is that this code assumes
> asynchronous progress.
> An example of when the processes hang is as follows:
>
> 1) P0 Finishes MCSLockAcquire()
> 2) P1 is busy waiting in MCSLockAcquire() at
> do {
> MPI_Win_sync(win);
> } while (lmem[blocked] == 1);
> 3) P0 executes MCSLockRelease()
> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>
> Hang!
>
> For P1 to get out of the loop, P0 has to get out of
> MPI_Win_lock_all() and
> executes its Compare_and_swap().
>
> For P0 to get out MPI_Win_lock_all(), it needs an
> ACK from P1 that it
> got
> the lock.
>
> P1 does not make communication progress because
> MPI_Win_sync is not
> required to do so. It only synchronizes private and
> public copies.
>
> For this hang to disappear, one can either trigger
> progress manually by
> using heavy-duty synchronization calls instead of
> Win_sync (e.g.,
> Win_unlock_all + Win_lock_all), or enable
> asynchronous progress.
>
> To enable asynchronous progress in MPICH, set the
> MPIR_CVAR_ASYNC_PROGRESS
> env var to 1.
>
> Halim
> www.mcs.anl.gov/~aamer
> <http://www.mcs.anl.gov/%7Eaamer>
> <http://www.mcs.anl.gov/%7Eaamer
> <http://www.mcs.anl.gov/%7Eaamer>>
>
>
> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>
> I am testing on x86_64 platform.
>
> I have tried to built both the mpich and the mcs
> lock code with -O0 to
> avoid agressive optimization. After your
> suggestion I have also
> tried to
> make volatile int *pblocked pointing to
> lmem[blocked] in the
> MCSLockAcquire
> function and volatile int *pnextrank pointing to
> lmem[nextRank] in
> MCSLockRelease, but it does not appear to make a
> difference.
>
> On suggestion from Richard Warren I have also
> tried building the code
> using
> openmpi-2.0.2 without any luck (however it
> appears to acquire the
> lock a
> couple of extra times before failing) which I
> find troubling.
>
> I think I will give up using local load/stores
> and will see if I can
> figure
> out if rewrite using MPI calls like
> MPI_Fetch_and_op as you suggest.
> Thanks for your help.
>
> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond
> <jeff.science at gmail.com
> <mailto:jeff.science at gmail.com>>
> wrote:
>
> What processor architecture are you testing?
>
>
> Maybe set lmem to volatile or read it with
> MPI_Fetch_and_op rather
> than a
> load. MPI_Win_sync cannot prevent the
> compiler from caching *lmem
> in a
> register.
>
> Jeff
>
> On Sat, Mar 4, 2017 at 12:30 AM, Ask
> Jakobsen <afj at qeye-labs.com
> <mailto:afj at qeye-labs.com>>
> wrote:
>
> Hi,
>
>
> I have downloaded the source code for
> the MCS lock from the excellent
> book "Using Advanced MPI" from
> http://www.mcs.anl.gov/researc
> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>
> I have made a very simple piece of test
> code for testing the MCS lock
> but
> it works at random and often never
> escapes the busy loops in the
> acquire
> and release functions (see attached
> source code). The code appears
> semantically correct to my eyes.
>
> #include <stdio.h>
> #include <mpi.h>
> #include "mcs-lock.h"
>
> int main(int argc, char *argv[])
> {
> MPI_Win win;
> MPI_Init( &argc, &argv );
>
> MCSLockInit(MPI_COMM_WORLD, &win);
>
> int rank, size;
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
> MPI_Comm_size(MPI_COMM_WORLD, &size);
>
> printf("rank: %d, size: %d\n", rank,
> size);
>
>
> MCSLockAcquire(win);
> printf("rank %d aquired lock\n",
> rank); fflush(stdout);
> MCSLockRelease(win);
>
>
> MPI_Win_free(&win);
> MPI_Finalize();
> return 0;
> }
>
>
> I have tested on several hardware
> platforms and mpich-3.2 and
> mpich-3.3a2
> but with no luck.
>
> It appears that the MPI_Win_Sync are not
> "refreshing" the local
> data or
> I
> have a bug I can't spot.
>
> A simple unfair lock like
> http://www.mcs.anl.gov/researc
> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c
> works
> perfectly.
>
> Best regards, Ask Jakobsen
>
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or
> unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
> _______________________________________________
>
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
> <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list