[mpich-discuss] MCS lock and MPI RMA problem

Halim Amer aamer at anl.gov
Wed Mar 8 12:29:58 CST 2017


The bug Ask was referring to is regarding the branch "if (lmem[nextRank] 
== -1)", as I explained previously.

Halim
www.mcs.anl.gov/~aamer

On 3/8/17 12:09 PM, Jeff Hammond wrote:
> It's not a bug in the example.  There is a bug in the MPI standard that
> tolerates implementations that do not provide asynchronous progress
> (https://github.com/mpi-forum/mpi-forum-historic/issues/359) and thus
> many of them do not do so by default (because doing so often negatively
> impacts the performance of other features).  Fortunately, as noted
> already, Casper or environment variables fix this issue.
>
> Jeff, who is an asynchronous RMA zealot :-)
>
> On Wed, Mar 8, 2017 at 1:06 AM, Ask Jakobsen <afj at qeye-labs.com
> <mailto:afj at qeye-labs.com>> wrote:
>
>     You are absolutely correct, Halim. Removing the test lmem[nextRank]
>     == -1 in release fixes the problem. Great work. Now I will try to
>     understand why you are right. I hope the authors of the book will
>     credit you for discovering the bug.
>
>     So in conclusion you need to remove the above mentioned test AND
>     enable asynchronous progression using the environment variable
>     MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code
>     to work in openmpi).
>
>
>     On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov
>     <mailto:aamer at anl.gov>> wrote:
>
>         > detect that another process is being or already enqueued in the MCS queue.
>
>         Actually the problem occurs only when the waiting process
>         already enqueued itself, i.e., the accumulate operation on the
>         nextRank field succeeded.
>
>         Halim
>         www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>
>
>         On 3/7/17 10:29 AM, Halim Amer wrote:
>
>             In the Release protocol, try removing this test:
>
>             if (lmem[nextRank] == -1) {
>                If-Block;
>             }
>
>             but keep the If-Block.
>
>             The hang occurs because the process releasing the MCS lock
>             fails to
>             detect that another process is being or already enqueued in
>             the MCS queue.
>
>             Halim
>             www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>
>             On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>
>                 Thanks, Halim. I have now enabled asynchronous progress
>                 in MPICH (can't
>                 find something similar in openmpi) and now all ranks
>                 acquire the lock and
>                 the program finish as expected. However if I put a
>                 while(1) loop
>                 around the
>                 acquire-release code in main.c it will fail again at
>                 random and go
>                 into an
>                 infinite loop. The simple unfair lock does not have this
>                 problem.
>
>                 On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer
>                 <aamer at anl.gov <mailto:aamer at anl.gov>> wrote:
>
>                     My understanding is that this code assumes
>                     asynchronous progress.
>                     An example of when the processes hang is as follows:
>
>                     1) P0 Finishes MCSLockAcquire()
>                     2) P1 is busy waiting in MCSLockAcquire() at
>                     do {
>                           MPI_Win_sync(win);
>                        } while (lmem[blocked] == 1);
>                     3) P0 executes MCSLockRelease()
>                     4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>
>                     Hang!
>
>                     For P1 to get out of the loop, P0 has to get out of
>                     MPI_Win_lock_all() and
>                     executes its Compare_and_swap().
>
>                     For P0 to get out MPI_Win_lock_all(), it needs an
>                     ACK from P1 that it
>                     got
>                     the lock.
>
>                     P1 does not make communication progress because
>                     MPI_Win_sync is not
>                     required to do so. It only synchronizes private and
>                     public copies.
>
>                     For this hang to disappear, one can either trigger
>                     progress manually by
>                     using heavy-duty synchronization calls instead of
>                     Win_sync (e.g.,
>                     Win_unlock_all + Win_lock_all), or enable
>                     asynchronous progress.
>
>                     To enable asynchronous progress in MPICH, set the
>                     MPIR_CVAR_ASYNC_PROGRESS
>                     env var to 1.
>
>                     Halim
>                     www.mcs.anl.gov/~aamer
>                     <http://www.mcs.anl.gov/%7Eaamer>
>                     <http://www.mcs.anl.gov/%7Eaamer
>                     <http://www.mcs.anl.gov/%7Eaamer>>
>
>
>                     On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>
>                          I am testing on x86_64 platform.
>
>                         I have tried to built both the mpich and the mcs
>                         lock code with -O0 to
>                         avoid agressive optimization. After your
>                         suggestion I have also
>                         tried to
>                         make volatile int *pblocked pointing to
>                         lmem[blocked] in the
>                         MCSLockAcquire
>                         function and volatile int *pnextrank pointing to
>                         lmem[nextRank] in
>                         MCSLockRelease, but it does not appear to make a
>                         difference.
>
>                         On suggestion from Richard Warren I have also
>                         tried building the code
>                         using
>                         openmpi-2.0.2 without any luck (however it
>                         appears to acquire the
>                         lock a
>                         couple of extra times before failing) which I
>                         find troubling.
>
>                         I think I will give up using local load/stores
>                         and will see if I can
>                         figure
>                         out if rewrite using MPI calls like
>                         MPI_Fetch_and_op  as you suggest.
>                         Thanks for your help.
>
>                         On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond
>                         <jeff.science at gmail.com
>                         <mailto:jeff.science at gmail.com>>
>                         wrote:
>
>                         What processor architecture are you testing?
>
>
>                             Maybe set lmem to volatile or read it with
>                             MPI_Fetch_and_op rather
>                             than a
>                             load.  MPI_Win_sync cannot prevent the
>                             compiler from caching *lmem
>                             in a
>                             register.
>
>                             Jeff
>
>                             On Sat, Mar 4, 2017 at 12:30 AM, Ask
>                             Jakobsen <afj at qeye-labs.com
>                             <mailto:afj at qeye-labs.com>>
>                             wrote:
>
>                             Hi,
>
>
>                                 I have downloaded the source code for
>                                 the MCS lock from the excellent
>                                 book "Using Advanced MPI" from
>                                 http://www.mcs.anl.gov/researc
>                                 h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>
>                                 I have made a very simple piece of test
>                                 code for testing the MCS lock
>                                 but
>                                 it works at random and often never
>                                 escapes the busy loops in the
>                                 acquire
>                                 and release functions (see attached
>                                 source code). The code appears
>                                 semantically correct to my eyes.
>
>                                 #include <stdio.h>
>                                 #include <mpi.h>
>                                 #include "mcs-lock.h"
>
>                                 int main(int argc, char *argv[])
>                                 {
>                                   MPI_Win win;
>                                   MPI_Init( &argc, &argv );
>
>                                   MCSLockInit(MPI_COMM_WORLD, &win);
>
>                                   int rank, size;
>                                   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>                                   MPI_Comm_size(MPI_COMM_WORLD, &size);
>
>                                   printf("rank: %d, size: %d\n", rank,
>                                 size);
>
>
>                                   MCSLockAcquire(win);
>                                   printf("rank %d aquired lock\n",
>                                 rank);   fflush(stdout);
>                                   MCSLockRelease(win);
>
>
>                                   MPI_Win_free(&win);
>                                   MPI_Finalize();
>                                   return 0;
>                                 }
>
>
>                                 I have tested on several hardware
>                                 platforms and mpich-3.2 and
>                                 mpich-3.3a2
>                                 but with no luck.
>
>                                 It appears that the MPI_Win_Sync are not
>                                 "refreshing" the local
>                                 data or
>                                 I
>                                 have a bug I can't spot.
>
>                                 A simple unfair lock like
>                                 http://www.mcs.anl.gov/researc
>                                 h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c
>                                 works
>                                 perfectly.
>
>                                 Best regards, Ask Jakobsen
>
>
>                                 _______________________________________________
>                                 discuss mailing list
>                                  discuss at mpich.org
>                                 <mailto:discuss at mpich.org>
>                                 To manage subscription options or
>                                 unsubscribe:
>                                 https://lists.mpich.org/mailman/listinfo/discuss
>                                 <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>                             --
>                             Jeff Hammond
>                             jeff.science at gmail.com
>                             <mailto:jeff.science at gmail.com>
>                             http://jeffhammond.github.io/
>
>                             _______________________________________________
>                             discuss mailing list     discuss at mpich.org
>                             <mailto:discuss at mpich.org>
>                             To manage subscription options or unsubscribe:
>                             https://lists.mpich.org/mailman/listinfo/discuss
>                             <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>
>
>                         _______________________________________________
>                         discuss mailing list     discuss at mpich.org
>                         <mailto:discuss at mpich.org>
>                         To manage subscription options or unsubscribe:
>                         https://lists.mpich.org/mailman/listinfo/discuss
>                         <https://lists.mpich.org/mailman/listinfo/discuss>
>
>                         _______________________________________________
>
>                     discuss mailing list     discuss at mpich.org
>                     <mailto:discuss at mpich.org>
>                     To manage subscription options or unsubscribe:
>                     https://lists.mpich.org/mailman/listinfo/discuss
>                     <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
>                 _______________________________________________
>                 discuss mailing list     discuss at mpich.org
>                 <mailto:discuss at mpich.org>
>                 To manage subscription options or unsubscribe:
>                 https://lists.mpich.org/mailman/listinfo/discuss
>                 <https://lists.mpich.org/mailman/listinfo/discuss>
>
>             _______________________________________________
>             discuss mailing list     discuss at mpich.org
>             <mailto:discuss at mpich.org>
>             To manage subscription options or unsubscribe:
>             https://lists.mpich.org/mailman/listinfo/discuss
>             <https://lists.mpich.org/mailman/listinfo/discuss>
>
>         _______________________________________________
>         discuss mailing list     discuss at mpich.org
>         <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>         <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>     _______________________________________________
>     discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>     <https://lists.mpich.org/mailman/listinfo/discuss>
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list