[mpich-discuss] MCS lock and MPI RMA problem

Jeff Hammond jeff.science at gmail.com
Wed Mar 8 12:09:53 CST 2017


It's not a bug in the example.  There is a bug in the MPI standard that
tolerates implementations that do not provide asynchronous progress (
https://github.com/mpi-forum/mpi-forum-historic/issues/359) and thus many
of them do not do so by default (because doing so often negatively impacts
the performance of other features).  Fortunately, as noted already, Casper
or environment variables fix this issue.

Jeff, who is an asynchronous RMA zealot :-)

On Wed, Mar 8, 2017 at 1:06 AM, Ask Jakobsen <afj at qeye-labs.com> wrote:

> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1
> in release fixes the problem. Great work. Now I will try to understand why
> you are right. I hope the authors of the book will credit you for
> discovering the bug.
>
> So in conclusion you need to remove the above mentioned test AND enable
> asynchronous progression using the environment variable
> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to work
> in openmpi).
>
>
> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov> wrote:
>
>> > detect that another process is being or already enqueued in the MCS
>> queue.
>>
>> Actually the problem occurs only when the waiting process already
>> enqueued itself, i.e., the accumulate operation on the nextRank field
>> succeeded.
>>
>> Halim
>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>
>>
>> On 3/7/17 10:29 AM, Halim Amer wrote:
>>
>>> In the Release protocol, try removing this test:
>>>
>>> if (lmem[nextRank] == -1) {
>>>    If-Block;
>>> }
>>>
>>> but keep the If-Block.
>>>
>>> The hang occurs because the process releasing the MCS lock fails to
>>> detect that another process is being or already enqueued in the MCS
>>> queue.
>>>
>>> Halim
>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>>
>>> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>>>
>>>> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't
>>>> find something similar in openmpi) and now all ranks acquire the lock
>>>> and
>>>> the program finish as expected. However if I put a while(1) loop
>>>> around the
>>>> acquire-release code in main.c it will fail again at random and go
>>>> into an
>>>> infinite loop. The simple unfair lock does not have this problem.
>>>>
>>>> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <aamer at anl.gov> wrote:
>>>>
>>>> My understanding is that this code assumes asynchronous progress.
>>>>> An example of when the processes hang is as follows:
>>>>>
>>>>> 1) P0 Finishes MCSLockAcquire()
>>>>> 2) P1 is busy waiting in MCSLockAcquire() at
>>>>> do {
>>>>>       MPI_Win_sync(win);
>>>>>    } while (lmem[blocked] == 1);
>>>>> 3) P0 executes MCSLockRelease()
>>>>> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>>>>>
>>>>> Hang!
>>>>>
>>>>> For P1 to get out of the loop, P0 has to get out of
>>>>> MPI_Win_lock_all() and
>>>>> executes its Compare_and_swap().
>>>>>
>>>>> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it
>>>>> got
>>>>> the lock.
>>>>>
>>>>> P1 does not make communication progress because MPI_Win_sync is not
>>>>> required to do so. It only synchronizes private and public copies.
>>>>>
>>>>> For this hang to disappear, one can either trigger progress manually by
>>>>> using heavy-duty synchronization calls instead of Win_sync (e.g.,
>>>>> Win_unlock_all + Win_lock_all), or enable asynchronous progress.
>>>>>
>>>>> To enable asynchronous progress in MPICH, set the
>>>>> MPIR_CVAR_ASYNC_PROGRESS
>>>>> env var to 1.
>>>>>
>>>>> Halim
>>>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer> <
>>>>> http://www.mcs.anl.gov/%7Eaamer>
>>>>>
>>>>>
>>>>> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>>>>>
>>>>>  I am testing on x86_64 platform.
>>>>>>
>>>>>> I have tried to built both the mpich and the mcs lock code with -O0 to
>>>>>> avoid agressive optimization. After your suggestion I have also
>>>>>> tried to
>>>>>> make volatile int *pblocked pointing to lmem[blocked] in the
>>>>>> MCSLockAcquire
>>>>>> function and volatile int *pnextrank pointing to lmem[nextRank] in
>>>>>> MCSLockRelease, but it does not appear to make a difference.
>>>>>>
>>>>>> On suggestion from Richard Warren I have also tried building the code
>>>>>> using
>>>>>> openmpi-2.0.2 without any luck (however it appears to acquire the
>>>>>> lock a
>>>>>> couple of extra times before failing) which I find troubling.
>>>>>>
>>>>>> I think I will give up using local load/stores and will see if I can
>>>>>> figure
>>>>>> out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.
>>>>>> Thanks for your help.
>>>>>>
>>>>>> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <jeff.science at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> What processor architecture are you testing?
>>>>>>
>>>>>>>
>>>>>>> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather
>>>>>>> than a
>>>>>>> load.  MPI_Win_sync cannot prevent the compiler from caching *lmem
>>>>>>> in a
>>>>>>> register.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <afj at qeye-labs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>>
>>>>>>>> I have downloaded the source code for the MCS lock from the
>>>>>>>> excellent
>>>>>>>> book "Using Advanced MPI" from http://www.mcs.anl.gov/researc
>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>>>>>>>>
>>>>>>>> I have made a very simple piece of test code for testing the MCS
>>>>>>>> lock
>>>>>>>> but
>>>>>>>> it works at random and often never escapes the busy loops in the
>>>>>>>> acquire
>>>>>>>> and release functions (see attached source code). The code appears
>>>>>>>> semantically correct to my eyes.
>>>>>>>>
>>>>>>>> #include <stdio.h>
>>>>>>>> #include <mpi.h>
>>>>>>>> #include "mcs-lock.h"
>>>>>>>>
>>>>>>>> int main(int argc, char *argv[])
>>>>>>>> {
>>>>>>>>   MPI_Win win;
>>>>>>>>   MPI_Init( &argc, &argv );
>>>>>>>>
>>>>>>>>   MCSLockInit(MPI_COMM_WORLD, &win);
>>>>>>>>
>>>>>>>>   int rank, size;
>>>>>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>>>
>>>>>>>>   printf("rank: %d, size: %d\n", rank, size);
>>>>>>>>
>>>>>>>>
>>>>>>>>   MCSLockAcquire(win);
>>>>>>>>   printf("rank %d aquired lock\n", rank);   fflush(stdout);
>>>>>>>>   MCSLockRelease(win);
>>>>>>>>
>>>>>>>>
>>>>>>>>   MPI_Win_free(&win);
>>>>>>>>   MPI_Finalize();
>>>>>>>>   return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> I have tested on several hardware platforms and mpich-3.2 and
>>>>>>>> mpich-3.3a2
>>>>>>>> but with no luck.
>>>>>>>>
>>>>>>>> It appears that the MPI_Win_Sync are not "refreshing" the local
>>>>>>>> data or
>>>>>>>> I
>>>>>>>> have a bug I can't spot.
>>>>>>>>
>>>>>>>> A simple unfair lock like http://www.mcs.anl.gov/researc
>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c works
>>>>>>>> perfectly.
>>>>>>>>
>>>>>>>> Best regards, Ask Jakobsen
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Hammond
>>>>>>> jeff.science at gmail.com
>>>>>>> http://jeffhammond.github.io/
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170308/9f9e1f04/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list