[mpich-discuss] MCS lock and MPI RMA problem
Halim Amer
aamer at anl.gov
Wed Mar 8 09:21:43 CST 2017
I cannot claim that I thoroughly verified the correctness of that code,
so take it with a grain of salt. Please keep in mind that it is a test
code from a tutorial book; those codes are meant for learning purposes
not for deployment.
If your goal is to have a high performance RMA lock, I suggest you to
look into the recent HPDC'16 paper: "High-Performance Distributed RMA
Locks".
Halim
www.mcs.anl.gov/~aamer
On 3/8/17 3:06 AM, Ask Jakobsen wrote:
> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1
> in release fixes the problem. Great work. Now I will try to understand why
> you are right. I hope the authors of the book will credit you for
> discovering the bug.
>
> So in conclusion you need to remove the above mentioned test AND enable
> asynchronous progression using the environment variable
> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to work
> in openmpi).
>
> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov> wrote:
>
>>> detect that another process is being or already enqueued in the MCS
>> queue.
>>
>> Actually the problem occurs only when the waiting process already enqueued
>> itself, i.e., the accumulate operation on the nextRank field succeeded.
>>
>> Halim
>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>
>>
>> On 3/7/17 10:29 AM, Halim Amer wrote:
>>
>>> In the Release protocol, try removing this test:
>>>
>>> if (lmem[nextRank] == -1) {
>>> If-Block;
>>> }
>>>
>>> but keep the If-Block.
>>>
>>> The hang occurs because the process releasing the MCS lock fails to
>>> detect that another process is being or already enqueued in the MCS queue.
>>>
>>> Halim
>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>>
>>> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>>>
>>>> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't
>>>> find something similar in openmpi) and now all ranks acquire the lock and
>>>> the program finish as expected. However if I put a while(1) loop
>>>> around the
>>>> acquire-release code in main.c it will fail again at random and go
>>>> into an
>>>> infinite loop. The simple unfair lock does not have this problem.
>>>>
>>>> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <aamer at anl.gov> wrote:
>>>>
>>>> My understanding is that this code assumes asynchronous progress.
>>>>> An example of when the processes hang is as follows:
>>>>>
>>>>> 1) P0 Finishes MCSLockAcquire()
>>>>> 2) P1 is busy waiting in MCSLockAcquire() at
>>>>> do {
>>>>> MPI_Win_sync(win);
>>>>> } while (lmem[blocked] == 1);
>>>>> 3) P0 executes MCSLockRelease()
>>>>> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>>>>>
>>>>> Hang!
>>>>>
>>>>> For P1 to get out of the loop, P0 has to get out of
>>>>> MPI_Win_lock_all() and
>>>>> executes its Compare_and_swap().
>>>>>
>>>>> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it
>>>>> got
>>>>> the lock.
>>>>>
>>>>> P1 does not make communication progress because MPI_Win_sync is not
>>>>> required to do so. It only synchronizes private and public copies.
>>>>>
>>>>> For this hang to disappear, one can either trigger progress manually by
>>>>> using heavy-duty synchronization calls instead of Win_sync (e.g.,
>>>>> Win_unlock_all + Win_lock_all), or enable asynchronous progress.
>>>>>
>>>>> To enable asynchronous progress in MPICH, set the
>>>>> MPIR_CVAR_ASYNC_PROGRESS
>>>>> env var to 1.
>>>>>
>>>>> Halim
>>>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer> <
>>>>> http://www.mcs.anl.gov/%7Eaamer>
>>>>>
>>>>>
>>>>> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>>>>>
>>>>> I am testing on x86_64 platform.
>>>>>>
>>>>>> I have tried to built both the mpich and the mcs lock code with -O0 to
>>>>>> avoid agressive optimization. After your suggestion I have also
>>>>>> tried to
>>>>>> make volatile int *pblocked pointing to lmem[blocked] in the
>>>>>> MCSLockAcquire
>>>>>> function and volatile int *pnextrank pointing to lmem[nextRank] in
>>>>>> MCSLockRelease, but it does not appear to make a difference.
>>>>>>
>>>>>> On suggestion from Richard Warren I have also tried building the code
>>>>>> using
>>>>>> openmpi-2.0.2 without any luck (however it appears to acquire the
>>>>>> lock a
>>>>>> couple of extra times before failing) which I find troubling.
>>>>>>
>>>>>> I think I will give up using local load/stores and will see if I can
>>>>>> figure
>>>>>> out if rewrite using MPI calls like MPI_Fetch_and_op as you suggest.
>>>>>> Thanks for your help.
>>>>>>
>>>>>> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <jeff.science at gmail.com>
>>>>>> wrote:
>>>>>>
>>>>>> What processor architecture are you testing?
>>>>>>
>>>>>>>
>>>>>>> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather
>>>>>>> than a
>>>>>>> load. MPI_Win_sync cannot prevent the compiler from caching *lmem
>>>>>>> in a
>>>>>>> register.
>>>>>>>
>>>>>>> Jeff
>>>>>>>
>>>>>>> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <afj at qeye-labs.com>
>>>>>>> wrote:
>>>>>>>
>>>>>>> Hi,
>>>>>>>
>>>>>>>>
>>>>>>>> I have downloaded the source code for the MCS lock from the excellent
>>>>>>>> book "Using Advanced MPI" from http://www.mcs.anl.gov/researc
>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>>>>>>>>
>>>>>>>> I have made a very simple piece of test code for testing the MCS lock
>>>>>>>> but
>>>>>>>> it works at random and often never escapes the busy loops in the
>>>>>>>> acquire
>>>>>>>> and release functions (see attached source code). The code appears
>>>>>>>> semantically correct to my eyes.
>>>>>>>>
>>>>>>>> #include <stdio.h>
>>>>>>>> #include <mpi.h>
>>>>>>>> #include "mcs-lock.h"
>>>>>>>>
>>>>>>>> int main(int argc, char *argv[])
>>>>>>>> {
>>>>>>>> MPI_Win win;
>>>>>>>> MPI_Init( &argc, &argv );
>>>>>>>>
>>>>>>>> MCSLockInit(MPI_COMM_WORLD, &win);
>>>>>>>>
>>>>>>>> int rank, size;
>>>>>>>> MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>>> MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>>>
>>>>>>>> printf("rank: %d, size: %d\n", rank, size);
>>>>>>>>
>>>>>>>>
>>>>>>>> MCSLockAcquire(win);
>>>>>>>> printf("rank %d aquired lock\n", rank); fflush(stdout);
>>>>>>>> MCSLockRelease(win);
>>>>>>>>
>>>>>>>>
>>>>>>>> MPI_Win_free(&win);
>>>>>>>> MPI_Finalize();
>>>>>>>> return 0;
>>>>>>>> }
>>>>>>>>
>>>>>>>>
>>>>>>>> I have tested on several hardware platforms and mpich-3.2 and
>>>>>>>> mpich-3.3a2
>>>>>>>> but with no luck.
>>>>>>>>
>>>>>>>> It appears that the MPI_Win_Sync are not "refreshing" the local
>>>>>>>> data or
>>>>>>>> I
>>>>>>>> have a bug I can't spot.
>>>>>>>>
>>>>>>>> A simple unfair lock like http://www.mcs.anl.gov/researc
>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c works
>>>>>>>> perfectly.
>>>>>>>>
>>>>>>>> Best regards, Ask Jakobsen
>>>>>>>>
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>> --
>>>>>>> Jeff Hammond
>>>>>>> jeff.science at gmail.com
>>>>>>> http://jeffhammond.github.io/
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>> _______________________________________________
>>>>>>
>>>>> discuss mailing list discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>> discuss mailing list discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>> _______________________________________________
>> discuss mailing list discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list