[mpich-discuss] MCS lock and MPI RMA problem

Halim Amer aamer at anl.gov
Tue Mar 7 10:37:03 CST 2017


 > detect that another process is being or already enqueued in the MCS 
queue.

Actually the problem occurs only when the waiting process already 
enqueued itself, i.e., the accumulate operation on the nextRank field 
succeeded.

Halim
www.mcs.anl.gov/~aamer

On 3/7/17 10:29 AM, Halim Amer wrote:
> In the Release protocol, try removing this test:
>
> if (lmem[nextRank] == -1) {
>    If-Block;
> }
>
> but keep the If-Block.
>
> The hang occurs because the process releasing the MCS lock fails to
> detect that another process is being or already enqueued in the MCS queue.
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't
>> find something similar in openmpi) and now all ranks acquire the lock and
>> the program finish as expected. However if I put a while(1) loop
>> around the
>> acquire-release code in main.c it will fail again at random and go
>> into an
>> infinite loop. The simple unfair lock does not have this problem.
>>
>> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <aamer at anl.gov> wrote:
>>
>>> My understanding is that this code assumes asynchronous progress.
>>> An example of when the processes hang is as follows:
>>>
>>> 1) P0 Finishes MCSLockAcquire()
>>> 2) P1 is busy waiting in MCSLockAcquire() at
>>> do {
>>>       MPI_Win_sync(win);
>>>    } while (lmem[blocked] == 1);
>>> 3) P0 executes MCSLockRelease()
>>> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>>>
>>> Hang!
>>>
>>> For P1 to get out of the loop, P0 has to get out of
>>> MPI_Win_lock_all() and
>>> executes its Compare_and_swap().
>>>
>>> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it
>>> got
>>> the lock.
>>>
>>> P1 does not make communication progress because MPI_Win_sync is not
>>> required to do so. It only synchronizes private and public copies.
>>>
>>> For this hang to disappear, one can either trigger progress manually by
>>> using heavy-duty synchronization calls instead of Win_sync (e.g.,
>>> Win_unlock_all + Win_lock_all), or enable asynchronous progress.
>>>
>>> To enable asynchronous progress in MPICH, set the
>>> MPIR_CVAR_ASYNC_PROGRESS
>>> env var to 1.
>>>
>>> Halim
>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>>
>>>
>>> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>>>
>>>>  I am testing on x86_64 platform.
>>>>
>>>> I have tried to built both the mpich and the mcs lock code with -O0 to
>>>> avoid agressive optimization. After your suggestion I have also
>>>> tried to
>>>> make volatile int *pblocked pointing to lmem[blocked] in the
>>>> MCSLockAcquire
>>>> function and volatile int *pnextrank pointing to lmem[nextRank] in
>>>> MCSLockRelease, but it does not appear to make a difference.
>>>>
>>>> On suggestion from Richard Warren I have also tried building the code
>>>> using
>>>> openmpi-2.0.2 without any luck (however it appears to acquire the
>>>> lock a
>>>> couple of extra times before failing) which I find troubling.
>>>>
>>>> I think I will give up using local load/stores and will see if I can
>>>> figure
>>>> out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.
>>>> Thanks for your help.
>>>>
>>>> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <jeff.science at gmail.com>
>>>> wrote:
>>>>
>>>> What processor architecture are you testing?
>>>>>
>>>>> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather
>>>>> than a
>>>>> load.  MPI_Win_sync cannot prevent the compiler from caching *lmem
>>>>> in a
>>>>> register.
>>>>>
>>>>> Jeff
>>>>>
>>>>> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <afj at qeye-labs.com>
>>>>> wrote:
>>>>>
>>>>> Hi,
>>>>>>
>>>>>> I have downloaded the source code for the MCS lock from the excellent
>>>>>> book "Using Advanced MPI" from http://www.mcs.anl.gov/researc
>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>>>>>>
>>>>>> I have made a very simple piece of test code for testing the MCS lock
>>>>>> but
>>>>>> it works at random and often never escapes the busy loops in the
>>>>>> acquire
>>>>>> and release functions (see attached source code). The code appears
>>>>>> semantically correct to my eyes.
>>>>>>
>>>>>> #include <stdio.h>
>>>>>> #include <mpi.h>
>>>>>> #include "mcs-lock.h"
>>>>>>
>>>>>> int main(int argc, char *argv[])
>>>>>> {
>>>>>>   MPI_Win win;
>>>>>>   MPI_Init( &argc, &argv );
>>>>>>
>>>>>>   MCSLockInit(MPI_COMM_WORLD, &win);
>>>>>>
>>>>>>   int rank, size;
>>>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>
>>>>>>   printf("rank: %d, size: %d\n", rank, size);
>>>>>>
>>>>>>
>>>>>>   MCSLockAcquire(win);
>>>>>>   printf("rank %d aquired lock\n", rank);   fflush(stdout);
>>>>>>   MCSLockRelease(win);
>>>>>>
>>>>>>
>>>>>>   MPI_Win_free(&win);
>>>>>>   MPI_Finalize();
>>>>>>   return 0;
>>>>>> }
>>>>>>
>>>>>>
>>>>>> I have tested on several hardware platforms and mpich-3.2 and
>>>>>> mpich-3.3a2
>>>>>> but with no luck.
>>>>>>
>>>>>> It appears that the MPI_Win_Sync are not "refreshing" the local
>>>>>> data or
>>>>>> I
>>>>>> have a bug I can't spot.
>>>>>>
>>>>>> A simple unfair lock like http://www.mcs.anl.gov/researc
>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c works
>>>>>> perfectly.
>>>>>>
>>>>>> Best regards, Ask Jakobsen
>>>>>>
>>>>>>
>>>>>> _______________________________________________
>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>>
>>>>>
>>>>>
>>>>> --
>>>>> Jeff Hammond
>>>>> jeff.science at gmail.com
>>>>> http://jeffhammond.github.io/
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>>
>>>>
>>>>
>>>>
>>>>
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list