No subject


Tue Jun 18 13:52:11 CDT 2019


loads and stores in the MPI_Win_lock_all epochs using MPI_Fetch_and_op (see
attached files).

This version behaves very similar to the original code and also fails from
time to time. Putting a sleep into the acquire busy loop (usleep(100)) will
make the code "much more robust" (I hack, I know, but indicating some
underlying race condition?!). Let me know if you see any problems in the
way I am using MPI_Fetch_and_op in a busy loop. Flushing or syncing is not
necessary in this case, right?

All work is done with export MPIR_CVAR_ASYNC_PROGRESS=1 on mpich-3.2 and
mpich-3.3a2

On Wed, Mar 8, 2017 at 4:21 PM, Halim Amer <aamer at anl.gov> wrote:

> I cannot claim that I thoroughly verified the correctness of that code, so
> take it with a grain of salt. Please keep in mind that it is a test code
> from a tutorial book; those codes are meant for learning purposes not for
> deployment.
>
> If your goal is to have a high performance RMA lock, I suggest you to look
> into the recent HPDC'16 paper: "High-Performance Distributed RMA Locks".
>
> Halim
> www.mcs.anl.gov/~aamer
>
> On 3/8/17 3:06 AM, Ask Jakobsen wrote:
>
>> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1
>> in release fixes the problem. Great work. Now I will try to understand why
>> you are right. I hope the authors of the book will credit you for
>> discovering the bug.
>>
>> So in conclusion you need to remove the above mentioned test AND enable
>> asynchronous progression using the environment variable
>> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to
>> work
>> in openmpi).
>>
>> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <aamer at anl.gov> wrote:
>>
>> detect that another process is being or already enqueued in the MCS
>>>>
>>> queue.
>>>
>>> Actually the problem occurs only when the waiting process already
>>> enqueued
>>> itself, i.e., the accumulate operation on the nextRank field succeeded.
>>>
>>> Halim
>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>>
>>>
>>> On 3/7/17 10:29 AM, Halim Amer wrote:
>>>
>>> In the Release protocol, try removing this test:
>>>>
>>>> if (lmem[nextRank] == -1) {
>>>>    If-Block;
>>>> }
>>>>
>>>> but keep the If-Block.
>>>>
>>>> The hang occurs because the process releasing the MCS lock fails to
>>>> detect that another process is being or already enqueued in the MCS
>>>> queue.
>>>>
>>>> Halim
>>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer>
>>>>
>>>>
>>>> On 3/7/17 6:43 AM, Ask Jakobsen wrote:
>>>>
>>>> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't
>>>>> find something similar in openmpi) and now all ranks acquire the lock
>>>>> and
>>>>> the program finish as expected. However if I put a while(1) loop
>>>>> around the
>>>>> acquire-release code in main.c it will fail again at random and go
>>>>> into an
>>>>> infinite loop. The simple unfair lock does not have this problem.
>>>>>
>>>>> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <aamer at anl.gov> wrote:
>>>>>
>>>>> My understanding is that this code assumes asynchronous progress.
>>>>>
>>>>>> An example of when the processes hang is as follows:
>>>>>>
>>>>>> 1) P0 Finishes MCSLockAcquire()
>>>>>> 2) P1 is busy waiting in MCSLockAcquire() at
>>>>>> do {
>>>>>>       MPI_Win_sync(win);
>>>>>>    } while (lmem[blocked] == 1);
>>>>>> 3) P0 executes MCSLockRelease()
>>>>>> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()
>>>>>>
>>>>>> Hang!
>>>>>>
>>>>>> For P1 to get out of the loop, P0 has to get out of
>>>>>> MPI_Win_lock_all() and
>>>>>> executes its Compare_and_swap().
>>>>>>
>>>>>> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it
>>>>>> got
>>>>>> the lock.
>>>>>>
>>>>>> P1 does not make communication progress because MPI_Win_sync is not
>>>>>> required to do so. It only synchronizes private and public copies.
>>>>>>
>>>>>> For this hang to disappear, one can either trigger progress manually
>>>>>> by
>>>>>> using heavy-duty synchronization calls instead of Win_sync (e.g.,
>>>>>> Win_unlock_all + Win_lock_all), or enable asynchronous progress.
>>>>>>
>>>>>> To enable asynchronous progress in MPICH, set the
>>>>>> MPIR_CVAR_ASYNC_PROGRESS
>>>>>> env var to 1.
>>>>>>
>>>>>> Halim
>>>>>> www.mcs.anl.gov/~aamer <http://www.mcs.anl.gov/%7Eaamer> <
>>>>>> http://www.mcs.anl.gov/%7Eaamer>
>>>>>>
>>>>>>
>>>>>> On 3/6/17 1:11 PM, Ask Jakobsen wrote:
>>>>>>
>>>>>>  I am testing on x86_64 platform.
>>>>>>
>>>>>>>
>>>>>>> I have tried to built both the mpich and the mcs lock code with -O0
>>>>>>> to
>>>>>>> avoid agressive optimization. After your suggestion I have also
>>>>>>> tried to
>>>>>>> make volatile int *pblocked pointing to lmem[blocked] in the
>>>>>>> MCSLockAcquire
>>>>>>> function and volatile int *pnextrank pointing to lmem[nextRank] in
>>>>>>> MCSLockRelease, but it does not appear to make a difference.
>>>>>>>
>>>>>>> On suggestion from Richard Warren I have also tried building the code
>>>>>>> using
>>>>>>> openmpi-2.0.2 without any luck (however it appears to acquire the
>>>>>>> lock a
>>>>>>> couple of extra times before failing) which I find troubling.
>>>>>>>
>>>>>>> I think I will give up using local load/stores and will see if I can
>>>>>>> figure
>>>>>>> out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.
>>>>>>> Thanks for your help.
>>>>>>>
>>>>>>> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <jeff.science at gmail.com
>>>>>>> >
>>>>>>> wrote:
>>>>>>>
>>>>>>> What processor architecture are you testing?
>>>>>>>
>>>>>>>
>>>>>>>> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather
>>>>>>>> than a
>>>>>>>> load.  MPI_Win_sync cannot prevent the compiler from caching *lmem
>>>>>>>> in a
>>>>>>>> register.
>>>>>>>>
>>>>>>>> Jeff
>>>>>>>>
>>>>>>>> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <afj at qeye-labs.com>
>>>>>>>> wrote:
>>>>>>>>
>>>>>>>> Hi,
>>>>>>>>
>>>>>>>>
>>>>>>>>> I have downloaded the source code for the MCS lock from the
>>>>>>>>> excellent
>>>>>>>>> book "Using Advanced MPI" from http://www.mcs.anl.gov/researc
>>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/mcs-lock.c
>>>>>>>>>
>>>>>>>>> I have made a very simple piece of test code for testing the MCS
>>>>>>>>> lock
>>>>>>>>> but
>>>>>>>>> it works at random and often never escapes the busy loops in the
>>>>>>>>> acquire
>>>>>>>>> and release functions (see attached source code). The code appears
>>>>>>>>> semantically correct to my eyes.
>>>>>>>>>
>>>>>>>>> #include <stdio.h>
>>>>>>>>> #include <mpi.h>
>>>>>>>>> #include "mcs-lock.h"
>>>>>>>>>
>>>>>>>>> int main(int argc, char *argv[])
>>>>>>>>> {
>>>>>>>>>   MPI_Win win;
>>>>>>>>>   MPI_Init( &argc, &argv );
>>>>>>>>>
>>>>>>>>>   MCSLockInit(MPI_COMM_WORLD, &win);
>>>>>>>>>
>>>>>>>>>   int rank, size;
>>>>>>>>>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);
>>>>>>>>>   MPI_Comm_size(MPI_COMM_WORLD, &size);
>>>>>>>>>
>>>>>>>>>   printf("rank: %d, size: %d\n", rank, size);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   MCSLockAcquire(win);
>>>>>>>>>   printf("rank %d aquired lock\n", rank);   fflush(stdout);
>>>>>>>>>   MCSLockRelease(win);
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>   MPI_Win_free(&win);
>>>>>>>>>   MPI_Finalize();
>>>>>>>>>   return 0;
>>>>>>>>> }
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> I have tested on several hardware platforms and mpich-3.2 and
>>>>>>>>> mpich-3.3a2
>>>>>>>>> but with no luck.
>>>>>>>>>
>>>>>>>>> It appears that the MPI_Win_Sync are not "refreshing" the local
>>>>>>>>> data or
>>>>>>>>> I
>>>>>>>>> have a bug I can't spot.
>>>>>>>>>
>>>>>>>>> A simple unfair lock like http://www.mcs.anl.gov/researc
>>>>>>>>> h/projects/mpi/usingmpi/examples-advmpi/rma2/ga_mutex1.c works
>>>>>>>>> perfectly.
>>>>>>>>>
>>>>>>>>> Best regards, Ask Jakobsen
>>>>>>>>>
>>>>>>>>>
>>>>>>>>> _______________________________________________
>>>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>>>
>>>>>>>> --
>>>>>>>> Jeff Hammond
>>>>>>>> jeff.science at gmail.com
>>>>>>>> http://jeffhammond.github.io/
>>>>>>>>
>>>>>>>> _______________________________________________
>>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>>> To manage subscription options or unsubscribe:
>>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>>
>>>>>>>
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>>> To manage subscription options or unsubscribe:
>>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>>
>>>>>>> _______________________________________________
>>>>>>>
>>>>>>> discuss mailing list     discuss at mpich.org
>>>>>> To manage subscription options or unsubscribe:
>>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>>
>>>>>>
>>>>>>
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>>
>>>>> _______________________________________________
>>>>>
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

--001a1141dbd03846c4054a8dd443
Content-Type: text/html; charset="UTF-8"
Content-Transfer-Encoding: quoted-printable

<meta http-equiv=3D"Content-Type" content=3D"text/html; charset=3Dutf-8"><d=
iv dir=3D"ltr">Interestingly, according to the paper you suggested it appea=
rs to include a similar test in pseudo code <a href=3D"https://htor.inf.eth=
z.ch/publications/img/hpclocks.pdf" target=3D"_blank">https://htor.inf.ethz=
.ch/publi<wbr>cations/img/hpclocks.pdf</a> (see Listing 3 in paper).<div><b=
r></div><div>Unfortunately, removing the test in the release protocol did n=
ot solve the problem. The race condition is much more difficult to provoke,=
 but I managed when setting the size of the communicator to 3 (only tested =
even sizes so far).</div><div><br></div><div>From Jeff's suggestion I have =
attempted to rewrite the code removing local loads and stores in the MPI_Wi=
n_lock_all epochs using MPI_Fetch_and_op (see attached files).</div><d=
iv><br></div><div>This version behaves very similar to the original code an=
d also fails from time to time. Putting a sleep into the acquire busy loop =
(usleep(100)) will make the code "much more robust" (I hack, I kn=
ow, but indicating some underlying race condition?!). Let me know if you se=
e any problems in the way I am using MPI_Fetch_and_op in a busy loop. Flush=
ing or syncing is not necessary in this case, right?</div>







<div><br></div>All work is done with export MPIR_CVAR_ASYNC_PROGRESS=3D1 on=
 mpich-3.2 and mpich-3.3a2<div><div><div class=3D"gmail_extra"><br></div><d=
iv class=3D"gmail_extra"><div class=3D"gmail_quote">On Wed, Mar 8, 2017 at =
4:21 PM, Halim Amer <span dir=3D"ltr"><<a href=3D"mailto:aamer at anl.gov" =
target=3D"_blank">aamer at anl.gov</a>></span> wrote:<br><blockquote class=
=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid rg=
b(204,204,204);padding-left:1ex">I cannot claim that I thoroughly verified =
the correctness of that code, so take it with a grain of salt. Please keep =
in mind that it is a test code from a tutorial book; those codes are meant =
for learning purposes not for deployment.<br>
<br>
If your goal is to have a high performance RMA lock, I suggest you to look =
into the recent HPDC'16 paper: "High-Performance Distributed RMA Locks=
".<br>
<br>
Halim<br>
<a href=3D"http://www.mcs.anl.gov/~aamer" rel=3D"noreferrer" target=3D"_bla=
nk">www.mcs.anl.gov/~aamer</a><span class=3D"gmail-m_8000249642949159004gma=
il-m_1322166946455689838gmail-"><br>
<br>
On 3/8/17 3:06 AM, Ask Jakobsen wrote:<br>
</span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=3D"gma=
il-m_8000249642949159004gmail-m_1322166946455689838gmail-">
You are absolutely correct, Halim. Removing the test lmem[nextRank] =3D=3D =
-1<br>
in release fixes the problem. Great work. Now I will try to understand why<=
br>
you are right. I hope the authors of the book will credit you for<br>
discovering the bug.<br>
<br>
So in conclusion you need to remove the above mentioned test AND enable<br>
asynchronous progression using the environment variable<br>
MPIR_CVAR_ASYNC_PROGRESS=3D1 in MPICH (BTW I still can't get the code to wo=
rk<br>
in openmpi).<br>
<br>
On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <<a href=3D"mailto:aamer at anl.=
gov" target=3D"_blank">aamer at anl.gov</a>> wrote:<br>
<br>
</span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=3D"gma=
il-m_8000249642949159004gmail-m_1322166946455689838gmail-"><blockquote clas=
s=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-left:1px solid r=
gb(204,204,204);padding-left:1ex">
detect that another process is being or already enqueued in the MCS<br>
</blockquote>
queue.<br>
<br>
Actually the problem occurs only when the waiting process already enqueued<=
br>
itself, i.e., the accumulate operation on the nextRank field succeeded.<br>
<br>
Halim<br>
</span><a href=3D"http://www.mcs.anl.gov/~aamer" rel=3D"noreferrer" target=
=3D"_blank">www.mcs.anl.gov/~aamer</a> <<a href=3D"http://www.mcs.anl.go=
v/%7Eaamer" rel=3D"noreferrer" target=3D"_blank">http://www.mcs.anl.gov/%7E=
aam<wbr>er</a>><span class=3D"gmail-m_8000249642949159004gmail-m_1322166=
946455689838gmail-"><br>
<br>
<br>
On 3/7/17 10:29 AM, Halim Amer wrote:<br>
<br>
</span><blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;=
border-left:1px solid rgb(204,204,204);padding-left:1ex"><span class=3D"gma=
il-m_8000249642949159004gmail-m_1322166946455689838gmail-">
In the Release protocol, try removing this test:<br>
<br>
if (lmem[nextRank] =3D=3D -1) {<br>
   If-Block;<br>
}<br>
<br>
but keep the If-Block.<br>
<br>
The hang occurs because the process releasing the MCS lock fails to<br>
detect that another process is being or already enqueued in the MCS queue.<=
br>
<br>
Halim<br>
</span><a href=3D"http://www.mcs.anl.gov/~aamer" rel=3D"noreferrer" target=
=3D"_blank">www.mcs.anl.gov/~aamer</a> <<a href=3D"http://www.mcs.anl.go=
v/%7Eaamer" rel=3D"noreferrer" target=3D"_blank">http://www.mcs.anl.gov/%7E=
aam<wbr>er</a>><div><div class=3D"gmail-m_8000249642949159004gmail-m_132=
2166946455689838gmail-h5"><br>
<br>
On 3/7/17 6:43 AM, Ask Jakobsen wrote:<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't<br>
find something similar in openmpi) and now all ranks acquire the lock and<b=
r>
the program finish as expected. However if I put a while(1) loop<br>
around the<br>
acquire-release code in main.c it will fail again at random and go<br>
into an<br>
infinite loop. The simple unfair lock does not have this problem.<br>
<br>
On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <<a href=3D"mailto:aamer at anl=
.gov" target=3D"_blank">aamer at anl.gov</a>> wrote:<br>
<br>
My understanding is that this code assumes asynchronous progress.<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
An example of when the processes hang is as follows:<br>
<br>
1) P0 Finishes MCSLockAcquire()<br>
2) P1 is busy waiting in MCSLockAcquire() at<br>
do {<br>
      MPI_Win_sync(win);<br>
   } while (lmem[blocked] =3D=3D 1);<br>
3) P0 executes MCSLockRelease()<br>
4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()<br>
<br>
Hang!<br>
<br>
For P1 to get out of the loop, P0 has to get out of<br>
MPI_Win_lock_all() and<br>
executes its Compare_and_swap().<br>
<br>
For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it<br>
got<br>
the lock.<br>
<br>
P1 does not make communication progress because MPI_Win_sync is not<br>
required to do so. It only synchronizes private and public copies.<br>
<br>
For this hang to disappear, one can either trigger progress manually by<br>
using heavy-duty synchronization calls instead of Win_sync (e.g.,<br>
Win_unlock_all + Win_lock_all), or enable asynchronous progress.<br>
<br>
To enable asynchronous progress in MPICH, set the<br>
MPIR_CVAR_ASYNC_PROGRESS<br>
env var to 1.<br>
<br>
Halim<br>
<a href=3D"http://www.mcs.anl.gov/~aamer" rel=3D"noreferrer" target=3D"_bla=
nk">www.mcs.anl.gov/~aamer</a> <<a href=3D"http://www.mcs.anl.gov/%7Eaam=
er" rel=3D"noreferrer" target=3D"_blank">http://www.mcs.anl.gov/%7Eaam<wbr>=
er</a>> <<br>
<a href=3D"http://www.mcs.anl.gov/%7Eaamer" rel=3D"noreferrer" target=3D"_b=
lank">http://www.mcs.anl.gov/%7Eaame<wbr>r</a>><br>
<br>
<br>
On 3/6/17 1:11 PM, Ask Jakobsen wrote:<br>
<br>
 I am testing on x86_64 platform.<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I have tried to built both the mpich and the mcs lock code with -O0 to<br>
avoid agressive optimization. After your suggestion I have also<br>
tried to<br>
make volatile int *pblocked pointing to lmem[blocked] in the<br>
MCSLockAcquire<br>
function and volatile int *pnextrank pointing to lmem[nextRank] in<br>
MCSLockRelease, but it does not appear to make a difference.<br>
<br>
On suggestion from Richard Warren I have also tried building the code<br>
using<br>
openmpi-2.0.2 without any luck (however it appears to acquire the<br>
lock a<br>
couple of extra times before failing) which I find troubling.<br>
<br>
I think I will give up using local load/stores and will see if I can<br>
figure<br>
out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.<=
br>
Thanks for your help.<br>
<br>
On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <<a href=3D"mailto:jeff.sci=
ence at gmail.com" target=3D"_blank">jeff.science at gmail.com</a>><br>
wrote:<br>
<br>
What processor architecture are you testing?<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather<br>
than a<br>
load.  MPI_Win_sync cannot prevent the compiler from caching *lmem<br>
in a<br>
register.<br>
<br>
Jeff<br>
<br>
On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <<a href=3D"mailto:afj at qey=
e-labs.com" target=3D"_blank">afj at qeye-labs.com</a>><br>
wrote:<br>
<br>
Hi,<br>
<br>
<blockquote class=3D"gmail_quote" style=3D"margin:0px 0px 0px 0.8ex;border-=
left:1px solid rgb(204,204,204);padding-left:1ex">
<br>
I have downloaded the source code for the MCS lock from the excellent<br>
book "Using Advanced MPI" from <a href=3D"http://www.mcs.anl.gov/=
researc" rel=3D"noreferrer" target=3D"_blank">http://www.mcs.anl.gov/resear=
c</a><br>
h/projects/mpi/usingmpi/exampl<wbr>es-advmpi/rma2/mcs-lock.c<br>
<br>
I have made a very simple piece of test code for testing the MCS lock<br>
but<br>
it works at random and often never escapes the busy loops in the<br>
acquire<br>
and release functions (see attached source code). The code appears<br>
semantically correct to my eyes.<br>
<br>
#include <stdio.h><br>
#include <mpi.h><br>
#include "mcs-lock.h"<br>
<br>
int main(int argc, char *argv[])<br>
{<br>
  MPI_Win win;<br>
  MPI_Init( &argc, &argv );<br>
<br>
  MCSLockInit(MPI_COMM_WORLD, &win);<br>
<br>
  int rank, size;<br>
  MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>
  MPI_Comm_size(MPI_COMM_WORLD, &size);<br>
<br>
  printf("rank: %d, size: %d\n", rank, size);<br>
<br>
<br>
  MCSLockAcquire(win);<br>
  printf("rank %d aquired lock\n", rank);   fflush=
(stdout);<br>
  MCSLockRelease(win);<br>
<br>
<br>
  MPI_Win_free(&win);<br>
  MPI_Finalize();<br>
  return 0;<br>
}<br>
<br>
<br>
I have tested on several hardware platforms and mpich-3.2 and<br>
mpich-3.3a2<br>
but with no luck.<br>
<br>
It appears that the MPI_Win_Sync are not "refreshing" the local<b=
r>
data or<br>
I<br>
have a bug I can't spot.<br>
<br>
A simple unfair lock like <a href=3D"http://www.mcs.anl.gov/researc" rel=3D=
"noreferrer" target=3D"_blank">http://www.mcs.anl.gov/researc</a><br>
h/projects/mpi/usingmpi/exampl<wbr>es-advmpi/rma2/ga_mutex1.c works<br>
perfectly.<br>
<br>
Best regards, Ask Jakobsen<br>
<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
<br>
<br>
</blockquote>
<br>
--<br>
Jeff Hammond<br>
<a href=3D"mailto:jeff.science at gmail.com" target=3D"_blank">jeff.science at gm=
ail.com</a><br>
<a href=3D"http://jeffhammond.github.io/" rel=3D"noreferrer" target=3D"_bla=
nk">http://jeffhammond.github.io/</a><br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
<br>
<br>
</blockquote>
<br>
<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
______________________________<wbr>_________________<br>
<br>
</blockquote>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
<br>
</blockquote>
<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
______________________________<wbr>_________________<br>
</blockquote>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
</div></div></blockquote><div><div class=3D"gmail-m_8000249642949159004gmai=
l-m_1322166946455689838gmail-h5">
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
</div></div></blockquote><div><div class=3D"gmail-m_8000249642949159004gmai=
l-m_1322166946455689838gmail-h5">
<br>
<br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a><br>
<br>
</div></div></blockquote><div class=3D"gmail-m_8000249642949159004gmail-m_1=
322166946455689838gmail-HOEnZb"><div class=3D"gmail-m_8000249642949159004gm=
ail-m_1322166946455689838gmail-h5">
______________________________<wbr>_________________<br>
discuss mailing list     <a href=3D"mailto:discuss at mpich.org=
" target=3D"_blank">discuss at mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href=3D"https://lists.mpich.org/mailman/listinfo/discuss" rel=3D"norefer=
rer" target=3D"_blank">https://lists.mpich.org/mailma<wbr>n/listinfo/discus=
s</a></div></div></blockquote></div>
</div></div></div></div>

--001a1141dbd03846c4054a8dd443--

--001a1141dbd03846ca054a8dd445
Content-Type: text/x-csrc; charset="US-ASCII"; name="main.c"
Content-Disposition: attachment; filename="main.c"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_j0732una0

I2luY2x1ZGUgPHN0ZGlvLmg+CiNpbmNsdWRlIDxtcGkuaD4KI2luY2x1ZGUgIm1jcy1sb2NrLmgi
CgppbnQgbWFpbihpbnQgYXJnYywgY2hhciAqYXJndltdKQp7CiAgTVBJX1dpbiB3aW47CiAgTVBJ
X0luaXQoICZhcmdjLCAmYXJndiApOwoKICBNQ1NMb2NrSW5pdChNUElfQ09NTV9XT1JMRCwgJndp
bik7CgogIGludCByYW5rLCBzaXplOwogIE1QSV9Db21tX3JhbmsoTVBJX0NPTU1fV09STEQsICZy
YW5rKTsKICBNUElfQ29tbV9zaXplKE1QSV9DT01NX1dPUkxELCAmc2l6ZSk7CgogIHByaW50Zigi
cmFuazogJWQsIHNpemU6ICVkXG4iLCByYW5rLCBzaXplKTsKCiAgaW50IGNvdW50ID0gMDsKICB3
aGlsZSgxKSB7CiAgICBNQ1NMb2NrQWNxdWlyZSh3aW4pOwogICAgLy9wcmludGYoInJhbmsgJWQg
YXF1aXJlZCBsb2NrXG4iLCByYW5rKTsgZmZsdXNoKHN0ZG91dCk7CiAgICBNQ1NMb2NrUmVsZWFz
ZSh3aW4pOwogICAgaWYgKGNvdW50PjEwMDApCiAgICAgIGJyZWFrOwogICAgY291bnQrKzsKICB9
CgogIE1QSV9XaW5fZnJlZSgmd2luKTsgCiAgTVBJX0ZpbmFsaXplKCk7CiAgcmV0dXJuIDA7Cn0K

--001a1141dbd03846ca054a8dd445
Content-Type: text/x-csrc; charset="US-ASCII"; name="mcs-lock-fop.c"
Content-Disposition: attachment; filename="mcs-lock-fop.c"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_j0732unw1

I2luY2x1ZGUgPHVuaXN0ZC5oPgoKI2luY2x1ZGUgIm1waS5oIgoKc3RhdGljIGludCBNQ1NfTE9D
S1JBTksgPSBNUElfS0VZVkFMX0lOVkFMSUQ7CmVudW0geyBuZXh0UmFuaz0wLCBibG9ja2VkPTEs
IGxvY2tUYWlsPTIgfTsKCnZvaWQgTUNTTG9ja0luaXQoTVBJX0NvbW0gY29tbSwgTVBJX1dpbiAq
d2luKQp7CiAgaW50ICAgICAgKmxtZW0sIHJhbms7CiAgTVBJX0FpbnQgd2luc2l6ZTsKICBNUElf
Q29tbV9yYW5rKGNvbW0sJnJhbmspOwoKICBpZiAoTUNTX0xPQ0tSQU5LID09IE1QSV9LRVlWQUxf
SU5WQUxJRCkKICAgIE1QSV9XaW5fY3JlYXRlX2tleXZhbChNUElfV0lOX05VTExfQ09QWV9GTiwK
CQkJICBNUElfV0lOX05VTExfREVMRVRFX0ZOLAoJCQkgICZNQ1NfTE9DS1JBTkssICh2b2lkKikw
KTsKCiAgd2luc2l6ZSA9IDIgKiBzaXplb2YoaW50KTsKICBpZiAocmFuayA9PSAwKSB3aW5zaXpl
ICs9IHNpemVvZihpbnQpOwogIE1QSV9XaW5fYWxsb2NhdGUod2luc2l6ZSwgc2l6ZW9mKGludCks
IE1QSV9JTkZPX05VTEwsIGNvbW0sCiAgICAgICAgICAgICAgICAgICAmbG1lbSwgd2luKTsKICBs
bWVtW25leHRSYW5rXSA9IC0xOwogIGxtZW1bYmxvY2tlZF0gID0gMDsKICBpZiAocmFuayA9PSAw
KSB7CiAgICBsbWVtW2xvY2tUYWlsXSA9IC0xOwogIH0KICBNUElfV2luX3NldF9hdHRyKCp3aW4s
IE1DU19MT0NLUkFOSywgKHZvaWQqKShNUElfQWludClyYW5rKTsKICBNUElfQmFycmllcihjb21t
KTsKfQoKdm9pZCBNQ1NMb2NrQWNxdWlyZShNUElfV2luIHdpbikKewogIGludCAgZmxhZywgbXly
YW5rLCBwcmVkZWNlc3NvciwgKmxtZW07CiAgdm9pZCAqYXR0cnZhbDsKICBpbnQgZmV0Y2hfYmxv
Y2tlZCwgZHVtbXk7CgogIE1QSV9XaW5fZ2V0X2F0dHIod2luLCBNQ1NfTE9DS1JBTkssICZhdHRy
dmFsLCAmZmxhZyk7CiAgbXlyYW5rID0gKGludCkoTVBJX0FpbnQpYXR0cnZhbDsKICBNUElfV2lu
X2dldF9hdHRyKHdpbiwgTVBJX1dJTl9CQVNFLCAmbG1lbSwgJmZsYWcpOwogIGxtZW1bYmxvY2tl
ZF0gPSAxOyAvKiBJbiBjYXNlIHdlIGFyZSBibG9ja2VkICovCiAgTVBJX1dpbl9sb2NrX2FsbCgw
LCB3aW4pOwogIE1QSV9GZXRjaF9hbmRfb3AoJm15cmFuaywgJnByZWRlY2Vzc29yLCBNUElfSU5U
LAogICAgICAgICAgICAgICAgICAgMCwgbG9ja1RhaWwsIE1QSV9SRVBMQUNFLCB3aW4pOwogIE1Q
SV9XaW5fZmx1c2goMCwgd2luKTsKICBpZiAocHJlZGVjZXNzb3IgIT0gLTEpIHsKICAgIC8qIFdl
IGRpZG4ndCBnZXQgdGhlIGxvY2suICBBZGQgdXMgdG8gdGhlIHRhaWwgb2YgdGhlIGxpc3QgKi8K
ICAgIE1QSV9BY2N1bXVsYXRlKCZteXJhbmssIDEsIE1QSV9JTlQsIHByZWRlY2Vzc29yLAogICAg
ICAgICAgICAgICAgICAgbmV4dFJhbmssIDEsIE1QSV9JTlQsIE1QSV9SRVBMQUNFLCB3aW4pOwog
ICAgLyogTm93IHNwaW4gb24gb3VyIGxvY2FsIHZhbHVlICJibG9ja2VkIiB1bnRpbCB3ZSBhcmUK
ICAgICAgIGdpdmVuIHRoZSBsb2NrICovCiAgICBkbyB7CiAgICAgIE1QSV9GZXRjaF9hbmRfb3Ao
JmR1bW15LCAmZmV0Y2hfYmxvY2tlZCwgTVBJX0lOVCwKICAgICAgICAgICAgICAgICAgIG15cmFu
aywgYmxvY2tlZCwgTVBJX05PX09QLCB3aW4pOwogICAgICB1c2xlZXAoMTAwKTsKICAgIH0gd2hp
bGUgKGZldGNoX2Jsb2NrZWQ9PTEpOwoKICB9CiAgLy8gZWxzZSB3ZSBoYXZlIHRoZSBsb2NrCiAg
TVBJX1dpbl91bmxvY2tfYWxsKHdpbik7Cn0Kdm9pZCBNQ1NMb2NrUmVsZWFzZShNUElfV2luIHdp
bikKewogIGludCBudWxscmFuayA9IC0xLCB6ZXJvPTAsIG15cmFuaywgY3VydGFpbCwgZmxhZywg
KmxtZW07CiAgdm9pZCAqYXR0cnZhbDsKICBpbnQgZmV0Y2hfbmV4dHJhbmssIGR1bW15OwoKICBN
UElfV2luX2dldF9hdHRyKHdpbiwgTUNTX0xPQ0tSQU5LLCAmYXR0cnZhbCwgJmZsYWcpOwogIG15
cmFuayA9IChpbnQpKE1QSV9BaW50KWF0dHJ2YWw7CiAgTVBJX1dpbl9nZXRfYXR0cih3aW4sIE1Q
SV9XSU5fQkFTRSwgJmxtZW0sICZmbGFnKTsKICBNUElfV2luX2xvY2tfYWxsKDAsIHdpbik7CiAg
TVBJX0ZldGNoX2FuZF9vcCgmZHVtbXksICZmZXRjaF9uZXh0cmFuaywgTVBJX0lOVCwKICAgICAg
ICAgICAgICAgICAgIG15cmFuaywgbmV4dFJhbmssIE1QSV9OT19PUCwgd2luKTsKCiAgaWYgKGZl
dGNoX25leHRyYW5rID09IC0xKSB7CiAgICAvKiBTZWUgaWYgd2UncmUgd2FpdGluZyBmb3IgdGhl
IG5leHQgdG8gbm90aWZ5IHVzICovCiAgICBNUElfQ29tcGFyZV9hbmRfc3dhcCgmbnVsbHJhbmss
ICZteXJhbmssICZjdXJ0YWlsLCBNUElfSU5ULAogICAgICAgICAgICAgICAgICAgICAgICAgMCwg
bG9ja1RhaWwsIHdpbik7CiAgICBpZiAoY3VydGFpbCA9PSBteXJhbmspIHsKICAgICAgLyogV2Ug
YXJlIHRoZSBvbmx5IHByb2Nlc3MgaW4gdGhlIGxpc3QgKi8KICAgICAgTVBJX1dpbl91bmxvY2tf
YWxsKHdpbik7CiAgICAgIHJldHVybjsKICAgIH0KICAgIC8qIE90aGVyd2lzZSwgc29tZW9uZSBl
bHNlIGhhcyBhZGRlZCB0aGVtc2VsdmVzIHRvIHRoZSBsaXN0LiovCiAgICBkbyB7CiAgICAgIE1Q
SV9GZXRjaF9hbmRfb3AoJmR1bW15LCAmZmV0Y2hfbmV4dHJhbmssIE1QSV9JTlQsCiAgICAgICAg
ICAgICAgICAgICBteXJhbmssIG5leHRSYW5rLCBNUElfTk9fT1AsIHdpbik7CiAgICB9IHdoaWxl
IChmZXRjaF9uZXh0cmFuaz09LTEpOwoKICAgfQogIC8qIE5vdyB3ZSBjYW4gbm90aWZ5IHRoZW0u
ICBVc2UgYWNjdW11bGF0ZSB3aXRoIHJlcGxhY2UgaW5zdGVhZCAKICAgICBvZiBwdXQgc2luY2Ug
d2Ugd2FudCBhbiBhdG9taWMgdXBkYXRlIG9mIHRoZSBsb2NhdGlvbiAqLwogIE1QSV9BY2N1bXVs
YXRlKCZ6ZXJvLCAxLCBNUElfSU5ULCBmZXRjaF9uZXh0cmFuaywgYmxvY2tlZCwKCQkgMSwgTVBJ
X0lOVCwgTVBJX1JFUExBQ0UsIHdpbik7CiAgTVBJX1dpbl91bmxvY2tfYWxsKHdpbik7Cn0K

--001a1141dbd03846ca054a8dd445
Content-Type: text/x-chdr; charset="US-ASCII"; name="mcs-lock.h"
Content-Disposition: attachment; filename="mcs-lock.h"
Content-Transfer-Encoding: base64
X-Attachment-Id: f_j0732uo22

dm9pZCBNQ1NMb2NrSW5pdChNUElfQ29tbSBjb21tLCBNUElfV2luICp3aW4pOwp2b2lkIE1DU0xv
Y2tBY3F1aXJlKE1QSV9XaW4gd2luKTsKdm9pZCBNQ1NMb2NrUmVsZWFzZShNUElfV2luIHdpbik7
Cgo=

--001a1141dbd03846ca054a8dd445
Content-Type: text/plain; charset="us-ascii"
Content-Transfer-Encoding: 7bit
Content-Disposition: inline

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
--001a1141dbd03846ca054a8dd445--


More information about the discuss mailing list