<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">I don't think so. Rank 0 also holds the tail which is the process which most recently requested the mutex.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 13, 2017 at 2:55 AM, Balaji, Pavan <span dir="ltr"><<a href="mailto:balaji@anl.gov" target="_blank">balaji@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>


Shouldn't winsize be 3 integers in your code?  (sorry, I spent only 30 seconds looking at the code, so I might have missed something).<br>


<br>


  -- Pavan<br>


<div><div class="h5"><br>


> On Mar 12, 2017, at 2:44 PM, Ask Jakobsen <<a href="mailto:afj@qeye-labs.com">afj@qeye-labs.com</a>> wrote:<br>


><br>


> Interestingly, according to the paper you suggested it appears to include a similar test in pseudo code <a href="https://htor.inf.ethz.ch/publications/img/hpclocks.pdf" rel="noreferrer" target="_blank">https://htor.inf.ethz.ch/<wbr>publications/img/hpclocks.pdf</a> (see Listing 3 in paper).<br>


><br>


> Unfortunately, removing the test in the release protocol did not solve the problem. The race condition is much more difficult to provoke, but I managed when setting the size of the communicator to 3 (only tested even sizes so far).<br>


><br>


> From Jeff's suggestion I have attempted to rewrite the code removing local loads and stores in the MPI_Win_lock_all epochs using MPI_Fetch_and_op (see attached files).<br>


><br>


> This version behaves very similar to the original code and also fails from time to time. Putting a sleep into the acquire busy loop (usleep(100)) will make the code "much more robust" (I hack, I know, but indicating some underlying race condition?!). Let me know if you see any problems in the way I am using MPI_Fetch_and_op in a busy loop. Flushing or syncing is not necessary in this case, right?<br>


><br>


> All work is done with export MPIR_CVAR_ASYNC_PROGRESS=1 on mpich-3.2 and mpich-3.3a2<br>


><br>


> On Wed, Mar 8, 2017 at 4:21 PM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>


> I cannot claim that I thoroughly verified the correctness of that code, so take it with a grain of salt. Please keep in mind that it is a test code from a tutorial book; those codes are meant for learning purposes not for deployment.<br>


><br>


> If your goal is to have a high performance RMA lock, I suggest you to look into the recent HPDC'16 paper: "High-Performance Distributed RMA Locks".<br>


><br>


> Halim<br>


> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a><br>


><br>


> On 3/8/17 3:06 AM, Ask Jakobsen wrote:<br>


> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1<br>


> in release fixes the problem. Great work. Now I will try to understand why<br>


> you are right. I hope the authors of the book will credit you for<br>


> discovering the bug.<br>


><br>


> So in conclusion you need to remove the above mentioned test AND enable<br>


> asynchronous progression using the environment variable<br>


> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to work<br>


> in openmpi).<br>


><br>


> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>


><br>


> detect that another process is being or already enqueued in the MCS<br>


> queue.<br>


><br>


> Actually the problem occurs only when the waiting process already enqueued<br>


> itself, i.e., the accumulate operation on the nextRank field succeeded.<br>


><br>


> Halim<br>


> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>


><br>


><br>


> On 3/7/17 10:29 AM, Halim Amer wrote:<br>


><br>


> In the Release protocol, try removing this test:<br>


><br>


> if (lmem[nextRank] == -1) {<br>


>    If-Block;<br>


> }<br>


><br>


> but keep the If-Block.<br>


><br>


> The hang occurs because the process releasing the MCS lock fails to<br>


> detect that another process is being or already enqueued in the MCS queue.<br>


><br>


> Halim<br>


> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>


><br>


><br>


> On 3/7/17 6:43 AM, Ask Jakobsen wrote:<br>


><br>


> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't<br>


> find something similar in openmpi) and now all ranks acquire the lock and<br>


> the program finish as expected. However if I put a while(1) loop<br>


> around the<br>


> acquire-release code in main.c it will fail again at random and go<br>


> into an<br>


> infinite loop. The simple unfair lock does not have this problem.<br>


><br>


> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>


><br>


> My understanding is that this code assumes asynchronous progress.<br>


> An example of when the processes hang is as follows:<br>


><br>


> 1) P0 Finishes MCSLockAcquire()<br>


> 2) P1 is busy waiting in MCSLockAcquire() at<br>


> do {<br>


>       MPI_Win_sync(win);<br>


>    } while (lmem[blocked] == 1);<br>


> 3) P0 executes MCSLockRelease()<br>


> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()<br>


><br>


> Hang!<br>


><br>


> For P1 to get out of the loop, P0 has to get out of<br>


> MPI_Win_lock_all() and<br>


> executes its Compare_and_swap().<br>


><br>


> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it<br>


> got<br>


> the lock.<br>


><br>


> P1 does not make communication progress because MPI_Win_sync is not<br>


> required to do so. It only synchronizes private and public copies.<br>


><br>


> For this hang to disappear, one can either trigger progress manually by<br>


> using heavy-duty synchronization calls instead of Win_sync (e.g.,<br>


> Win_unlock_all + Win_lock_all), or enable asynchronous progress.<br>


><br>


> To enable asynchronous progress in MPICH, set the<br>


> MPIR_CVAR_ASYNC_PROGRESS<br>


> env var to 1.<br>


><br>


> Halim<br>


> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>> <<br>


> <a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>


><br>


><br>


> On 3/6/17 1:11 PM, Ask Jakobsen wrote:<br>


><br>


>  I am testing on x86_64 platform.<br>


><br>


> I have tried to built both the mpich and the mcs lock code with -O0 to<br>


> avoid agressive optimization. After your suggestion I have also<br>


> tried to<br>


> make volatile int *pblocked pointing to lmem[blocked] in the<br>


> MCSLockAcquire<br>


> function and volatile int *pnextrank pointing to lmem[nextRank] in<br>


> MCSLockRelease, but it does not appear to make a difference.<br>


><br>


> On suggestion from Richard Warren I have also tried building the code<br>


> using<br>


> openmpi-2.0.2 without any luck (however it appears to acquire the<br>


> lock a<br>


> couple of extra times before failing) which I find troubling.<br>


><br>


> I think I will give up using local load/stores and will see if I can<br>


> figure<br>


> out if rewrite using MPI calls like MPI_Fetch_and_op  as you suggest.<br>


> Thanks for your help.<br>


><br>


> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>><br>


> wrote:<br>


><br>


> What processor architecture are you testing?<br>


><br>


><br>


> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather<br>


> than a<br>


> load.  MPI_Win_sync cannot prevent the compiler from caching *lmem<br>


> in a<br>


> register.<br>


><br>


> Jeff<br>


><br>


> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <<a href="mailto:afj@qeye-labs.com">afj@qeye-labs.com</a>><br>


> wrote:<br>


><br>


> Hi,<br>


><br>


><br>


> I have downloaded the source code for the MCS lock from the excellent<br>


> book "Using Advanced MPI" from <a href="http://www.mcs.anl.gov/researc" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/researc</a><br>


> h/projects/mpi/usingmpi/<wbr>examples-advmpi/rma2/mcs-lock.<wbr>c<br>


><br>


> I have made a very simple piece of test code for testing the MCS lock<br>


> but<br>


> it works at random and often never escapes the busy loops in the<br>


> acquire<br>


> and release functions (see attached source code). The code appears<br>


> semantically correct to my eyes.<br>


><br>


> #include <stdio.h><br>


> #include <mpi.h><br>


> #include "mcs-lock.h"<br>


><br>


> int main(int argc, char *argv[])<br>


> {<br>


>   MPI_Win win;<br>


>   MPI_Init( &argc, &argv );<br>


><br>


>   MCSLockInit(MPI_COMM_WORLD, &win);<br>


><br>


>   int rank, size;<br>


>   MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>


>   MPI_Comm_size(MPI_COMM_WORLD, &size);<br>


><br>


>   printf("rank: %d, size: %d\n", rank, size);<br>


><br>


><br>


>   MCSLockAcquire(win);<br>


>   printf("rank %d aquired lock\n", rank);   fflush(stdout);<br>


>   MCSLockRelease(win);<br>


><br>


><br>


>   MPI_Win_free(&win);<br>


>   MPI_Finalize();<br>


>   return 0;<br>


> }<br>


><br>


><br>


> I have tested on several hardware platforms and mpich-3.2 and<br>


> mpich-3.3a2<br>


> but with no luck.<br>


><br>


> It appears that the MPI_Win_Sync are not "refreshing" the local<br>


> data or<br>


> I<br>


> have a bug I can't spot.<br>


><br>


> A simple unfair lock like <a href="http://www.mcs.anl.gov/researc" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/researc</a><br>


> h/projects/mpi/usingmpi/<wbr>examples-advmpi/rma2/ga_<wbr>mutex1.c works<br>


> perfectly.<br>


><br>


> Best regards, Ask Jakobsen<br>


><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


><br>


><br>


><br>


> --<br>


> Jeff Hammond<br>


> <a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a><br>


> <a href="http://jeffhammond.github.io/" rel="noreferrer" target="_blank">http://jeffhammond.github.io/</a><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


><br>


><br>


><br>


><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


> ______________________________<wbr>_________________<br>


><br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


><br>


><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


><br>


><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


><br>


> ______________________________<wbr>_________________<br>


> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


</div></div>> <main.c><mcs-lock-fop.c><mcs-<wbr>lock.h>_______________________<wbr>________________________<br>


<div class="HOEnZb"><div class="h5">> discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


> To manage subscription options or unsubscribe:<br>


> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


<br>


______________________________<wbr>_________________<br>


discuss mailing list     <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>


To manage subscription options or unsubscribe:<br>


<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>


</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><font size="1"><b>Ask Jakobsen</b><br>R&D<br><br><span style="color:rgb(255,153,102)">Q</span>eye Labs<br>Lersø Parkallé 107<br>2100 Copenhagen Ø <br>Denmark<br><br>mobile: +45 2834 6936<br>email: afj@Qeye-Labs.com<br></font></div></div></div></div>


</div>