<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">I don't think so. Rank 0 also holds the tail which is the process which most recently requested the mutex.<br></div><div class="gmail_extra"><br><div class="gmail_quote">On Mon, Mar 13, 2017 at 2:55 AM, Balaji, Pavan <span dir="ltr"><<a href="mailto:balaji@anl.gov" target="_blank">balaji@anl.gov</a>></span> wrote:<br><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><br>
Shouldn't winsize be 3 integers in your code? (sorry, I spent only 30 seconds looking at the code, so I might have missed something).<br>
<br>
-- Pavan<br>
<div><div class="h5"><br>
> On Mar 12, 2017, at 2:44 PM, Ask Jakobsen <<a href="mailto:afj@qeye-labs.com">afj@qeye-labs.com</a>> wrote:<br>
><br>
> Interestingly, according to the paper you suggested it appears to include a similar test in pseudo code <a href="https://htor.inf.ethz.ch/publications/img/hpclocks.pdf" rel="noreferrer" target="_blank">https://htor.inf.ethz.ch/<wbr>publications/img/hpclocks.pdf</a> (see Listing 3 in paper).<br>
><br>
> Unfortunately, removing the test in the release protocol did not solve the problem. The race condition is much more difficult to provoke, but I managed when setting the size of the communicator to 3 (only tested even sizes so far).<br>
><br>
> From Jeff's suggestion I have attempted to rewrite the code removing local loads and stores in the MPI_Win_lock_all epochs using MPI_Fetch_and_op (see attached files).<br>
><br>
> This version behaves very similar to the original code and also fails from time to time. Putting a sleep into the acquire busy loop (usleep(100)) will make the code "much more robust" (I hack, I know, but indicating some underlying race condition?!). Let me know if you see any problems in the way I am using MPI_Fetch_and_op in a busy loop. Flushing or syncing is not necessary in this case, right?<br>
><br>
> All work is done with export MPIR_CVAR_ASYNC_PROGRESS=1 on mpich-3.2 and mpich-3.3a2<br>
><br>
> On Wed, Mar 8, 2017 at 4:21 PM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>
> I cannot claim that I thoroughly verified the correctness of that code, so take it with a grain of salt. Please keep in mind that it is a test code from a tutorial book; those codes are meant for learning purposes not for deployment.<br>
><br>
> If your goal is to have a high performance RMA lock, I suggest you to look into the recent HPDC'16 paper: "High-Performance Distributed RMA Locks".<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a><br>
><br>
> On 3/8/17 3:06 AM, Ask Jakobsen wrote:<br>
> You are absolutely correct, Halim. Removing the test lmem[nextRank] == -1<br>
> in release fixes the problem. Great work. Now I will try to understand why<br>
> you are right. I hope the authors of the book will credit you for<br>
> discovering the bug.<br>
><br>
> So in conclusion you need to remove the above mentioned test AND enable<br>
> asynchronous progression using the environment variable<br>
> MPIR_CVAR_ASYNC_PROGRESS=1 in MPICH (BTW I still can't get the code to work<br>
> in openmpi).<br>
><br>
> On Tue, Mar 7, 2017 at 5:37 PM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>
><br>
> detect that another process is being or already enqueued in the MCS<br>
> queue.<br>
><br>
> Actually the problem occurs only when the waiting process already enqueued<br>
> itself, i.e., the accumulate operation on the nextRank field succeeded.<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>
><br>
><br>
> On 3/7/17 10:29 AM, Halim Amer wrote:<br>
><br>
> In the Release protocol, try removing this test:<br>
><br>
> if (lmem[nextRank] == -1) {<br>
> If-Block;<br>
> }<br>
><br>
> but keep the If-Block.<br>
><br>
> The hang occurs because the process releasing the MCS lock fails to<br>
> detect that another process is being or already enqueued in the MCS queue.<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>
><br>
><br>
> On 3/7/17 6:43 AM, Ask Jakobsen wrote:<br>
><br>
> Thanks, Halim. I have now enabled asynchronous progress in MPICH (can't<br>
> find something similar in openmpi) and now all ranks acquire the lock and<br>
> the program finish as expected. However if I put a while(1) loop<br>
> around the<br>
> acquire-release code in main.c it will fail again at random and go<br>
> into an<br>
> infinite loop. The simple unfair lock does not have this problem.<br>
><br>
> On Tue, Mar 7, 2017 at 12:44 AM, Halim Amer <<a href="mailto:aamer@anl.gov">aamer@anl.gov</a>> wrote:<br>
><br>
> My understanding is that this code assumes asynchronous progress.<br>
> An example of when the processes hang is as follows:<br>
><br>
> 1) P0 Finishes MCSLockAcquire()<br>
> 2) P1 is busy waiting in MCSLockAcquire() at<br>
> do {<br>
> MPI_Win_sync(win);<br>
> } while (lmem[blocked] == 1);<br>
> 3) P0 executes MCSLockRelease()<br>
> 4) P0 waits on MPI_Win_lock_all() inside MCSLockRlease()<br>
><br>
> Hang!<br>
><br>
> For P1 to get out of the loop, P0 has to get out of<br>
> MPI_Win_lock_all() and<br>
> executes its Compare_and_swap().<br>
><br>
> For P0 to get out MPI_Win_lock_all(), it needs an ACK from P1 that it<br>
> got<br>
> the lock.<br>
><br>
> P1 does not make communication progress because MPI_Win_sync is not<br>
> required to do so. It only synchronizes private and public copies.<br>
><br>
> For this hang to disappear, one can either trigger progress manually by<br>
> using heavy-duty synchronization calls instead of Win_sync (e.g.,<br>
> Win_unlock_all + Win_lock_all), or enable asynchronous progress.<br>
><br>
> To enable asynchronous progress in MPICH, set the<br>
> MPIR_CVAR_ASYNC_PROGRESS<br>
> env var to 1.<br>
><br>
> Halim<br>
> <a href="http://www.mcs.anl.gov/~aamer" rel="noreferrer" target="_blank">www.mcs.anl.gov/~aamer</a> <<a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>> <<br>
> <a href="http://www.mcs.anl.gov/%7Eaamer" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/%<wbr>7Eaamer</a>><br>
><br>
><br>
> On 3/6/17 1:11 PM, Ask Jakobsen wrote:<br>
><br>
> I am testing on x86_64 platform.<br>
><br>
> I have tried to built both the mpich and the mcs lock code with -O0 to<br>
> avoid agressive optimization. After your suggestion I have also<br>
> tried to<br>
> make volatile int *pblocked pointing to lmem[blocked] in the<br>
> MCSLockAcquire<br>
> function and volatile int *pnextrank pointing to lmem[nextRank] in<br>
> MCSLockRelease, but it does not appear to make a difference.<br>
><br>
> On suggestion from Richard Warren I have also tried building the code<br>
> using<br>
> openmpi-2.0.2 without any luck (however it appears to acquire the<br>
> lock a<br>
> couple of extra times before failing) which I find troubling.<br>
><br>
> I think I will give up using local load/stores and will see if I can<br>
> figure<br>
> out if rewrite using MPI calls like MPI_Fetch_and_op as you suggest.<br>
> Thanks for your help.<br>
><br>
> On Mon, Mar 6, 2017 at 7:20 PM, Jeff Hammond <<a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a>><br>
> wrote:<br>
><br>
> What processor architecture are you testing?<br>
><br>
><br>
> Maybe set lmem to volatile or read it with MPI_Fetch_and_op rather<br>
> than a<br>
> load. MPI_Win_sync cannot prevent the compiler from caching *lmem<br>
> in a<br>
> register.<br>
><br>
> Jeff<br>
><br>
> On Sat, Mar 4, 2017 at 12:30 AM, Ask Jakobsen <<a href="mailto:afj@qeye-labs.com">afj@qeye-labs.com</a>><br>
> wrote:<br>
><br>
> Hi,<br>
><br>
><br>
> I have downloaded the source code for the MCS lock from the excellent<br>
> book "Using Advanced MPI" from <a href="http://www.mcs.anl.gov/researc" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/researc</a><br>
> h/projects/mpi/usingmpi/<wbr>examples-advmpi/rma2/mcs-lock.<wbr>c<br>
><br>
> I have made a very simple piece of test code for testing the MCS lock<br>
> but<br>
> it works at random and often never escapes the busy loops in the<br>
> acquire<br>
> and release functions (see attached source code). The code appears<br>
> semantically correct to my eyes.<br>
><br>
> #include <stdio.h><br>
> #include <mpi.h><br>
> #include "mcs-lock.h"<br>
><br>
> int main(int argc, char *argv[])<br>
> {<br>
> MPI_Win win;<br>
> MPI_Init( &argc, &argv );<br>
><br>
> MCSLockInit(MPI_COMM_WORLD, &win);<br>
><br>
> int rank, size;<br>
> MPI_Comm_rank(MPI_COMM_WORLD, &rank);<br>
> MPI_Comm_size(MPI_COMM_WORLD, &size);<br>
><br>
> printf("rank: %d, size: %d\n", rank, size);<br>
><br>
><br>
> MCSLockAcquire(win);<br>
> printf("rank %d aquired lock\n", rank); fflush(stdout);<br>
> MCSLockRelease(win);<br>
><br>
><br>
> MPI_Win_free(&win);<br>
> MPI_Finalize();<br>
> return 0;<br>
> }<br>
><br>
><br>
> I have tested on several hardware platforms and mpich-3.2 and<br>
> mpich-3.3a2<br>
> but with no luck.<br>
><br>
> It appears that the MPI_Win_Sync are not "refreshing" the local<br>
> data or<br>
> I<br>
> have a bug I can't spot.<br>
><br>
> A simple unfair lock like <a href="http://www.mcs.anl.gov/researc" rel="noreferrer" target="_blank">http://www.mcs.anl.gov/researc</a><br>
> h/projects/mpi/usingmpi/<wbr>examples-advmpi/rma2/ga_<wbr>mutex1.c works<br>
> perfectly.<br>
><br>
> Best regards, Ask Jakobsen<br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
><br>
><br>
><br>
> --<br>
> Jeff Hammond<br>
> <a href="mailto:jeff.science@gmail.com">jeff.science@gmail.com</a><br>
> <a href="http://jeffhammond.github.io/" rel="noreferrer" target="_blank">http://jeffhammond.github.io/</a><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
><br>
><br>
><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<wbr>_________________<br>
><br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
><br>
><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
><br>
> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
</div></div>> <main.c><mcs-lock-fop.c><mcs-<wbr>lock.h>_______________________<wbr>________________________<br>
<div class="HOEnZb"><div class="h5">> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
<br>
______________________________<wbr>_________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
</div></div></blockquote></div><br><br clear="all"><br>-- <br><div class="gmail_signature" data-smartmail="gmail_signature"><div dir="ltr"><div><div dir="ltr"><font size="1"><b>Ask Jakobsen</b><br>R&D<br><br><span style="color:rgb(255,153,102)">Q</span>eye Labs<br>Lersø Parkallé 107<br>2100 Copenhagen Ø <br>Denmark<br><br>mobile: +45 2834 6936<br>email: afj@Qeye-Labs.com<br></font></div></div></div></div>
</div>