<html><head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div class="BodyFragment"><font size="2"><span style="font-size:10pt;">
<div class="PlainText"><br>
Thanks. That's a bug in your code. In mcs-lock-fop.c:92, before you do an accumulate to notify the next rank, you need to reset your lmem[nextRank] back to -1. Otherwise, in the next iteration, you'll think that the value is set even if it is not. You can
either do it using a local store followed by an MPI_WIN_SYNC or using MPI_Put or MPI_Accumulate. After that fix your program seems to work correctly.<br>
<br>
I also found that it was not a bug in the MPI implementation, after all. Even without the MODE_NOCHECK hint, the program seems to work correctly now.<br>
<br>
Nevertheless, this is a good regression test to have and we'd like to include it in our test harness. Would you mind providing this as a contribution to mpich? You'd need to sign the contributors' agreement for it:<br>
<br>
<a href="http://www.mpich.org/documentation/contributor-docs/">http://www.mpich.org/documentation/contributor-docs/</a><br>
<br>
FWIW, here's the new code that includes the send/recv optimization that I pointed out in my previous email.<br>
<br>
Thanks,<br>
<br>
-- Pavan<br>
<br>
</div>
</span></font></div>
<div class="BodyFragment"><font size="2"><span style="font-size:10pt;">
<div class="PlainText"><br>
> On Mar 14, 2017, at 2:03 PM, Ask Jakobsen <afj@qeye-labs.com> wrote:<br>
> <br>
> $ mpicc -Wall main.c mcs-lock-fop.c<br>
> <br>
> $ while true; do mpiexec -n 2 ./a.out ; done<br>
> <br>
> At some point it does locks up.<br>
> <br>
> On Tue, Mar 14, 2017 at 7:52 PM, Balaji, Pavan <balaji@anl.gov> wrote:<br>
> <br>
> Thanks. I can't seem to reproduce the deadlock anymore after the MODE_NOCHECK hint. Can you tell us how to reproduce it?<br>
> <br>
> -- Pavan<br>
> <br>
> > On Mar 14, 2017, at 1:43 PM, Ask Jakobsen <afj@qeye-labs.com> wrote:<br>
> ><br>
> > Sorry, now header file is attached.<br>
> ><br>
> > On Tue, Mar 14, 2017 at 5:11 PM, Balaji, Pavan <balaji@anl.gov> wrote:<br>
> ><br>
> > > On Mar 14, 2017, at 4:58 AM, Ask Jakobsen <afj@qeye-labs.com> wrote:<br>
> > > The MPI_Fetch_and_op version trying to implement Pavan's ideas as I understand it (no async progress necessary, but also tested with it enabled).<br>
> ><br>
> > The header file is missing.<br>
> ><br>
> > -- Pavan<br>
> ><br>
> > _______________________________________________<br>
> > discuss mailing list discuss@mpich.org<br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> ><br>
> ><br>
> > <mcs-lock.h>_______________________________________________<br>
> > discuss mailing list discuss@mpich.org<br>
> > To manage subscription options or unsubscribe:<br>
> > <a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> <br>
> _______________________________________________<br>
> discuss mailing list discuss@mpich.org<br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
> <br>
> <br>
> <br>
> -- <br>
> Ask Jakobsen<br>
> R&D<br>
> <br>
> Qeye Labs<br>
> Lersø Parkallé 107<br>
> 2100 Copenhagen Ø <br>
> Denmark<br>
> <br>
> mobile: +45 2834 6936<br>
> email: afj@Qeye-Labs.com<br>
> _______________________________________________<br>
> discuss mailing list discuss@mpich.org<br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
<br>
</div>
</span></font></div>
</body>
</html>