[mpich-discuss] MCS lock and MPI RMA problem
Balaji, Pavan
balaji at anl.gov
Tue Mar 14 16:46:37 CDT 2017
Thanks. That's a bug in your code. In mcs-lock-fop.c:92, before you do an accumulate to notify the next rank, you need to reset your lmem[nextRank] back to -1. Otherwise, in the next iteration, you'll think that the value is set even if it is not. You can either do it using a local store followed by an MPI_WIN_SYNC or using MPI_Put or MPI_Accumulate. After that fix your program seems to work correctly.
I also found that it was not a bug in the MPI implementation, after all. Even without the MODE_NOCHECK hint, the program seems to work correctly now.
Nevertheless, this is a good regression test to have and we'd like to include it in our test harness. Would you mind providing this as a contribution to mpich? You'd need to sign the contributors' agreement for it:
http://www.mpich.org/documentation/contributor-docs/
FWIW, here's the new code that includes the send/recv optimization that I pointed out in my previous email.
Thanks,
-- Pavan
> On Mar 14, 2017, at 2:03 PM, Ask Jakobsen <afj at qeye-labs.com> wrote:
>
> $ mpicc -Wall main.c mcs-lock-fop.c
>
> $ while true; do mpiexec -n 2 ./a.out ; done
>
> At some point it does locks up.
>
> On Tue, Mar 14, 2017 at 7:52 PM, Balaji, Pavan <balaji at anl.gov> wrote:
>
> Thanks. I can't seem to reproduce the deadlock anymore after the MODE_NOCHECK hint. Can you tell us how to reproduce it?
>
> -- Pavan
>
> > On Mar 14, 2017, at 1:43 PM, Ask Jakobsen <afj at qeye-labs.com> wrote:
> >
> > Sorry, now header file is attached.
> >
> > On Tue, Mar 14, 2017 at 5:11 PM, Balaji, Pavan <balaji at anl.gov> wrote:
> >
> > > On Mar 14, 2017, at 4:58 AM, Ask Jakobsen <afj at qeye-labs.com> wrote:
> > > The MPI_Fetch_and_op version trying to implement Pavan's ideas as I understand it (no async progress necessary, but also tested with it enabled).
> >
> > The header file is missing.
> >
> > -- Pavan
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> >
> >
> > <mcs-lock.h>_______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Ask Jakobsen
> R&D
>
> Qeye Labs
> Lersø Parkallé 107
> 2100 Copenhagen Ø
> Denmark
>
> mobile: +45 2834 6936
> email: afj at Qeye-Labs.com
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170314/7c900c92/attachment.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: main.c
Type: application/octet-stream
Size: 649 bytes
Desc: main.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170314/7c900c92/attachment.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mcs-lock-fop.c
Type: application/octet-stream
Size: 3946 bytes
Desc: mcs-lock-fop.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170314/7c900c92/attachment-0001.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mcs-lock.c
Type: application/octet-stream
Size: 2920 bytes
Desc: mcs-lock.c
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170314/7c900c92/attachment-0002.obj>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: mcs-lock.h
Type: application/octet-stream
Size: 195 bytes
Desc: mcs-lock.h
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170314/7c900c92/attachment-0003.obj>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list