[mpich-discuss] spurious lock ups on collective merge intercom
Dmitriy Lyubimov
dlieu.7 at gmail.com
Wed Jan 11 11:34:29 CST 2017
Thanks.
it would not be easy for me to do immediately as i am using proprietary
scala binding api for MPI.
it would help me to know if there's a known problem like that in the past,
or generally mergeIntercomm api is known to work on hundreds of processes.
Sounds like there are no known issues with that.
On Tue, Jan 10, 2017 at 11:53 PM, Oden, Lena <loden at anl.gov> wrote:
> Hello Dmittiy,
>
> can you maybe create a simple example-program to reproduce this failure?
> It is also often easier also to look at a code example to identify a
> problem.
>
> Thanks,
> Lena
> > On Jan 11, 2017, at 2:45 AM, Dmitriy Lyubimov <dlieu.7 at gmail.com> wrote:
> >
> > Hello,
> >
> > (mpich 3.2)
> >
> > I have a scenario when i add a few extra processes do existing intercom.
> >
> > it works as a simple loop --
> > (1) n processes accept on n-intercom
> > (2) 1 process connects
> > (3) intracom is merged into n+1 intercom, intracom and n-intercom are
> closed
> > (4) repeat 1-3 as needed.
> >
> > Occasionally, i observe that step 3 spuriously locks up (once i get in
> the range of 100+ processes). From what i can tell, all processes in step 3
> are accounted for, and are waiting on the merge, but nothing happens. the
> collective barrier locks up.
> >
> > I really have trouble resolving this issue, any ideas are appreciated!
> >
> > Thank you very much.
> > -Dmitriy
> >
> >
> > _______________________________________________
> > discuss mailing list discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170111/48102f2c/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list