[mpich-discuss] spurious lock ups on collective merge intercom

Dmitriy Lyubimov dlieu.7 at gmail.com
Tue Jan 10 19:45:29 CST 2017


Hello,

(mpich 3.2)

I have a scenario when i add a few extra processes do existing intercom.

it works as a simple loop --
(1) n processes accept on n-intercom
(2) 1 process connects
(3) intracom is merged into n+1 intercom, intracom and n-intercom are closed
(4) repeat 1-3 as needed.

Occasionally, i observe that step 3 spuriously locks up (once i get in the
range of 100+ processes). From what i can tell, all processes in step 3 are
accounted for, and are waiting on the merge, but nothing happens. the
collective barrier locks up.

I really have trouble resolving this issue, any ideas are appreciated!

Thank you very much.
-Dmitriy
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170110/ab797f7b/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list