[mpich-discuss] Maximum number of inter-communicators?
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Sun Oct 24 14:37:00 CDT 2021
Hi,
Based on a paper I read about giving an MPI job some fault tolerance, I'm exclusively connecting my processes with inter-communicators.
I've found that if I increase the number of processes beyond a certain point, many processes don't get created at all and the whole job
crashes. Am I running up against an operating system limit (like the number of open file descriptors - it is set at 1024), or some sort of
MPICH limit?
If it matters, my process architecture (a tree) is as follows: one master process connected to 21 manager processes on 21 other nodes,
and each manager connected to 8 worker processes on the manager's own node. This is the largest job I've been able to create
without it crashing. Attempting to increase the number of workers beyond 8 results in a crash.
I'm using MPICH 3.3.2 on Centos 3.10.0. MPICH was compiled with the Portland Group compiler pgc++ 19.5-0.
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20211024/0cf88bdb/attachment.html>
More information about the discuss
mailing list