[mpich-discuss] Maximum number of inter-communicators?
Mccall, Kurt E. (MSFC-EV41)
kurt.e.mccall at nasa.gov
Sun Oct 24 14:37:00 CDT 2021
Based on a paper I read about giving an MPI job some fault tolerance, I'm exclusively connecting my processes with inter-communicators.
I've found that if I increase the number of processes beyond a certain point, many processes don't get created at all and the whole job
crashes. Am I running up against an operating system limit (like the number of open file descriptors - it is set at 1024), or some sort of
If it matters, my process architecture (a tree) is as follows: one master process connected to 21 manager processes on 21 other nodes,
and each manager connected to 8 worker processes on the manager's own node. This is the largest job I've been able to create
without it crashing. Attempting to increase the number of workers beyond 8 results in a crash.
I'm using MPICH 3.3.2 on Centos 3.10.0. MPICH was compiled with the Portland Group compiler pgc++ 19.5-0.
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the discuss