[mpich-discuss] Maximum number of inter-communicators?

Mccall, Kurt E. (MSFC-EV41) kurt.e.mccall at nasa.gov
Sun Oct 24 14:37:00 CDT 2021


Hi,

Based on a paper I read about giving an MPI job some fault tolerance, I'm exclusively connecting my processes with inter-communicators.
I've found that if I increase the number of processes beyond a certain point, many processes don't get created at all and the whole job
crashes.   Am I running up against an operating system limit (like the number of open file descriptors - it is set at 1024), or some sort of
MPICH limit?

If it matters, my process architecture (a tree)  is as follows:  one master process connected to 21 manager processes on 21 other nodes,
and each manager connected to 8 worker processes on the manager's own node.   This is the largest job I've been able to create
without it crashing.    Attempting to increase the number of workers beyond 8 results in a crash.

I'm using MPICH 3.3.2 on Centos 3.10.0.   MPICH was compiled with the Portland Group compiler pgc++ 19.5-0.

Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20211024/0cf88bdb/attachment.html>


More information about the discuss mailing list