[mpich-discuss] MPICH -- too many open files
Zhou, Hui
zhouh at anl.gov
Tue Mar 22 13:19:45 CDT 2022
MPICH currently launches individual proxy for each spawn. I think that probably attributes to the flood of fds on the server. For now, I guess the solution is to ask system admin to increase the fd limit. Feel free to open a issue at https://github.com/pmodels/mpich/issues. We'll prioritize to get it enhanced.
--
Hui
[https://opengraph.githubassets.com/7cbf378518c55cfe4549cfe44dfb9b74ceb606f933d674d797658393c7979a52/pmodels/mpich]<https://github.com/pmodels/mpich/issues>
Issues · pmodels/mpich - GitHub<https://github.com/pmodels/mpich/issues>
Official MPICH Repository. Contribute to pmodels/mpich development by creating an account on GitHub.
github.com
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Tuesday, March 22, 2022 12:55 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] MPICH -- too many open files
My application, which spawns multiple subprocesses via MPI_Comm_spawn, eventually fails on one Slurm cluster as I scale up the number of processes, with the error:
[mpiexec at n002.cluster.pssclabs.com] HYDU_create_process (../../../../mpich-4.0.1/src/pm/hydra/utils/launch/launch.c:21): pipe error (Too many open files)
[mpiexec at n002.cluster.pssclabs.com] HYDT_bscd_common_launch_procs (../../../../mpich-4.0.1/src/pm/hydra/tools/bootstrap/external/external_common_launch.c:296): create process returned error
free(): invalid pointer
/var/spool/slurm/job235999/slurm_script: line 296: 3778907 Aborted (core dumped)
It works fine on a different (Torque) cluster for very large job sizes.
“ulimit -n” (number of open files) on both machines returns 1024.
I’m hoping that there is some other system setting on the Slurm cluster that would allow larger jobs. I can provide the “-verbose” output file if that would help.
Thanks,
Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20220322/423fdb59/attachment.html>
More information about the discuss
mailing list