[mpich-discuss] Only Run as root
Tanner L. Nelson
Tanner.Nelson at inl.gov
Mon Feb 17 11:42:46 CST 2025
Good morning,
We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag that we believe is MPI-related.
As root:
* Job runs as expected (verified with 2 nodes)
* mpirun with host file, same result
As user account:
* Single node works
* 2 nodes, job hangs with no error output that we can find
To us, and our colleagues at our main Computing Center, it seems related to permissions/authentication. But everything we know to check has been checked several times (i.e. user can ssh to nodes 001-002 where root was successful, user was not; permissions on mpi directory 755, even tried 777 just to rule it out).
Details I believe may be asked:
* RHEL9
* Mpich4.2.3 was compiled with gcc11.5.0
* Consistent results with scheduler and mpirun
* Tested HPL and our primary application, both built against the same mpich version
Hoping we can get some suggestions. Thanks in advance!
Respectfully,
Tanner Nelson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250217/3d91e84b/attachment.html>
More information about the discuss
mailing list