[mpich-discuss] Only Run as root

Tanner L. Nelson Tanner.Nelson at inl.gov
Mon Feb 17 11:42:46 CST 2025


Good morning,



We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag that we believe is MPI-related.



As root:

*       Job runs as expected (verified with 2 nodes)
*       mpirun with host file, same result



As user account:

*       Single node works
*       2 nodes, job hangs with no error output that we can find



To us, and our colleagues at our main Computing Center, it seems related to permissions/authentication. But everything we know to check has been checked several times (i.e. user can ssh to nodes 001-002 where root was successful, user was not; permissions on mpi directory 755, even tried 777 just to rule it out).



Details I believe may be asked:

*       RHEL9
*       Mpich4.2.3 was compiled with gcc11.5.0
*       Consistent results with scheduler and mpirun
*       Tested HPL and our primary application, both built against the same mpich version



Hoping we can get some suggestions. Thanks in advance!





Respectfully,

Tanner Nelson



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250217/3d91e84b/attachment.html>


More information about the discuss mailing list