[mpich-discuss] Only Run as root
Raffenetti, Ken
raffenet at anl.gov
Mon Feb 17 11:52:28 CST 2025
Can you add -v to your mpiexec command? It will enable verbose output and hopefully give some clues to where things are getting stuck.
Ken
From: Tanner L. Nelson via discuss <discuss at mpich.org>
Date: Monday, February 17, 2025 at 11:43 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Tanner L. Nelson <Tanner.Nelson at inl.gov>
Subject: Re: [mpich-discuss] Only Run as root
Good morning, We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Good morning,
We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag that we believe is MPI-related.
As root:
· Job runs as expected (verified with 2 nodes)
· mpirun with host file, same result
As user account:
· Single node works
· 2 nodes, job hangs with no error output that we can find
To us, and our colleagues at our main Computing Center, it seems related to permissions/authentication. But everything we know to check has been checked several times (i.e. user can ssh to nodes 001-002 where root was successful, user was not; permissions on mpi directory 755, even tried 777 just to rule it out).
Details I believe may be asked:
· RHEL9
· Mpich4.2.3 was compiled with gcc11.5.0
· Consistent results with scheduler and mpirun
· Tested HPL and our primary application, both built against the same mpich version
Hoping we can get some suggestions. Thanks in advance!
Respectfully,
Tanner Nelson
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250217/6c7d8e60/attachment-0001.html>
More information about the discuss
mailing list