[mpich-discuss] Only Run as root

Raffenetti, Ken raffenet at anl.gov
Mon Feb 17 11:52:28 CST 2025


Can you add -v to your mpiexec command? It will enable verbose output and hopefully give some clues to where things are getting stuck.

Ken

From: Tanner L. Nelson via discuss <discuss at mpich.org>
Date: Monday, February 17, 2025 at 11:43 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Tanner L. Nelson <Tanner.Nelson at inl.gov>
Subject: Re: [mpich-discuss] Only Run as root
Good morning, We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.

ZjQcmQRYFpfptBannerEnd
Good morning,

We have a small cluster in an isolated environment. Our lead has spearheaded a reinstall to get away from Bright Cluster Manager and CentOS. This is the first time our project has done anything like this, and we have hit a snag that we believe is MPI-related.

As root:

·         Job runs as expected (verified with 2 nodes)

·         mpirun with host file, same result

As user account:

·         Single node works

·         2 nodes, job hangs with no error output that we can find

To us, and our colleagues at our main Computing Center, it seems related to permissions/authentication. But everything we know to check has been checked several times (i.e. user can ssh to nodes 001-002 where root was successful, user was not; permissions on mpi directory 755, even tried 777 just to rule it out).

Details I believe may be asked:

·         RHEL9

·         Mpich4.2.3 was compiled with gcc11.5.0

·         Consistent results with scheduler and mpirun

·         Tested HPL and our primary application, both built against the same mpich version

Hoping we can get some suggestions. Thanks in advance!


Respectfully,
Tanner Nelson

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250217/6c7d8e60/attachment-0001.html>


More information about the discuss mailing list