[mpich-discuss] Running issue on CENTOS Cluster with SLURM
Yaser Afshar
ya.afshar at gmail.com
Mon Apr 20 09:03:02 CDT 2020
Hi,
System:
- AMD cluster (AMD Opteron(tm) Processor 6344) with InfiniPath_QLE7240
- CentOS Linux release 7.7.1908 (Core)
Scheduler:
- slurm
MPICH version:
3.3.2
MPICH configuration:
./configure --prefix=/home/yaser/bin/mpich/3.3.2/intel CC=icc CXX=icpc
FC=ifort --with-hwloc=/opt/HWLOC/2.2.0 --with-ucx=/opt/UCX/1.8.0
--with-knem=/opt/KNEM/1.1.3 --with-device=ch4:ucx --enable-mpi-cxx
--enable-mpi1-compatibility --enable-threads=multiple --with-pmi --with-pm
I am testing using the OSU microbenchmark (
http://mvapich.cse.ohio-state.edu/benchmarks)
running the job on `srun` would succeed with no problem.
```
srun --mpi=pmi2 ./osu_bw
```
When I use the `mpiexec`, or `mpirun` it fails with:
```
[proxy:0:1 at pd-compute-3-40.local] HYDU_sock_connect
(utils/sock/sock.c:145): unable to connect from "pd-compute-3-40.local" to
"pd-compute-1-6.local" (Connection refused)
[proxy:0:1 at pd-compute-3-40.local] main (pm/pmiserv/pmip.c:183): unable to
connect to server pd-compute-1-6.local at port 42216 (check for firewalls!)
srun: error: pd-compute-3-40: task 1: Exited with exit code 5
```
The firewall is off, so it is not the reason.
```
> systemctl status firewalld
● firewalld.service
Loaded: masked (/dev/null; bad)
Active: inactive (dead)
```
I could not find any hint on MPICH FAQ nor anything useful anywhere else.
Would you help me to resolve this issue?
Many thanks.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200420/2642b1dc/attachment.html>
More information about the discuss
mailing list