<div dir="ltr"><div>Hi,</div><div><br></div><div><div>System: <br></div><div>- AMD cluster (AMD Opteron(tm) Processor 6344) with InfiniPath_QLE7240</div><div>- CentOS Linux release 7.7.1908 (Core)</div><div><br></div><div>Scheduler: <br></div><div>- slurm <br></div><div><br></div><div>MPICH version:</div><div>3.3.2</div><div><br></div><div>MPICH configuration:</div><div>./configure --prefix=/home/yaser/bin/mpich/3.3.2/intel
CC=icc CXX=icpc FC=ifort --with-hwloc=/opt/HWLOC/2.2.0
--with-ucx=/opt/UCX/1.8.0 --with-knem=/opt/KNEM/1.1.3
--with-device=ch4:ucx --enable-mpi-cxx --enable-mpi1-compatibility
--enable-threads=multiple --with-pmi --with-pm</div><div><br></div><div>I am testing using the OSU microbenchmark (<a href="http://mvapich.cse.ohio-state.edu/benchmarks" target="_blank">http://mvapich.cse.ohio-state.edu/benchmarks</a>) </div><div><br></div><div>running the job on `srun` would succeed with no problem.</div><div>```<br></div><div>srun --mpi=pmi2 ./osu_bw</div><div>```</div><div><br></div><div>When I use the `mpiexec`, or `mpirun` it fails with:</div><div>```<br></div><div>[proxy:0:1@pd-compute-3-40.local]
HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from
"pd-compute-3-40.local" to "pd-compute-1-6.local" (Connection refused)<br>[proxy:0:1@pd-compute-3-40.local] main (pm/pmiserv/pmip.c:183): unable to connect to server pd-compute-1-6.local at port 42216 (check for firewalls!)<br>srun: error: pd-compute-3-40: task 1: Exited with exit code 5</div><div>```</div><div><br></div><div>The firewall is off, so it is not the reason.</div><div><br></div><div>```</div><div>> systemctl status firewalld</div><div>● firewalld.service<br> Loaded: masked (/dev/null; bad)<br> Active: inactive (dead)</div><div>```</div><div><br></div><div>I could not find any hint on MPICH FAQ nor anything useful anywhere else. <br></div><div>Would you help me to resolve this issue? <br></div><div><br></div><div>Many thanks. </div></div></div>