[mpich-discuss] MPICH configure

Palmer, Bruce J Bruce.Palmer at pnnl.gov
Thu Apr 16 17:47:50 CDT 2020


I’ve been building MPICH on are aging Infiniband cluster using the following formula

./configure --prefix=/people/d3g293/mpich/mpich-3.3.2/install --with-device=ch4:ofi:sockets --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc CXX=g++

It’s been working pretty well but I recently tried to build mpich-3.3.2 and mpich-3.4a2 and although the build seems to work okay, I’m having problems actually running anything. If I run on 2 nodes the code seems to hang on MPI_Init and it looks like it is producing the error message

[proxy:0:1 at node013.local] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "node013.l

ocal" to "node012.local" (Connection refused)

[proxy:0:1 at node013.local] main (pm/pmiserv/pmip.c:183): unable to connect to server node012.local at p

ort 37769 (check for firewalls!)

srun: error: node013: task 1: Exited with exit code 5

If I run on a single node, things seem to work. Any idea what is going on here? I’ve got a working build of mpich-3.3, so things were okay up until recently. Has something in MPICH changed and my configuration formula is no good, or is this more likely to be due to some system modification?

Bruce Palmer
Senior Research Scientist
Pacific Northwest National Laboratory
Richland, WA 99352
(509) 375-3899

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200416/70718abb/attachment.html>

More information about the discuss mailing list