[mpich-discuss] MPICH configure

Zhou, Hui zhouh at anl.gov
Mon Apr 20 12:24:16 CDT 2020

The error is from `hydra`, which should not have changed much between the versions. Could you verify that 3.3.1 still works for you?

Hui Zhou

From: "Palmer, Bruce J via discuss" <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Thursday, April 16, 2020 at 5:48 PM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: "Palmer, Bruce J" <Bruce.Palmer at pnnl.gov>
Subject: [mpich-discuss] MPICH configure


I’ve been building MPICH on are aging Infiniband cluster using the following formula

./configure --prefix=/people/d3g293/mpich/mpich-3.3.2/install --with-device=ch4:ofi:sockets --with-libfabric=embedded --enable-threads=multiple --with-slurm CC=gcc CXX=g++

It’s been working pretty well but I recently tried to build mpich-3.3.2 and mpich-3.4a2 and although the build seems to work okay, I’m having problems actually running anything. If I run on 2 nodes the code seems to hang on MPI_Init and it looks like it is producing the error message

[proxy:0:1 at node013.local] HYDU_sock_connect (utils/sock/sock.c:145): unable to connect from "node013.l

ocal" to "node012.local" (Connection refused)

[proxy:0:1 at node013.local] main (pm/pmiserv/pmip.c:183): unable to connect to server node012.local at p

ort 37769 (check for firewalls!)

srun: error: node013: task 1: Exited with exit code 5

If I run on a single node, things seem to work. Any idea what is going on here? I’ve got a working build of mpich-3.3, so things were okay up until recently. Has something in MPICH changed and my configuration formula is no good, or is this more likely to be due to some system modification?

Bruce Palmer
Senior Research Scientist
Pacific Northwest National Laboratory
Richland, WA 99352
(509) 375-3899

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200420/cc128de3/attachment.html>

More information about the discuss mailing list