[mpich-discuss] A few inconveniences with MPICH 3.3 build

Martin Cuma martin.cuma at utah.edu
Tue Feb 12 15:36:03 CST 2019


Hi everyone,

I was wondering if someone could give me a feedback on a couple of quirks 
I found with MPICH 3.3, that I did not observe with 3.2.1.

1. If CUDA is found on a machine where MPICH is built, 
/lib64/libXNVCtrl.so.0 gets included in the MPICH library (libmpi.so). I 
tried the --without-x or --with-x=no options but it does not affect this. 
If I build on a machine that does not have CUDA, /lib64/libXNVCtrl.so.0 is 
not included and the library seems to work the same (on typical MPI 
applications).

This is annoying if we want to have a common MPICH for machines with and 
without CUDA installed, e.g. compute node with and without GPUs.

2. It seems that MPICH now requires the --with-slurm option to pick up the 
SLURM hostlist (= run w/o the -machinefile option). This was not the case 
with 3.2.1, where mpich picked up the SLURM hostlist without needing to be 
built with --with-slurm. The path to SLURM also gets encoded in the RPATH 
of mpirun and friends. I'd rather control that myself if possible.

This is an inconvenience for us as well since we have different clusters 
with SLURMs on different file systems/paths, and I'd like to also use the 
same MPICH build on desktops that don't have SLURM at all. I could put 
$ORIGIN to RPATH and stick that libslurm.so to the MPICH bin directory, 
since we tend to run the same SLURM version on all the clusters, but, 
still, it's clunky.

Again, appreciate if someone could put some light to this or give some 
good workarounds.

Thanks,
MC

-- 
Martin Cuma
Center for High Performance Computing
Department of Geology and Geophysics
University of Utah



More information about the discuss mailing list