[mpich-discuss] A few inconveniences with MPICH 3.3 build

Brice Goglin Brice.Goglin at inria.fr
Tue Feb 12 15:43:34 CST 2019


Hello

This might be caused by hwloc. hwloc links with libXNVCtrl when
available (because some people wants hwloc to expose the locality a
NVIDIA X11 displays). OpenMPI disables some hwloc backends by passing
things like enable_gl=no before invoking hwloc's configury:

https://github.com/open-mpi/ompi/blob/master/opal/mca/hwloc/hwloc201/configure.m4#L95

enable_gl=no is the one that matters here, but others could be disabled
in MPICH unless the corresponding objects are explicitly used in MPICH.

Brice



Le 12/02/2019 à 22:36, Martin Cuma via discuss a écrit :
> Hi everyone,
>
> I was wondering if someone could give me a feedback on a couple of
> quirks I found with MPICH 3.3, that I did not observe with 3.2.1.
>
> 1. If CUDA is found on a machine where MPICH is built,
> /lib64/libXNVCtrl.so.0 gets included in the MPICH library (libmpi.so).
> I tried the --without-x or --with-x=no options but it does not affect
> this. If I build on a machine that does not have CUDA,
> /lib64/libXNVCtrl.so.0 is not included and the library seems to work
> the same (on typical MPI applications).
>
> This is annoying if we want to have a common MPICH for machines with
> and without CUDA installed, e.g. compute node with and without GPUs.
>
> 2. It seems that MPICH now requires the --with-slurm option to pick up
> the SLURM hostlist (= run w/o the -machinefile option). This was not
> the case with 3.2.1, where mpich picked up the SLURM hostlist without
> needing to be built with --with-slurm. The path to SLURM also gets
> encoded in the RPATH of mpirun and friends. I'd rather control that
> myself if possible.
>
> This is an inconvenience for us as well since we have different
> clusters with SLURMs on different file systems/paths, and I'd like to
> also use the same MPICH build on desktops that don't have SLURM at
> all. I could put $ORIGIN to RPATH and stick that libslurm.so to the
> MPICH bin directory, since we tend to run the same SLURM version on
> all the clusters, but, still, it's clunky.
>
> Again, appreciate if someone could put some light to this or give some
> good workarounds.
>
> Thanks,
> MC
>



More information about the discuss mailing list