[mpich-discuss] libmpi size unusually large when building with CUDA support

Lucas Zacchi de Medeiros lucasd at cadence.com
Mon Sep 18 09:29:06 CDT 2023


Thanks for your answer  Jeff.

I’m,  going to double check the ROCM5 build later on and get back with the results, but for now the build commands used for the cuda version are:

./configure --prefix=${install_dir}/ucx_1.14.1 --without-bfd --with-cuda= nvidia/Linux_x86_64/21.2/cuda/11.2/ --without-knem --without-rocm --enable-gtest --enable-examples
make -j16
make install


I see your comment on github (https://github.com/pmodels/mpich/issues/6675) as well, I’ll change the value of   ­if-yaksa-depth and check the results. Thank you for you insight on that.

Best,
Lucas.

From: Jeff Hammond <jeff.science at gmail.com>
Sent: Monday, 18 September 2023 13:26
To: discuss at mpich.org
Cc: Lucas Zacchi de Medeiros <lucasd at cadence.com>
Subject: Re: [mpich-discuss] libmpi size unusually large when building with CUDA support

EXTERNAL MAIL
If the CPU-only and ROCm builds are the same size, I wonder if ROCm support was compiled in at all.  How did you verify that?

Can you provide the full build commands for each so I can reproduce these?

You might see if -yaksa-depth=1 (or 0 perhaps) changes the results.  There is a performance tradeoff, but if your applications don't use interesting noncontiguous datatypes, it shouldn't matter.

Jeff

On Fri, Sep 15, 2023 at 5:53 PM Lucas Zacchi de Medeiros via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:
I work on a project that supports different architectures, so we've built 3 separate versions of MPICH 4.1.1 (cuda11.2, rocm5.4.3 and cpu-only with ucx1.14.1)
The cpu-only and the rocm5 libraries are around 50MB each. The CUDA version on the other hand, is close to 1.8GB! There doesn’t seem to be anything out of the ordinary with the builds and all versions are working as expected.
This is the output of du -sh performed on the install directories:
First the rocm5 build:
$ du -sh LINUX_gcc9.3_glibc2.28_rocm5.4.3_ucx1.14.1/lib/*
(...)
53M      LINUX_gcc9.3_glibc2.28_rocm5.4.3_ucx1.14.1/lib/libmpi.a
40M      LINUX_gcc9.3_glibc2.28_rocm5.4.3_ucx1.14.1/lib/libmpi.so.12.3.0
And then the CUDA build:
$ du -sh LINUX_gcc9.3_glibc2.17_cuda11.2_ucx1.14.1/lib/*
(...)
1.8G     LINUX_gcc9.3_glibc2.17_cuda11.2_ucx1.14.1/lib/libmpi.a
1.7G     LINUX_gcc9.3_glibc2.17_cuda11.2_ucx1.14.1/lib/libmpi.so.12.3.0


This issue really complicates packaging and distribution since we provide both the archive and the shared libraries, I am looking at more than 3GB for MPICH only.
After some investigation, it doesn’t seem the problem is coming from our end. Is it possible that something in mpich’s build process is causing this excessive file size?

Thanks for the help.

Kind regards,
Lucas

_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss<https://urldefense.com/v3/__https:/lists.mpich.org/mailman/listinfo/discuss__;!!EHscmS1ygiU1lA!EaOkIWaGMsFGOtnvVJqLcS20A0toTn_gIEHsyespF0xr8WerBJidCx6LZnB9gSbIy-LWgxCcNMimtEgH9E1y$>


--
Jeff Hammond
jeff.science at gmail.com<mailto:jeff.science at gmail.com>
http://jeffhammond.github.io/<https://urldefense.com/v3/__http:/jeffhammond.github.io/__;!!EHscmS1ygiU1lA!EaOkIWaGMsFGOtnvVJqLcS20A0toTn_gIEHsyespF0xr8WerBJidCx6LZnB9gSbIy-LWgxCcNMimtJQmD4Bd$>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20230918/b2a1c223/attachment-0001.html>


More information about the discuss mailing list