[mpich-discuss] Recommended configure options for MPICH 4.3.x with Valgrind or address-sanitizer
Eric Chamberland
Eric.Chamberland at giref.ulaval.ca
Thu Oct 2 08:33:20 CDT 2025
Hi,
I have been building MPICH with the following configure options for a
long time, mainly to keep my code “Valgrind-clean”:
===
./configure \
--enable-g=dbg,meminit \
--with-device=ch3:sock \
--enable-romio
===
This setup worked reasonably well in the past, but recently I’ve been
seeing occasional errors with address-sanitizer or valgrind (with 4.3.0
on a single node) such as:
===
Fatal error in internal_Allreduce_c: Unknown error class, error stack:
internal_Allreduce_c(347)...................:
MPI_Allreduce_c(sendbuf=0x7ffdeb0b8e90, recvbuf=0x7ffdeb0b8e98, count=1,
dtype=0x4c00083a, MPI_SUM, comm=0x84000003) failed
MPIR_Allreduce_impl(4826)...................:
MPIR_Allreduce_allcomm_auto(4732)...........:
MPIR_Allreduce_intra_recursive_doubling(115):
MPIC_Sendrecv(266)..........................:
MPIC_Wait(90)...............................:
MPIR_Wait(751)..............................:
MPIR_Wait_state(708)........................:
MPIDI_CH3i_Progress_wait(187)...............: an error occurred while
handling an event returned by MPIDI_CH3I_Sock_Wait()
MPIDI_CH3I_Progress_handle_sock_event(385)..:
MPIDI_CH3I_Socki_handle_read(3647)..........: connection failure
(set=0,sock=1,errno=104:Connection reset by peer)
===
Is CH3 considered legacy?
I would like to also ask:
1. What are the recommended configure options in 2025 for building
MPICH in a way that works well with Valgrind?
2. Is it preferable now to move to CH4 (e.g. ch4:ofi or ch4:shm) when
debugging with Valgrind?
3. Are there any other options (besides --enable-g=dbg,meminit) that
you would suggest for catching memory errors while keeping Valgrind
reports as clean as possible?
4. Is
https://urldefense.us/v3/__https://github.com/pmodels/mpich/blob/main/doc/wiki/design/Support_for_Debugging_Memory_Allocation.md__;!!G_uCfscf7eWS!dFuaQZT98RM3djlwTDR166TzLQA81Eo6KgJexqv-7tSs0ksEgsw5P7-HubGwLwtV3eIycYeq5yGo0flX2JqpDUTLrun1lyOw$
<https://urldefense.com/v3/__https:/github.com/pmodels/mpich/blob/main/doc/wiki/design/Support_for_Debugging_Memory_Allocation.md__;!!KGKeukY!0uZHEHtZEaga1beOpdYFXpq7WNGp5jNAW8wQaJk8wgYLGwAEf-QD8rrTOQF7SYFYfdxC1lVvpP3XqxhRMeBGqXCTDdN2eE6IFMZP04X4lX-e$ >
up-to-date?
Any guidance on the “best practice” configuration for this use case
would be greatly appreciated.
PETSc guys have some options about debug
(https://urldefense.us/v3/__https://gitlab.com/petsc/petsc/-/blob/main/config/BuildSystem/config/packages/MPICH.py*L94__;Iw!!G_uCfscf7eWS!dFuaQZT98RM3djlwTDR166TzLQA81Eo6KgJexqv-7tSs0ksEgsw5P7-HubGwLwtV3eIycYeq5yGo0flX2JqpDUTLrlIPsrgu$
<https://urldefense.com/v3/__https:/gitlab.com/petsc/petsc/-/blob/main/config/BuildSystem/config/packages/MPICH.py*L94__;Iw!!KGKeukY!0uZHEHtZEaga1beOpdYFXpq7WNGp5jNAW8wQaJk8wgYLGwAEf-QD8rrTOQF7SYFYfdxC1lVvpP3XqxhRMeBGqXCTDdN2eE6IFMZP07bPQizu$ >)
but still uses CH3 by default. However Satish uses the configuration
described above, at least for valgrind CI.
Thanks a lot,
Eric
--
Eric Chamberland, ing., M. Ing
Professionnel de recherche
GIREF/Université Laval
On 2025-10-02 08:13, Balay, Satish wrote:
> We currently use:
>
> balay at petsc-gpu-02:~$
> /nfs/gce/projects/petsc/soft/u22.04/mpich-4.3.0-p2-ucx/bin/mpichversion
> MPICH Version: 4.3.0
> MPICH Release date: Mon Feb 3 09:09:47 AM CST 2025
> MPICH ABI: 17:0:5
> MPICH Device: ch4:ucx
> MPICH configure:
> --prefix=/nfs/gce/projects/petsc/soft/u22.04/mpich-4.3.0-p2-ucx
> --enable-shared --with-device=ch4:ucx --with-pm=hydra --enable-fast=no
> --enable-error-messages=all --enable-g=meminit --disable-java
> --without-hwloc --disable-opencl --without-cuda --without-hip
> MPICH CC: gcc -O0
> MPICH CXX: g++ -O0
> MPICH F77: gfortran -O0
> MPICH FC: gfortran -O0
> MPICH features: threadcomm
>
> With:
>
> #MPICH OFI/UCX/valgrind
> export FI_PROVIDER=^psm3
> export UCX_SYSV_HUGETLB_MODE=n
> export UCX_LOG_LEVEL=error
>
> Satish
> ------------------------------------------------------------------------
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20251002/b0f38957/attachment.html>
More information about the discuss
mailing list