[mpich-discuss] Assertion Failure
Zhou, Hui
zhouh at anl.gov
Thu Jun 19 11:34:29 CDT 2025
Hi Chris,
Can you try to dump the value of *comm at the assertion? I suspect the value is corrupted and it results in a wrong comm handle passed to MPI_Comm_free.
Hui
________________________________
From: Chris Hewson via discuss <discuss at mpich.org>
Sent: Wednesday, June 18, 2025 1:41 PM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Chris Hewson <chris at resfrac.com>
Subject: [mpich-discuss] Assertion Failure
Hi All, I am having a consistent issue when using mpich with petsc. When calling another external library (mkl's pardiso), I will get the following assertion failure: Assertion failed in file src/binding/c/c_binding. c at line 29448: ((*comm)&(0x03ffffff))
ZjQcmQRYFpfptBannerStart
This Message Is From an External Sender
This message came from outside your organization.
ZjQcmQRYFpfptBannerEnd
Hi All,
I am having a consistent issue when using mpich with petsc. When calling another external library (mkl's pardiso), I will get the following assertion failure:
Assertion failed in file src/binding/c/c_binding.c at line 29448: ((*comm)&(0x03ffffff)) < MPIR_COMM_N_BUILTIN
I'm not really too sure why, if I turn assertions off in mpich, the program will seg fault later on. Stack trace for this looks like this:
Assertion failed in file src/binding/c/c_binding.c at line 29448: ((*comm)&(0x03ffffff)) < MPIR_COMM_N_BUILTIN
resfrac73bfaea(+0x1e992e0) [0x55b9357d52e0]
resfrac73bfaea(+0x1e07ca8) [0x55b935743ca8]
resfrac73bfaea(PMPI_Comm_free+0x580) [0x55b935642570]
resfrac73bfaea(MKLMPI_Comm_free+0x38) [0x55b9385830f8]
resfrac73bfaea(mkl_pds_lp64_reduce_rhs_real+0x4aa) [0x55b938edc2aa]
resfrac73bfaea(mkl_pds_lp64_slv_omp_real+0x1c46) [0x55b93869a4c6]
resfrac73bfaea(mkl_pds_lp64_solve_slave+0xd84) [0x55b935ba8514]
resfrac73bfaea(mkl_pds_lp64_cluster_sparse_solver+0xec4) [0x55b935ba4b74]
resfrac73bfaea(+0xdef3de) [0x55b93472b3de]
resfrac73bfaea(MatSolve+0x521) [0x55b9349f17ef]
resfrac73bfaea(+0x1a29db7) [0x55b935365db7]
resfrac73bfaea(PCApply+0x73c) [0x55b934c53065]
resfrac73bfaea(+0x19ee12d) [0x55b93532a12d]
resfrac73bfaea(+0x115a74c) [0x55b934a9674c]
resfrac73bfaea(KSPSolve+0x1b) [0x55b934a98d04]
resfrac73bfaea(_ZN11nextstim_ns18PetscSolveParallelERNS_17SystemOfEquationsE+0x28b7) [0x55b934046767]
resfrac73bfaea(_ZN11nextstim_ns21SlavePoolWaitAndSolveEv+0x5c7) [0x55b9341d5037]
resfrac73bfaea(main+0x5db) [0x55b933eb292b]
/lib/x86_64-linux-gnu/libc.so.6(+0x29d90) [0x7f14c99d5d90]
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x80) [0x7f14c99d5e40]
resfrac73bfaea(_start+0x25) [0x55b933ecb0d5]
The mpich configuration and version that we're using:
MPICH Version: 4.3.0
MPICH Release date: Mon Feb 3 09:09:47 AM CST 2025
MPICH ABI: 17:0:5
MPICH Device: ch3:sock
MPICH configure: --prefix=/opt/anl/petsc-3.23.2 MAKE=/usr/bin/make --libdir=/opt/anl/petsc-3.23.2/lib CC=gcc CFLAGS=-fPIC -Wno-lto-type-mismatch -Wno-stringop-overflow -O0 AR=/usr/bin/ar ARFLAGS=cr CXX=g++ CXXFLAGS=-Wno-lto-type-mismatch -Wno-psabi -O0 -std=gnu++20 FFLAGS=-ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O0 -fallow-argument-mismatch FC=gfortran F77=gfortran FCFLAGS=-ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O0 -fallow-argument-mismatch --disable-shared --with-pm=hydra --disable-java --with-hwloc=embedded --enable-fast=no --enable-error-messages=all --with-device=ch3:sock --enable-g=meminit,dbg PYTHON=/opt/intel/oneapi/intelpython/python3.12/bin/python3 --disable-maintainer-mode --disable-dependency-tracking
MPICH CC: gcc -fPIC -Wno-lto-type-mismatch -Wno-stringop-overflow -O0 -g -O0
MPICH CXX: g++ -Wno-lto-type-mismatch -Wno-psabi -O0 -std=gnu++20 -g -O0
MPICH F77: gfortran -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O0 -fallow-argument-mismatch -g -O0
MPICH FC: gfortran -ffree-line-length-none -ffree-line-length-0 -Wno-lto-type-mismatch -O0 -fallow-argument-mismatch -g -O0
Any information or help on this would be greatly appreciated.
Chris Hewson
Senior Reservoir Simulation Engineer
ResFrac
+1.587.575.9792
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20250619/1c664c33/attachment-0001.html>
More information about the discuss
mailing list