[mpich-discuss] MPI_Comm_free assertion failure

Zhou, Hui zhouh at anl.gov
Tue Oct 10 16:12:06 CDT 2023


`../mpich-4.1.2/src/mpid/ch3/src/ch3u_request.c at line 480: ` that line is
```
MPIDI_Datatype_get_info(rreq->dev.user_count, rreq->dev.datatype,
                            dt_contig, userbuf_sz, dt_ptr, dt_true_lb);

```
That is a macro and the assertion is inside the macro expansion. Usually, the assertion indicates a wrong datatype handle. Do you think you can dump the datatype as hex value?

--
Hui
________________________________
From: Mccall, Kurt E. (MSFC-EV41) via discuss <discuss at mpich.org>
Sent: Tuesday, October 10, 2023 11:05 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Mccall, Kurt E. (MSFC-EV41) <kurt.e.mccall at nasa.gov>
Subject: [mpich-discuss] MPI_Comm_free assertion failure


Before calling MPI_Finalize, I am calling MPI_Comm_free at both ends (processes) of an intercommunicator created by MPI_Intercomm_create

and am crashing with the following error in one of the processes:



Assertion failed in file ../mpich-4.1.2/src/mpid/ch3/src/ch3u_request.c at line 480:

/opt/mpich/lib/libmpi.so.12(+0x2d3bf3) [0x7f6d10e1fbf3]

/opt/mpich/lib/libmpi.so.12(+0x225124) [0x7f6d10d71124]

/opt/mpich/lib/libmpi.so.12(+0x33016) [0x7f6d10b7f016]

/opt/mpich/lib/libmpi.so.12(+0x29eada) [0x7f6d10deaada]

/opt/mpich/lib/libmpi.so.12(+0x1dfb7c) [0x7f6d10d2bb7c]

/opt/mpich/lib/libmpi.so.12(+0x1d78cd) [0x7f6d10d238cd]

/opt/mpich/lib/libmpi.so.12(PMPI_Comm_free+0xe7) [0x7f6d10bfe6b7]

/home/kmccall/Needles2/NeedlesMpiMM(_ZN7needles12IntercomList20handleFailedIntercomERSt17_Rb_tree_iteratorISt4pairIKiPNS_16IntercomListElemEEEiPKc+0xae) [0x4c348a]

/home/kmccall/Needles2/NeedlesMpiMM(_ZN7needles12IntercomList19disconnectIntercomsERilb+0xcf) [0x4c36fb]

/home/kmccall/Needles2/NeedlesMpiMM(_ZN7needles16NeedlesMpiMaster8finalizeEv+0x25f) [0x496e25]

/home/kmccall/Needles2/NeedlesMpiMM(_ZN7needles16NeedlesMpiMaster4loopEv+0x74e) [0x49355a]

/home/kmccall/Needles2/NeedlesMpiMM(main+0x4af) [0x48d1e5]

/lib64/libc.so.6(__libc_start_main+0xf3) [0x7f6d0f186ca3]

/home/kmccall/Needles2/NeedlesMpiMM(_start+0x2e) [0x48cc7e]

internal ABORT - process 0



I looked in ch3u_request.c and there seems to be no assertion at line 480.   If I forgo the MPI_Comm_free in the first process, the

same error then occurs in the second process.   Does this give you any clue as to what I am doing wrong?



Thanks,

Kurt
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20231010/6bbf76af/attachment.html>


More information about the discuss mailing list