[mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1
Raffenetti, Kenneth J.
raffenet at mcs.anl.gov
Wed Jun 24 11:04:55 CDT 2020
Thanks for your report. We have not seen this issue with any compiler/OS combination in our nightly tests. We are using PGI 19.4 at this time. I will request 20.1 be installed so we can investigate further.
On 6/23/20, 8:26 AM, "Kent Cheung via discuss" <discuss at mpich.org> wrote:
I'm running into an issue where processes sometimes hang when calling MPI_Finalize. This happens with both versions 3.3.2 and 3.4a2 on a single node RedHat 7.5 x86-64 machine, when MPICH is compiled with PGI 20.1 with these configuration flags
--enable-debug --enable-shared --enable-debuginfo --enable-sharedlib=gcc
If I change the default optimization level (-O2) by configuring with
as well, the hang doesn't occur. Another data point is that the hang does not occur with PGI 19.5 with either optimization levels.
I have been testing with the cpi.c code in the examples folder built with just
mpiexec -n 3 ./a.out
Here is a the backtrace from one of the processes that is hanging
#0 MPID_nem_mpich_blocking_recv ()
#1 MPIDI_CH3I_Progress () at ../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:506
#2 0x00000000004fc88d in MPIDI_CH3U_VC_WaitForClose ()
#3 0x0000000000442364 in MPID_Finalize () at ../src/mpid/ch3/src/mpid_finalize.c:110
#4 0x0000000000408621 in PMPI_Finalize () at ../src/mpi/init/finalize.c:260
#5 0x00000000004023e5 in main () at cpi.c:59
Is there a potential fix to be made to MPICH to prevent processes hanging when MPICH is compiled with the default optimization level?
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
More information about the discuss