[mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1

Raffenetti, Kenneth J. raffenet at mcs.anl.gov
Wed Jun 24 11:04:55 CDT 2020

Hi Kent,

Thanks for your report. We have not seen this issue with any compiler/OS combination in our nightly tests. We are using PGI 19.4 at this time. I will request 20.1 be installed so we can investigate further.


On 6/23/20, 8:26 AM, "Kent Cheung via discuss" <discuss at mpich.org> wrote:

    I'm running into an issue where processes sometimes hang when calling MPI_Finalize. This happens with both versions 3.3.2 and 3.4a2 on a single node RedHat 7.5 x86-64 machine, when MPICH is compiled with PGI 20.1 with these configuration flags
     --enable-debug --enable-shared --enable-debuginfo --enable-sharedlib=gcc
    If I change the default optimization level (-O2) by configuring with
    as well, the hang doesn't occur. Another data point is that the hang does not occur with PGI 19.5 with either optimization levels.
    I have been testing with the cpi.c code in the examples folder built with just
    mpicc cpi.c
    mpiexec -n 3 ./a.out
    Here is a the backtrace from one of the processes that is hanging
    (gdb) bt
    #0  MPID_nem_mpich_blocking_recv ()
        at /tmp/mpich-3.3.2/build/../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:1038
    #1  MPIDI_CH3I_Progress () at ../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:506
    #2  0x00000000004fc88d in MPIDI_CH3U_VC_WaitForClose ()
        at ../src/mpid/ch3/src/ch3u_handle_connection.c:383
    #3  0x0000000000442364 in MPID_Finalize () at ../src/mpid/ch3/src/mpid_finalize.c:110
    #4  0x0000000000408621 in PMPI_Finalize () at ../src/mpi/init/finalize.c:260
    #5  0x00000000004023e5 in main () at cpi.c:59
    Is there a potential fix to be made to MPICH to prevent processes hanging when MPICH is compiled with the default optimization level?
