[mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1

Kent Cheung Kent.Cheung at arm.com
Mon Sep 21 04:51:15 CDT 2020


Are there any updates on this issue? Thanks.

Kent

________________________________
From: Raffenetti, Kenneth J. <raffenet at mcs.anl.gov>
Sent: 24 June 2020 17:04
To: discuss at mpich.org <discuss at mpich.org>
Cc: Kent Cheung <Kent.Cheung at arm.com>
Subject: Re: [mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1

Hi Kent,

Thanks for your report. We have not seen this issue with any compiler/OS combination in our nightly tests. We are using PGI 19.4 at this time. I will request 20.1 be installed so we can investigate further.

Ken

On 6/23/20, 8:26 AM, "Kent Cheung via discuss" <discuss at mpich.org> wrote:

    I'm running into an issue where processes sometimes hang when calling MPI_Finalize. This happens with both versions 3.3.2 and 3.4a2 on a single node RedHat 7.5 x86-64 machine, when MPICH is compiled with PGI 20.1 with these configuration flags


     --enable-debug --enable-shared --enable-debuginfo --enable-sharedlib=gcc

    If I change the default optimization level (-O2) by configuring with

    --enable-fast=O1


    as well, the hang doesn't occur. Another data point is that the hang does not occur with PGI 19.5 with either optimization levels.

    I have been testing with the cpi.c code in the examples folder built with just

    mpicc cpi.c
    mpiexec -n 3 ./a.out


    Here is a the backtrace from one of the processes that is hanging

    (gdb) bt
    #0  MPID_nem_mpich_blocking_recv ()

        at /tmp/mpich-3.3.2/build/../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:1038

    #1  MPIDI_CH3I_Progress () at ../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:506

    #2  0x00000000004fc88d in MPIDI_CH3U_VC_WaitForClose ()

        at ../src/mpid/ch3/src/ch3u_handle_connection.c:383

    #3  0x0000000000442364 in MPID_Finalize () at ../src/mpid/ch3/src/mpid_finalize.c:110

    #4  0x0000000000408621 in PMPI_Finalize () at ../src/mpi/init/finalize.c:260

    #5  0x00000000004023e5 in main () at cpi.c:59



    Is there a potential fix to be made to MPICH to prevent processes hanging when MPICH is compiled with the default optimization level?

    Thanks,
    Kent



    IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.

IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200921/e7838223/attachment.html>


More information about the discuss mailing list