[mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1
Zhou, Hui
zhouh at anl.gov
Mon Sep 21 08:59:54 CDT 2020
Hi Kent,
I just tried with PGI 20.1 on mpich v3.3.2. I think I hit a hang once when I was checking it manually, but then I couldn’t reproduce it even after 1000 times repeat. Anyway, we have made some changes to the ch3 header structures that potentially makes the code more standard compliant. Could you try with latest development on github and see if the issue still occur on your end?
--
Hui Zhou
From: Kent Cheung via discuss <discuss at mpich.org>
Reply-To: "discuss at mpich.org" <discuss at mpich.org>
Date: Monday, September 21, 2020 at 4:52 AM
To: "discuss at mpich.org" <discuss at mpich.org>
Cc: Kent Cheung <Kent.Cheung at arm.com>
Subject: Re: [mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1
Are there any updates on this issue? Thanks.
Kent
________________________________
From: Raffenetti, Kenneth J. <raffenet at mcs.anl.gov>
Sent: 24 June 2020 17:04
To: discuss at mpich.org <discuss at mpich.org>
Cc: Kent Cheung <Kent.Cheung at arm.com>
Subject: Re: [mpich-discuss] Intermittent hang in MPI_Finalize with PGI 20.1
Hi Kent,
Thanks for your report. We have not seen this issue with any compiler/OS combination in our nightly tests. We are using PGI 19.4 at this time. I will request 20.1 be installed so we can investigate further.
Ken
On 6/23/20, 8:26 AM, "Kent Cheung via discuss" <discuss at mpich.org> wrote:
I'm running into an issue where processes sometimes hang when calling MPI_Finalize. This happens with both versions 3.3.2 and 3.4a2 on a single node RedHat 7.5 x86-64 machine, when MPICH is compiled with PGI 20.1 with these configuration flags
--enable-debug --enable-shared --enable-debuginfo --enable-sharedlib=gcc
If I change the default optimization level (-O2) by configuring with
--enable-fast=O1
as well, the hang doesn't occur. Another data point is that the hang does not occur with PGI 19.5 with either optimization levels.
I have been testing with the cpi.c code in the examples folder built with just
mpicc cpi.c
mpiexec -n 3 ./a.out
Here is a the backtrace from one of the processes that is hanging
(gdb) bt
#0 MPID_nem_mpich_blocking_recv ()
at /tmp/mpich-3.3.2/build/../src/mpid/ch3/channels/nemesis/include/mpid_nem_inline.h:1038
#1 MPIDI_CH3I_Progress () at ../src/mpid/ch3/channels/nemesis/src/ch3_progress.c:506
#2 0x00000000004fc88d in MPIDI_CH3U_VC_WaitForClose ()
at ../src/mpid/ch3/src/ch3u_handle_connection.c:383
#3 0x0000000000442364 in MPID_Finalize () at ../src/mpid/ch3/src/mpid_finalize.c:110
#4 0x0000000000408621 in PMPI_Finalize () at ../src/mpi/init/finalize.c:260
#5 0x00000000004023e5 in main () at cpi.c:59
Is there a potential fix to be made to MPICH to prevent processes hanging when MPICH is compiled with the default optimization level?
Thanks,
Kent
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
IMPORTANT NOTICE: The contents of this email and any attachments are confidential and may also be privileged. If you are not the intended recipient, please notify the sender immediately and do not disclose the contents to any other person, use it for any purpose, or store or copy the information in any medium. Thank you.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200921/dc6d5d3c/attachment-0001.html>
More information about the discuss
mailing list