[mpich-discuss] the MPI daemon triggers an assertion on an ARM-based Linux system

Zhou, Hui zhouh at anl.gov
Wed Jun 16 14:37:26 CDT 2021


I believe this is the same issue as https://github.com/pmodels/mpich/issues/5309. While we are resolving it, you could try the patch mentioned in the issue, or configure with ch4 using the latest release.

--
Hui Zhou


From: Fabrice Ducos via discuss <discuss at mpich.org>
Date: Wednesday, June 16, 2021 at 3:54 AM
To: discuss at mpich.org <discuss at mpich.org>
Cc: Fabrice Ducos <fabrice.ducos at univ-lille.fr>
Subject: [mpich-discuss] the MPI daemon triggers an assertion on an ARM-based Linux system
Greetings,

when running an application (atmospheric science) with MPICH on an AWS ARM (not x86) instance (with Linux Ubuntu Server 20.04),
our process crashes at the end of the processing.

MPICH was installed precompiled with the Ubuntu apt provisioning tool:
$ sudo apt install -y mpich

$ apt list
[only relevant lines displayed for brevity]
libmpich-dev/focal,now 3.3.2-2build1 arm64 [residual-config]
libmpich12/focal,now 3.3.2-2build1 arm64 [installed,auto-removable]
mpich-doc/focal 3.3.2-2build1 all
mpich/focal 3.3.2-2build1 arm64

Luckily, we got some debug information in mpid that can be valuable:

Assertion failed in file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at line 530: payload_len >= sizeof (MPIDI_CH3_Pkt_t)
0xffff86833f5f ???
???:0
0xffff86881eef ???
???:0
0xffff8683793f ???
???:0
0xffff8686c543 ???
???:0
0xffff8676d4b3 ???
???:0
0xffff8637e6eb ???
???:0
0xffff8637e85b ???
???:0
0xffff86369093 ???
???:0
0xaaaac5491e6f ???
???:0
internal ABORT - process 22

The same application has been used for years with several MPI implementations (MPICH, OpenMPI, Intel MPI) on x86 systems without problem.

It was successfully tested with MPICH on Linux Ubuntu Server 20.04 x86 shortly alongside the ARM test.
We also tested the application with another MPI implementation (namely, OpenMPI) on the same ARM instance and it did work.

We are perfectly fine using another MPI implementation in this specific case, but we thought that this issue would be of some interest to the MPICH maintenance team.

Best regards

Fabrice Ducos
Ingénieur d’études CNRS
Laboratoire d’Optique Atmosphérique - UMR CNRS 8518

Faculté des Sciences et Technologies
Bâtiment P5 - Bureau 325
Université de Lille - Cité Scientifique
59655 Villeneuve d’Ascq



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210616/7aa62056/attachment.html>


More information about the discuss mailing list