[mpich-discuss] the MPI daemon triggers an assertion on an ARM-based Linux system

Fabrice Ducos fabrice.ducos at univ-lille.fr
Wed Jun 16 03:54:28 CDT 2021


Greetings,

when running an application (atmospheric science) with MPICH on an AWS ARM (not x86) instance (with Linux Ubuntu Server 20.04),
our process crashes at the end of the processing.

MPICH was installed precompiled with the Ubuntu apt provisioning tool:
$ sudo apt install -y mpich

$ apt list
[only relevant lines displayed for brevity]
libmpich-dev/focal,now 3.3.2-2build1 arm64 [residual-config]
libmpich12/focal,now 3.3.2-2build1 arm64 [installed,auto-removable]
mpich-doc/focal 3.3.2-2build1 all
mpich/focal 3.3.2-2build1 arm64

Luckily, we got some debug information in mpid that can be valuable:

Assertion failed in file src/mpid/ch3/channels/nemesis/src/ch3_progress.c at line 530: payload_len >= sizeof (MPIDI_CH3_Pkt_t)
0xffff86833f5f ???
	???:0
0xffff86881eef ???
	???:0
0xffff8683793f ???
	???:0
0xffff8686c543 ???
	???:0
0xffff8676d4b3 ???
	???:0
0xffff8637e6eb ???
	???:0
0xffff8637e85b ???
	???:0
0xffff86369093 ???
	???:0
0xaaaac5491e6f ???
	???:0
internal ABORT - process 22

The same application has been used for years with several MPI implementations (MPICH, OpenMPI, Intel MPI) on x86 systems without problem.

It was successfully tested with MPICH on Linux Ubuntu Server 20.04 x86 shortly alongside the ARM test.
We also tested the application with another MPI implementation (namely, OpenMPI) on the same ARM instance and it did work.

We are perfectly fine using another MPI implementation in this specific case, but we thought that this issue would be of some interest to the MPICH maintenance team.

Best regards

Fabrice Ducos
Ingénieur d’études CNRS
Laboratoire d’Optique Atmosphérique - UMR CNRS 8518

Faculté des Sciences et Technologies
Bâtiment P5 - Bureau 325
Université de Lille - Cité Scientifique
59655 Villeneuve d’Ascq



-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210616/73bc4761/attachment.html>


More information about the discuss mailing list