[mpich-discuss] Hang during MPI_Finalize using ch4:ofi:shm in mpich-4.1.2

Edric Ellis eellis at mathworks.com
Wed Dec 13 07:05:05 CST 2023


I'm working on getting a build of mpich-4.1.2 ready to replace our old build of mpich-3.3.2. With older MPICH releases, we used the "nemesis" channel via ch3 to provide support for shared-memory configurations as well as TCP/IP. In ch4, I thought the nearest equivalent would be:

--with-device=ch4:ofi:tcp,shm

The "tcp" portion of this seems to work just fine, but "shm" hangs during (I think) MPI_Finalize, requiring a CTRL-C to kill it. For example, in the build area,

$ ./src/pm/hydra/mpiexec.hydra -n 2 ./examples/cpi
Process 0 of 2 is on uk-eellis-l
Process 1 of 2 is on uk-eellis-l
pi is approximately 3.1415926544231318, Error is 0.0000000008333387
wall clock time = 0.000019
^C[mpiexec at uk-eellis-l] Sending Ctrl-C to processes as requested
[mpiexec at uk-eellis-l] Press Ctrl-C again to force abort

===================================================================================
=   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
=   PID 829015 RUNNING AT uk-eellis-l
=   EXIT CODE: 2
=   CLEANING UP REMAINING PROCESSES
=   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
===================================================================================
YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)
This typically refers to a problem with your application.
Please see the FAQ page for debugging suggestions

Things work fine if I force FI_PROVIDER=tcp. Am I missing something?

Here's the configure line I'm using:

$ ./configure --prefix <prefix> --with-device=ch4:ofi:tcp,shm --enable-shared --with-libfabric=embedded --enable-fortran --enable-efa=no

This is running on a Debian 11 system, gcc 10.3.0.

Cheers,
Edric.

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20231213/2de8c0dc/attachment.html>


More information about the discuss mailing list