[mpich-discuss] Hang during MPI_Finalize using ch4:ofi:shm in mpich-4.1.2

Edric Ellis eellis at mathworks.com
Thu Dec 14 02:30:26 CST 2023


I don't think there's any uncompleted communication. The example I show below is one of the very simple examples shipping with MPICH itself (sorry if that wasn't completely clear).

Cheers,
Edric.
________________________________
From: Joachim Jenke via discuss <discuss at mpich.org>
Sent: 13 December 2023 17:16
To: discuss at mpich.org <discuss at mpich.org>; Zhou, Hui <zhouh at anl.gov>
Cc: Joachim Jenke <jenke at itc.rwth-aachen.de>
Subject: Re: [mpich-discuss] Hang during MPI_Finalize using ch4:ofi:shm in mpich-4.1.2

If your code hangs in MPI_Finalize for certain communication
implementation, this sounds like an uncompleted communication. Are you
sure that you have no MPI communication ongoing when calling MPI_Finalize?

- Joachim

Am 13.12.23 um 17:45 schrieb Edric Ellis via discuss:
> Ok, that's good to know, I'll stick with simply "ofi:tcp" for now.
>
> Thanks,
> Edric.
> ------------------------------------------------------------------------
> *From:* Zhou, Hui <zhouh at anl.gov>
> *Sent:* 13 December 2023 15:39
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* Re: Hang during MPI_Finalize using ch4:ofi:shm in mpich-4.1.2
> Hi Edric,
>
> I am not sure which part is hanging, but you don't need to enable
> |ofi:shm|​ (libfabric shm provider). The ch4 device comes with its own
> shared memory functionality.
>
> --
> Hui
> ------------------------------------------------------------------------
> *From:* Edric Ellis via discuss <discuss at mpich.org>
> *Sent:* Wednesday, December 13, 2023 7:05 AM
> *To:* discuss at mpich.org <discuss at mpich.org>
> *Cc:* Edric Ellis <eellis at mathworks.com>
> *Subject:* [mpich-discuss] Hang during MPI_Finalize using ch4:ofi:shm in
> mpich-4.1.2
> I'm working on getting a build of mpich-4.1.2 ready to replace our old
> build of mpich-3.3.2. With older MPICH releases, we used the "nemesis"
> channel via ch3 to provide support for shared-memory configurations as
> well as TCP/IP. In ch4, I thought the nearest equivalent would be:
>
> --with-device=ch4:ofi:tcp,shm
>
> The "tcp" portion of this seems to work just fine, but "shm" hangs
> during (I think) MPI_Finalize, requiring a CTRL-C to kill it. For
> example, in the build area,
>
> $ ./src/pm/hydra/mpiexec.hydra -n 2 ./examples/cpi
> Process 0 of 2 is on uk-eellis-l
> Process 1 of 2 is on uk-eellis-l
> pi is approximately 3.1415926544231318, Error is 0.0000000008333387
> wall clock time = 0.000019
> ^C[mpiexec at uk-eellis-l] Sending Ctrl-C to processes as requested
> [mpiexec at uk-eellis-l] Press Ctrl-C again to force abort
>
> ===================================================================================
> =   BAD TERMINATION OF ONE OF YOUR APPLICATION PROCESSES
> =   PID 829015 RUNNING AT uk-eellis-l
> =   EXIT CODE: 2
> =   CLEANING UP REMAINING PROCESSES
> =   YOU CAN IGNORE THE BELOW CLEANUP MESSAGES
> ===================================================================================
> YOUR APPLICATION TERMINATED WITH THE EXIT STRING: Interrupt (signal 2)
> This typically refers to a problem with your application.
> Please see the FAQ page for debugging suggestions
>
> Things work fine if I force FI_PROVIDER=tcp. Am I missing something?
>
> Here's the configure line I'm using:
>
> $ ./configure --prefix <prefix> --with-device=ch4:ofi:tcp,shm
> --enable-shared --with-libfabric=embedded --enable-fortran --enable-efa=no
>
> This is running on a Debian 11 system, gcc 10.3.0.
>
> Cheers,
> Edric.
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss<https://lists.mpich.org/mailman/listinfo/discuss>

--
Dr. rer. nat. Joachim Jenke

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074 Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
jenke at itc.rwth-aachen.de
www.itc.rwth-aachen.de<http://www.itc.rwth-aachen.de>

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20231214/641f4d12/attachment.html>


More information about the discuss mailing list