<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=utf-8">
</head>
<body>
<div dir="auto">OpenMP has no explicit finalization call. The runtime detects library destruction and performs cleanup code. Such cleanup code includes finalization of offloading devices.</div>
<div dir="auto">If your openmp library supports omp_pause_resource_all (https://www.openmp.org/spec-html/5.0/openmpsu153.html#x190-9040003.2.44), you should call this function before the actual mpi_finalize()</div>
<div dir="auto"><br>
</div>
<div dir="auto">-Joachim</div>
<hr style="display:inline-block;width:98%" tabindex="-1">
<div id="divRplyFwdMsg" dir="ltr"><font face="Calibri, sans-serif" style="font-size:11pt" color="#000000"><b>From:</b> Raffenetti, Ken via discuss <discuss@mpich.org><br>
<b>Sent:</b> Wednesday, December 8, 2021 7:02:54 PM<br>
<b>To:</b> discuss@mpich.org <discuss@mpich.org><br>
<b>Cc:</b> Raffenetti, Ken <raffenet@anl.gov><br>
<b>Subject:</b> Re: [mpich-discuss] Error in mutex destruction at the end of MPI Program</font>
<div> </div>
</div>
<div class="BodyFragment"><font size="2"><span style="font-size:11pt;">
<div class="PlainText">We do not see this issue in our regular multithreaded testing. Is it possible the other thread is using MPI while it is being finalized? I imagine that could lead to an error when destroying the mutex. It would also explain the nondeterministic
nature of the error.<br>
<br>
Ken<br>
<br>
On 12/8/21, 7:26 AM, "Pedro Henrique Di Francia Rosso via discuss" <discuss@mpich.org> wrote:<br>
<br>
Hello there,<br>
I'm Pedro, and I work in a research group that researches the use of OpenMP in distributed systems using MPI as a communication layer.
<br>
<br>
In particular, we are working with multithreaded MPICH, where there are two main "users" of MPI in our system, an Event System and a Fault Tolerance (FT) system that work together in separate threads. Those systems carry mainly MPI asynchronous messages
where the requests were freed or tested until the completion. Everything works fine and correctly.<br>
<br>
Recently, sometimes we are getting an assert error in the program ending when calling MPI_Finalize(), here is the error with some callstack: (An important note: this error does not always happen. In fact, it is much more common for the application to finish
correctly, instead of asserting like that)<br>
<br>
<br>
Error in system call pthread_mutex_destroy: Device or resource busy<br>
src/mpi/init/mutex.c:90<br>
Assertion failed in file src/mpi/init/mutex.c at line 91: err == 0<br>
/usr/local/mpi/lib/libmpi.so.12(MPL_backtrace_show+0x35) [0x7f3429505673]<br>
/usr/local/mpi/lib/libmpi.so.12(+0x3248b4) [0x7f34294a38b4]<br>
/usr/local/mpi/lib/libmpi.so.12(MPI_Finalize+0xb8) [0x7f34293a4b28]<br>
/builds/ompcluster/llvm-project/build/projects/openmp/libomptarget/libomptarget.rtl.mpi.so <<a href="http://libomptarget.rtl.mpi.so>(+0xafa5">http://libomptarget.rtl.mpi.so>(+0xafa5</a>) [0x7f342b800fa5]<br>
/lib/x86_64-linux-gnu/libc.so.6(+0x43161) [0x7f342a849161]<br>
/lib/x86_64-linux-gnu/libc.so.6(+0x4325a) [0x7f342a84925a]<br>
/builds/ompcluster/llvm-project/build/projects/openmp/libomptarget/libomptarget.rtl.mpi.so <<a href="http://libomptarget.rtl.mpi.so>(+0xf3d9">http://libomptarget.rtl.mpi.so>(+0xf3d9</a>) [0x7f342b8053d9]<br>
/builds/ompcluster/llvm-project/build/projects/openmp/libomptarget/libomptarget.so.12(__tgt_register_lib+0xf9) [0x7f342b8673c9]<br>
./ompcluster/main() [0x40ef3d]<br>
/lib/x86_64-linux-gnu/libc.so.6(__libc_start_main+0x78) [0x7f342a827b88]<br>
./ompcluster/main() [0x402a5a]<br>
Abort(1) on node 2: Internal error<br>
<br>
<br>
<br>
<br>
I've looked at the mutex.c file and saw that this is a problem when destroying the global mutex employed in the multithread MPI. I would like to ask if there are any known scenarios, or common reasons for this problem to occur, to help me find what could
be happening at the end of the execution.<br>
<br>
Here is the MPICH configuration in our container:<br>
<br>
<br>
$ mpichversion<br>
MPICH Version: 3.4.2<br>
MPICH Release date: Wed May 26 15:51:40 CDT 2021<br>
MPICH Device: ch4:ucx<br>
MPICH configure: --prefix=/usr/local/mpi --disable-static --with-device=ch4:ucx --with-ucx=/usr/local/ucx<br>
MPICH CC: gcc -O2<br>
MPICH CXX: g++ -O2<br>
MPICH F77: gfortran -O2<br>
MPICH FC: gfortran -O2<br>
MPICH Custom Information: <br>
<br>
<br>
<br>
<br>
<br>
<br>
Regards, Pedro<br>
<br>
_______________________________________________<br>
discuss mailing list discuss@mpich.org<br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss">https://lists.mpich.org/mailman/listinfo/discuss</a><br>
</div>
</span></font></div>
</body>
</html>