[mpich-discuss] possible thread bug with MPI_Reduce/MPI_Allreduce

Zhou, Hui zhouh at anl.gov
Wed Jul 12 13:55:20 CDT 2023


Hi Burlen,

You are not allowed to issue collectives concurrently on the same communicator.  You can use non-blocking collectives, i.e. MPI_Ireduce and MPI_Iallreduce to achieve the overlapping. You still need serialize the issue of MPI_Ireduce/MPI_Iallreduce, but you don't need mutex to wait for the request to complete in concurrent threads.

--
Hui
________________________________
From: Burlen Loring via discuss <discuss at mpich.org>
Sent: Wednesday, July 12, 2023 1:29 PM
To: <discuss at mpich.org> <discuss at mpich.org>
Cc: Burlen Loring <burlen.loring at gmail.com>
Subject: [mpich-discuss] possible thread bug with MPI_Reduce/MPI_Allreduce

Hi All,

I'm using MPICH 4.0.2 on Fedora 37 from the package manager for development. From an MPI parallel  simulation I'm spawning a thread that does a number of reductions (MPI_Allreduce and MPI_Reduce). MPI_IN_PLACE option is used. The results are written with POSIX I/O from rank 0. The simulation continues, and can launch the next set of reductions before the previous ones completed. I have called MPI_Init_thread and requested and received MPI_THREAD_MULTIPLE support.

However, when multiple threads overlap (in test runs 3-4 threads running concurrently) both MPI_Allreduce and MPI_Reduce calls can produce incorrect results. If instead I serialize the threads, by waiting on them before returning to the simulation, the results are correct. Also, if I use a mutex around my MPI_Allreduce/Reduce sections, the results are correct. I think that MPI_Reduce/Allreduce is not thread safe.

I was wondering if this is a known issue? Could it be a mpich build/configure setting not set correctly by the Fedora package maintainer ?

Thanks
Burlen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20230712/820e78cb/attachment.html>


More information about the discuss mailing list