[mpich-discuss] possible thread bug with MPI_Reduce/MPI_Allreduce
Zhou, Hui
zhouh at anl.gov
Fri Jul 14 15:05:36 CDT 2023
Glad I could help.
Hui
________________________________
From: Burlen Loring <burlen.loring at gmail.com>
Sent: Friday, July 14, 2023 12:31 PM
To: Zhou, Hui <zhouh at anl.gov>; discuss at mpich.org <discuss at mpich.org>
Subject: Re: [mpich-discuss] possible thread bug with MPI_Reduce/MPI_Allreduce
Hi Hui,
The wording in the documentation of MPI_Init_thread, and specifically the part that says with MPI_THREAD_MULTIPLE that "multiple threads may call MPI at once with no restrictions", had me thinking along the wrong lines.
Following your suggestions, I rewrote my code to give each thread exclusive access to one communicator from a small pool of communicators. This is working well and out performed Iallreduce/Ireduce approach.
Thanks so much for the help
Burlen
On 7/12/23 11:55, Zhou, Hui wrote:
Hi Burlen,
You are not allowed to issue collectives concurrently on the same communicator. You can use non-blocking collectives, i.e. MPI_Ireduce and MPI_Iallreduce to achieve the overlapping. You still need serialize the issue of MPI_Ireduce/MPI_Iallreduce, but you don't need mutex to wait for the request to complete in concurrent threads.
--
Hui
________________________________
From: Burlen Loring via discuss <discuss at mpich.org><mailto:discuss at mpich.org>
Sent: Wednesday, July 12, 2023 1:29 PM
To: <discuss at mpich.org><mailto:discuss at mpich.org> <discuss at mpich.org><mailto:discuss at mpich.org>
Cc: Burlen Loring <burlen.loring at gmail.com><mailto:burlen.loring at gmail.com>
Subject: [mpich-discuss] possible thread bug with MPI_Reduce/MPI_Allreduce
Hi All,
I'm using MPICH 4.0.2 on Fedora 37 from the package manager for development. From an MPI parallel simulation I'm spawning a thread that does a number of reductions (MPI_Allreduce and MPI_Reduce). MPI_IN_PLACE option is used. The results are written with POSIX I/O from rank 0. The simulation continues, and can launch the next set of reductions before the previous ones completed. I have called MPI_Init_thread and requested and received MPI_THREAD_MULTIPLE support.
However, when multiple threads overlap (in test runs 3-4 threads running concurrently) both MPI_Allreduce and MPI_Reduce calls can produce incorrect results. If instead I serialize the threads, by waiting on them before returning to the simulation, the results are correct. Also, if I use a mutex around my MPI_Allreduce/Reduce sections, the results are correct. I think that MPI_Reduce/Allreduce is not thread safe.
I was wondering if this is a known issue? Could it be a mpich build/configure setting not set correctly by the Fedora package maintainer ?
Thanks
Burlen
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20230714/63bd27c4/attachment.html>
More information about the discuss
mailing list