[mpich-discuss] Weird performance of locally concurrent MPI_Put

Kun Feng kfeng1 at hawk.iit.edu
Fri Dec 21 10:15:07 CST 2018


Hi all,

I'm working on a project in which one half of the processes need to send
data to the other half in each node.
I'm using passive target mode of one-sided communication in which the
receivers expose memory using MPI_Win_create, wait on MPI_Win_free and the
senders send the data using MPI_Put.
The code works. However, I get weird performance using this concurrent
MPI_Put communication. The peak aggregate bandwidth is only around 5GB/s.
It does not make sense as an aggregate performance in one single node.
I thought the node-local communication is implemented as local memcpy.
But concurrent memcpy on the same testbed has 4x to 5x higher aggregate
bandwidth.
Even concurrent memcpy using Linux shared memory across processes is 3x
faster than my code.
I'm using CH3 in MPICH 3.2.1. CH4 in MPICH 3.3 is even 2x slower.
Does the performance make sense? Does MPICH has some queue for all
one-sided communication in one node? Or do I understand it incorrectly?

Thanks
Kun
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20181221/c1116298/attachment.html>


More information about the discuss mailing list