[mpich-discuss] Weird performance of locally concurrent MPI_Put

Si, Min msi at anl.gov
Fri Dec 21 11:14:33 CST 2018


Hi Kun,

Can you please try MPI_Win_allocate instead of MPI_Win_create in your program ? Single memory copy based MPI_Put is enabled only for MPI_Win_allocate in current MPICH.

Best regards,
Min

On 2018/12/21 10:15, Kun Feng via discuss wrote:
Hi all,

I'm working on a project in which one half of the processes need to send data to the other half in each node.
I'm using passive target mode of one-sided communication in which the receivers expose memory using MPI_Win_create, wait on MPI_Win_free and the senders send the data using MPI_Put.
The code works. However, I get weird performance using this concurrent MPI_Put communication. The peak aggregate bandwidth is only around 5GB/s. It does not make sense as an aggregate performance in one single node.
I thought the node-local communication is implemented as local memcpy.
But concurrent memcpy on the same testbed has 4x to 5x higher aggregate bandwidth.
Even concurrent memcpy using Linux shared memory across processes is 3x faster than my code.
I'm using CH3 in MPICH 3.2.1. CH4 in MPICH 3.3 is even 2x slower.
Does the performance make sense? Does MPICH has some queue for all one-sided communication in one node? Or do I understand it incorrectly?

Thanks
Kun



_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20181221/f3921ac5/attachment.html>


More information about the discuss mailing list