[mpich-discuss] Weird performance of locally concurrent MPI_Put

Fri Dec 21 11:15:04 CST 2018

Use MPI_Win_allocate instead of MPI_Win_create.  MPI_Win_create cannot
allocate shared memory so you will not get good performance within a node.

Jeff

On Fri, Dec 21, 2018 at 8:18 AM Kun Feng via discuss <discuss at mpich.org>
wrote:

> Hi all,
>
> I'm working on a project in which one half of the processes need to send
> data to the other half in each node.
> I'm using passive target mode of one-sided communication in which the
> receivers expose memory using MPI_Win_create, wait on MPI_Win_free and the
> senders send the data using MPI_Put.
> The code works. However, I get weird performance using this concurrent
> MPI_Put communication. The peak aggregate bandwidth is only around 5GB/s.
> It does not make sense as an aggregate performance in one single node.
> I thought the node-local communication is implemented as local memcpy.
> But concurrent memcpy on the same testbed has 4x to 5x higher aggregate
> bandwidth.
> Even concurrent memcpy using Linux shared memory across processes is 3x
> faster than my code.
> I'm using CH3 in MPICH 3.2.1. CH4 in MPICH 3.3 is even 2x slower.
> Does the performance make sense? Does MPICH has some queue for all
> one-sided communication in one node? Or do I understand it incorrectly?
>
> Thanks
> Kun
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20181221/593919e7/attachment.html>