[mpich-devel] fastest shared-memory back-end?
Wesley Bland
work at wesbland.com
Mon Dec 16 10:29:30 CST 2019
In MPICH, OFI vs. UCX makes no difference for shared memory since that’s a different module. CH4 vs. CH3 is the only choice that matters in that case.
CH3 vs. CH4 shared memory is pretty close, but it most instances, CH3 is still a little faster. There’s some changes in progress to improve things, but they’re not ready yet.
Thanks,
Wesley
> On Dec 14, 2019, at 6:51 PM, Jeff Hammond via devel <devel at mpich.org> wrote:
>
> MPICH configure just asked me to choose ch4:ofi vs ch4:ucx vs ch3. Does anyone have an informed opinion on which one is the faster for shared-memory execution?
>
> My specific use case is NWChem on very large multi-socket Xeon nodes, where passive target RMA is the primary communication method, hence my primary concern is asynchronous progress and lack of serialization in RMA. In the past, I have observed significant performance issues due to serialization of RMA accumulate operations acting on non-overlapping memory regions.
>
> Jeff
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>_______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20191216/2d3d7f87/attachment.html>
More information about the devel
mailing list