[mpich-devel] fastest shared-memory back-end?

Mon Dec 16 10:29:30 CST 2019

In MPICH, OFI vs. UCX makes no difference for shared memory since that’s a different module. CH4 vs. CH3 is the only choice that matters in that case.

CH3 vs. CH4 shared memory is pretty close, but it most instances, CH3 is still a little faster. There’s some changes in progress to improve things, but they’re not ready yet.

Thanks,
Wesley

> On Dec 14, 2019, at 6:51 PM, Jeff Hammond via devel <devel at mpich.org> wrote:
> 
> MPICH configure just asked me to choose ch4:ofi vs ch4:ucx vs ch3.  Does anyone have an informed opinion on which one is the faster for shared-memory execution?
> 
> My specific use case is NWChem on very large multi-socket Xeon nodes, where passive target RMA is the primary communication method, hence my primary concern is asynchronous progress and lack of serialization in RMA.  In the past, I have observed significant performance issues due to serialization of RMA accumulate operations acting on non-overlapping memory regions.
> 
> Jeff
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/ <http://jeffhammond.github.io/>_______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20191216/2d3d7f87/attachment.html>