[mpich-devel] fastest shared-memory back-end?

Sat Dec 14 18:51:15 CST 2019

MPICH configure just asked me to choose ch4:ofi vs ch4:ucx vs ch3.  Does
anyone have an informed opinion on which one is the faster for
shared-memory execution?

My specific use case is NWChem on very large multi-socket Xeon nodes, where
passive target RMA is the primary communication method, hence my primary
concern is asynchronous progress and lack of serialization in RMA.  In the
past, I have observed significant performance issues due to serialization
of RMA accumulate operations acting on non-overlapping memory regions.

Jeff

-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20191214/63f6bdc2/attachment.html>