<div dir="ltr">MPICH configure just asked me to choose ch4:ofi vs ch4:ucx vs ch3.  Does anyone have an informed opinion on which one is the faster for shared-memory execution?<div><br></div><div>My specific use case is NWChem on very large multi-socket Xeon nodes, where passive target RMA is the primary communication method, hence my primary concern is asynchronous progress and lack of serialization in RMA.  In the past, I have observed significant performance issues due to serialization of RMA accumulate operations acting on non-overlapping memory regions.</div><div><br></div><div>Jeff<br clear="all"><div><br></div>-- <br><div dir="ltr" class="gmail_signature" data-smartmail="gmail_signature">Jeff Hammond<br><a href="mailto:jeff.science@gmail.com" target="_blank">jeff.science@gmail.com</a><br><a href="http://jeffhammond.github.io/" target="_blank">http://jeffhammond.github.io/</a></div></div></div>