<div dir="ltr"><div class="gmail_extra"><br><div class="gmail_quote">On Sat, Feb 23, 2013 at 7:01 PM, Jeff Hammond <span dir="ltr"><<a href="mailto:jhammond@alcf.anl.gov" target="_blank">jhammond@alcf.anl.gov</a>></span> wrote:<br>
<blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex"><div class="im">>> For example, on an 8-core node, I was hoping to get async progress on<br>
>> 7 processes by pinning 7 comm threads to the 8th core.<br>
><br>
> Did this work at all?<br>
<br>
</div>What is your definition of work?</blockquote><div><br></div><div style>Does it make better async progress than the default?</div><div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
<div class="im">
</div>As far as I know, there is no API in MPICH for controlling thread<br>
affinity. The right way to improve this could would be to move it<br>
inside of Nemesis and then add support for hwloc for comm threads. I<br>
assume you can move the parent processes around such that your 7<br>
comm-intensive procs are closer to the NIC though. You should look at<br>
hwloc though.<br></blockquote><div><br></div><div style>I'm thinking about how to interact with other threads _because_ I'm writing hwloc-based affinity management now. The behavior that I think people will want is to use MPI to set process affinity and then tell me to use some affinity policy to _restrict_ the process affinity to each thread. I prefer using MPI to set process affinity because I think it's more messy for software at a higher level than MPI to "figure out" what other processes are running on the same node and agree among themselves how to divvy up resources. If there was an important configuration that could not be supported this way, however, then I suppose I could do it.</div>
<div> </div><blockquote class="gmail_quote" style="margin:0 0 0 .8ex;border-left:1px #ccc solid;padding-left:1ex">
My diff w.r.t. the SVN trunk (I did all of this before the SVN->Git<br>
conversion) is below. It is clearly a hack and I don't care. It only<br>
works on Linux or other systems that support CPU_SET. It does not<br>
work on my Mac, for example.<br>
<br>
I have not done very much experimenting with this code other than to<br>
verify that it works (as in "does not crash and gives the same result<br>
for cpi"). Eventually, I am going to see how it works with ARMCI-MPI.</blockquote></div><br>Cool, I'll be curious.</div></div>