[mpich-discuss] Affinity with MPICH_ASYNC_PROGRESS

Jeff Hammond jhammond at alcf.anl.gov
Sat Feb 23 19:01:31 CST 2013


>> My understanding, having looked at the code for this, is that no
>> thread binding is set.  The OS will hopefully do the right thing when
>> the cores are undersubscribed, but I see no thread affinity code in
>> the MPICH source for the comm threads.  MVAPICH has some additional
>> (i.e. non-MPICH) affinity code but I think it is mostly for
>> process-binding.
>>
>> I have a modified version of MPICH that allows the user to set the
>> affinity of the comm threads explicitly.  I was interested in pinning
>> all of the comm threads to one core and letting them fight for time.
>> For example, on an 8-core node, I was hoping to get async progress on
>> 7 processes by pinning 7 comm threads to the 8th core.
>
> Did this work at all?

What is your definition of work?  I have done no performance tests yet
because I the systems I care about the most have their own async
mechanisms.

>> My patches for setting comm thread affinity are pretty simple.  I
>> assume you want me to share them?
>
> Sure, but I'm more interested in whether there will be an API for
> determining this information. That is, suppose I have a node running 4 MPI
> ranks consisting of 15 user threads each . Those 15 threads are split
> between two dies and will be running in two groups: one group of 8 doing
> entirely local work and one group of 7 that communicates frequently. If I
> could detect the affinity of the comm thread, I could choose how to set
> affinity for the application threads to get the layout I wanted.

As far as I know, there is no API in MPICH for controlling thread
affinity.  The right way to improve this could would be to move it
inside of Nemesis and then add support for hwloc for comm threads.  I
assume you can move the parent processes around such that your 7
comm-intensive procs are closer to the NIC though.  You should look at
hwloc though.

My diff w.r.t. the SVN trunk (I did all of this before the SVN->Git
conversion) is below.  It is clearly a hack and I don't care.  It only
works on Linux or other systems that support CPU_SET.  It does not
work on my Mac, for example.

I have not done very much experimenting with this code other than to
verify that it works (as in "does not crash and gives the same result
for cpi").  Eventually, I am going to see how it works with ARMCI-MPI.

[jhammond at flogin1 init]$ svn info
Path: .
URL: https://svn.mcs.anl.gov/repos/mpi/mpich2/trunk/src/mpi/init
Repository Root: https://svn.mcs.anl.gov/repos/mpi
Repository UUID: a5d90c62-d51d-0410-9f91-bf5351168976
Revision: 10808
Node Kind: directory
Schedule: normal
Last Changed Author: balaji
Last Changed Rev: 10804
Last Changed Date: 2012-12-25 22:48:46 -0600 (Tue, 25 Dec 2012)

[jhammond at flogin1 init]$ svn diff *.c
Index: async.c
===================================================================
--- async.c	(revision 10808)
+++ async.c	(working copy)
@@ -4,6 +4,17 @@
  *      See COPYRIGHT in top-level directory.
  */

+#ifndef REMOVE_HACKING
+/*these have to go before the MPICH headers since they include these
without the GNU macro set */
+#  define _GNU_SOURCE
+#  define __USE_GNU
+#  include <unistd.h>
+#  include <sched.h>
+#  if !defined(CPU_ZERO) || !defined(CPU_SET)
+#  error CPU_ZERO and/or CPU_SET not defined
+#  endif
+#endif
+
 #include "mpiimpl.h"
 #include "mpi_init.h"
 #include "mpiu_thread.h"
@@ -19,6 +30,25 @@

 #define WAKE_TAG 100

+/* Jeff: from http://stackoverflow.com/questions/1407786/how-to-set-cpu-affinity-of-a-particular-pthread
*/
+static void bind_thread_to_core(void) {
+    int num_cores = (int) sysconf(_SC_NPROCESSORS_ONLN);
+    int core_id = num_cores-1;
+
+    cpu_set_t cpuset;
+    CPU_ZERO(&cpuset);
+    CPU_SET(core_id, &cpuset);
+
+    pthread_t current_thread = pthread_self();
+    int rc = pthread_setaffinity_np(current_thread,
sizeof(cpu_set_t), &cpuset);
+    if (rc!=0) fprintf(stderr, "pthread_setaffinity_np rc = %d \n", rc);
+    else       fprintf(stderr,"bound async thread to core %d \n", core_id);
+
+    return;
+}
+
+/**************************/
+
 #undef FUNCNAME
 #define FUNCNAME progress_fn
 #undef FCNAME
@@ -29,12 +59,20 @@
     MPID_Request *request_ptr = NULL;
     MPI_Request request;
     MPI_Status status;
+    int same = 0;

     /* Explicitly add CS_ENTER/EXIT since this thread is created from
      * within an internal function and will call NMPI functions
      * directly. */
     MPIU_THREAD_CS_ENTER(ALLFUNC,);

+    /* JEFF: this is where to put the affinity code to pin async CHT
+             to core N-1 on an N-core system                          */
+
+    MPL_env2bool("MPICH_ASYNC_PLACE_SAME", &same);
+    if (same>0)
+        bind_thread_to_core();
+
     /* FIXME: We assume that waiting on some request forces progress
      * on all requests. With fine-grained threads, will this still
      * work as expected? We can imagine an approach where a request on



-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond



More information about the discuss mailing list