<html>
<head>
<meta http-equiv="content-type" content="text/html; charset=UTF-8">
</head>
<body>
Hi All,<br>
<br>
I'm trying to accomplish binding each of a number of openmp threads
to a unique physical core. my system has 2 hyper-threads per core, I
want to bind only 1 openmp thread per physical core and skip
hyper-thread "slots".<br>
<br>
My system has 10 cores (20 if you count hyper-threads). I want to
run 2 mpi ranks each with 5 openmp threads, and I want the threads
on rank 0 to bind to core: 0,1,2,3,4 and the threads on rank 1 to
bind to core: 5,6,7,8,9.<br>
<br>
I've tried combinations of -bind-to and -map-by but I'm experiencing
that the openmp threads are always doubled up , the hyperthread
slots are being used.<br>
<br>
For example (a.out here prints the thread affinity):<br>
<br>
$ export OMP_PLACES=threads OMP_PROC_BIND=true OMP_DISPLAY_ENV=TRUE
OMP_NUM_THREADS=5<br>
$ mpiexec -np 2 -bind-to core:5 -map-by hwthread:10 ./a.out | sort
-k 4<br>
<br>
OPENMP DISPLAY ENVIRONMENT BEGIN<br>
_OPENMP = '201511'<br>
OMP_DYNAMIC = 'FALSE'<br>
OMP_NESTED = 'FALSE'<br>
OMP_NUM_THREADS = '5'<br>
OMP_SCHEDULE = 'DYNAMIC'<br>
OMP_PROC_BIND = 'TRUE'<br>
OMP_PLACES = '{0},{10},{1},{11},{2},{12},{3},{13},{4},{14}'<br>
OMP_STACKSIZE = '0'<br>
OMP_WAIT_POLICY = 'PASSIVE'<br>
OMP_THREAD_LIMIT = '4294967295'<br>
OMP_MAX_ACTIVE_LEVELS = '2147483647'<br>
OMP_CANCELLATION = 'FALSE'<br>
OMP_DEFAULT_DEVICE = '0'<br>
OMP_MAX_TASK_PRIORITY = '0'<br>
OMP_DISPLAY_AFFINITY = 'FALSE'<br>
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'<br>
OPENMP DISPLAY ENVIRONMENT END<br>
<br>
OPENMP DISPLAY ENVIRONMENT BEGIN<br>
_OPENMP = '201511'<br>
OMP_DYNAMIC = 'FALSE'<br>
OMP_NESTED = 'FALSE'<br>
OMP_NUM_THREADS = '5'<br>
OMP_SCHEDULE = 'DYNAMIC'<br>
OMP_PROC_BIND = 'TRUE'<br>
OMP_PLACES = '{5},{15},{6},{16},{7},{17},{8},{18},{9},{19}'<br>
OMP_STACKSIZE = '0'<br>
OMP_WAIT_POLICY = 'PASSIVE'<br>
OMP_THREAD_LIMIT = '4294967295'<br>
OMP_MAX_ACTIVE_LEVELS = '2147483647'<br>
OMP_CANCELLATION = 'FALSE'<br>
OMP_DEFAULT_DEVICE = '0'<br>
OMP_MAX_TASK_PRIORITY = '0'<br>
OMP_DISPLAY_AFFINITY = 'FALSE'<br>
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'<br>
OPENMP DISPLAY ENVIRONMENT END<br>
Hello from rank 0, thread 0, on smic.dhcp. (core affinity = 0)<br>
Hello from rank 0, thread 1, on smic.dhcp. (core affinity = 10)<br>
Hello from rank 0, thread 2, on smic.dhcp. (core affinity = 1)<br>
Hello from rank 0, thread 3, on smic.dhcp. (core affinity = 11)<br>
Hello from rank 0, thread 4, on smic.dhcp. (core affinity = 2)<br>
Hello from rank 1, thread 0, on smic.dhcp. (core affinity = 5)<br>
Hello from rank 1, thread 1, on smic.dhcp. (core affinity = 15)<br>
Hello from rank 1, thread 2, on smic.dhcp. (core affinity = 6)<br>
Hello from rank 1, thread 3, on smic.dhcp. (core affinity = 16)<br>
Hello from rank 1, thread 4, on smic.dhcp. (core affinity = 7)<br>
<br>
<b>How do I get mpich to skip the hyper-threads (i.e. cores with
number greater than 9)?</b><br>
<br>
Here's an example that did work:<br>
<blockquote><font face="monospace">export OMP_PROC_BIND=true
OMP_DISPLAY_ENV=TRUE OMP_NUM_THREADS=5</font><br>
<font face="monospace">export
OMP_PLACES="{0},{1},{2},{3},{4},{5},{6},{7},{8},{9}"</font><br>
<br>
<font face="monospace">mpiexec -np 2 -bind-to core:5 -map-by
core:5 ./a.out </font><br>
<br>
<font face="monospace">libgomp: Number of places reduced from 10
to 5 because some places didn't contain any usable logical CPUs</font><br>
<br>
<font face="monospace">OPENMP DISPLAY ENVIRONMENT BEGIN</font><br>
<font face="monospace"> _OPENMP = '201511'</font><br>
<font face="monospace"> OMP_DYNAMIC = 'FALSE'</font><br>
<font face="monospace"> OMP_NESTED = 'FALSE'</font><br>
<font face="monospace"> OMP_NUM_THREADS = '5'</font><br>
<font face="monospace"> OMP_SCHEDULE = 'DYNAMIC'</font><br>
<font face="monospace"> OMP_PROC_BIND = 'TRUE'</font><br>
<font face="monospace"> OMP_PLACES = '{0},{1},{2},{3},{4}</font><br>
<font face="monospace">libgomp: '</font><br>
<font face="monospace"> OMP_STACKSIZE = '0'</font><br>
<font face="monospace"> OMP_WAIT_POLICY = 'PASSIVE'</font><br>
<font face="monospace"> OMP_THREAD_LIMIT = '4294967295'</font><br>
<font face="monospace"> OMP_MAX_ACTIVE_LEVELS = '2147483647'</font><br>
<font face="monospace"> OMP_CANCELLATION = 'FALSE'</font><br>
<font face="monospace"> OMP_DEFAULT_DEVICE = '0'</font><br>
<font face="monospace"> OMP_MAX_TASK_PRIORITY = '0'</font><br>
<font face="monospace"> OMP_DISPLAY_AFFINITY = 'FALSE'</font><br>
<font face="monospace"> OMP_AFFINITY_FORMAT = 'level %L thread %i
affinity %A'</font><br>
<font face="monospace">OPENMP DISPLAY ENVIRONMENT END</font><br>
<font face="monospace">Number of places reduced from 10 to 5
because some places didn't contain any usable logical CPUs</font><br>
<br>
<font face="monospace">OPENMP DISPLAY ENVIRONMENT BEGIN</font><br>
<font face="monospace"> _OPENMP = '201511'</font><br>
<font face="monospace"> OMP_DYNAMIC = 'FALSE'</font><br>
<font face="monospace"> OMP_NESTED = 'FALSE'</font><br>
<font face="monospace"> OMP_NUM_THREADS = '5'</font><br>
<font face="monospace"> OMP_SCHEDULE = 'DYNAMIC'</font><br>
<font face="monospace"> OMP_PROC_BIND = 'TRUE'</font><br>
<font face="monospace"> OMP_PLACES = '{5},{6},{7},{8},{9}'</font><br>
<font face="monospace"> OMP_STACKSIZE = '0'</font><br>
<font face="monospace"> OMP_WAIT_POLICY = 'PASSIVE'</font><br>
<font face="monospace"> OMP_THREAD_LIMIT = '4294967295'</font><br>
<font face="monospace"> OMP_MAX_ACTIVE_LEVELS = '2147483647'</font><br>
<font face="monospace"> OMP_CANCELLATION = 'FALSE'</font><br>
<font face="monospace"> OMP_DEFAULT_DEVICE = '0'</font><br>
<font face="monospace"> OMP_MAX_TASK_PRIORITY = '0'</font><br>
<font face="monospace"> OMP_DISPLAY_AFFINITY = 'FALSE'</font><br>
<font face="monospace"> OMP_AFFINITY_FORMAT = 'level %L thread %i
affinity %A'</font><br>
<font face="monospace">OPENMP DISPLAY ENVIRONMENT END</font><br>
<font face="monospace">Hello from rank 0, thread 1, on smic.dhcp.
(core affinity = 1)</font><br>
<font face="monospace">Hello from rank 0, thread 2, on smic.dhcp.
(core affinity = 2)</font><br>
<font face="monospace">Hello from rank 1, thread 0, on smic.dhcp.
(core affinity = 5)</font><br>
<font face="monospace">Hello from rank 0, thread 3, on smic.dhcp.
(core affinity = 3)</font><br>
<font face="monospace">Hello from rank 0, thread 0, on smic.dhcp.
(core affinity = 0)</font><br>
<font face="monospace">Hello from rank 1, thread 4, on smic.dhcp.
(core affinity = 9)</font><br>
<font face="monospace">Hello from rank 0, thread 4, on smic.dhcp.
(core affinity = 4)</font><br>
<font face="monospace">Hello from rank 1, thread 3, on smic.dhcp.
(core affinity = 8)</font><br>
<font face="monospace">Hello from rank 1, thread 1, on smic.dhcp.
(core affinity = 6)</font><br>
<font face="monospace">Hello from rank 1, thread 2, on smic.dhcp.
(core affinity = 7)</font><br>
</blockquote>
That's exactly what I want!<br>
<br>
<b>Could I accomplish this without setting OMP_PLACES list
explicitly? And in the process hopefully avoid the libgomp
warning. Is there some combination of -bind-to, -map-to that would
achieve this result?</b><br>
<br>
<br>
Thanks!<br>
</body>
</html>