[mpich-discuss] some trouble with mpich + openmp thread affinity
Burlen Loring
bloring at lbl.gov
Tue Jan 5 13:10:49 CST 2021
Hi All,
I'm trying to accomplish binding each of a number of openmp threads to a
unique physical core. my system has 2 hyper-threads per core, I want to
bind only 1 openmp thread per physical core and skip hyper-thread "slots".
My system has 10 cores (20 if you count hyper-threads). I want to run 2
mpi ranks each with 5 openmp threads, and I want the threads on rank 0
to bind to core: 0,1,2,3,4 and the threads on rank 1 to bind to core:
5,6,7,8,9.
I've tried combinations of -bind-to and -map-by but I'm experiencing
that the openmp threads are always doubled up , the hyperthread slots
are being used.
For example (a.out here prints the thread affinity):
$ export OMP_PLACES=threads OMP_PROC_BIND=true OMP_DISPLAY_ENV=TRUE
OMP_NUM_THREADS=5
$ mpiexec -np 2 -bind-to core:5 -map-by hwthread:10 ./a.out | sort -k 4
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP = '201511'
OMP_DYNAMIC = 'FALSE'
OMP_NESTED = 'FALSE'
OMP_NUM_THREADS = '5'
OMP_SCHEDULE = 'DYNAMIC'
OMP_PROC_BIND = 'TRUE'
OMP_PLACES = '{0},{10},{1},{11},{2},{12},{3},{13},{4},{14}'
OMP_STACKSIZE = '0'
OMP_WAIT_POLICY = 'PASSIVE'
OMP_THREAD_LIMIT = '4294967295'
OMP_MAX_ACTIVE_LEVELS = '2147483647'
OMP_CANCELLATION = 'FALSE'
OMP_DEFAULT_DEVICE = '0'
OMP_MAX_TASK_PRIORITY = '0'
OMP_DISPLAY_AFFINITY = 'FALSE'
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
OPENMP DISPLAY ENVIRONMENT END
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP = '201511'
OMP_DYNAMIC = 'FALSE'
OMP_NESTED = 'FALSE'
OMP_NUM_THREADS = '5'
OMP_SCHEDULE = 'DYNAMIC'
OMP_PROC_BIND = 'TRUE'
OMP_PLACES = '{5},{15},{6},{16},{7},{17},{8},{18},{9},{19}'
OMP_STACKSIZE = '0'
OMP_WAIT_POLICY = 'PASSIVE'
OMP_THREAD_LIMIT = '4294967295'
OMP_MAX_ACTIVE_LEVELS = '2147483647'
OMP_CANCELLATION = 'FALSE'
OMP_DEFAULT_DEVICE = '0'
OMP_MAX_TASK_PRIORITY = '0'
OMP_DISPLAY_AFFINITY = 'FALSE'
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
OPENMP DISPLAY ENVIRONMENT END
Hello from rank 0, thread 0, on smic.dhcp. (core affinity = 0)
Hello from rank 0, thread 1, on smic.dhcp. (core affinity = 10)
Hello from rank 0, thread 2, on smic.dhcp. (core affinity = 1)
Hello from rank 0, thread 3, on smic.dhcp. (core affinity = 11)
Hello from rank 0, thread 4, on smic.dhcp. (core affinity = 2)
Hello from rank 1, thread 0, on smic.dhcp. (core affinity = 5)
Hello from rank 1, thread 1, on smic.dhcp. (core affinity = 15)
Hello from rank 1, thread 2, on smic.dhcp. (core affinity = 6)
Hello from rank 1, thread 3, on smic.dhcp. (core affinity = 16)
Hello from rank 1, thread 4, on smic.dhcp. (core affinity = 7)
*How do I get mpich to skip the hyper-threads (i.e. cores with number
greater than 9)?*
Here's an example that did work:
export OMP_PROC_BIND=true OMP_DISPLAY_ENV=TRUE OMP_NUM_THREADS=5
export OMP_PLACES="{0},{1},{2},{3},{4},{5},{6},{7},{8},{9}"
mpiexec -np 2 -bind-to core:5 -map-by core:5 ./a.out
libgomp: Number of places reduced from 10 to 5 because some places
didn't contain any usable logical CPUs
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP = '201511'
OMP_DYNAMIC = 'FALSE'
OMP_NESTED = 'FALSE'
OMP_NUM_THREADS = '5'
OMP_SCHEDULE = 'DYNAMIC'
OMP_PROC_BIND = 'TRUE'
OMP_PLACES = '{0},{1},{2},{3},{4}
libgomp: '
OMP_STACKSIZE = '0'
OMP_WAIT_POLICY = 'PASSIVE'
OMP_THREAD_LIMIT = '4294967295'
OMP_MAX_ACTIVE_LEVELS = '2147483647'
OMP_CANCELLATION = 'FALSE'
OMP_DEFAULT_DEVICE = '0'
OMP_MAX_TASK_PRIORITY = '0'
OMP_DISPLAY_AFFINITY = 'FALSE'
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
OPENMP DISPLAY ENVIRONMENT END
Number of places reduced from 10 to 5 because some places didn't
contain any usable logical CPUs
OPENMP DISPLAY ENVIRONMENT BEGIN
_OPENMP = '201511'
OMP_DYNAMIC = 'FALSE'
OMP_NESTED = 'FALSE'
OMP_NUM_THREADS = '5'
OMP_SCHEDULE = 'DYNAMIC'
OMP_PROC_BIND = 'TRUE'
OMP_PLACES = '{5},{6},{7},{8},{9}'
OMP_STACKSIZE = '0'
OMP_WAIT_POLICY = 'PASSIVE'
OMP_THREAD_LIMIT = '4294967295'
OMP_MAX_ACTIVE_LEVELS = '2147483647'
OMP_CANCELLATION = 'FALSE'
OMP_DEFAULT_DEVICE = '0'
OMP_MAX_TASK_PRIORITY = '0'
OMP_DISPLAY_AFFINITY = 'FALSE'
OMP_AFFINITY_FORMAT = 'level %L thread %i affinity %A'
OPENMP DISPLAY ENVIRONMENT END
Hello from rank 0, thread 1, on smic.dhcp. (core affinity = 1)
Hello from rank 0, thread 2, on smic.dhcp. (core affinity = 2)
Hello from rank 1, thread 0, on smic.dhcp. (core affinity = 5)
Hello from rank 0, thread 3, on smic.dhcp. (core affinity = 3)
Hello from rank 0, thread 0, on smic.dhcp. (core affinity = 0)
Hello from rank 1, thread 4, on smic.dhcp. (core affinity = 9)
Hello from rank 0, thread 4, on smic.dhcp. (core affinity = 4)
Hello from rank 1, thread 3, on smic.dhcp. (core affinity = 8)
Hello from rank 1, thread 1, on smic.dhcp. (core affinity = 6)
Hello from rank 1, thread 2, on smic.dhcp. (core affinity = 7)
That's exactly what I want!
*Could I accomplish this without setting OMP_PLACES list explicitly? And
in the process hopefully avoid the libgomp warning. Is there some
combination of -bind-to, -map-to that would achieve this result?*
Thanks!
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20210105/711dfc87/attachment.html>
More information about the discuss
mailing list