[mpich-discuss] process-core binding doesn't work on newer multisocket opteron systems with mpich 3.0.x
Róbert Špir
spir.robert at gmail.com
Sat Feb 2 03:35:25 CST 2013
Hello,
we have multiple quad-socket AMD opteron systems with linux operating
system. Since these systems are NUMA architecture, we are using binding of
MPI processes to single NUMA nodes for optimal local memory usage. With our
older systems, with 4 quadcore processors the binding in mpich 3 works as
expected, so when we use "mpiexec ... -bind-to numa ..." the MPI processes
are bound to NUMA nodes.
The output of hwloc lstopo is:
Machine (128GB)
NUMANode L#0 (P#0 32GB)
Socket L#0 + L3 L#0 (2048KB)
L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
(P#0)
L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
(P#1)
L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
(P#2)
L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
(P#3)
HostBridge L#0
PCI 10de:036e
Block L#0 "sr0"
PCI 10de:037f
PCI 10de:037f
PCI 10de:037f
PCIBridge
PCI 1002:515e
GPU L#1 "card0"
PCIBridge
PCI 1000:0058
Block L#2 "sda"
NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (2048KB)
L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
(P#4)
L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
(P#5)
L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
(P#6)
L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
(P#7)
NUMANode L#2 (P#2 32GB) + Socket L#2 + L3 L#2 (2048KB)
L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
(P#8)
L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
(P#9)
L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
L#10 (P#10)
L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
L#11 (P#11)
NUMANode L#3 (P#3 32GB) + Socket L#3 + L3 L#3 (2048KB)
L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
L#12 (P#12)
L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
L#13 (P#13)
L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
L#14 (P#14)
L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
L#15 (P#15)
Now when we try this binding on our newer systems, where we have four 8-core
AMD opterons, we get errors (no form of binding works, even if we use
binding to socket or core)
[proxy:0:0 at embryo09] get_nbobjs_by_type
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:189): assert (nb % x
== 0) failed
[proxy:0:0 at embryo09] handle_bitmap_binding
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:450): unable to get
number of objects
[proxy:0:0 at embryo09] HYDT_topo_hwloc_init
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:527): error binding
with bind "numa" and map "(null)"
[proxy:0:0 at embryo09] HYDT_topo_init
(../../../../src/pm/hydra/tools/topo/topo.c:60): unable to initialize hwloc
[proxy:0:0 at embryo09] launch_procs
(../../../../src/pm/hydra/pm/pmiserv/pmip_cb.c:520): unable to initialize
process topology
[proxy:0:0 at embryo09] HYD_pmcd_pmip_control_cmd_cb
(../../../../src/pm/hydra/pm/pmiserv/pmip_cb.c:893): launch_procs returned
error
[proxy:0:0 at embryo09] HYDT_dmxu_poll_wait_for_event
(../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback returned
error status
[proxy:0:0 at embryo09] main (../../../../src/pm/hydra/pm/pmiserv/pmip.c:206):
demux engine error waiting for event
[mpiexec at embryo09] control_cb
(../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:202): assert (!closed)
failed
[mpiexec at embryo09] HYDT_dmxu_poll_wait_for_event
(../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback returned
error status
[mpiexec at embryo09] HYD_pmci_wait_for_completion
(../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error waiting for
event
[mpiexec at embryo09] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:330):
process manager error waiting for completion
lstopo output is:
Machine (256GB)
Socket L#0 (64GB)
NUMANode L#0 (P#0 32GB)
L3 L#0 (5118KB)
L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
(P#0)
L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
(P#1)
L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
(P#2)
L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
(P#3)
HostBridge L#0
PCIBridge
PCI 8086:10c9
Net L#0 "eth0"
PCI 8086:10c9
Net L#1 "eth1"
PCI 1002:4390
Block L#2 "sda"
PCI 1002:439c
PCIBridge
PCI 102b:0532
NUMANode L#1 (P#1 32GB) + L3 L#1 (5118KB)
L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
(P#4)
L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
(P#5)
L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
(P#6)
L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
(P#7)
Socket L#1 (64GB)
NUMANode L#2 (P#2 32GB) + L3 L#2 (5118KB)
L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
(P#8)
L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
(P#9)
L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
L#10 (P#10)
L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
L#11 (P#11)
NUMANode L#3 (P#3 32GB) + L3 L#3 (5118KB)
L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
L#12 (P#12)
L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
L#13 (P#13)
L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
L#14 (P#14)
L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
L#15 (P#15)
Socket L#2 (64GB)
NUMANode L#4 (P#4 32GB) + L3 L#4 (5118KB)
L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 + PU
L#16 (P#16)
L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 + PU
L#17 (P#17)
L2 L#18 (512KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 + PU
L#18 (P#18)
L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 + PU
L#19 (P#19)
NUMANode L#5 (P#5 32GB) + L3 L#5 (5118KB)
L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 + PU
L#20 (P#20)
L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 + PU
L#21 (P#21)
L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core L#22 + PU
L#22 (P#22)
L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core L#23 + PU
L#23 (P#23)
Socket L#3 (64GB)
NUMANode L#6 (P#6 32GB) + L3 L#6 (5118KB)
L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core L#24 + PU
L#24 (P#24)
L2 L#25 (512KB) + L1d L#25 (64KB) + L1i L#25 (64KB) + Core L#25 + PU
L#25 (P#25)
L2 L#26 (512KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core L#26 + PU
L#26 (P#26)
L2 L#27 (512KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core L#27 + PU
L#27 (P#27)
NUMANode L#7 (P#7 32GB) + L3 L#7 (5118KB)
L2 L#28 (512KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core L#28 + PU
L#28 (P#28)
L2 L#29 (512KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core L#29 + PU
L#29 (P#29)
L2 L#30 (512KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core L#30 + PU
L#30 (P#30)
L2 L#31 (512KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core L#31 + PU
L#31 (P#31)
the last mpich version with working binding on new systems was mpich2
version 1.4.1
Robert Spir
--
Department of Mathematics
Faculty of Civil Engineering
Slovak University of Technology Bratislava
Radlinskeho 11
813 68 Bratislava
Slovakia
http://www.math.sk
More information about the discuss
mailing list