[mpich-discuss] process-core binding doesn't work on newer multisocket opteron systems with mpich 3.0.x

Róbert Špir spir.robert at gmail.com
Sat Feb 2 03:35:25 CST 2013


Hello,

we have multiple quad-socket AMD opteron systems with linux operating
system. Since these systems are NUMA architecture, we are using binding of
MPI processes to single NUMA nodes for optimal local memory usage. With our
older systems, with 4 quadcore processors the binding in mpich 3 works as
expected, so when we use "mpiexec ... -bind-to numa ..." the MPI processes
are bound to NUMA nodes.
The output of hwloc lstopo is:
Machine (128GB)
  NUMANode L#0 (P#0 32GB)
    Socket L#0 + L3 L#0 (2048KB)
      L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
(P#0)
      L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
(P#1)
      L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
(P#2)
      L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
(P#3)
    HostBridge L#0
      PCI 10de:036e
        Block L#0 "sr0"
      PCI 10de:037f
      PCI 10de:037f
      PCI 10de:037f
      PCIBridge
        PCI 1002:515e
          GPU L#1 "card0"
      PCIBridge
        PCI 1000:0058
          Block L#2 "sda"
  NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (2048KB)
    L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
(P#4)
    L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
(P#5)
    L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
(P#6)
    L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
(P#7)
  NUMANode L#2 (P#2 32GB) + Socket L#2 + L3 L#2 (2048KB)
    L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
(P#8)
    L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
(P#9)
    L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
L#10 (P#10)
    L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
L#11 (P#11)
  NUMANode L#3 (P#3 32GB) + Socket L#3 + L3 L#3 (2048KB)
    L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
L#12 (P#12)
    L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
L#13 (P#13)
    L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
L#14 (P#14)
    L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
L#15 (P#15)

Now when we try this binding on our newer systems, where we have four 8-core
AMD opterons, we get errors (no form of binding works, even if we use
binding to socket or core)

[proxy:0:0 at embryo09] get_nbobjs_by_type
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:189): assert (nb % x
== 0) failed
[proxy:0:0 at embryo09] handle_bitmap_binding
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:450): unable to get
number of objects
[proxy:0:0 at embryo09] HYDT_topo_hwloc_init
(../../../../src/pm/hydra/tools/topo/hwloc/topo_hwloc.c:527): error binding
with bind "numa" and map "(null)"
[proxy:0:0 at embryo09] HYDT_topo_init
(../../../../src/pm/hydra/tools/topo/topo.c:60): unable to initialize hwloc
[proxy:0:0 at embryo09] launch_procs
(../../../../src/pm/hydra/pm/pmiserv/pmip_cb.c:520): unable to initialize
process topology
[proxy:0:0 at embryo09] HYD_pmcd_pmip_control_cmd_cb
(../../../../src/pm/hydra/pm/pmiserv/pmip_cb.c:893): launch_procs returned
error
[proxy:0:0 at embryo09] HYDT_dmxu_poll_wait_for_event
(../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback returned
error status
[proxy:0:0 at embryo09] main (../../../../src/pm/hydra/pm/pmiserv/pmip.c:206):
demux engine error waiting for event
[mpiexec at embryo09] control_cb
(../../../../src/pm/hydra/pm/pmiserv/pmiserv_cb.c:202): assert (!closed)
failed
[mpiexec at embryo09] HYDT_dmxu_poll_wait_for_event
(../../../../src/pm/hydra/tools/demux/demux_poll.c:77): callback returned
error status
[mpiexec at embryo09] HYD_pmci_wait_for_completion
(../../../../src/pm/hydra/pm/pmiserv/pmiserv_pmci.c:197): error waiting for
event
[mpiexec at embryo09] main (../../../../src/pm/hydra/ui/mpich/mpiexec.c:330):
process manager error waiting for completion

lstopo output is:
Machine (256GB)
  Socket L#0 (64GB)
    NUMANode L#0 (P#0 32GB)
      L3 L#0 (5118KB)
        L2 L#0 (512KB) + L1d L#0 (64KB) + L1i L#0 (64KB) + Core L#0 + PU L#0
(P#0)
        L2 L#1 (512KB) + L1d L#1 (64KB) + L1i L#1 (64KB) + Core L#1 + PU L#1
(P#1)
        L2 L#2 (512KB) + L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2
(P#2)
        L2 L#3 (512KB) + L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3
(P#3)
      HostBridge L#0
        PCIBridge
          PCI 8086:10c9
            Net L#0 "eth0"
          PCI 8086:10c9
            Net L#1 "eth1"
        PCI 1002:4390
          Block L#2 "sda"
        PCI 1002:439c
        PCIBridge
          PCI 102b:0532
    NUMANode L#1 (P#1 32GB) + L3 L#1 (5118KB)
      L2 L#4 (512KB) + L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4
(P#4)
      L2 L#5 (512KB) + L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5
(P#5)
      L2 L#6 (512KB) + L1d L#6 (64KB) + L1i L#6 (64KB) + Core L#6 + PU L#6
(P#6)
      L2 L#7 (512KB) + L1d L#7 (64KB) + L1i L#7 (64KB) + Core L#7 + PU L#7
(P#7)
  Socket L#1 (64GB)
    NUMANode L#2 (P#2 32GB) + L3 L#2 (5118KB)
      L2 L#8 (512KB) + L1d L#8 (64KB) + L1i L#8 (64KB) + Core L#8 + PU L#8
(P#8)
      L2 L#9 (512KB) + L1d L#9 (64KB) + L1i L#9 (64KB) + Core L#9 + PU L#9
(P#9)
      L2 L#10 (512KB) + L1d L#10 (64KB) + L1i L#10 (64KB) + Core L#10 + PU
L#10 (P#10)
      L2 L#11 (512KB) + L1d L#11 (64KB) + L1i L#11 (64KB) + Core L#11 + PU
L#11 (P#11)
    NUMANode L#3 (P#3 32GB) + L3 L#3 (5118KB)
      L2 L#12 (512KB) + L1d L#12 (64KB) + L1i L#12 (64KB) + Core L#12 + PU
L#12 (P#12)
      L2 L#13 (512KB) + L1d L#13 (64KB) + L1i L#13 (64KB) + Core L#13 + PU
L#13 (P#13)
      L2 L#14 (512KB) + L1d L#14 (64KB) + L1i L#14 (64KB) + Core L#14 + PU
L#14 (P#14)
      L2 L#15 (512KB) + L1d L#15 (64KB) + L1i L#15 (64KB) + Core L#15 + PU
L#15 (P#15)
  Socket L#2 (64GB)
    NUMANode L#4 (P#4 32GB) + L3 L#4 (5118KB)
      L2 L#16 (512KB) + L1d L#16 (64KB) + L1i L#16 (64KB) + Core L#16 + PU
L#16 (P#16)
      L2 L#17 (512KB) + L1d L#17 (64KB) + L1i L#17 (64KB) + Core L#17 + PU
L#17 (P#17)
      L2 L#18 (512KB) + L1d L#18 (64KB) + L1i L#18 (64KB) + Core L#18 + PU
L#18 (P#18)
      L2 L#19 (512KB) + L1d L#19 (64KB) + L1i L#19 (64KB) + Core L#19 + PU
L#19 (P#19)
    NUMANode L#5 (P#5 32GB) + L3 L#5 (5118KB)
      L2 L#20 (512KB) + L1d L#20 (64KB) + L1i L#20 (64KB) + Core L#20 + PU
L#20 (P#20)
      L2 L#21 (512KB) + L1d L#21 (64KB) + L1i L#21 (64KB) + Core L#21 + PU
L#21 (P#21)
      L2 L#22 (512KB) + L1d L#22 (64KB) + L1i L#22 (64KB) + Core L#22 + PU
L#22 (P#22)
      L2 L#23 (512KB) + L1d L#23 (64KB) + L1i L#23 (64KB) + Core L#23 + PU
L#23 (P#23)
  Socket L#3 (64GB)
    NUMANode L#6 (P#6 32GB) + L3 L#6 (5118KB)
      L2 L#24 (512KB) + L1d L#24 (64KB) + L1i L#24 (64KB) + Core L#24 + PU
L#24 (P#24)
      L2 L#25 (512KB) + L1d L#25 (64KB) + L1i L#25 (64KB) + Core L#25 + PU
L#25 (P#25)
      L2 L#26 (512KB) + L1d L#26 (64KB) + L1i L#26 (64KB) + Core L#26 + PU
L#26 (P#26)
      L2 L#27 (512KB) + L1d L#27 (64KB) + L1i L#27 (64KB) + Core L#27 + PU
L#27 (P#27)
    NUMANode L#7 (P#7 32GB) + L3 L#7 (5118KB)
      L2 L#28 (512KB) + L1d L#28 (64KB) + L1i L#28 (64KB) + Core L#28 + PU
L#28 (P#28)
      L2 L#29 (512KB) + L1d L#29 (64KB) + L1i L#29 (64KB) + Core L#29 + PU
L#29 (P#29)
      L2 L#30 (512KB) + L1d L#30 (64KB) + L1i L#30 (64KB) + Core L#30 + PU
L#30 (P#30)
      L2 L#31 (512KB) + L1d L#31 (64KB) + L1i L#31 (64KB) + Core L#31 + PU
L#31 (P#31)

the last mpich version with working binding on new systems was mpich2
version 1.4.1

Robert Spir
--
Department of Mathematics
Faculty of Civil Engineering
Slovak University of Technology Bratislava
Radlinskeho 11
813 68 Bratislava
Slovakia
http://www.math.sk




More information about the discuss mailing list