[mpich-discuss] affinity problems with mpiexec.hydra 3.0.4
Bill Ryder
bryder at wetafx.co.nz
Mon Sep 16 18:04:45 CDT 2013
Greetings all,
I'm trying set affinity for mybrid MPI/OpenMP tasks.
I want to run two processes on a host, and give one socket to one rank, and the other socket to the other rank.
I have two types of hardware - one works perfectly - the other doesn't.
I first saw the problem using mpiexec.hydra with slurm but I've moved to using ssh to remove some possible variables
I have a trivial script which just greps out /proc/$$/status for the Cpus_allowed mask and Cpus_allowed_list
It's just: echo "`hostname` $PMI_RANK `grep Cpus_allowed /proc/$$/status`"
On the machine that is doing what I want I get:
mpiexec.hydra -ppn 2 -hosts abrams201a --bind-to socket -launcher ssh ./get_mapping
abrams201a 0 Cpus_allowed: 00555555
Cpus_allowed_list: 0,2,4,6,8,10,12,14,16,18,20,22
abrams201a 1 Cpus_allowed: 00aaaaaa
Cpus_allowed_list: 1,3,5,7,9,11,13,15,17,19,21,23
rank 0 gets one socket, rank 1 gets the other socket. This is what I want
But on my other machine with a different topology I get this:
mpiexec.hydra -ppn 2 -hosts jericho101 --bind-to socket -launcher ssh ./get_mapping
jericho101 0 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
jericho101 1 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
So each rank is trying to use the same socket.
Similarly if I try to bind to a numanode
mpiexec.hydra -ppn 2 -hosts jericho101 --bind-to numa -launcher ssh ./get_mapping
jericho101 1 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
jericho101 0 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
Or even if I send numa:2
mpiexec.hydra -ppn 2 -hosts jericho101 --bind-to numa:2 -launcher ssh ./get_mapping
jericho101 0 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
jericho101 1 Cpus_allowed: 00000000,00ff00ff
Cpus_allowed_list: 0-7,16-23
So once again instead of handing a numa node to each process - it's handing the same node to both.
How would I start debugging this?
Or am I missing something really obvious
Thanks!
---------
Bill Ryder
Weta Digital
A bit more data:
mpiexec.hydra --info
HYDRA build details:
Version: 3.0.4
Release Date: Wed Apr 24 10:08:10 CDT 2013
CC: cc
CXX:
F77:
F90:
Configure options: '--prefix=/tech/apps/mpich/hydra'
Process Manager: pmi
Launchers available: ssh rsh fork slurm ll lsf sge manual persist
Topology libraries available: hwloc
Resource management kernels available: user slurm ll lsf sge pbs cobalt
Checkpointing libraries available:
Demux engines available: poll select
I have hwloc 1.3.1 installed locally on each machine
abrams201a looks like:
Machine (48GB)
NUMANode L#0 (P#1 24GB) + Socket L#0 + L3 L#0 (12MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#12)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
PU L#2 (P#2)
PU L#3 (P#14)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
PU L#4 (P#4)
PU L#5 (P#16)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
PU L#6 (P#6)
PU L#7 (P#18)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
PU L#8 (P#8)
PU L#9 (P#20)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
PU L#10 (P#10)
PU L#11 (P#22)
NUMANode L#1 (P#0 24GB) + Socket L#1 + L3 L#1 (12MB)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
PU L#12 (P#1)
PU L#13 (P#13)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
PU L#14 (P#3)
PU L#15 (P#15)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8
PU L#16 (P#5)
PU L#17 (P#17)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9
PU L#18 (P#7)
PU L#19 (P#19)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10
PU L#20 (P#9)
PU L#21 (P#21)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11
PU L#22 (P#11)
PU L#23 (P#23)
HostBridge L#0
PCIBridge
PCI 8086:10e7
Net L#0 "eth0"
PCI 8086:10e7
Net L#1 "eth1"
PCIBridge
PCI 15b3:6746
Net L#2 "eth2"
OpenFabrics L#3 "mlx4_0"
PCI 15b3:6746
PCI 15b3:6746
PCI 15b3:6746
PCIBridge
PCI 102b:0533
PCI 8086:3a20
Block L#4 "sda"
And jericho101 looks like:
achine (96GB)
NUMANode L#0 (P#0 48GB)
Socket L#0 + L3 L#0 (20MB)
L2 L#0 (256KB) + L1 L#0 (32KB) + Core L#0
PU L#0 (P#0)
PU L#1 (P#16)
L2 L#1 (256KB) + L1 L#1 (32KB) + Core L#1
PU L#2 (P#1)
PU L#3 (P#17)
L2 L#2 (256KB) + L1 L#2 (32KB) + Core L#2
PU L#4 (P#2)
PU L#5 (P#18)
L2 L#3 (256KB) + L1 L#3 (32KB) + Core L#3
PU L#6 (P#3)
PU L#7 (P#19)
L2 L#4 (256KB) + L1 L#4 (32KB) + Core L#4
PU L#8 (P#4)
PU L#9 (P#20)
L2 L#5 (256KB) + L1 L#5 (32KB) + Core L#5
PU L#10 (P#5)
PU L#11 (P#21)
L2 L#6 (256KB) + L1 L#6 (32KB) + Core L#6
PU L#12 (P#6)
PU L#13 (P#22)
L2 L#7 (256KB) + L1 L#7 (32KB) + Core L#7
PU L#14 (P#7)
PU L#15 (P#23)
HostBridge L#0
PCIBridge
PCI 14e4:168e
Net L#0 "eth0"
PCI 14e4:168e
Net L#1 "eth1"
PCI 14e4:168e
Net L#2 "eth2"
PCI 14e4:168e
Net L#3 "eth3"
PCI 14e4:168e
Net L#4 "eth4"
PCI 14e4:168e
Net L#5 "eth5"
PCI 14e4:168e
Net L#6 "eth6"
PCI 14e4:168e
Net L#7 "eth7"
PCIBridge
PCI 103c:323b
Block L#8 "sda"
PCIBridge
PCI 102b:0533
NUMANode L#1 (P#1 48GB) + Socket L#1 + L3 L#1 (20MB)
L2 L#8 (256KB) + L1 L#8 (32KB) + Core L#8
PU L#16 (P#8)
PU L#17 (P#24)
L2 L#9 (256KB) + L1 L#9 (32KB) + Core L#9
PU L#18 (P#9)
PU L#19 (P#25)
L2 L#10 (256KB) + L1 L#10 (32KB) + Core L#10
PU L#20 (P#10)
PU L#21 (P#26)
L2 L#11 (256KB) + L1 L#11 (32KB) + Core L#11
PU L#22 (P#11)
PU L#23 (P#27)
L2 L#12 (256KB) + L1 L#12 (32KB) + Core L#12
PU L#24 (P#12)
PU L#25 (P#28)
L2 L#13 (256KB) + L1 L#13 (32KB) + Core L#13
PU L#26 (P#13)
PU L#27 (P#29)
L2 L#14 (256KB) + L1 L#14 (32KB) + Core L#14
PU L#28 (P#14)
PU L#29 (P#30)
L2 L#15 (256KB) + L1 L#15 (32KB) + Core L#15
PU L#30 (P#15)
PU L#31 (P#31)
More information about the discuss
mailing list