[mpich-discuss] mpiexec.hydra binding for multiple compute nodes

Kenneth Raffenetti raffenet at mcs.anl.gov
Mon May 11 21:27:13 CDT 2015


Ah, I see now the problem. I misread the first email. Your original line 
should work fine! The user bindings are listings of hw elements, not 
processes, so your binding will be applied identically on each node.

Ken

On 05/11/2015 05:03 PM, Justin Chang wrote:
> Ken,
>
> "-bind-to core" gives me the following topology:
>
> process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
> process 8 binding: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
> process 9 binding: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
> process 10 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
> process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
> process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
> process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
> process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
> process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>
> but I want this:
>
> process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
> process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
> process 8 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
> process 9 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
> process 10 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
> process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
> process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
> process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
> process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
> process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
>
> The latter gives me better performance for my application, and I am
> guessing it's because I have evenly distribute the processes among the
> two sockets (sequentially). Which is why I resorted to what I had
> originally with the custom binding.
>
> Thanks,
>
> On Mon, May 11, 2015 at 4:53 PM, Kenneth Raffenetti
> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>
>     Justin,
>
>     Try using the "-bind-to core" option instead. It should do exactly
>     what you are wanting. See this page with examples for more details
>     https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
>
>     Ken
>
>
>     On 05/11/2015 04:48 PM, Justin Chang wrote:
>
>         Hello everyone,
>
>         I am working with an HPC machine that has this configuring for a
>         single
>         compute node:
>
>         Machine (64GB total)
>             NUMANode L#0 (P#0 32GB)
>               Socket L#0 + L3 L#0 (25MB)
>                 L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core
>         L#0 + PU
>         L#0 (P#0)
>                 L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core
>         L#1 + PU
>         L#1 (P#1)
>                 L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core
>         L#2 + PU
>         L#2 (P#2)
>                 L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core
>         L#3 + PU
>         L#3 (P#3)
>                 L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core
>         L#4 + PU
>         L#4 (P#4)
>                 L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core
>         L#5 + PU
>         L#5 (P#5)
>                 L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core
>         L#6 + PU
>         L#6 (P#6)
>                 L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core
>         L#7 + PU
>         L#7 (P#7)
>                 L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core
>         L#8 + PU
>         L#8 (P#8)
>                 L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core
>         L#9 + PU
>         L#9 (P#9)
>               HostBridge L#0
>                 PCIBridge
>                   PCI 1000:0087
>                     Block L#0 "sda"
>                 PCIBridge
>                   PCI 15b3:1003
>                     Net L#1 "eth0"
>                     Net L#2 "ib0"
>                     OpenFabrics L#3 "mlx4_0"
>                 PCIBridge
>                   PCI 8086:1521
>                     Net L#4 "eth1"
>                   PCI 8086:1521
>                     Net L#5 "eth2"
>                 PCIBridge
>                   PCI 102b:0533
>                 PCI 8086:1d02
>             NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (25MB)
>               L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core
>         L#10 +
>         PU L#10 (P#10)
>               L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core
>         L#11 +
>         PU L#11 (P#11)
>               L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core
>         L#12 +
>         PU L#12 (P#12)
>               L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core
>         L#13 +
>         PU L#13 (P#13)
>               L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core
>         L#14 +
>         PU L#14 (P#14)
>               L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core
>         L#15 +
>         PU L#15 (P#15)
>               L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core
>         L#16 +
>         PU L#16 (P#16)
>               L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core
>         L#17 +
>         PU L#17 (P#17)
>               L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core
>         L#18 +
>         PU L#18 (P#18)
>               L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core
>         L#19 +
>         PU L#19 (P#19)
>
>         If I ran my program with 16 processes, I would have the follow
>         batch script:
>
>         #!/bin/bash
>         #SBATCH -N 1
>         #SBATCH -n 20
>         #SBATCH -t 0-09:00
>         #SBATCH -o output.txt
>
>         mpiexec.hydra -bind-to
>         user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
>         16 ./my_program <args>
>
>         This would give me decent speedup. However, what if I want to use 32
>         processes? Since each node only has 20 cores I would need
>         #SBATCH -N 2
>         and #SBATCH -n 40. However, I want ranks 0-15 and 16-31 to have
>         the same
>         mapping as above but on different compute nodes, so how would I
>         do this?
>         Or would the above line work so long as I have a multiple of 16
>         processes?
>
>         Thanks,
>
>         --
>         Justin Chang
>         PhD Candidate, Civil Engineering - Computational Sciences
>         University of Houston, Department of Civil and Environmental
>         Engineering
>         Houston, TX 77004
>         (512) 963-3262 <tel:%28512%29%20963-3262>
>
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Justin Chang
> PhD Candidate, Civil Engineering - Computational Sciences
> University of Houston, Department of Civil and Environmental Engineering
> Houston, TX 77004
> (512) 963-3262
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list