[mpich-discuss] mpiexec.hydra binding for multiple compute nodes

Wed May 13 08:29:29 CDT 2015

Ah, you are right. They will be striped odd/even across the nodes. Not 
sequentially. I don't see a way to serve that binding with the current 
options. It's something we can look at adding in a future release. I'll 
open up a ticket and add you to the CC list, if you are interested.

Ken

On 05/12/2015 02:16 PM, Justin Chang wrote:
> So just to be clear, if I have the following script:
>
> #SBATCH -N 2
> #SBATCH -n 32
>
> mpiexec.hydra -bind-to user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
> 32 ./my_program <args>
>
> ranks 0-15 will be binded as such on the first node and ranks 16-31 will
> be the same for the second node? Or will all the even ranks be on one
> node and the odd on the other?
>
> Thanks,
>
> On Mon, May 11, 2015 at 9:27 PM, Kenneth Raffenetti
> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>
>     Ah, I see now the problem. I misread the first email. Your original
>     line should work fine! The user bindings are listings of hw
>     elements, not processes, so your binding will be applied identically
>     on each node.
>
>     Ken
>
>
>     On 05/11/2015 05:03 PM, Justin Chang wrote:
>
>         Ken,
>
>         "-bind-to core" gives me the following topology:
>
>         process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>         process 8 binding: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
>         process 9 binding: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
>         process 10 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>         process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>         process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>         process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>         process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>         process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>
>         but I want this:
>
>         process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>         process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>         process 8 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>         process 9 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>         process 10 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>         process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>         process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>         process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>         process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
>         process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
>
>         The latter gives me better performance for my application, and I am
>         guessing it's because I have evenly distribute the processes
>         among the
>         two sockets (sequentially). Which is why I resorted to what I had
>         originally with the custom binding.
>
>         Thanks,
>
>         On Mon, May 11, 2015 at 4:53 PM, Kenneth Raffenetti
>         <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>
>         <mailto:raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>>> wrote:
>
>              Justin,
>
>              Try using the "-bind-to core" option instead. It should do
>         exactly
>              what you are wanting. See this page with examples for more
>         details
>         https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
>
>              Ken
>
>
>              On 05/11/2015 04:48 PM, Justin Chang wrote:
>
>                  Hello everyone,
>
>                  I am working with an HPC machine that has this
>         configuring for a
>                  single
>                  compute node:
>
>                  Machine (64GB total)
>                      NUMANode L#0 (P#0 32GB)
>                        Socket L#0 + L3 L#0 (25MB)
>                          L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0
>         (32KB) + Core
>                  L#0 + PU
>                  L#0 (P#0)
>                          L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1
>         (32KB) + Core
>                  L#1 + PU
>                  L#1 (P#1)
>                          L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2
>         (32KB) + Core
>                  L#2 + PU
>                  L#2 (P#2)
>                          L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3
>         (32KB) + Core
>                  L#3 + PU
>                  L#3 (P#3)
>                          L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4
>         (32KB) + Core
>                  L#4 + PU
>                  L#4 (P#4)
>                          L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5
>         (32KB) + Core
>                  L#5 + PU
>                  L#5 (P#5)
>                          L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6
>         (32KB) + Core
>                  L#6 + PU
>                  L#6 (P#6)
>                          L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7
>         (32KB) + Core
>                  L#7 + PU
>                  L#7 (P#7)
>                          L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8
>         (32KB) + Core
>                  L#8 + PU
>                  L#8 (P#8)
>                          L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9
>         (32KB) + Core
>                  L#9 + PU
>                  L#9 (P#9)
>                        HostBridge L#0
>                          PCIBridge
>                            PCI 1000:0087
>                              Block L#0 "sda"
>                          PCIBridge
>                            PCI 15b3:1003
>                              Net L#1 "eth0"
>                              Net L#2 "ib0"
>                              OpenFabrics L#3 "mlx4_0"
>                          PCIBridge
>                            PCI 8086:1521
>                              Net L#4 "eth1"
>                            PCI 8086:1521
>                              Net L#5 "eth2"
>                          PCIBridge
>                            PCI 102b:0533
>                          PCI 8086:1d02
>                      NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (25MB)
>                        L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10
>         (32KB) + Core
>                  L#10 +
>                  PU L#10 (P#10)
>                        L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11
>         (32KB) + Core
>                  L#11 +
>                  PU L#11 (P#11)
>                        L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12
>         (32KB) + Core
>                  L#12 +
>                  PU L#12 (P#12)
>                        L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13
>         (32KB) + Core
>                  L#13 +
>                  PU L#13 (P#13)
>                        L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14
>         (32KB) + Core
>                  L#14 +
>                  PU L#14 (P#14)
>                        L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15
>         (32KB) + Core
>                  L#15 +
>                  PU L#15 (P#15)
>                        L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16
>         (32KB) + Core
>                  L#16 +
>                  PU L#16 (P#16)
>                        L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17
>         (32KB) + Core
>                  L#17 +
>                  PU L#17 (P#17)
>                        L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18
>         (32KB) + Core
>                  L#18 +
>                  PU L#18 (P#18)
>                        L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19
>         (32KB) + Core
>                  L#19 +
>                  PU L#19 (P#19)
>
>                  If I ran my program with 16 processes, I would have the
>         follow
>                  batch script:
>
>                  #!/bin/bash
>                  #SBATCH -N 1
>                  #SBATCH -n 20
>                  #SBATCH -t 0-09:00
>                  #SBATCH -o output.txt
>
>                  mpiexec.hydra -bind-to
>                  user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
>                  16 ./my_program <args>
>
>                  This would give me decent speedup. However, what if I
>         want to use 32
>                  processes? Since each node only has 20 cores I would need
>                  #SBATCH -N 2
>                  and #SBATCH -n 40. However, I want ranks 0-15 and 16-31
>         to have
>                  the same
>                  mapping as above but on different compute nodes, so how
>         would I
>                  do this?
>                  Or would the above line work so long as I have a
>         multiple of 16
>                  processes?
>
>                  Thanks,
>
>                  --
>                  Justin Chang
>                  PhD Candidate, Civil Engineering - Computational Sciences
>                  University of Houston, Department of Civil and
>         Environmental
>                  Engineering
>                  Houston, TX 77004
>         (512) 963-3262 <tel:%28512%29%20963-3262> <tel:%28512%29%20963-3262>
>
>
>                  _______________________________________________
>                  discuss mailing list discuss at mpich.org
>         <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>         <mailto:discuss at mpich.org>>
>                  To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>              _______________________________________________
>              discuss mailing list discuss at mpich.org
>         <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>         <mailto:discuss at mpich.org>>
>              To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
>         --
>         Justin Chang
>         PhD Candidate, Civil Engineering - Computational Sciences
>         University of Houston, Department of Civil and Environmental
>         Engineering
>         Houston, TX 77004
>         (512) 963-3262 <tel:%28512%29%20963-3262>
>
>
>         _______________________________________________
>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>         To manage subscription options or unsubscribe:
>         https://lists.mpich.org/mailman/listinfo/discuss
>
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Justin Chang
> PhD Candidate, Civil Engineering - Computational Sciences
> University of Houston, Department of Civil and Environmental Engineering
> Houston, TX 77004
> (512) 963-3262
>
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss