[mpich-discuss] mpiexec.hydra binding for multiple compute nodes

Wed May 13 08:59:32 CDT 2015

Oh okay. I see that one of the binding options within hydra is to bind to
the motherboard. Would that allow one to assign ranks to certain nodes?
(Assuming that my compute node only has one motherboard). If I had #SBATCH
-N 2 and #SBATCH -n 32 y guess is to have something like '-bind-to board'
sequentially bind 0-15 to node 0 and 16-31 to node 1, or would this option
still result in striped even/odd assignments?

But yes otherwise I would very much like to be included in that CC if a
ticket is to be made.

Thanks

On Wednesday, May 13, 2015, Kenneth Raffenetti <raffenet at mcs.anl.gov> wrote:

> Ah, you are right. They will be striped odd/even across the nodes. Not
> sequentially. I don't see a way to serve that binding with the current
> options. It's something we can look at adding in a future release. I'll
> open up a ticket and add you to the CC list, if you are interested.
>
> Ken
>
> On 05/12/2015 02:16 PM, Justin Chang wrote:
>
>> So just to be clear, if I have the following script:
>>
>> #SBATCH -N 2
>> #SBATCH -n 32
>>
>> mpiexec.hydra -bind-to user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
>> 32 ./my_program <args>
>>
>> ranks 0-15 will be binded as such on the first node and ranks 16-31 will
>> be the same for the second node? Or will all the even ranks be on one
>> node and the odd on the other?
>>
>> Thanks,
>>
>> On Mon, May 11, 2015 at 9:27 PM, Kenneth Raffenetti
>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>
>>     Ah, I see now the problem. I misread the first email. Your original
>>     line should work fine! The user bindings are listings of hw
>>     elements, not processes, so your binding will be applied identically
>>     on each node.
>>
>>     Ken
>>
>>
>>     On 05/11/2015 05:03 PM, Justin Chang wrote:
>>
>>         Ken,
>>
>>         "-bind-to core" gives me the following topology:
>>
>>         process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 8 binding: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
>>         process 9 binding: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
>>         process 10 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>>         process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>>         process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>>         process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>>         process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>>         process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>>
>>         but I want this:
>>
>>         process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>>         process 8 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>>         process 9 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>>         process 10 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>>         process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>>         process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>>         process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>>         process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
>>         process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
>>
>>         The latter gives me better performance for my application, and I
>> am
>>         guessing it's because I have evenly distribute the processes
>>         among the
>>         two sockets (sequentially). Which is why I resorted to what I had
>>         originally with the custom binding.
>>
>>         Thanks,
>>
>>         On Mon, May 11, 2015 at 4:53 PM, Kenneth Raffenetti
>>         <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>
>>         <mailto:raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>>>
>> wrote:
>>
>>              Justin,
>>
>>              Try using the "-bind-to core" option instead. It should do
>>         exactly
>>              what you are wanting. See this page with examples for more
>>         details
>>
>> https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
>>
>>              Ken
>>
>>
>>              On 05/11/2015 04:48 PM, Justin Chang wrote:
>>
>>                  Hello everyone,
>>
>>                  I am working with an HPC machine that has this
>>         configuring for a
>>                  single
>>                  compute node:
>>
>>                  Machine (64GB total)
>>                      NUMANode L#0 (P#0 32GB)
>>                        Socket L#0 + L3 L#0 (25MB)
>>                          L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0
>>         (32KB) + Core
>>                  L#0 + PU
>>                  L#0 (P#0)
>>                          L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1
>>         (32KB) + Core
>>                  L#1 + PU
>>                  L#1 (P#1)
>>                          L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2
>>         (32KB) + Core
>>                  L#2 + PU
>>                  L#2 (P#2)
>>                          L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3
>>         (32KB) + Core
>>                  L#3 + PU
>>                  L#3 (P#3)
>>                          L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4
>>         (32KB) + Core
>>                  L#4 + PU
>>                  L#4 (P#4)
>>                          L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5
>>         (32KB) + Core
>>                  L#5 + PU
>>                  L#5 (P#5)
>>                          L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6
>>         (32KB) + Core
>>                  L#6 + PU
>>                  L#6 (P#6)
>>                          L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7
>>         (32KB) + Core
>>                  L#7 + PU
>>                  L#7 (P#7)
>>                          L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8
>>         (32KB) + Core
>>                  L#8 + PU
>>                  L#8 (P#8)
>>                          L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9
>>         (32KB) + Core
>>                  L#9 + PU
>>                  L#9 (P#9)
>>                        HostBridge L#0
>>                          PCIBridge
>>                            PCI 1000:0087
>>                              Block L#0 "sda"
>>                          PCIBridge
>>                            PCI 15b3:1003
>>                              Net L#1 "eth0"
>>                              Net L#2 "ib0"
>>                              OpenFabrics L#3 "mlx4_0"
>>                          PCIBridge
>>                            PCI 8086:1521
>>                              Net L#4 "eth1"
>>                            PCI 8086:1521
>>                              Net L#5 "eth2"
>>                          PCIBridge
>>                            PCI 102b:0533
>>                          PCI 8086:1d02
>>                      NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (25MB)
>>                        L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10
>>         (32KB) + Core
>>                  L#10 +
>>                  PU L#10 (P#10)
>>                        L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11
>>         (32KB) + Core
>>                  L#11 +
>>                  PU L#11 (P#11)
>>                        L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12
>>         (32KB) + Core
>>                  L#12 +
>>                  PU L#12 (P#12)
>>                        L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13
>>         (32KB) + Core
>>                  L#13 +
>>                  PU L#13 (P#13)
>>                        L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14
>>         (32KB) + Core
>>                  L#14 +
>>                  PU L#14 (P#14)
>>                        L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15
>>         (32KB) + Core
>>                  L#15 +
>>                  PU L#15 (P#15)
>>                        L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16
>>         (32KB) + Core
>>                  L#16 +
>>                  PU L#16 (P#16)
>>                        L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17
>>         (32KB) + Core
>>                  L#17 +
>>                  PU L#17 (P#17)
>>                        L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18
>>         (32KB) + Core
>>                  L#18 +
>>                  PU L#18 (P#18)
>>                        L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19
>>         (32KB) + Core
>>                  L#19 +
>>                  PU L#19 (P#19)
>>
>>                  If I ran my program with 16 processes, I would have the
>>         follow
>>                  batch script:
>>
>>                  #!/bin/bash
>>                  #SBATCH -N 1
>>                  #SBATCH -n 20
>>                  #SBATCH -t 0-09:00
>>                  #SBATCH -o output.txt
>>
>>                  mpiexec.hydra -bind-to
>>                  user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
>>                  16 ./my_program <args>
>>
>>                  This would give me decent speedup. However, what if I
>>         want to use 32
>>                  processes? Since each node only has 20 cores I would need
>>                  #SBATCH -N 2
>>                  and #SBATCH -n 40. However, I want ranks 0-15 and 16-31
>>         to have
>>                  the same
>>                  mapping as above but on different compute nodes, so how
>>         would I
>>                  do this?
>>                  Or would the above line work so long as I have a
>>         multiple of 16
>>                  processes?
>>
>>                  Thanks,
>>
>>                  --
>>                  Justin Chang
>>                  PhD Candidate, Civil Engineering - Computational Sciences
>>                  University of Houston, Department of Civil and
>>         Environmental
>>                  Engineering
>>                  Houston, TX 77004
>>         (512) 963-3262 <tel:%28512%29%20963-3262>
>> <tel:%28512%29%20963-3262>
>>
>>
>>                  _______________________________________________
>>                  discuss mailing list discuss at mpich.org
>>         <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>>         <mailto:discuss at mpich.org>>
>>                  To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>
>>              _______________________________________________
>>              discuss mailing list discuss at mpich.org
>>         <mailto:discuss at mpich.org> <mailto:discuss at mpich.org
>>         <mailto:discuss at mpich.org>>
>>              To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>>
>>         --
>>         Justin Chang
>>         PhD Candidate, Civil Engineering - Computational Sciences
>>         University of Houston, Department of Civil and Environmental
>>         Engineering
>>         Houston, TX 77004
>>         (512) 963-3262 <tel:%28512%29%20963-3262>
>>
>>
>>         _______________________________________________
>>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>         To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>
>>     _______________________________________________
>>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>     To manage subscription options or unsubscribe:
>>     https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>>
>> --
>> Justin Chang
>> PhD Candidate, Civil Engineering - Computational Sciences
>> University of Houston, Department of Civil and Environmental Engineering
>> Houston, TX 77004
>> (512) 963-3262
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>  _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150513/a11caf54/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss