[mpich-discuss] mpiexec.hydra binding for multiple compute nodes

Tue May 12 14:16:10 CDT 2015

So just to be clear, if I have the following script:

#SBATCH -N 2
#SBATCH -n 32

mpiexec.hydra -bind-to user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n 32
./my_program <args>

ranks 0-15 will be binded as such on the first node and ranks 16-31 will be
the same for the second node? Or will all the even ranks be on one node and
the odd on the other?

Thanks,

On Mon, May 11, 2015 at 9:27 PM, Kenneth Raffenetti <raffenet at mcs.anl.gov>
wrote:

> Ah, I see now the problem. I misread the first email. Your original line
> should work fine! The user bindings are listings of hw elements, not
> processes, so your binding will be applied identically on each node.
>
> Ken
>
>
> On 05/11/2015 05:03 PM, Justin Chang wrote:
>
>> Ken,
>>
>> "-bind-to core" gives me the following topology:
>>
>> process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>> process 8 binding: 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0
>> process 9 binding: 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0
>> process 10 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>> process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>> process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>> process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>> process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>> process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>>
>> but I want this:
>>
>> process 0 binding: 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 1 binding: 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 2 binding: 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 3 binding: 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 4 binding: 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 5 binding: 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 6 binding: 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0
>> process 7 binding: 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0
>> process 8 binding: 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0
>> process 9 binding: 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0
>> process 10 binding: 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0
>> process 11 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0
>> process 12 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0
>> process 13 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0
>> process 14 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
>> process 15 binding: 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0
>>
>> The latter gives me better performance for my application, and I am
>> guessing it's because I have evenly distribute the processes among the
>> two sockets (sequentially). Which is why I resorted to what I had
>> originally with the custom binding.
>>
>> Thanks,
>>
>> On Mon, May 11, 2015 at 4:53 PM, Kenneth Raffenetti
>> <raffenet at mcs.anl.gov <mailto:raffenet at mcs.anl.gov>> wrote:
>>
>>     Justin,
>>
>>     Try using the "-bind-to core" option instead. It should do exactly
>>     what you are wanting. See this page with examples for more details
>>
>> https://wiki.mpich.org/mpich/index.php/Using_the_Hydra_Process_Manager#Process-core_Binding
>>
>>     Ken
>>
>>
>>     On 05/11/2015 04:48 PM, Justin Chang wrote:
>>
>>         Hello everyone,
>>
>>         I am working with an HPC machine that has this configuring for a
>>         single
>>         compute node:
>>
>>         Machine (64GB total)
>>             NUMANode L#0 (P#0 32GB)
>>               Socket L#0 + L3 L#0 (25MB)
>>                 L2 L#0 (256KB) + L1d L#0 (32KB) + L1i L#0 (32KB) + Core
>>         L#0 + PU
>>         L#0 (P#0)
>>                 L2 L#1 (256KB) + L1d L#1 (32KB) + L1i L#1 (32KB) + Core
>>         L#1 + PU
>>         L#1 (P#1)
>>                 L2 L#2 (256KB) + L1d L#2 (32KB) + L1i L#2 (32KB) + Core
>>         L#2 + PU
>>         L#2 (P#2)
>>                 L2 L#3 (256KB) + L1d L#3 (32KB) + L1i L#3 (32KB) + Core
>>         L#3 + PU
>>         L#3 (P#3)
>>                 L2 L#4 (256KB) + L1d L#4 (32KB) + L1i L#4 (32KB) + Core
>>         L#4 + PU
>>         L#4 (P#4)
>>                 L2 L#5 (256KB) + L1d L#5 (32KB) + L1i L#5 (32KB) + Core
>>         L#5 + PU
>>         L#5 (P#5)
>>                 L2 L#6 (256KB) + L1d L#6 (32KB) + L1i L#6 (32KB) + Core
>>         L#6 + PU
>>         L#6 (P#6)
>>                 L2 L#7 (256KB) + L1d L#7 (32KB) + L1i L#7 (32KB) + Core
>>         L#7 + PU
>>         L#7 (P#7)
>>                 L2 L#8 (256KB) + L1d L#8 (32KB) + L1i L#8 (32KB) + Core
>>         L#8 + PU
>>         L#8 (P#8)
>>                 L2 L#9 (256KB) + L1d L#9 (32KB) + L1i L#9 (32KB) + Core
>>         L#9 + PU
>>         L#9 (P#9)
>>               HostBridge L#0
>>                 PCIBridge
>>                   PCI 1000:0087
>>                     Block L#0 "sda"
>>                 PCIBridge
>>                   PCI 15b3:1003
>>                     Net L#1 "eth0"
>>                     Net L#2 "ib0"
>>                     OpenFabrics L#3 "mlx4_0"
>>                 PCIBridge
>>                   PCI 8086:1521
>>                     Net L#4 "eth1"
>>                   PCI 8086:1521
>>                     Net L#5 "eth2"
>>                 PCIBridge
>>                   PCI 102b:0533
>>                 PCI 8086:1d02
>>             NUMANode L#1 (P#1 32GB) + Socket L#1 + L3 L#1 (25MB)
>>               L2 L#10 (256KB) + L1d L#10 (32KB) + L1i L#10 (32KB) + Core
>>         L#10 +
>>         PU L#10 (P#10)
>>               L2 L#11 (256KB) + L1d L#11 (32KB) + L1i L#11 (32KB) + Core
>>         L#11 +
>>         PU L#11 (P#11)
>>               L2 L#12 (256KB) + L1d L#12 (32KB) + L1i L#12 (32KB) + Core
>>         L#12 +
>>         PU L#12 (P#12)
>>               L2 L#13 (256KB) + L1d L#13 (32KB) + L1i L#13 (32KB) + Core
>>         L#13 +
>>         PU L#13 (P#13)
>>               L2 L#14 (256KB) + L1d L#14 (32KB) + L1i L#14 (32KB) + Core
>>         L#14 +
>>         PU L#14 (P#14)
>>               L2 L#15 (256KB) + L1d L#15 (32KB) + L1i L#15 (32KB) + Core
>>         L#15 +
>>         PU L#15 (P#15)
>>               L2 L#16 (256KB) + L1d L#16 (32KB) + L1i L#16 (32KB) + Core
>>         L#16 +
>>         PU L#16 (P#16)
>>               L2 L#17 (256KB) + L1d L#17 (32KB) + L1i L#17 (32KB) + Core
>>         L#17 +
>>         PU L#17 (P#17)
>>               L2 L#18 (256KB) + L1d L#18 (32KB) + L1i L#18 (32KB) + Core
>>         L#18 +
>>         PU L#18 (P#18)
>>               L2 L#19 (256KB) + L1d L#19 (32KB) + L1i L#19 (32KB) + Core
>>         L#19 +
>>         PU L#19 (P#19)
>>
>>         If I ran my program with 16 processes, I would have the follow
>>         batch script:
>>
>>         #!/bin/bash
>>         #SBATCH -N 1
>>         #SBATCH -n 20
>>         #SBATCH -t 0-09:00
>>         #SBATCH -o output.txt
>>
>>         mpiexec.hydra -bind-to
>>         user:0,1,2,3,4,5,6,7,10,11,12,13,14,15,16,17 -n
>>         16 ./my_program <args>
>>
>>         This would give me decent speedup. However, what if I want to use
>> 32
>>         processes? Since each node only has 20 cores I would need
>>         #SBATCH -N 2
>>         and #SBATCH -n 40. However, I want ranks 0-15 and 16-31 to have
>>         the same
>>         mapping as above but on different compute nodes, so how would I
>>         do this?
>>         Or would the above line work so long as I have a multiple of 16
>>         processes?
>>
>>         Thanks,
>>
>>         --
>>         Justin Chang
>>         PhD Candidate, Civil Engineering - Computational Sciences
>>         University of Houston, Department of Civil and Environmental
>>         Engineering
>>         Houston, TX 77004
>>         (512) 963-3262 <tel:%28512%29%20963-3262>
>>
>>
>>         _______________________________________________
>>         discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>         To manage subscription options or unsubscribe:
>>         https://lists.mpich.org/mailman/listinfo/discuss
>>
>>     _______________________________________________
>>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>>     To manage subscription options or unsubscribe:
>>     https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>>
>> --
>> Justin Chang
>> PhD Candidate, Civil Engineering - Computational Sciences
>> University of Houston, Department of Civil and Environmental Engineering
>> Houston, TX 77004
>> (512) 963-3262
>>
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>  _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>

-- 
Justin Chang
PhD Candidate, Civil Engineering - Computational Sciences
University of Houston, Department of Civil and Environmental Engineering
Houston, TX 77004
(512) 963-3262
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20150512/07dcc9e6/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss