<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">On Fri, May 15, 2020, at 8:52 PM, hritikesh semwal via discuss wrote:<br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div>Hello,<br></div><div><br></div><div>I am working on a parallel CFD solver with MPI and I am using an account on a cluster to run my executable. The hardware structure of my account is as follows;<br></div><div><br></div><div><div>Architecture:          x86_64<br></div><div>CPU op-mode(s):        32-bit, 64-bit<br></div><div>Byte Order:            Little Endian<br></div><div>CPU(s):                32<br></div><div>On-line CPU(s) list:   0-31<br></div><div>Thread(s) per core:    2<br></div><div>Core(s) per socket:    8<br></div><div>CPU socket(s):         2<br></div><div>NUMA node(s):          2<br></div><div>Vendor ID:             GenuineIntel<br></div><div>CPU family:            6<br></div><div>Model:                 62<br></div><div>Stepping:              4<br></div><div>CPU MHz:               2600.079<br></div><div>BogoMIPS:              5199.25<br></div><div>Virtualization:        VT-x<br></div><div>L1d cache:             32K<br></div><div>L1i cache:             32K<br></div><div>L2 cache:              256K<br></div><div>L3 cache:              20480K<br></div><div>NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30<br></div><div>NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31<br></div></div><div><br></div><div>Initially, I was running my executable with any binding options and in that case, whenever I was switching from 2 to 4 processors my computation time was also increasing along with communication time inside some iterative loop. <br></div><div><br></div><div>Today, somewhere I read about binding options in MPI through which I can manage the allocation of processors. Initially, I used the "-bind-to core" option and the results were different and  I got time reduction up to 16 processors and after that with 24 and 32 processors, it has started increasing. Results of timing are as follows;<br></div><div>2 procs- 160 seconds, 4 procs- 84 seconds, 8 procs- 45 seconds, 16 procs- 28 seconds, 24 procs- 38 seconds, 32 procs- 34 seconds.<br></div><div><br></div><div>After that, I used some other combinations of binding option but did not get better timing results compared to -bind-to core option. So, I back edited the bind to option to core but now I am getting different timing results with the same executable which are as follows,<br></div><div>2 procs- 164 seconds, 4 procs- 85 seconds, 8 procs- 45 seconds, 16 procs- 48 seconds, 24 procs- 52 seconds, 32 seconds- 98 seconds.<br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Hitesh,<br></div><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">You might find the following online workshop useful:<br></div><div style="font-family:Arial;"><a href="http://www.hlrs.de/training/2020-05-25-VI-HPS/">http://www.hlrs.de/training/2020-05-25-VI-HPS/</a><br></div><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Regards,<br></div><div style="font-family:Arial;">Benson</div><div style="font-family:Arial;"><br></div></body></html>