[mpich-discuss] discuss Digest, Vol 91, Issue 21

hritikesh semwal hritikesh.semwal at gmail.com
Sat May 16 14:05:44 CDT 2020


On Sat, 16 May, 2020, 10:30 PM , <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1.  Understanding process bindings in MPICH (hritikesh semwal)
>    2. Re:  Understanding process bindings in MPICH (Benson Muite)
>    3. Re:  Understanding process bindings in MPICH (Benson Muite)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Fri, 15 May 2020 23:22:35 +0530
> From: hritikesh semwal <hritikesh.semwal at gmail.com>
> To: Benson Muite via discuss <discuss at mpich.org>
> Subject: [mpich-discuss] Understanding process bindings in MPICH
> Message-ID:
>         <
> CAA+35d2JQ25SYkqST9A5aPN-KuqHSkK7oC6OThsB9KRQvP6d9g at mail.gmail.com>
> Content-Type: text/plain; charset="utf-8"
>
> Hello,
>
> I am working on a parallel CFD solver with MPI and I am using an account on
> a cluster to run my executable. The hardware structure of my account is as
> follows;
>
> Architecture:          x86_64
> CPU op-mode(s):        32-bit, 64-bit
> Byte Order:            Little Endian
> CPU(s):                32
> On-line CPU(s) list:   0-31
> Thread(s) per core:    2
> Core(s) per socket:    8
> CPU socket(s):         2
> NUMA node(s):          2
> Vendor ID:             GenuineIntel
> CPU family:            6
> Model:                 62
> Stepping:              4
> CPU MHz:               2600.079
> BogoMIPS:              5199.25
> Virtualization:        VT-x
> L1d cache:             32K
> L1i cache:             32K
> L2 cache:              256K
> L3 cache:              20480K
> NUMA node0 CPU(s):     0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> NUMA node1 CPU(s):     1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
>
> Initially, I was running my executable with

any binding options and in that
> case, whenever I was switching from 2 to 4 processors my computation time
> was also increasing along with communication time inside some iterative
> loop.
>
> Today, somewhere I read about binding options in MPI through which I can
> manage the allocation of processors. Initially, I used the "-bind-to core"
> option and the results were different and I got time reduction up to 16
> processors and after that with 24 and 32 processors, it has started
> increasing. Results of timing are as follows;
> 2 procs- 160 seconds, 4 procs- 84 seconds, 8 procs- 45 seconds, 16 procs-
> 28 seconds, 24 procs- 38 seconds, 32 procs- 34 seconds.
>
> After that, I used some other combinations of binding option but did not
> get better timing results compared to -bind-to core option. So, I back
> edited the bind to option to core but now I am getting different timing
> results with the same executable which are as follows,
> 2 procs- 164 seconds, 4 procs- 85 seconds, 8 procs- 45 seconds, 16 procs-
> 48 seconds, 24 procs- 52 seconds, 32 seconds- 98 seconds.
>
> I have following two questions for which I am seeking your help,
>
> 1. Can anyone please suggest me is it possible an optimum binding and
> mapping options  based on my cluster account hardware topology?? If yes
> then please tell me.
> 2. Why I am getting such an irregular pattern of jump in timing without
> binding option and why with a binding option, my timings are varying for
> each run? Is it my cluster network problem or my MPI code problem?
>
> If you need further details about my iterative loop then please tell me. As
> this message got too long I can share it later if you think above data is
> not sufficient.
>
> Thank you.
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20200515/9316fd02/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 2
> Date: Fri, 15 May 2020 21:02:47 +0300
> From: "Benson Muite" <benson_muite at emailplus.org>
> To: "Benson Muite via discuss" <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Understanding process bindings in MPICH
> Message-ID: <52dd313a-dfe5-4e1e-8cb0-c62ffd5cd85c at www.fastmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
>
>
> On Fri, May 15, 2020, at 8:52 PM, hritikesh semwal via discuss wrote:
> > Hello,
> >
> > I am working on a parallel CFD solver with MPI and I am using an account
> on a cluster to run my executable. The hardware structure of my account is
> as follows;
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 32
> > On-line CPU(s) list: 0-31
> > Thread(s) per core: 2
> > Core(s) per socket: 8
> > CPU socket(s): 2
> > NUMA node(s): 2
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 62
> > Stepping: 4
> > CPU MHz: 2600.079
> > BogoMIPS: 5199.25
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 20480K
> > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
> >
> > Initially, I was running my executable with (typo *without)any binding
> options and in that case, whenever I was switching from 2 to 4 processors
> my computation time was also increasing along with communication time
> inside some iterative loop.
> >
> > Today, somewhere I read about binding options in MPI through which I can
> manage the allocation of processors. Initially, I used the "-bind-to core"
> option and the results were different and I got time reduction up to 16
> processors and after that with 24 and 32 processors, it has started
> increasing. Results of timing are as follows;
> > 2 procs- 160 seconds, 4 procs- 84 seconds, 8 procs- 45 seconds, 16
> procs- 28 seconds, 24 procs- 38 seconds, 32 procs- 34 seconds.
>
> This seems reasonable. Are you able to turn of hyperthreading? For most
> numerical codes this is not useful as they are typically bandwidth limited.
> Thus for more than 16 processors will not see much speed up.
>

Speed up should not be much but the time is also increasing. How can I turn
on Hyperthreading?


> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20200515/d0327a90/attachment-0001.html
> >
>
> ------------------------------
>
> Message: 3
> Date: Sat, 16 May 2020 10:31:30 +0300
> From: "Benson Muite" <benson_muite at emailplus.org>
> To: "Benson Muite via discuss" <discuss at mpich.org>
> Subject: Re: [mpich-discuss] Understanding process bindings in MPICH
> Message-ID: <e93484c1-3604-49f3-9d9a-2a307270ca9c at www.fastmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
>
> On Fri, May 15, 2020, at 8:52 PM, hritikesh semwal via discuss wrote:
> > Hello,
> >
> > I am working on a parallel CFD solver with MPI and I am using an account
> on a cluster to run my executable. The hardware structure of my account is
> as follows;
> >
> > Architecture: x86_64
> > CPU op-mode(s): 32-bit, 64-bit
> > Byte Order: Little Endian
> > CPU(s): 32
> > On-line CPU(s) list: 0-31
> > Thread(s) per core: 2
> > Core(s) per socket: 8
> > CPU socket(s): 2
> > NUMA node(s): 2
> > Vendor ID: GenuineIntel
> > CPU family: 6
> > Model: 62
> > Stepping: 4
> > CPU MHz: 2600.079
> > BogoMIPS: 5199.25
> > Virtualization: VT-x
> > L1d cache: 32K
> > L1i cache: 32K
> > L2 cache: 256K
> > L3 cache: 20480K
> > NUMA node0 CPU(s): 0,2,4,6,8,10,12,14,16,18,20,22,24,26,28,30
> > NUMA node1 CPU(s): 1,3,5,7,9,11,13,15,17,19,21,23,25,27,29,31
> >
> > Initially, I was running my executable with any binding options and in
> that case, whenever I was switching from 2 to 4 processors my computation
> time was also increasing along with communication time inside some
> iterative loop.
> >
> > Today, somewhere I read about binding options in MPI through which I can
> manage the allocation of processors. Initially, I used the "-bind-to core"
> option and the results were different and I got time reduction up to 16
> processors and after that with 24 and 32 processors, it has started
> increasing. Results of timing are as follows;
> > 2 procs- 160 seconds, 4 procs- 84 seconds, 8 procs- 45 seconds, 16
> procs- 28 seconds, 24 procs- 38 seconds, 32 procs- 34 seconds.
> >
> > After that, I used some other combinations of binding option but did not
> get better timing results compared to -bind-to core option. So, I back
> edited the bind to option to core but now I am getting different timing
> results with the same executable which are as follows,
> > 2 procs- 164 seconds, 4 procs- 85 seconds, 8 procs- 45 seconds, 16
> procs- 48 seconds, 24 procs- 52 seconds, 32 seconds- 98 seconds.
>
> Hitesh,
>
> You might find the following online workshop useful:
> http://www.hlrs.de/training/2020-05-25-VI-HPS/


Thanks and I will go through course outline but can you tell me reason in
short here. Why time is varying differently with the different run of the
same code (sometimes for 16 processors it is decreasing and sometimes it is
increasing for 16 processors). I don't know but every time I type the
discuss at mpich.org, it automatically types your name first. Does my mail go
to all members of the group? The two questions I have asked in my first
mail are very important for my project completion so I request all members
to please help me with this matter. I can not work further with this
uncertainty in my code timings all the time, which can be get slowed any
time.


>
>
> Regards,
> Benson
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20200516/666681af/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> ------------------------------
>
> End of discuss Digest, Vol 91, Issue 21
> ***************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200517/4e784ee4/attachment.html>


More information about the discuss mailing list