[mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Douglas Dommermuth dgd at mit.edu
Wed Nov 13 14:06:08 CST 2019


The timings for 4 runs are 163.1s, 159.9s, 160.9s, and 158.3s.   

The subdomain size is currently 32^3.   I could see how 64^3 and 128^3 scale to boost the work relative to the communication.   The solver is incompressible with multigrid, which makes it a bit tricky.   Also, AMD recommends specific BIOS settings.   However, the machine is currently Top 5 on Geekbench 4 for multicore results.
________________________________________
From: Congiu, Giuseppe <gcongiu at anl.gov>
Sent: Wednesday, November 13, 2019 11:42 AM
To: discuss at mpich.org
Cc: Douglas Dommermuth
Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Is this an average runtime over multiple runs?

> On Nov 13, 2019, at 1:40 PM, Douglas Dommermuth via discuss <discuss at mpich.org> wrote:
>
> Hi Giuseppe,
>
> It took 163.1s for this case:
>
> mpirun.mpich -bind-to user:0+1 -n 64 myprog &
> mpirun.mpich -bind-to user:64+1 -n 64 myprog &
> mpirun.mpich -bind-to user:128+1 -n 64 myprog &
> mpirun.mpich -bind-to user:192+1 -n 64 myprog &
>
> Thanks, Doug.
> ________________________________________
> From: Congiu, Giuseppe <gcongiu at anl.gov>
> Sent: Wednesday, November 13, 2019 11:10 AM
> To: discuss at mpich.org
> Cc: Douglas Dommermuth
> Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs
>
> Try binding all the ranks of a job to the same numa. See if something like this works better:
>
> mpirun.mpich -bind-to user:0+1 -n 64 myprog &
> mpirun.mpich -bind-to user:64+1 -n 64 myprog &
> mpirun.mpich -bind-to user:128+1 -n 64 myprog &
> mpirun.mpich -bind-to user:192+1 -n 64 myprog &
>
> However this might not solve completely the problem as MPI processes can still move around across different cores in the numa.
>
> —Giuseppe
>
>> On Nov 13, 2019, at 1:00 PM, Douglas Dommermuth via discuss <discuss at mpich.org> wrote:
>>
>> Hi Giuseppe and Joachim,
>>
>> I will look into turning off hyperthreading and running two jobs with a corresponding change in the sizes of the jobs.  Meanwhile, I ran the following case, which took 159.6s:
>>
>> mpirun.mpich -bind-to user:0+4 -n 64 myprog &
>> mpirun.mpich -bind-to user:1+4 -n 64 myprog &
>> mpirun.mpich -bind-to user:2+4 -n 64 myprog &
>> mpirun.mpich -bind-to user:3+4 -n 64 myprog &
>>
>> Thank you, Doug.
>> ________________________________________
>> From: Joachim Protze <protze at itc.rwth-aachen.de>
>> Sent: Wednesday, November 13, 2019 9:56 AM
>> To: discuss at mpich.org
>> Cc: Douglas Dommermuth
>> Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs
>>
>> Hi Doug,
>>
>> in general, using hyperthreads only improves execution time, if you do
>> not utilize the core with a single process/thread. I.e., if you see
>> ~100% cpu utilization per process in "top" for the single job execution,
>> doubling the execution time from 2 to 4 mpi jobs sounds reasonable.
>> If your application is mostly calculating (as an MPI application
>> hopefully does), the two processes/threads running on the same core
>> share the execution time of the core and will finally end up with double
>> execution time.
>>
>> Depending on your application, additional processes/threads also might
>> increase the pressure on the memory bus and therefore slow down the
>> other application by making it wait for memory accesses. This might also
>> explain the execution time increase from one to two mpi jobs.
>> All this depends on the cpu/memory configuration of this machine.
>>
>> Best
>> Joachim
>>
>> On 11/13/19 5:39 PM, Douglas Dommermuth via discuss wrote:
>>> I am running Ubuntu 18.04.3 with MPICH 3.3~a2-4 and GFortran
>>> 4:7.4.0-1ubuntu2.3 and GCC 4:7.4.0-1ubuntu2.3CC on dual AMD EPYC 7742
>>> processors with hyper threading enabled.  My codes are written in MPI
>>> and Fortran.  The dual AMD processors have 128 cores and 256 threads.
>>> I want to optimize the runtime for 4 mpi jobs running concurrently
>>> with 64 threads each.  Some timings are provided here:
>>>
>>> 1. One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
>>> 2. One mpi job with mpiexec.hydra -bind-to numa -n 64 => 50.52s
>>> 3. Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
>>> 4. Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 72.23s
>>> 5. Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 159.2s
>>>
>>> The option "-bind-to numa" helps, but even so,  running four mpi
>>> jobs concurrently with 64 threads each is considerably slower than
>>> running one mpi job with 64 threads.  I can almost run four mpi jobs
>>> sequentially and match the time for running four mpi jobs concurrently.
>>>  How can I improve on the result for running 4 mpi jobs concurrently?
>>>  Thanks, Doug.
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>>
>> --
>> Dipl.-Inf. Joachim Protze
>>
>> IT Center
>> Group: High Performance Computing
>> Division: Computational Science and Engineering
>> RWTH Aachen University
>> Seffenter Weg 23
>> D 52074  Aachen (Germany)
>> Tel: +49 241 80- 24765
>> Fax: +49 241 80-624765
>> protze at itc.rwth-aachen.de
>> www.itc.rwth-aachen.de
>>
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



More information about the discuss mailing list