[mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Douglas Dommermuth dgd at mit.edu
Wed Nov 13 13:00:09 CST 2019


Hi Giuseppe and Joachim,

I will look into turning off hyperthreading and running two jobs with a corresponding change in the sizes of the jobs.  Meanwhile, I ran the following case, which took 159.6s:  

mpirun.mpich -bind-to user:0+4 -n 64 myprog &
mpirun.mpich -bind-to user:1+4 -n 64 myprog &
mpirun.mpich -bind-to user:2+4 -n 64 myprog &
mpirun.mpich -bind-to user:3+4 -n 64 myprog &

Thank you, Doug.
________________________________________
From: Joachim Protze <protze at itc.rwth-aachen.de>
Sent: Wednesday, November 13, 2019 9:56 AM
To: discuss at mpich.org
Cc: Douglas Dommermuth
Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Hi Doug,

in general, using hyperthreads only improves execution time, if you do
not utilize the core with a single process/thread. I.e., if you see
~100% cpu utilization per process in "top" for the single job execution,
doubling the execution time from 2 to 4 mpi jobs sounds reasonable.
If your application is mostly calculating (as an MPI application
hopefully does), the two processes/threads running on the same core
share the execution time of the core and will finally end up with double
execution time.

Depending on your application, additional processes/threads also might
increase the pressure on the memory bus and therefore slow down the
other application by making it wait for memory accesses. This might also
explain the execution time increase from one to two mpi jobs.
All this depends on the cpu/memory configuration of this machine.

Best
Joachim

On 11/13/19 5:39 PM, Douglas Dommermuth via discuss wrote:
> I am running Ubuntu 18.04.3 with MPICH 3.3~a2-4 and GFortran
> 4:7.4.0-1ubuntu2.3 and GCC 4:7.4.0-1ubuntu2.3CC on dual AMD EPYC 7742
> processors with hyper threading enabled.  My codes are written in MPI
> and Fortran.  The dual AMD processors have 128 cores and 256 threads.
> I want to optimize the runtime for 4 mpi jobs running concurrently
> with 64 threads each.  Some timings are provided here:
>
>  1. One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
>  2. One mpi job with mpiexec.hydra -bind-to numa -n 64 => 50.52s
>  3. Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
>  4. Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 72.23s
>  5. Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 159.2s
>
> The option "-bind-to numa" helps, but even so,  running four mpi
> jobs concurrently with 64 threads each is considerably slower than
> running one mpi job with 64 threads.  I can almost run four mpi jobs
> sequentially and match the time for running four mpi jobs concurrently.
>    How can I improve on the result for running 4 mpi jobs concurrently?
>    Thanks, Doug.
>
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>


--
Dipl.-Inf. Joachim Protze

IT Center
Group: High Performance Computing
Division: Computational Science and Engineering
RWTH Aachen University
Seffenter Weg 23
D 52074  Aachen (Germany)
Tel: +49 241 80- 24765
Fax: +49 241 80-624765
protze at itc.rwth-aachen.de
www.itc.rwth-aachen.de



More information about the discuss mailing list