[mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Douglas Dommermuth dgd at mit.edu
Wed Nov 13 11:37:48 CST 2019


Hi Hui,

Here is the updated list with your request:

  1.  One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
  2.  One mpi job with mpiexec.hydra -bind-to numa -n 64 myprog => 50.52s
  3.  Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
  4.  Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 myprog => 72.23s
  5.  Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 myprog => 159.2s
  6.  Four mpi jobs with mpiexec.hydra -n 64 myprog => 159.9s

Case 6 is the same speed as Case 5.  The code is a CFD code based on a subdomain approach.   There is communication between neighboring subdomains.  The machine is a supermicro server with 512GB of memory.  These small test cases use about 3% of the available memory.   I want to run 4 larger MPI jobs concurrently.


________________________________
From: Zhou, Hui <zhouh at anl.gov>
Sent: Wednesday, November 13, 2019 8:55 AM
To: discuss at mpich.org
Cc: Douglas Dommermuth
Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs

Hi Doug,

What is your number running 4 mpi jobs without -bind-to numa?

-
Hui Zhou









On Nov 13, 2019, at 10:39 AM, Douglas Dommermuth via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:

I am running Ubuntu 18.04.3 with MPICH 3.3~a2-4 and GFortran 4:7.4.0-1ubuntu2.3 and GCC 4:7.4.0-1ubuntu2.3CC on dual AMD EPYC 7742 processors with hyper threading enabled.  My codes are written in MPI and Fortran.  The dual AMD processors have 128 cores and 256 threads.   I want to optimize the runtime for 4 mpi jobs running concurrently with 64 threads each.  Some timings are provided here:

  1.  One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
  2.  One mpi job with mpiexec.hydra -bind-to numa -n 64 => 50.52s
  3.  Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
  4.  Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 72.23s
  5.  Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 159.2s

The option "-bind-to numa" helps, but even so,  running four mpi jobs concurrently with 64 threads each is considerably slower than running one mpi job with 64 threads.  I can almost run four mpi jobs sequentially and match the time for running four mpi jobs concurrently.   How can I improve on the result for running 4 mpi jobs concurrently?   Thanks, Doug.
_______________________________________________
discuss mailing list     discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20191113/e8574d99/attachment.html>


More information about the discuss mailing list