[mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs
Douglas Dommermuth
dgd at mit.edu
Wed Nov 13 11:37:48 CST 2019
Hi Hui,
Here is the updated list with your request:
1. One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
2. One mpi job with mpiexec.hydra -bind-to numa -n 64 myprog => 50.52s
3. Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
4. Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 myprog => 72.23s
5. Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 myprog => 159.2s
6. Four mpi jobs with mpiexec.hydra -n 64 myprog => 159.9s
Case 6 is the same speed as Case 5. The code is a CFD code based on a subdomain approach. There is communication between neighboring subdomains. The machine is a supermicro server with 512GB of memory. These small test cases use about 3% of the available memory. I want to run 4 larger MPI jobs concurrently.
________________________________
From: Zhou, Hui <zhouh at anl.gov>
Sent: Wednesday, November 13, 2019 8:55 AM
To: discuss at mpich.org
Cc: Douglas Dommermuth
Subject: Re: [mpich-discuss] Optimizing runtime for 4 mpiexec.hydra jobs
Hi Doug,
What is your number running 4 mpi jobs without -bind-to numa?
-
Hui Zhou
On Nov 13, 2019, at 10:39 AM, Douglas Dommermuth via discuss <discuss at mpich.org<mailto:discuss at mpich.org>> wrote:
I am running Ubuntu 18.04.3 with MPICH 3.3~a2-4 and GFortran 4:7.4.0-1ubuntu2.3 and GCC 4:7.4.0-1ubuntu2.3CC on dual AMD EPYC 7742 processors with hyper threading enabled. My codes are written in MPI and Fortran. The dual AMD processors have 128 cores and 256 threads. I want to optimize the runtime for 4 mpi jobs running concurrently with 64 threads each. Some timings are provided here:
1. One mpi job with mpiexec.hydra -n 64 myprog => 57.32s
2. One mpi job with mpiexec.hydra -bind-to numa -n 64 => 50.52s
3. Two mpi jobs with mpiexec.hydra -n 64 myprog => 99.77s
4. Two mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 72.23s
5. Four mpi jobs with mpiexec.hydra -bind-to numa -n 64 => 159.2s
The option "-bind-to numa" helps, but even so, running four mpi jobs concurrently with 64 threads each is considerably slower than running one mpi job with 64 threads. I can almost run four mpi jobs sequentially and match the time for running four mpi jobs concurrently. How can I improve on the result for running 4 mpi jobs concurrently? Thanks, Doug.
_______________________________________________
discuss mailing list discuss at mpich.org<mailto:discuss at mpich.org>
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20191113/e8574d99/attachment.html>
More information about the discuss
mailing list