[mpich-discuss] Need help to run hybrid code.
Halim Amer
aamer at anl.gov
Mon Aug 28 09:28:11 CDT 2017
First of all, you are using Open MPI and MVAPICH below; I don't see
MPICH. If you are using MPICH, then please tell us the version (show the
output of mpichversion).
> Starting omp_dotprod_hybrid. Using 4 Cores...
> Core 3 using 16 threads
> Core 0 using 16 threads
> Core 2 using 16 threads
> Core 1 using 16 threads
Second, you are using 16 threads per core; naturally you won't scale.
Third, even if you were using 1 OpenMP thread per hardware thread, there
is no guarantee that your program will scale (as in, time reduces while
increasing the number of threads). You program might have scalability
bottlenecks that have nothing to do with MPI. My first advice is to
forget about MPI and make sure you program scales with only OpenMP. If
your machine is truly a 64-way threads one, then use 1 MPI process per
node and investigate the issue.
Finally, if you still suspect bad thread bindings, then check how they
are bound to hardware threads instead of measuring execution time. Use
sched_getcpu() to query thread placement on Linux and tools like hwloc
to understand the machine topology.
I doubt your problems are related to issues in MPICH, which is the
purpose of this list. Thus, I suggest to move this discuss out of the
list and reply to me personally instead of replying to the list.
Halim
www.mcs.anl.gov/~aamer
On 8/28/17 8:39 AM, Pasha Pashaei wrote:
> Dear friends
> I am going to run a hybrid MPI+OPENMP code.
> As you can see in the below I played with the number of threads in various cases while running my main code with (Openmpi,Mpich,Mpich2).
> As you can see in OpenMPi and Mpich it seems that openmp did not work at all as Total time did not change considerably. But in Mpich2 Total computational time increased with increasing the number of threads. It could be because of using virtual threads instead of physical threads(or you said that over-subscribing).
>
>
> Hybrid code result (MPI + OpenMP) :
> Your suggestions:
> mvapich
> mpirun -np 4 -genv OMP_NUM_THREADS 1 --bind-to hwthread:1 ./pjet.gfortran > output.txt
> Total time = 7.290E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 ./pjet.gfortran > output.txt
> Total time = 4.940E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 -map-by hwthread:8 ./pjet.gfortran > output.txt
> Total time = 4.960E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 --bind-to hwthread:16 ./pjet.gfortran > output.txt
> Total time = 4.502E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to core:16 -map-by core:16 ./pjet.gfortran > output.txt
> Total time = 4.628E+02
>
> Pervios commands
> OpenMPI 1.8.1
> mpirun -np 4 -x OMP_NUM_THREADS=1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.475E+02
> mpirun -np 4 -x OMP_NUM_THREADS=8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.525E+02
> mpirun -np 4 -x OMP_NUM_THREADS=16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.611E+02
>
> mvapich
> mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.441E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.535E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.552E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.591E+02
>
> mvapich2
> mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.935E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 5.562E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 6.392E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 8.170E+02
>
> Then I used a simple "hybrid.f90" code and its result which I used to check whether the computer can recognize correct value of cores and threads or not. It showed that the correct values in all three in different cases.
> here is its result:
>
> Starting omp_dotprod_hybrid. Using 4 Cores...
> Core 3 using 16 threads
> Core 0 using 16 threads
> Core 2 using 16 threads
> Core 1 using 16 threads
> Core 1 thread 0 partial sum = 0.0000000000000000
> Core 3 thread 0 partial sum = 0.0000000000000000
> Core 1 thread 4 partial sum = 0.0000000000000000
> Core 3 thread 7 partial sum = 200.00000000000000
> Core 1 thread 8 partial sum = 200.00000000000000
> Core 3 thread 9 partial sum = 200.00000000000000
> Core 1 thread 11 partial sum = 200.00000000000000
> Core 3 thread 3 partial sum = 200.00000000000000
> Core 1 thread 2 partial sum = 0.0000000000000000
> Core 3 thread 5 partial sum = 0.0000000000000000
> Core 1 thread 3 partial sum = 200.00000000000000
> Core 3 thread 2 partial sum = 0.0000000000000000
> Core 1 thread 13 partial sum = 200.00000000000000
> Core 3 thread 12 partial sum = 200.00000000000000
> Core 1 thread 1 partial sum = 200.00000000000000
> Core 3 thread 1 partial sum = 200.00000000000000
> Core 3 thread 8 partial sum = 0.0000000000000000
> Core 1 thread 7 partial sum = 0.0000000000000000
> Core 3 thread 11 partial sum = 200.00000000000000
> Core 1 thread 15 partial sum = 0.0000000000000000
> Core 3 thread 15 partial sum = 0.0000000000000000
> Core 1 thread 10 partial sum = 200.00000000000000
> Core 1 thread 9 partial sum = 200.00000000000000
> Core 3 thread 13 partial sum = 0.0000000000000000
> Core 1 thread 5 partial sum = 200.00000000000000
> Core 3 thread 6 partial sum = 0.0000000000000000
> Core 3 thread 4 partial sum = 0.0000000000000000
> Core 1 thread 6 partial sum = 0.0000000000000000
> Core 3 thread 14 partial sum = 200.00000000000000
> Core 1 thread 12 partial sum = 0.0000000000000000
> Core 3 thread 10 partial sum = 200.00000000000000
> Core 0 thread 0 partial sum = 0.0000000000000000
> Core 0 thread 14 partial sum = 200.00000000000000
> Core 0 thread 8 partial sum = 200.00000000000000
> Core 0 thread 7 partial sum = 0.0000000000000000
> Core 0 thread 15 partial sum = 200.00000000000000
> Core 0 thread 5 partial sum = 200.00000000000000
> Core 0 thread 9 partial sum = 200.00000000000000
> Core 0 thread 11 partial sum = 0.0000000000000000
> Core 0 thread 10 partial sum = 200.00000000000000
> Core 0 thread 6 partial sum = 200.00000000000000
> Core 0 thread 3 partial sum = 200.00000000000000
> Core 0 thread 4 partial sum = 0.0000000000000000
> Core 0 thread 2 partial sum = 0.0000000000000000
> Core 0 thread 13 partial sum = 0.0000000000000000
> Core 0 thread 12 partial sum = 0.0000000000000000
> Core 0 thread 1 partial sum = 0.0000000000000000
> Core 0 partial sum = 1600.0000000000000
> Core 2 thread 3 partial sum = 0.0000000000000000
> Core 2 thread 15 partial sum = 0.0000000000000000
> Core 2 thread 0 partial sum = 0.0000000000000000
> Core 2 thread 2 partial sum = 200.00000000000000
> Core 2 thread 4 partial sum = 0.0000000000000000
> Core 2 thread 5 partial sum = 0.0000000000000000
> Core 2 thread 9 partial sum = 200.00000000000000
> Core 2 thread 7 partial sum = 0.0000000000000000
> Core 2 thread 14 partial sum = 200.00000000000000
> Core 2 thread 8 partial sum = 200.00000000000000
> Core 2 thread 12 partial sum = 200.00000000000000
> Core 2 thread 10 partial sum = 200.00000000000000
> Core 2 thread 6 partial sum = 200.00000000000000
> Core 2 thread 1 partial sum = 0.0000000000000000
> Core 2 thread 13 partial sum = 0.0000000000000000
> Core 2 thread 11 partial sum = 200.00000000000000
> Core 2 partial sum = 1600.0000000000000
> Core 3 partial sum = 1600.0000000000000
> Core 1 thread 14 partial sum = 0.0000000000000000
> Core 1 partial sum = 1600.0000000000000
> Done. Hybrid version: global sum = 6400.0000000000000
>
>
>
> Please tell me If I should check something. I still getting nowhere.
> Best regards
>
> Pasha Pashaei
>
>
>
>
> _______________________________________________
> discuss mailing list discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
>
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list