[mpich-discuss] Need help to run hybrid code.
Jahanzeb Maqbool Hashmi
jahanzeb.maqbool at gmail.com
Tue Aug 29 10:01:22 CDT 2017
Hi,
You are using -bind-to socket and -map-by socket with mpirun when running with MVAPICH2. These will not result in correct process mapping with MVAPICH2+OpenMP. You needs to use the options mentioned in the userguide.
(http://mvapich.cse.ohio-state.edu/static/media/mvapich/mvapich2-2.3b-userguide.html#x1-900006.20)
regards
Jahanzeb
On Aug 28, 2017, 9:39 AM -0400, Pasha Pashaei <pasha.313 at hotmail.com>, wrote:
> Dear friends
> I am going to run a hybrid MPI+OPENMP code.
> As you can see in the below I played with the number of threads in various cases while running my main code with (Openmpi,Mpich,Mpich2).
> As you can see in OpenMPi and Mpich it seems that openmp did not work at all as Total time did not change considerably. But in Mpich2 Total computational time increased with increasing the number of threads. It could be because of using virtual threads instead of physical threads(or you said that over-subscribing).
>
> Hybrid code result (MPI + OpenMP) :
> Your suggestions:
> mvapich
> mpirun -np 4 -genv OMP_NUM_THREADS 1 --bind-to hwthread:1 ./pjet.gfortran > output.txt
> Total time = 7.290E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 ./pjet.gfortran > output.txt
> Total time = 4.940E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 --bind-to hwthread:8 -map-by hwthread:8 ./pjet.gfortran > output.txt
> Total time = 4.960E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 --bind-to hwthread:16 ./pjet.gfortran > output.txt
> Total time = 4.502E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to core:16 -map-by core:16 ./pjet.gfortran > output.txt
> Total time = 4.628E+02
>
> Pervios commands
> OpenMPI 1.8.1
> mpirun -np 4 -x OMP_NUM_THREADS=1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.475E+02
> mpirun -np 4 -x OMP_NUM_THREADS=8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.525E+02
> mpirun -np 4 -x OMP_NUM_THREADS=16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.611E+02
>
> mvapich
> mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.441E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.535E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.552E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.591E+02
>
> mvapich2
> mpirun -np 4 -genv OMP_NUM_THREADS 1 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 4.935E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 4 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 5.562E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 8 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 6.392E+02
> mpirun -np 4 -genv OMP_NUM_THREADS 16 -bind-to socket -map-by socket ./pjet.gfortran > output.txt
> Total time = 8.170E+02
>
> Then I used a simple "hybrid.f90" code and its result which I used to check whether the computer can recognize correct value of cores and threads or not. It showed that the correct values in all three in different cases.
> here is its result:
> Starting omp_dotprod_hybrid. Using 4 Cores...
> Core 3 using 16 threads
> Core 0 using 16 threads
> Core 2 using 16 threads
> Core 1 using 16 threads
> Core 1 thread 0 partial sum = 0.0000000000000000
> Core 3 thread 0 partial sum = 0.0000000000000000
> Core 1 thread 4 partial sum = 0.0000000000000000
> Core 3 thread 7 partial sum = 200.00000000000000
> Core 1 thread 8 partial sum = 200.00000000000000
> Core 3 thread 9 partial sum = 200.00000000000000
> Core 1 thread 11 partial sum = 200.00000000000000
> Core 3 thread 3 partial sum = 200.00000000000000
> Core 1 thread 2 partial sum = 0.0000000000000000
> Core 3 thread 5 partial sum = 0.0000000000000000
> Core 1 thread 3 partial sum = 200.00000000000000
> Core 3 thread 2 partial sum = 0.0000000000000000
> Core 1 thread 13 partial sum = 200.00000000000000
> Core 3 thread 12 partial sum = 200.00000000000000
> Core 1 thread 1 partial sum = 200.00000000000000
> Core 3 thread 1 partial sum = 200.00000000000000
> Core 3 thread 8 partial sum = 0.0000000000000000
> Core 1 thread 7 partial sum = 0.0000000000000000
> Core 3 thread 11 partial sum = 200.00000000000000
> Core 1 thread 15 partial sum = 0.0000000000000000
> Core 3 thread 15 partial sum = 0.0000000000000000
> Core 1 thread 10 partial sum = 200.00000000000000
> Core 1 thread 9 partial sum = 200.00000000000000
> Core 3 thread 13 partial sum = 0.0000000000000000
> Core 1 thread 5 partial sum = 200.00000000000000
> Core 3 thread 6 partial sum = 0.0000000000000000
> Core 3 thread 4 partial sum = 0.0000000000000000
> Core 1 thread 6 partial sum = 0.0000000000000000
> Core 3 thread 14 partial sum = 200.00000000000000
> Core 1 thread 12 partial sum = 0.0000000000000000
> Core 3 thread 10 partial sum = 200.00000000000000
> Core 0 thread 0 partial sum = 0.0000000000000000
> Core 0 thread 14 partial sum = 200.00000000000000
> Core 0 thread 8 partial sum = 200.00000000000000
> Core 0 thread 7 partial sum = 0.0000000000000000
> Core 0 thread 15 partial sum = 200.00000000000000
> Core 0 thread 5 partial sum = 200.00000000000000
> Core 0 thread 9 partial sum = 200.00000000000000
> Core 0 thread 11 partial sum = 0.0000000000000000
> Core 0 thread 10 partial sum = 200.00000000000000
> Core 0 thread 6 partial sum = 200.00000000000000
> Core 0 thread 3 partial sum = 200.00000000000000
> Core 0 thread 4 partial sum = 0.0000000000000000
> Core 0 thread 2 partial sum = 0.0000000000000000
> Core 0 thread 13 partial sum = 0.0000000000000000
> Core 0 thread 12 partial sum = 0.0000000000000000
> Core 0 thread 1 partial sum = 0.0000000000000000
> Core 0 partial sum = 1600.0000000000000
> Core 2 thread 3 partial sum = 0.0000000000000000
> Core 2 thread 15 partial sum = 0.0000000000000000
> Core 2 thread 0 partial sum = 0.0000000000000000
> Core 2 thread 2 partial sum = 200.00000000000000
> Core 2 thread 4 partial sum = 0.0000000000000000
> Core 2 thread 5 partial sum = 0.0000000000000000
> Core 2 thread 9 partial sum = 200.00000000000000
> Core 2 thread 7 partial sum = 0.0000000000000000
> Core 2 thread 14 partial sum = 200.00000000000000
> Core 2 thread 8 partial sum = 200.00000000000000
> Core 2 thread 12 partial sum = 200.00000000000000
> Core 2 thread 10 partial sum = 200.00000000000000
> Core 2 thread 6 partial sum = 200.00000000000000
> Core 2 thread 1 partial sum = 0.0000000000000000
> Core 2 thread 13 partial sum = 0.0000000000000000
> Core 2 thread 11 partial sum = 200.00000000000000
> Core 2 partial sum = 1600.0000000000000
> Core 3 partial sum = 1600.0000000000000
> Core 1 thread 14 partial sum = 0.0000000000000000
> Core 1 partial sum = 1600.0000000000000
> Done. Hybrid version: global sum = 6400.0000000000000
>
>
>
> Please tell me If I should check something. I still getting nowhere.
> Best regards
> Pasha Pashaei
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170829/4a4542e3/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss
More information about the discuss
mailing list