<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style="font-family:Arial;"><br></div><div style="font-family:Arial;"><br></div><div>On Tue, May 5, 2020, at 10:19 PM, hritikesh semwal via discuss wrote:<br></div><blockquote type="cite" id="qt" style=""><div dir="auto"><div><div><br></div><div><br></div><div class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Tue, 5 May, 2020, 10:30 PM , <<a href="mailto:discuss-request@mpich.org" target="_blank" rel="noreferrer">discuss-request@mpich.org</a>> wrote:<br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div> <br></div><div>  > 1. I am using MPI_Neighbor_alltoallw() for exchanging the data by generating a distributed graph topology communicator. My concern is that most of the time my code is working fine but sometimes I guess it is going into deadlock (as it is not showing any output). But MPI_Neighbor_alltoallw uses MPI_Waitall inside it so I am not getting why exactly this is happening.<br></div><div> >> <br></div><div> >>  May want to check sending and receiving correct data. Perhaps also try MPI_Neighbor_alltoallw<br></div><div> >> <br></div><div> >>  > 2. Is it possible that every time I run the code the processors times for completion of the task may vary? For example, for one run it all processors takes around 100 seconds and for another run, all processors take 110 seconds. <br></div><div> >> <br></div><div> >>  There is usually some variability. Do you solve the same system each time? What is the method of solution? If your code is available it can sometimes be easier to give suggestions.<br></div><div> >> <br></div><div> > Yes, the system of equations are the same. I am using the finite volume method for solving Navier stokes equations. By first sentence you mean to say it is possible.<br></div><div> <br></div><div> Is the method implicit or explicit?<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><br></blockquote></div></div><div dir="auto">Its an explicit method.<br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Ok<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="auto"><div dir="auto"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><br></div><div>><br></div><div> >> > <br></div><div> >>  > Please help in above two matters.<br></div><div> >>  > <br></div><div> >>  > On Tue, May 5, 2020 at 4:28 PM hritikesh semwal <<a href="mailto:hritikesh.semwal@gmail.com" rel="noreferrer noreferrer" target="_blank">hritikesh.semwal@gmail.com</a>> wrote:<br></div><div> >>  >> Thanks for your response.<br></div><div> >>  >> <br></div><div> >>  >> Yes, you are right. I have put barrier just before Allreduce and out of the total time consumed by Allreduce, 79% time is consumed by the barrier. But my computational work is balanced. Right now, I have distributed 97336 cells among 24 processors and maximum and minimum cell distribution among all processors is 4057 and 4055 respectively which is not too bad. Is there any solution to get rid of this.<br></div><div> >> <br></div><div> >>  Try profiling your code not just looking at cell distribution. Are any profling tools already installed on your cluster?<br></div><div> > <br></div><div> > gprof and valgrind are there.<br></div><div> <br></div><div> While not ideal GPROF may be helpful. Perhaps initial try running on 12 processors. With GPROF you will get 12 files to examine. Check if all subroutines take similar times on each processor. You can also time the subroutines individually using MPI_WTIME to get the same information.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><br></blockquote></div></div><div dir="auto">Yes, I have already timed my code before posting this question. I will try with gprof.<br></div><div dir="auto"><br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Great. Some documentation on Gprof:<br></div><div style="font-family:Arial;"><a href="http://shwina.github.io/2014/11/profiling-parallel">http://shwina.github.io/2014/11/profiling-parallel</a><br></div><div style="font-family:Arial;"><a href="https://cluster.earlham.edu/wiki/index.php/Cluster:Gprof">https://cluster.earlham.edu/wiki/index.php/Cluster:Gprof</a><br></div><div style="font-family:Arial;"><a href="https://portal.tacc.utexas.edu/documents/13601/1041435/29-Overview_of_Profiling.pdf/84359111-d21a-4618-9d90-ca878c1e37ab">https://portal.tacc.utexas.edu/documents/13601/1041435/29-Overview_of_Profiling.pdf/84359111-d21a-4618-9d90-ca878c1e37ab</a><br></div><div style="font-family:Arial;"><a href="https://hpc.llnl.gov/software/development-environment-software/gprof">https://hpc.llnl.gov/software/development-environment-software/gprof</a><br></div><div style="font-family:Arial;"><a href="https://support.pawsey.org.au/documentation/display/US/Profiling+with+gprof">https://support.pawsey.org.au/documentation/display/US/Profiling+with+gprof</a><br></div><div style="font-family:Arial;"><a href="https://stackoverflow.com/questions/39041871/missing-function-from-gprof-output">https://stackoverflow.com/questions/39041871/missing-function-from-gprof-output</a><br></div><div style="font-family:Arial;"><br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="auto"><div dir="auto"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><br></div><div>Also, try not to reply to the digest -, or if you do, change the subject of the message. This is useful in deciding what to read.<br></div></blockquote></div></div><div dir="auto"><br></div><div dir="auto"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><br></blockquote></div></div><div dir="auto">Is it fine this time? I have changed the subject line. Is that what you want to say?<br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">That is ok if you cannot reply to the message directly.<br></div><div style="font-family:Arial;"><br></div></body></html>