<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style="font-family:Arial;"> > 1. I am using MPI_Neighbor_alltoallw() for exchanging the data by generating a distributed graph topology communicator. My concern is that most of the time my code is working fine but sometimes I guess it is going into deadlock (as it is not showing any output). But MPI_Neighbor_alltoallw uses MPI_Waitall inside it so I am not getting why exactly this is happening.<br></div><blockquote type="cite" id="qt" style=""><div dir="auto"><div dir="ltr"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div> <br></div><div> May want to check sending and receiving correct data. Perhaps also try MPI_Neighbor_alltoallw<br></div><div> <br></div><div> > 2. Is it possible that every time I run the code the processors times for completion of the task may vary? For example, for one run it all processors takes around 100 seconds and for another run, all processors take 110 seconds. <br></div><div> <br></div><div> There is usually some variability. Do you solve the same system each time? What is the method of solution? If your code is available it can sometimes be easier to give suggestions.<br></div><div> <br></div></blockquote><div>Yes, the system of equations are the same. I am using the finite volume method for solving Navier stokes equations. By first sentence you mean to say it is possible.<br></div></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Is the method implicit or explicit?<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="auto"><div dir="ltr"><div class="qt-gmail_quote"><div> <br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>> <br></div><div> > Please help in above two matters.<br></div><div> > <br></div><div> > On Tue, May 5, 2020 at 4:28 PM hritikesh semwal <<a href="mailto:hritikesh.semwal@gmail.com" target="_blank" rel="noreferrer">hritikesh.semwal@gmail.com</a>> wrote:<br></div><div> >> Thanks for your response.<br></div><div> >> <br></div><div> >> Yes, you are right. I have put barrier just before Allreduce and out of the total time consumed by Allreduce, 79% time is consumed by the barrier. But my computational work is balanced. Right now, I have distributed 97336 cells among 24 processors and maximum and minimum cell distribution among all processors is 4057 and 4055 respectively which is not too bad. Is there any solution to get rid of this.<br></div><div> <br></div><div> Try profiling your code not just looking at cell distribution. Are any profling tools already installed on your cluster?<br></div></blockquote><div><br></div><div>gprof and valgrind are there.<br></div></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">While not ideal GPROF may be helpful. Perhaps initial try running on 12 processors. With GPROF you will get 12 files to examine. Check if all subroutines take similar times on each processor. You can also time the subroutines individually using MPI_WTIME to get the same information.</div><div style="font-family:Arial;"><br></div><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Also, try not to reply to the digest -, or if you do, change the  subject of the message. This is useful in deciding what to read.<br></div></body></html>