<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style="font-family:Arial;"><br></div><div style="font-family:Arial;"><br></div><div>On Tue, May 5, 2020, at 2:16 PM, hritikesh semwal via discuss wrote:<br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div>I want to add two more questions about my solver,<br></div><div>1. I am using MPI_Neighbor_alltoallw() for exchanging the data by generating a distributed graph topology communicator. My concern is that most of the time my code is working fine but sometimes I guess it is going into deadlock (as it is not showing any output). But MPI_Neighbor_alltoallw uses MPI_Waitall inside it so I am not getting why exactly this is happening.<br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">May want to check sending and receiving correct data. Perhaps also try MPI_Neighbor_alltoallw<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div>2. Is it possible that every time I run the code the processors times for completion of the task may vary? For example, for one run it all processors takes around 100 seconds and for another run, all processors take 110 seconds. <br></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">There is usually some variability. Do you solve the same system each time? What is the method of solution? If your code is available it can sometimes be easier to give suggestions.<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div><br></div><div>Please help in above two matters.<br></div></div><div><br></div><div class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Tue, May 5, 2020 at 4:28 PM hritikesh semwal <<a href="mailto:hritikesh.semwal@gmail.com">hritikesh.semwal@gmail.com</a>> wrote:<br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div dir="ltr"><div>Thanks for your response.<br></div><div><br></div><div>Yes, you are right. I have put barrier just before Allreduce and out of the total time consumed by Allreduce, 79% time is consumed by the barrier. But my computational work is balanced. Right now, I have distributed 97336 cells among 24 processors and maximum and minimum cell distribution among all processors is 4057 and 4055 respectively which is not too bad. Is there any solution to get rid of this.<br></div></div></blockquote></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Try profiling your code not just looking at cell distribution. Are any profling tools already installed on your cluster?<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Tue, May 5, 2020 at 12:30 PM Joachim Protze <<a href="mailto:protze@itc.rwth-aachen.de" target="_blank">protze@itc.rwth-aachen.de</a>> wrote:<br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>Hello,<br></div><div> <br></div><div> it is important to understand, that most of the time you see is not the <br></div><div> cost of the allreduce, but the cost of synchronization (caused by load <br></div><div> imbalance).<br></div><div> <br></div><div> You can do an easy experiment and add a barrier before the allreduce. <br></div><div> Then you will see the actual cost of the allreduce, while the cost of <br></div><div> synchronization will go into the barrier.<br></div><div> <br></div><div> Now, think about dependencies in your algorithm: do you need the output <br></div><div> value immediately? Is this the same time, where you have the input value <br></div><div> ready?<br></div><div> -> otherwise use non-blocking communication and perform independent work <br></div><div> in between<br></div><div> <br></div><div> In any case: fix your load imbalance (the root cause of synchronization <br></div><div> cost).<br></div><div> <br></div><div> Best<br></div><div> Joachim<br></div><div> <br></div><div> Am 05.05.20 um 07:38 schrieb hritikesh semwal via discuss:<br></div><div> > Hello all,<br></div><div> > <br></div><div> > I am working on the development of a parallel CFD solver and I am using <br></div><div> > MPI_Allreduce for the global summation of the local errors calculated on <br></div><div> > all processes of a group and the summation is to be used by all the <br></div><div> > processes. My concern is that MPI_Allreduce is taking almost 27-30% of <br></div><div> > the total time used, which is a significant amount. So, I want to ask if <br></div><div> > anyone can suggest me better alternative/s to replace MPI_Allreduce <br></div><div> > which can reduce the time consumption.<br></div><div> > <br></div><div> > Thank you.<br></div><div> > <br></div><div> > _______________________________________________<br></div><div> > discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br></div><div> > To manage subscription options or unsubscribe:<br></div><div> > <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br></div><div> > <br></div><div> <br></div><div> <br></div><div> -- <br></div><div> Dipl.-Inf. Joachim Protze<br></div><div> <br></div><div> IT Center<br></div><div> Group: High Performance Computing<br></div><div> Division: Computational Science and Engineering<br></div><div> RWTH Aachen University<br></div><div> Seffenter Weg 23<br></div><div> D 52074  Aachen (Germany)<br></div><div> Tel: +49 241 80- 24765<br></div><div> Fax: +49 241 80-624765<br></div><div> <a href="mailto:protze@itc.rwth-aachen.de" target="_blank">protze@itc.rwth-aachen.de</a><br></div><div> <a href="http://www.itc.rwth-aachen.de" rel="noreferrer" target="_blank">www.itc.rwth-aachen.de</a><br></div><div> <br></div></blockquote></div></blockquote></div><div>_______________________________________________<br></div><div>discuss mailing list     discuss@mpich.org<br></div><div>To manage subscription options or unsubscribe:<br></div><div>https://lists.mpich.org/mailman/listinfo/discuss<br></div><div><br></div></blockquote><div style="font-family:Arial;"><br></div></body></html>