<!DOCTYPE html><html><head><title></title><style type="text/css">p.MsoNormal,p.MsoNoSpacing{margin:0}</style></head><body><div style="font-family:Arial;"><br></div><div style="font-family:Arial;"><br></div><div>On Tue, May 5, 2020, at 2:46 PM, hritikesh semwal wrote:<br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div dir="ltr"><br></div><div><br></div><div class="qt-gmail_quote"><div dir="ltr" class="qt-gmail_attr">On Tue, May 5, 2020 at 4:51 PM Benson Muite <<a href="mailto:benson_muite@emailplus.org">benson_muite@emailplus.org</a>> wrote:<br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><u></u><br></div><div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt-gmail-m_8386624168633875755qt"><div dir="ltr"><div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>> <br></div><div>> Hi Hitesh,<br></div><div>> <br></div><div>> What hardware are you running on and what is the interconnect?<br></div></blockquote><div> <br></div><div>Right now I am using a cluster.<br></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">What is the interconnect?<br></div></div></blockquote><div><br></div><div>I don't know about this. Is it relevant?<br></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">It can affect performance - but expect it may not be the most important factor on 24 processors. Most common is Infiniband (<a href="https://en.wikipedia.org/wiki/InfiniBand">https://en.wikipedia.org/wiki/InfiniBand</a>)<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div class="qt-gmail_quote"><div> <br></div><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><blockquote type="cite" id="qt-gmail-m_8386624168633875755qt"><div dir="ltr"><div><div> <br></div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;">> Have you tried changing any of the MPI settings?<br></blockquote><div> <br></div><div>What do you mean by MPI settings?<br></div></div></div></blockquote><div style="font-family:Arial;">Given your comment on the barrier, this is probably not so useful at the moment.<br></div><blockquote type="cite" id="qt-gmail-m_8386624168633875755qt"><div dir="ltr"><div><div> <br></div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;">> Can the reduction be done asynchronously?<br></blockquote><div> <br></div><div>I did not get your question.<br></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">For example using a non blocking all reduce:<br></div><div style="font-family:Arial;"><a href="https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node135.htm" target="_blank">https://www.mpi-forum.org/docs/mpi-3.1/mpi31-report/node135.htm</a><br></div><div style="font-family:Arial;"><br></div></div></blockquote><div><br></div><div>I tried using a non-blocking call but after this code is not working correctly. <br></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">Ok. Change back to blocking call. It is likely you have poor load balancing.<br></div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt" style=""><div dir="ltr"><div class="qt-gmail_quote"><blockquote class="qt-gmail_quote" style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div><div style="font-family:Arial;"><br></div><blockquote type="cite" id="qt-gmail-m_8386624168633875755qt"><div dir="ltr"><div><div> <br></div><blockquote style="margin-top:0px;margin-right:0px;margin-bottom:0px;margin-left:0.8ex;border-left-width:1px;border-left-style:solid;border-left-color:rgb(204, 204, 204);padding-left:1ex;"><div>> <br></div><div>> Regards,<br></div><div>> Benson<br></div><div><br></div><div>Also, is your work load balanced? One way to check this might be to place a barrier just before the all-reduce call. If the barrier ends up taking most of your time, then it is likely you will need to determine a better way to distribute the computational work.<br></div></blockquote><div><br></div><div> Thanks for your response.<br></div><div><br></div><div>Yes, you are right. I have put barrier just before Allreduce and out of the total time consumed by Allreduce, 79% time is consumed by the barrier. But my computational work is balanced. Right now, I have distributed 97336 cells among 24 processors and maximum and minimum cell distribution among all processors is 4057 and 4055 respectively which is not too bad. Is there any solution to get rid of this?<br></div></div></div></blockquote></div></blockquote><div>Please help me in this regard. <br></div></div></div></blockquote><div style="font-family:Arial;"><br></div><div style="font-family:Arial;">If you cannot profile your code, time the section before the all reduce on each processor using MPI_WTIME and check if it is even across all 24 processors. If using more processors, you will likely want to use a profiling tool, but if expect to run on about 24 processors, setting up a profiling tool if not already available may take some time.<br></div><div style="font-family:Arial;"><br></div></body></html>