<div dir="ltr">Thanks for your response.<div><br></div><div>Yes, you are right. I have put barrier just before Allreduce and out of the total time consumed by Allreduce, 79% time is consumed by the barrier. But my computational work is balanced. Right now, I have distributed 97336 cells among 24 processors and maximum and minimum cell distribution among all processors is 4057 and 4055 respectively which is not too bad. Is there any solution to get rid of this.</div></div><br><div class="gmail_quote"><div dir="ltr" class="gmail_attr">On Tue, May 5, 2020 at 12:30 PM Joachim Protze <<a href="mailto:protze@itc.rwth-aachen.de">protze@itc.rwth-aachen.de</a>> wrote:<br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left:1px solid rgb(204,204,204);padding-left:1ex">Hello,<br>

<br>

it is important to understand, that most of the time you see is not the <br>

cost of the allreduce, but the cost of synchronization (caused by load <br>

imbalance).<br>

<br>

You can do an easy experiment and add a barrier before the allreduce. <br>

Then you will see the actual cost of the allreduce, while the cost of <br>

synchronization will go into the barrier.<br>

<br>

Now, think about dependencies in your algorithm: do you need the output <br>

value immediately? Is this the same time, where you have the input value <br>

ready?<br>

-> otherwise use non-blocking communication and perform independent work <br>

in between<br>

<br>

In any case: fix your load imbalance (the root cause of synchronization <br>

cost).<br>

<br>

Best<br>

Joachim<br>

<br>

Am 05.05.20 um 07:38 schrieb hritikesh semwal via discuss:<br>

> Hello all,<br>

> <br>

> I am working on the development of a parallel CFD solver and I am using <br>

> MPI_Allreduce for the global summation of the local errors calculated on <br>

> all processes of a group and the summation is to be used by all the <br>

> processes. My concern is that MPI_Allreduce is taking almost 27-30% of <br>

> the total time used, which is a significant amount. So, I want to ask if <br>

> anyone can suggest me better alternative/s to replace MPI_Allreduce <br>

> which can reduce the time consumption.<br>

> <br>

> Thank you.<br>

> <br>

> _______________________________________________<br>

> discuss mailing list     <a href="mailto:discuss@mpich.org" target="_blank">discuss@mpich.org</a><br>

> To manage subscription options or unsubscribe:<br>

> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/mailman/listinfo/discuss</a><br>

> <br>

<br>

<br>

-- <br>

Dipl.-Inf. Joachim Protze<br>

<br>

IT Center<br>

Group: High Performance Computing<br>

Division: Computational Science and Engineering<br>

RWTH Aachen University<br>

Seffenter Weg 23<br>

D 52074  Aachen (Germany)<br>

Tel: +49 241 80- 24765<br>

Fax: +49 241 80-624765<br>

<a href="mailto:protze@itc.rwth-aachen.de" target="_blank">protze@itc.rwth-aachen.de</a><br>

<a href="http://www.itc.rwth-aachen.de" rel="noreferrer" target="_blank">www.itc.rwth-aachen.de</a><br>

<br>

</blockquote></div>