[mpich-discuss] Better alternative to MPI_Allreduce() and avoiding deadlock with MPI_Neighbor_alltoallw().

hritikesh semwal hritikesh.semwal at gmail.com
Tue May 5 14:19:24 CDT 2020


On Tue, 5 May, 2020, 10:30 PM , <discuss-request at mpich.org> wrote:

> Send discuss mailing list submissions to
>         discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
>         https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
>         discuss-request at mpich.org
>
> You can reach the person managing the list at
>         discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
>    1.  Better alternatives of MPI_Allreduce() (Benson Muite)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 05 May 2020 15:17:08 +0300
> From: "Benson Muite" <benson_muite at emailplus.org>
> To: "Benson Muite via discuss" <discuss at mpich.org>
> Subject: [mpich-discuss] Better alternatives of MPI_Allreduce()
> Message-ID: <ef00483f-25a7-48d8-a04c-964e8001def7 at www.fastmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
>  > 1. I am using MPI_Neighbor_alltoallw() for exchanging the data by
> generating a distributed graph topology communicator. My concern is that
> most of the time my code is working fine but sometimes I guess it is going
> into deadlock (as it is not showing any output). But MPI_Neighbor_alltoallw
> uses MPI_Waitall inside it so I am not getting why exactly this is
> happening.
> >>
> >>  May want to check sending and receiving correct data. Perhaps also try
> MPI_Neighbor_alltoallw
> >>
> >>  > 2. Is it possible that every time I run the code the processors
> times for completion of the task may vary? For example, for one run it all
> processors takes around 100 seconds and for another run, all processors
> take 110 seconds.
> >>
> >>  There is usually some variability. Do you solve the same system each
> time? What is the method of solution? If your code is available it can
> sometimes be easier to give suggestions.
> >>
> > Yes, the system of equations are the same. I am using the finite volume
> method for solving Navier stokes equations. By first sentence you mean to
> say it is possible.
>
> Is the method implicit or explicit?
>

Its an explicit method.

>
> >
> >> >
> >>  > Please help in above two matters.
> >>  >
> >>  > On Tue, May 5, 2020 at 4:28 PM hritikesh semwal <
> hritikesh.semwal at gmail.com> wrote:
> >>  >> Thanks for your response.
> >>  >>
> >>  >> Yes, you are right. I have put barrier just before Allreduce and
> out of the total time consumed by Allreduce, 79% time is consumed by the
> barrier. But my computational work is balanced. Right now, I have
> distributed 97336 cells among 24 processors and maximum and minimum cell
> distribution among all processors is 4057 and 4055 respectively which is
> not too bad. Is there any solution to get rid of this.
> >>
> >>  Try profiling your code not just looking at cell distribution. Are any
> profling tools already installed on your cluster?
> >
> > gprof and valgrind are there.
>
> While not ideal GPROF may be helpful. Perhaps initial try running on 12
> processors. With GPROF you will get 12 files to examine. Check if all
> subroutines take similar times on each processor. You can also time the
> subroutines individually using MPI_WTIME to get the same information.
>

Yes, I have already timed my code before posting this question. I will try
with gprof.


>
> Also, try not to reply to the digest -, or if you do, change the subject
> of the message. This is useful in deciding what to read.
>

Is it fine this time? I have changed the subject line. Is that what you
want to say?

-------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20200505/bfebcddc/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> ------------------------------
>
> End of discuss Digest, Vol 91, Issue 7
> **************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200506/2c9dbea5/attachment.html>


More information about the discuss mailing list