[mpich-discuss] Better alternative to MPI_Allreduce() and avoiding deadlock with MPI_Neighbor_alltoallw().
hritikesh semwal
hritikesh.semwal at gmail.com
Tue May 5 14:19:24 CDT 2020
On Tue, 5 May, 2020, 10:30 PM , <discuss-request at mpich.org> wrote:
> Send discuss mailing list submissions to
> discuss at mpich.org
>
> To subscribe or unsubscribe via the World Wide Web, visit
> https://lists.mpich.org/mailman/listinfo/discuss
> or, via email, send a message with subject or body 'help' to
> discuss-request at mpich.org
>
> You can reach the person managing the list at
> discuss-owner at mpich.org
>
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of discuss digest..."
>
>
> Today's Topics:
>
> 1. Better alternatives of MPI_Allreduce() (Benson Muite)
>
>
> ----------------------------------------------------------------------
>
> Message: 1
> Date: Tue, 05 May 2020 15:17:08 +0300
> From: "Benson Muite" <benson_muite at emailplus.org>
> To: "Benson Muite via discuss" <discuss at mpich.org>
> Subject: [mpich-discuss] Better alternatives of MPI_Allreduce()
> Message-ID: <ef00483f-25a7-48d8-a04c-964e8001def7 at www.fastmail.com>
> Content-Type: text/plain; charset="us-ascii"
>
> > 1. I am using MPI_Neighbor_alltoallw() for exchanging the data by
> generating a distributed graph topology communicator. My concern is that
> most of the time my code is working fine but sometimes I guess it is going
> into deadlock (as it is not showing any output). But MPI_Neighbor_alltoallw
> uses MPI_Waitall inside it so I am not getting why exactly this is
> happening.
> >>
> >> May want to check sending and receiving correct data. Perhaps also try
> MPI_Neighbor_alltoallw
> >>
> >> > 2. Is it possible that every time I run the code the processors
> times for completion of the task may vary? For example, for one run it all
> processors takes around 100 seconds and for another run, all processors
> take 110 seconds.
> >>
> >> There is usually some variability. Do you solve the same system each
> time? What is the method of solution? If your code is available it can
> sometimes be easier to give suggestions.
> >>
> > Yes, the system of equations are the same. I am using the finite volume
> method for solving Navier stokes equations. By first sentence you mean to
> say it is possible.
>
> Is the method implicit or explicit?
>
Its an explicit method.
>
> >
> >> >
> >> > Please help in above two matters.
> >> >
> >> > On Tue, May 5, 2020 at 4:28 PM hritikesh semwal <
> hritikesh.semwal at gmail.com> wrote:
> >> >> Thanks for your response.
> >> >>
> >> >> Yes, you are right. I have put barrier just before Allreduce and
> out of the total time consumed by Allreduce, 79% time is consumed by the
> barrier. But my computational work is balanced. Right now, I have
> distributed 97336 cells among 24 processors and maximum and minimum cell
> distribution among all processors is 4057 and 4055 respectively which is
> not too bad. Is there any solution to get rid of this.
> >>
> >> Try profiling your code not just looking at cell distribution. Are any
> profling tools already installed on your cluster?
> >
> > gprof and valgrind are there.
>
> While not ideal GPROF may be helpful. Perhaps initial try running on 12
> processors. With GPROF you will get 12 files to examine. Check if all
> subroutines take similar times on each processor. You can also time the
> subroutines individually using MPI_WTIME to get the same information.
>
Yes, I have already timed my code before posting this question. I will try
with gprof.
>
> Also, try not to reply to the digest -, or if you do, change the subject
> of the message. This is useful in deciding what to read.
>
Is it fine this time? I have changed the subject line. Is that what you
want to say?
-------------- next part --------------
> An HTML attachment was scrubbed...
> URL: <
> http://lists.mpich.org/pipermail/discuss/attachments/20200505/bfebcddc/attachment-0001.html
> >
>
> ------------------------------
>
> Subject: Digest Footer
>
> _______________________________________________
> discuss mailing list
> discuss at mpich.org
> https://lists.mpich.org/mailman/listinfo/discuss
>
>
> ------------------------------
>
> End of discuss Digest, Vol 91, Issue 7
> **************************************
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20200506/2c9dbea5/attachment.html>
More information about the discuss
mailing list