[mpich-devel] mpich3 error

Sun Jan 17 01:14:54 CST 2021

Hi Mpich dev community,

Based on what we're seeing, I can guess it concurrently broadcasts for 23
processes, then serially goes to the next set of processes (4 processes in
this case).  This remains more true as I increase the # of processes- more
and more processes are serially broadcasted after the first set of
processes.  Some processes are finished before some enter- so there does
seem to be a limitation on MPI_Bcast() on how many concurrent processes are
being broadcasted at a time.  Does this explain why there's a timing drop
as the # of processes grows?

If so, possible fixes: multiple communicators simultaneously (I think), or
MPI_Reduce()-  but MPI_Reduce() will eventually run into a limitation as
well?

Am I correct?  Thank you,

Best,
Brent

On Sat, Jan 16, 2021 at 7:05 PM Brent Morgan <brent.taylormorgan at gmail.com>
wrote:

> Hi Rajeev,
>
> For the 27 processes, it consists of 27 physical cores, 7 devices.
>
> I understand Gather may overall be slower, but that should result in a
> smaller slope of a linear speedup...  The observed results leads to a
> timing 2x of what is expected for 27 processes.
> [image: image.png]
>
> One thing I notice is that, in line 126 of my sent code, ranks 20-24 is
> processed *after* ranks 1-19, 25-27.  Printing things out above line 126,
> ranks 20-24 is printed always after 1-19, 25-27, and a pause; it's as if
> the program broadcasts 23 (instead of 27) ranks at a time.  There is
> nothing faulty about my devices/cores, as this behavior is consistent when
> I switch nodes in the hostname file.
>
> Best,
> Brent
>
> On Sat, Jan 16, 2021 at 6:44 PM Thakur, Rajeev <thakur at anl.gov> wrote:
>
>> There is no limitation. Gather concatenates data at the root so all the
>> data needs to be sent. Reduce calculates intermediate sums in a binomial
>> tree fashion, so if you are summing a single number, only a single number
>> is communicated. There are many papers on collective communication
>> algorithms.
>>
>>
>>
>> When you run 27 processes, are there 27 physical cores for them to run
>> concurrently? If not, the performance measurements will not show speedups.
>>
>>
>>
>> *From: *Brent Morgan <brent.taylormorgan at gmail.com>
>> *Date: *Saturday, January 16, 2021 at 7:37 PM
>> *To: *"Thakur, Rajeev" <thakur at anl.gov>
>> *Cc: *Robert Katona <robert.katona at hotmail.com>, "Zhou, Hui" <
>> zhouh at anl.gov>, "devel at mpich.org" <devel at mpich.org>
>> *Subject: *Re: [mpich-devel] mpich3 error
>>
>>
>>
>> Why isn’t the sum all computed after doing a gather to the root- is there
>> a limitation of 27 processes (7 devices) for gather?  We thought a
>> collector could handle much more,
>>
>>
>>
>> Best,
>>
>> Brent
>>
>>
>>
>> On Sat, Jan 16, 2021 at 2:18 PM Thakur, Rajeev <thakur at anl.gov> wrote:
>>
>> It uses a different algorithm and communicates much less data. The sum is
>> not all computed after doing a gather to the root.
>>
>>
>>
>> *From: *Brent Morgan <brent.taylormorgan at gmail.com>
>> *Date: *Saturday, January 16, 2021 at 2:58 PM
>> *To: *"Thakur, Rajeev" <thakur at anl.gov>
>> *Cc: *Robert Katona <robert.katona at hotmail.com>, "Zhou, Hui" <
>> zhouh at anl.gov>, "devel at mpich.org" <devel at mpich.org>
>> *Subject: *Re: [mpich-devel] mpich3 error
>>
>>
>>
>> We will try MPI_Reduce which will improve our code, but it will not solve
>> the underlying problem.
>>
>>
>>
>> Best,
>>
>> Brent
>>
>>
>>
>> On Sat, Jan 16, 2021 at 1:31 PM Thakur, Rajeev <thakur at anl.gov> wrote:
>>
>> Your mail all the way below says “We are using MPI_Gather collector for
>> merely calculating the sum of the result of N processes”. Why don’t you use
>> MPI_Reduce instead then?
>>
>>
>>
>> Rajeev
>>
>>
>>
>>
>>
>> *From: *Brent Morgan via devel <devel at mpich.org>
>> *Reply-To: *"devel at mpich.org" <devel at mpich.org>
>> *Date: *Saturday, January 16, 2021 at 1:38 PM
>> *To: *"Zhou, Hui" <zhouh at anl.gov>
>> *Cc: *Brent Morgan <brent.taylormorgan at gmail.com>, "devel at mpich.org" <
>> devel at mpich.org>, Robert Katona <robert.katona at hotmail.com>
>> *Subject: *Re: [mpich-devel] mpich3 error
>>
>>
>>
>> Hi Hui, Mpich community,
>>
>>
>>
>> Thanks for the response.  You're right, I'll provide a toy program that
>> replicates the code structure (and results).  The toy program is
>> calculating a sum value from each process- the value isn't too important
>> for this toy program.  The timing, however, is the only thing important in
>> our demonstration.  It exactly replicates what we are observing for our
>> actual program.  This directly relates to the MPI functionality- we can't
>> find out what the issue is.
>>
>>
>>
>> I have attached the code.  Is something wrong with our implementation?
>> It starts with the main() function.  Thank you very much for any help,
>>
>>
>>
>> Best,
>>
>> Brent
>>
>> PS My subscription to discuss at mpich.org is pending currently.
>>
>>
>>
>> On Sat, Jan 16, 2021 at 12:24 PM Brent Morgan <
>> brent.taylormorgan at gmail.com> wrote:
>>
>> Hi Hui, Mpich community,
>>
>>
>>
>> Thanks for the response.  You're right, I'll provide a toy program that
>> replicates the code structure (and results).  The toy program is
>> calculating a sum value from each process- the value isn't too important
>> for this toy program.  The timing, however, is the only thing important in
>> our demonstration.  It exactly replicates what we are observing for our
>> actual program.  This directly relates to the MPI functionality- we can't
>> find out what the issue is.
>>
>>
>>
>> I have attached the code.  Is something wrong with our implementation?
>> It starts with the main() function.  Thank you very much for any help,
>>
>>
>>
>> Best,
>>
>> Brent
>>
>>
>>
>> On Fri, Jan 15, 2021 at 10:43 PM Zhou, Hui <zhouh at anl.gov> wrote:
>>
>> Your description only mentions MPI_Gather. If there is indeed problem
>> with MPI_Gather, then you should be able to reproduce the issue with a
>> sample program. Share with us and we can better assist you. If you can’t
>> reproduce the issue with a simple example, then I suspect there are other
>> problems that you are not able to fully describe. We really can’t help much
>> without able to see the code.
>>
>>
>>
>> That said, I am not even sure what is the issue you are describing. 100
>> process MPI_Gather will be slower than 50 process MPI_Gather. And since it
>> is a collective, if one of your process is delayed due to some computations
>> or else, the whole collective will take longer to finish just due to
>> waiting for the late process. You really need tell us what your program is
>> doing in order for us to even offer an intelligent guess.
>>
>>
>>
>> --
>> Hui Zhou
>>
>>
>>
>>
>>
>> *From: *Brent Morgan <brent.taylormorgan at gmail.com>
>> *Date: *Friday, January 15, 2021 at 10:42 PM
>> *To: *Zhou, Hui <zhouh at anl.gov>, discuss at mpich.org <discuss at mpich.org>
>> *Cc: *Robert Katona <robert.katona at hotmail.com>
>> *Subject: *Re: [mpich-devel] mpich3 error
>>
>> Hi MPICH community,
>>
>>
>>
>> My team has downloaded mpich 3.3.2 (using ch3 as default) and implemented
>> MPI, and for small # of processes (<50), everything worked fine for our MPI
>> implementation.  For >=50 processes, there was a ch3 error and crashed the
>> program after a random amount of seconds (sometimes 10seconds, sometimes
>> 100seconds).  So we compiled mpich 3.3.2 with ch4 (instead of default ch3)
>> using '--with-device=ch4:ofi` flag and this got rid of the error- but for
>> >12 processes, the speed would slow down to 2x slower suddenly.
>>
>>
>>
>> Upon Hui's suggestion, we upgraded to mpich 3.4 and compiled with
>> '--with-device=ch4:ofi` flag (where ch4 is default for mpich 3.4).
>> Everything worked fine until we hit 20 processes; after >=20 processes, the
>> 2x slowdown is happening again.
>>
>>
>>
>> We have tried 1 communicator and multiple communicators in an attempt to
>> make the MPI implementation faster, but there's no significant difference
>> in observations.  We are using MPI_Gather collector for merely calculating
>> the sum of the result of N processes, but we can't seem to maintain
>> stability within MPI as we increase N processes.  Is there something we are
>> missing that is ultimately causing this error?  We are at a loss here,
>> thank you.
>>
>>
>>
>> Best,
>>
>> Brent
>>
>>
>>
>>
>>
>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20210117/35c6a386/attachment-0001.html>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image001.png
Type: image/png
Size: 87386 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20210117/35c6a386/attachment-0002.png>
-------------- next part --------------
A non-text attachment was scrubbed...
Name: image.png
Type: image/png
Size: 87383 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20210117/35c6a386/attachment-0003.png>