[mpich-discuss] Poor performance of Waitany / Waitsome

Wed Jan 15 15:39:33 CST 2014

Hi Jeff,

> Given that you're using shared memory on a bloated OS (anything
> driving a GUI Window Manager), software overhead is going to be
> significant.

Very true - I would not expect these times to be indicative of what MPICH can actually achieve, but nonetheless the general trends seem to be reproducible.

I’m hoping if I can get a good handle on what’s happening, then I can write better MPI code in the general case. The major head scratcher for me is how the MPI_TestX routines seem to be slower than their conjugate MPI_WaitX in most situations, when I would imagine they’d be doing something fairly similar behind the scenes.

The wildcard appears to be MPI_Testsome().

I must be doing something dumb here, so I’ll also caffein-ate and consider!

J.

On Jan 15, 2014, at 3:26 PM, Jeff Hammond <jeff.science at gmail.com> wrote:

> Given that you're using shared memory on a bloated OS (anything
> driving a GUI Window Manager), software overhead is going to be
> significant.  You can only do so much about this.  You might want to
> compile MPICH yourself using all the optimization flags.
> 
> For example, I decided that "--enable-static
> --enable-fast=O3,nochkmsg,notiming,ndebug,nompit
> --disable-weak-symbols --enable-threads=single" were configure options
> that someone in search of speed might use.  I have not done any
> systematic testing yet so some MPICH developer might tell me I'm a
> clueless buffoon for bothering to (de)activate some of these options.
> 
> If you were to assume that I was going to rerun your test with
> different builds of MPICH on my Mac laptop as soon as I get some
> coffee, you would be correct.  Hence, apathy on your part has no
> impact on the experiments regarding MPICH build variants and speed :-)
> 
> Jeff
> 
> On Wed, Jan 15, 2014 at 3:10 PM, John Grime <jgrime at uchicago.edu> wrote:
>> Cheers for the help, Jeff!
>> 
>> I just tried to mimic Waitall() using a variety of the “MPI_Test…” routines
>> (code attached), and the results are not what I would expect:
>> 
>> Although Waitsome() seems to give consistently the worst performance (
>> Waitall < Waitany < Waitsome ) , Testsome() *appears* to always be faster
>> than Testany(), and for larger numbers of requests the performance order
>> seems to actually reverse.
>> 
>> 
>> Now, I may have done something spectacularly dumb here (it would be the 5th
>> such example from today alone), but on the assumption I have not: is this
>> result expected given the underlying implementation?
>> 
>> J.
>> 
>> 
>> ./time_routines.sh 4 50
>> 
>> nprocs = 4, ntokens = 16, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 1.526000e-03    1.000x
>>    MPI_Waitany : 1.435000e-03    0.940x
>>   MPI_Waitsome : 3.381000e-03    2.216x
>>    MPI_Testall : 3.101000e-03    2.032x
>>    MPI_Testany : 8.080000e-03    5.295x
>>   MPI_Testsome : 3.037000e-03    1.990x
>>   PMPI_Waitall : 1.603000e-03    1.050x
>>   PMPI_Waitany : 1.404000e-03    0.920x
>>  PMPI_Waitsome : 4.666000e-03    3.058x
>> 
>> 
>> 
>> nprocs = 4, ntokens = 64, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 3.173000e-03    1.000x
>>    MPI_Waitany : 5.362000e-03    1.690x
>>   MPI_Waitsome : 1.809100e-02    5.702x
>>    MPI_Testall : 1.364200e-02    4.299x
>>    MPI_Testany : 2.309300e-02    7.278x
>>   MPI_Testsome : 1.469800e-02    4.632x
>>   PMPI_Waitall : 2.063000e-03    0.650x
>>   PMPI_Waitany : 9.420000e-03    2.969x
>>  PMPI_Waitsome : 1.890300e-02    5.957x
>> 
>> 
>> 
>> nprocs = 4, ntokens = 128, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 4.730000e-03    1.000x
>>    MPI_Waitany : 2.691000e-02    5.689x
>>   MPI_Waitsome : 4.519000e-02    9.554x
>>    MPI_Testall : 4.696900e-02    9.930x
>>    MPI_Testany : 7.285200e-02   15.402x
>>   MPI_Testsome : 3.773400e-02    7.978x
>>   PMPI_Waitall : 5.158000e-03    1.090x
>>   PMPI_Waitany : 2.223200e-02    4.700x
>>  PMPI_Waitsome : 4.205000e-02    8.890x
>> 
>> 
>> 
>> nprocs = 4, ntokens = 512, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 1.365900e-02    1.000x
>>    MPI_Waitany : 3.261610e-01   23.879x
>>   MPI_Waitsome : 3.944020e-01   28.875x
>>    MPI_Testall : 5.408010e-01   39.593x
>>    MPI_Testany : 4.865990e-01   35.625x
>>   MPI_Testsome : 3.067470e-01   22.458x
>>   PMPI_Waitall : 1.976100e-02    1.447x
>>   PMPI_Waitany : 3.011500e-01   22.048x
>>  PMPI_Waitsome : 3.791930e-01   27.761x
>> 
>> 
>> 
>> nprocs = 4, ntokens = 1024, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 4.087800e-02    1.000x
>>    MPI_Waitany : 1.245209e+00   30.462x
>>   MPI_Waitsome : 1.704020e+00   41.686x
>>    MPI_Testall : 1.940940e+00   47.481x
>>    MPI_Testany : 1.618215e+00   39.586x
>>   MPI_Testsome : 1.133568e+00   27.731x
>>   PMPI_Waitall : 3.970200e-02    0.971x
>>   PMPI_Waitany : 1.344188e+00   32.883x
>>  PMPI_Waitsome : 1.685816e+00   41.240x
>> 
>> 
>> nprocs = 4, ntokens = 2048, ncycles = 50
>> Method          : Time         Relative
>>    MPI_Waitall : 1.173840e-01    1.000x
>>    MPI_Waitany : 4.600552e+00   39.192x
>>   MPI_Waitsome : 6.840568e+00   58.275x
>>    MPI_Testall : 6.762144e+00   57.607x
>>    MPI_Testany : 5.170525e+00   44.048x
>>   MPI_Testsome : 4.260335e+00   36.294x
>>   PMPI_Waitall : 1.291590e-01    1.100x
>>   PMPI_Waitany : 5.161881e+00   43.974x
>>  PMPI_Waitsome : 7.388439e+00   62.942x
>> 
>> 
>> 
>> On Jan 15, 2014, at 2:53 PM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> 
>>> On Wed, Jan 15, 2014 at 2:23 PM, John Grime <jgrime at uchicago.edu> wrote:
>>>> Hi Jeff,
>>>> 
>>>>> If Waitall wasn't faster than Waitsome or Waitany, then it wouldn't
>>>>> exist since obviously one can implement the former in terms of the
>>>>> latter
>>>> 
>>>> 
>>>> I see no reason it wouldn’t exist in such a case, given that it’s an
>>>> elegant/convenient way to wait for all requests to complete vs. Waitsome /
>>>> Waitany. It makes sense to me that it would be in the API in any case, much
>>>> as I appreciate the value of the RISC-y approach you imply.
>>>> 
>>>>> it shouldn't be surprising that they aren't as efficient.
>>>> 
>>>> I would’t expect them to have identical performance - but nor would I
>>>> have expected a performance difference of ~50x for the same number of
>>>> outstanding requests, even given that a naive loop over the request array
>>>> will be O(N). That loop should be pretty cheap after all, even given that
>>>> you can’t use cache well due to the potential for background state changes
>>>> in the request object data or whatever (I’m not sure how it’s actually
>>>> implemented, which is why I’m asking about this issue on the mailing list).
>>>> 
>>>>> The appropriate question to ask is whether Waitany is implemented
>>>>> optimally or not.
>>>> 
>>>> 
>>>> Well, yes. I kinda hoped that question was heavily implied by my original
>>>> email!
>>>> 
>>>> 
>>>>> If you find that emulating Waitany
>>>>> using Testall following by a loop, then that's useful information.
>>>> 
>>>> I accidentally the whole thing, Jeff! ;)
>>>> 
>>>> But that’s a good idea, thanks - I’ll give it a try and report back!
>>> 
>>> Testall is the wrong semantic here.  I thought it would test them all
>>> individually but it doesn't.  I implemented it anyways and it is the
>>> worst of all.  I attached your test with my modifications.  Because I
>>> am an evil bastard, I made a ton of whitespace changes in addition to
>>> the nontrivial ones.
>>> 
>>> Jeff
>>> 
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> <nb_ring.c>_______________________________________________
>> 
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> 
> 
> 
> -- 
> Jeff Hammond
> jeff.science at gmail.com
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss