[mpich-discuss] Questions about non-blocking collective calls...

Eric Chamberland Eric.Chamberland at giref.ulaval.ca
Thu Oct 22 12:29:30 CDT 2015


Hi Pavan and Jeff,

Thanks a lot for your answers.

I feel there is something to tell about mpi standard expectations...

Why does the knowledge (or ignorance) of MPI users about scalability of 
communications in MPI should be a requirement for them for developing 
high-performance MPI applications?  Especially when we want to keep it 
simple by using high-level MPI functionalities, like MPI_gather, and let 
the library do the best work it can...

In other words, why the complexity (then scalability) of MPI algorithms 
is not guaranteed by the standard?  If every one that is familiar with 
performance of MPI communication have the burden to write MPI calls in a 
way that it is scaling/performing well, isn't all of us rewriting 
essentially the same "good" code that should be in the standard?

Is the standard voluntarily blind to these (crucial) questions?

I may be too naive too... tell me! :)

(ie, the c++ standard guarantee the sort algorithm complexity: 
https://en.wikipedia.org/wiki/Sort_%28C%2B%2B%29)

On 21/10/15 10:03 PM, Balaji, Pavan wrote:
 > You might want to join the collectives working group and voice your 
opinion over there.

Ok, where exactly do I do this?

Btw, I don't want to blame anybody... I am just learning and discussing 
here!!! :)

Thanks for reading!

Eric

On 21/10/15 11:56 PM, Jeff Hammond wrote:
> Depending on the size of your data, you could pipeline a series of
> MPI_Igather calls and process all of the data associated with the
> partial buffer.  Of course, this will change the layout of the buffer at
> the root unless you do something interesting with datatypes (e.g. struct
> with offset).  This may or may not matter, if you are going to process
> it anyways.
>
> In general, I think you may be able to do just fine with rolling your
> own.  It's a myth that using higher-level functionality in MPI is
> _always_ better.
>
> Jeff
>
> On Wed, Oct 21, 2015 at 7:03 PM, Balaji, Pavan <balaji at anl.gov
> <mailto:balaji at anl.gov>> wrote:
>
>     Eric,
>
>     The concept of partial completion of collectives did come up in the
>     Forum, but the Forum decided that it was rather unnatural to define
>     Iallgather/Igather that way.  So we decided to standardize it the
>     way it is.
>
>     However, there is a separate proposal for streaming collectives,
>     which is more along the lines of what you are thinking of.  That's
>     obviously not in MPI-3, but might be considered for a future MPI.
>     You might want to join the collectives working group and voice your
>     opinion over there.
>
>     With respect to writing your own igather implementation, as long as
>     your implementation is logarithmic, it won't be too bad.  However, a
>     native implementation inside MPI would almost certainly do better
>     because: (1) it can take advantage of platform-specific features to
>     improve performance, and (2) if the platform doesn't give anything
>     special, it'll anyway do exactly what you are doing above MPI.
>
>     So, apart from any performance bugs that the implementation might
>     have, using MPI Igather would be the recommended mechanism for the
>     best performance.
>
>        -- Pavan
>
>
>
>
>
>     On 10/21/15, 2:45 PM, "Eric Chamberland"
>     <Eric.Chamberland at giref.ulaval.ca
>     <mailto:Eric.Chamberland at giref.ulaval.ca>> wrote:
>
>      >Hi,
>      >
>      >A long time ago (in 2002) we programmed here a non-blocking
>     MPI_Igather
>      >with equivalent calls to MPI_Isend/MPI_Irecv (see the 2 attached
>     files).
>      >
>      >A very convenient advantage of this version, is that I can do some
>     work
>      >on the root process as soon as it start receiving data...  Then,
>     it wait
>      >for the next communication to terminate and process the received data.
>      >
>      >Now, I am looking at MPI_Igather (and all non-blocking collective MPI
>      >calls), and I am somewhat surprised (or ignorant) that I cannot
>     have the
>      >root rank receive some data, then process it, then wait until the next
>      >"MPI_irecv" terminate...
>      >
>      >In other words, a MPI_Igather generate only 1 MPI_Request but I would
>      >like to have either "p" (with p = size of communicator) MPI_Request
>      >generated or be able to call "p" times MPI_WaitAny with the same
>      >MPI_Request...  Am I normal? :)
>      >
>      >So my 3 questions are:
>      >
>      >#1- Is there a way to use MPI_Igather with MPI_WaitAny (or something
>      >else?) to process data as it is received?
>      >
>      >#2- Big question: will our implementation with MPI_Isend/MPI_Irecv
>     scale
>      >to a large number of processes?  What are the possible drawbacks of
>      >doing it like we did?
>      >
>      >#3- Why should I replace our implementation by the native MPI_Igather?
>      >
>      >Thanks!
>      >
>      >Eric
>     _______________________________________________
>     discuss mailing list discuss at mpich.org <mailto:discuss at mpich.org>
>     To manage subscription options or unsubscribe:
>     https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com <mailto:jeff.science at gmail.com>
> http://jeffhammond.github.io/

_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list