[mpich-discuss] Persistent Communication using MPI_SEND_INIT, MPI_RECV_INIT etc.

Tue Mar 26 13:46:13 CDT 2013

Pavan...thanks for the comments. 

Our finite element simulation decomposes the problem domain into a 2D grid of sub-domains with a 1-1 mapping between compute cores and sub-domains. In each sub-domain we apply a ghost zone between direct neighbors and communicate the relevant data between neighbors during each timestep.  

In general we use MVAPICH2 on the various systems we have access to. When using a Cray we use their tuned MPI library which I believe is derived from MPICH2. If we primarily are MPICH-based (and with a basic idea of our communication pattern as given above) would you recommend looking into some other MPI(-3) functionality to achieve better scalability and performance?

Cheers,

Tim.

On Mar 26, 2013, at 12:59 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:

> Tim,
> 
> It's hard for us to decide which MPI functionality is better for you
> without knowing your algorithmic model.  For some algorithms persistent
> send/recv is great.  For some, RMA is great.  Of course, each one has
> its shortcomings.  Please don't jump to one or the other based on a
> short email discussion :-).
> 
> Also note that some MPI implementations do optimize persistent
> communication, particularly for memory registration.  So even if you
> don't see a benefit on some platforms, it doesn't mean that other MPI
> implementations cannot take advantage of it.
> 
> -- Pavan
> 
> On 03/26/2013 11:36 AM US Central Time, Timothy Stitt wrote:
>> Thanks for the quick reply Jeff. That information is valuable. I'll
>> follow up on your pointers.
>> 
>> Much appreciated,
>> 
>> Tim.
>> 
>> On Mar 26, 2013, at 12:32 PM, Jeff Hammond <jhammond at alcf.anl.gov
>> <mailto:jhammond at alcf.anl.gov>> wrote:
>> 
>>> You might want to look at neighborhood collectives, which are
>>> discussed in Chapter 7 of MPI-3.  This is a new feature so it may not
>>> be implemented in all MPI implementations, but MPICH supports it.  I
>>> guess MVAPICH will support it soon enough if not already.
>>> 
>>> When persistent MPI send/recv is discussed at the MPI Forum, it is
>>> often described as an inadequate solution because it does not specify
>>> a full channel and thus some important optimizations, e.g. for RDMA,
>>> may not be feasible.
>>> 
>>> If you can use MPI-3 RMA, that is probably going to be a good idea,
>>> although high-quality support for RMA varies.  MPICH-derived
>>> implementations usually do a good job though.
>>> 
>>> Best,
>>> 
>>> Jeff
>>> 
>>> On Tue, Mar 26, 2013 at 11:25 AM, Timothy Stitt
>>> <Timothy.Stitt.9 at nd.edu <mailto:Timothy.Stitt.9 at nd.edu>> wrote:
>>>> Hi all,
>>>> 
>>>> I've been asking this question around various MPI boards to try and
>>>> get a consensus before I decide to rewrite some MPI code. I am
>>>> grateful for any advice that you can give.
>>>> 
>>>> I've inherited a MPI code that was written ~8-10 years ago and it
>>>> predominately uses MPI persistent communication routines for data
>>>> transfers e.g. MPI_SEND_INIT, MPI_RECV_INIT, MPI_START etc.  (which I
>>>> am not familiar with and don't normally hear much discussion about).
>>>> I was just wondering if using persistent communication calls is still
>>>> regarded as the most efficient/scalable way to perform communication
>>>> when the communication pattern is known and fixed amongst
>>>> neighborhood processes? We regularly run the code across an IB
>>>> network so would there be a benefit to rewrite the code using another
>>>> approach (e.g. MPI one-sided communication) or should I leave it as
>>>> it is? The code currently scales up to 10K cores and I want to push
>>>> it even further and thus was wondering if there is any benefit in
>>>> tinkering with this persistent MPI communication approach?
>>>> 
>>>> Thanks in advance for any advice.
>>>> 
>>>> Tim.
>>>> 
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org>
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>> 
>>> 
>>> 
>>> -- 
>>> Jeff Hammond
>>> Argonne Leadership Computing Facility
>>> University of Chicago Computation Institute
>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>> http://www.linkedin.com/in/jeffhammond
>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
>> 
>> 
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>> 
> 
> -- 
> Pavan Balaji
> http://www.mcs.anl.gov/~balaji