[mpich-discuss] Persistent Communication using MPI_SEND_INIT, MPI_RECV_INIT etc.

Tue Mar 26 14:44:06 CDT 2013

Hi Tim,

I didn't mean for you to give me a 1-paragraph summary of the
communication model.  My first guess is a neighborhood collective (in
MPI-3), which Jeff already suggested, but you'll need to look through
the capabilities of the different MPI operations and decide.

With respect to MPI implementations, I guess most MPICH derivatives
(including MVAPICH, Cray MPI, IBM MPI, Intel MPI, etc.) should have
MPI-3 capability this year or early next year, though we don't have an
official date from most of them.  I believe the latest release of
MVAPICH already has MPI-3 support.

 -- Pavan

On 03/26/2013 01:46 PM US Central Time, Timothy Stitt wrote:
> Pavan...thanks for the comments. 
> 
> Our finite element simulation decomposes the problem domain into a 2D grid of sub-domains with a 1-1 mapping between compute cores and sub-domains. In each sub-domain we apply a ghost zone between direct neighbors and communicate the relevant data between neighbors during each timestep.  
> 
> In general we use MVAPICH2 on the various systems we have access to. When using a Cray we use their tuned MPI library which I believe is derived from MPICH2. If we primarily are MPICH-based (and with a basic idea of our communication pattern as given above) would you recommend looking into some other MPI(-3) functionality to achieve better scalability and performance?
> 
> Cheers,
> 
> Tim.
>    
> On Mar 26, 2013, at 12:59 PM, Pavan Balaji <balaji at mcs.anl.gov> wrote:
> 
>> Tim,
>>
>> It's hard for us to decide which MPI functionality is better for you
>> without knowing your algorithmic model.  For some algorithms persistent
>> send/recv is great.  For some, RMA is great.  Of course, each one has
>> its shortcomings.  Please don't jump to one or the other based on a
>> short email discussion :-).
>>
>> Also note that some MPI implementations do optimize persistent
>> communication, particularly for memory registration.  So even if you
>> don't see a benefit on some platforms, it doesn't mean that other MPI
>> implementations cannot take advantage of it.
>>
>> -- Pavan
>>
>> On 03/26/2013 11:36 AM US Central Time, Timothy Stitt wrote:
>>> Thanks for the quick reply Jeff. That information is valuable. I'll
>>> follow up on your pointers.
>>>
>>> Much appreciated,
>>>
>>> Tim.
>>>
>>> On Mar 26, 2013, at 12:32 PM, Jeff Hammond <jhammond at alcf.anl.gov
>>> <mailto:jhammond at alcf.anl.gov>> wrote:
>>>
>>>> You might want to look at neighborhood collectives, which are
>>>> discussed in Chapter 7 of MPI-3.  This is a new feature so it may not
>>>> be implemented in all MPI implementations, but MPICH supports it.  I
>>>> guess MVAPICH will support it soon enough if not already.
>>>>
>>>> When persistent MPI send/recv is discussed at the MPI Forum, it is
>>>> often described as an inadequate solution because it does not specify
>>>> a full channel and thus some important optimizations, e.g. for RDMA,
>>>> may not be feasible.
>>>>
>>>> If you can use MPI-3 RMA, that is probably going to be a good idea,
>>>> although high-quality support for RMA varies.  MPICH-derived
>>>> implementations usually do a good job though.
>>>>
>>>> Best,
>>>>
>>>> Jeff
>>>>
>>>> On Tue, Mar 26, 2013 at 11:25 AM, Timothy Stitt
>>>> <Timothy.Stitt.9 at nd.edu <mailto:Timothy.Stitt.9 at nd.edu>> wrote:
>>>>> Hi all,
>>>>>
>>>>> I've been asking this question around various MPI boards to try and
>>>>> get a consensus before I decide to rewrite some MPI code. I am
>>>>> grateful for any advice that you can give.
>>>>>
>>>>> I've inherited a MPI code that was written ~8-10 years ago and it
>>>>> predominately uses MPI persistent communication routines for data
>>>>> transfers e.g. MPI_SEND_INIT, MPI_RECV_INIT, MPI_START etc.  (which I
>>>>> am not familiar with and don't normally hear much discussion about).
>>>>> I was just wondering if using persistent communication calls is still
>>>>> regarded as the most efficient/scalable way to perform communication
>>>>> when the communication pattern is known and fixed amongst
>>>>> neighborhood processes? We regularly run the code across an IB
>>>>> network so would there be a benefit to rewrite the code using another
>>>>> approach (e.g. MPI one-sided communication) or should I leave it as
>>>>> it is? The code currently scales up to 10K cores and I want to push
>>>>> it even further and thus was wondering if there is any benefit in
>>>>> tinkering with this persistent MPI communication approach?
>>>>>
>>>>> Thanks in advance for any advice.
>>>>>
>>>>> Tim.
>>>>>
>>>>> _______________________________________________
>>>>> discuss mailing list     discuss at mpich.org <mailto:discuss at mpich.org>
>>>>> To manage subscription options or unsubscribe:
>>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>>
>>>>
>>>>
>>>> -- 
>>>> Jeff Hammond
>>>> Argonne Leadership Computing Facility
>>>> University of Chicago Computation Institute
>>>> jhammond at alcf.anl.gov <mailto:jhammond at alcf.anl.gov> / (630) 252-5381
>>>> http://www.linkedin.com/in/jeffhammond
>>>> https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>
>> -- 
>> Pavan Balaji
>> http://www.mcs.anl.gov/~balaji
> 

-- 
Pavan Balaji
http://www.mcs.anl.gov/~balaji