[mpich-devel] MPICH memory pools

Jeff Hammond jeff.science at gmail.com
Tue Jan 20 13:05:49 CST 2015


The usage justification for request-based RMA should not be sensitive
to software overhead.  I would expect that users concerned about, e.g.
message-rate, would not use Rput but rather Put.

Jeff

On Tue, Jan 20, 2015 at 9:38 AM, Archer, Charles J
<charles.j.archer at intel.com> wrote:
> There are other inflexibilities with the handle encoding that can lead to unintended behavior, too.
> Although it’s not obvious, the pooling disallows using different type of object use, each handle must map to an object 1-1, which can lead to space inefficiencies.
> This is probably best illustrated by example.
>
> Let’s say you want to allocate a different request object for two sided (MPID_REQUEST kind) and 1 sided communication (MPID_WREQUEST kind).
> Each object has a different memory footprint, so you want the pools to be distinct.
>
> Now, if you call MPI_RPut, under the covers, you call your window object pool to allocate the request (allocating an MPID_WREQUEST object).
> Everything is fine until you call MPI_Wait, the code will call Request_get_ptr macro, which points you to the wrong pool.
> The pool is hardcoded to use MPID_REQUEST into the macro name, so your code will faceplant.
>
> The set of (non-ideal) options I see are:
>   1)  In common code to decode both the handle kind and the object kind (more branches) to get the actual pointer to the request.  Call Request_get_actual_ptr, which checks the object kind and the handle kind.
>   2)  Waste space allocating a union (likely the wrequest and the request will not be the same size, leading to space).
>   3)  Allocate a window request from the window pool, and a signal request from the request pool, and complete signal request on window request completion.  This is both space and time inefficient. pamid does this today.
>
> So you get your choice of #1 (extra branches/cycles) or #2 (wasted space) or #3 (wasted space and extra cycles).
> It’s not obvious which of #1 or #2 is better, but #3 should be the worst option.
>
> It seems there’s no good/easy way to hack in your own fancy allocator (like tcmalloc/jemalloc, whatever) and have the handle encoding point to this allocation directly, so we may be stuck with choice #3 for external MPI objects.
>
>
>
>> On Jan 20, 2015, at 8:47 AM, Dave Goodell (dgoodell) <dgoodell at cisco.com> wrote:
>>
>> The existing mempool stuff isn't particularly time efficient, as I recall.  You might want to benchmark it for your use case against a proper memory allocator like Hoard or tcmalloc and make sure it meets your needs.  It's also not overly space efficient, since IIRC it won't ever return memory to the OS or even to other memory users in the same process.
>>
>> The mempool stuff only really exists for two reasons:
>>
>> 1. So that the "encode the predefined type width in the handle value" optimization can be implemented in MPICH.  IMO this is a pretty questionable on modern processors, but if we were to argue about that we should probably do some benchmarking rather than waving our hands.
>>
>> 2. So that one can implement all handles as integers, which simplifies the implementation of the Fortran bindings and avoids penalizing Fortran codes with one or more handle translation lookups on every MPI call.  The "kind" field of the handle value helps with type checking, which you would otherwise get from the compiler if properly pointers were used as the handle type instead of integers.
>>
>> It seems unlikely that you need either of these features in some subsystem.
>>
>> If I needed some new allocation logic in my netmod/device/whatever, I'd look for something off the shelf first, then roll my own second.  I'd probably stay away from the existing mempool stuff unless there was a killer feature there I'm forgetting about.
>>
>> -Dave
>>
>> On Jan 20, 2015, at 10:06 AM, Archer, Charles J <charles.j.archer at intel.com> wrote:
>>
>>> Hi.
>>>
>>> MPICH has some pretty nice functionality for memory pools implemented, but as far as I can tell, it’s a bit limited for internal device use because each pool you implement needs to consume an entry in the handle space.
>>>
>>> Looking at the available “kinds” of memory pools already implemented:
>>>
>>> typedef enum MPID_Object_kind {
>>> MPID_COMM       = 0x1,
>>> MPID_GROUP      = 0x2,
>>> MPID_DATATYPE   = 0x3,
>>> MPID_FILE       = 0x4, /* only used obliquely inside MPID_Errhandler objs */
>>> MPID_ERRHANDLER = 0x5,
>>> MPID_OP         = 0x6,
>>> MPID_INFO       = 0x7,
>>> MPID_WIN        = 0x8,
>>> MPID_KEYVAL     = 0x9,
>>> MPID_ATTR       = 0xa,
>>> MPID_REQUEST    = 0xb,
>>> MPID_PROCGROUP  = 0xc,               /* These are internal device objects */
>>> MPID_VCONN      = 0xd,
>>> MPID_GREQ_CLASS = 0xf
>>> } MPID_Object_kind;
>>>
>>> It looks like only 0xe is available for implementing a new type of memory pool, limiting me to one additional pool.
>>> Furthermore, the internal device objects don’t need publishable handles, right?
>>> It looks like the handle contains 2 bits for (internal, valid, invalid, direct), and 4 bits to contain the object kind.
>>>
>>> Are there any memory pool routines that I’m missing somewhere that aren’t restricted to the limits of what we can publish in a handle?
>>> Since my object pools are internal, I don’t need to encode anything into a handle.
>>>
>>> Furthermore, if we had a set of non-handle pool routines, internal pools like procgroup, vconn, wouldn’t consume entries in the handle space that could be used for future versions of MPI.
>>>
>>> Looking for some guidance here, I don’t want to publish any internal device gorp into the object_kind space…but I want to use memory pools.
>>> I’ve used the mpich pools on internal objects with a garbage kind value (unintentionally set to the wrong enum value), and it appears I get a new pool and everything is working, but just because it works, doesn’t mean it’s correct.
>>>
>>> What should I do?  Bracing for "implement your own memory pools, lazy”.
>>> _______________________________________________
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/devel
>>
>> _______________________________________________
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/devel
>
> _______________________________________________
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/devel



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/


More information about the devel mailing list