[mpich-discuss] Memory alignment with MPI_Alloc_mem

Thu Feb 19 01:41:48 CST 2015

On Wed, Feb 18, 2015 at 10:01 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
> On Mon, Feb 16, 2015 at 6:23 AM, Marcin Zalewski
> <marcin.zalewski at gmail.com> wrote:
>> Thanks for your answer Jeff.
>>
>> On Mon, Feb 16, 2015 at 1:42 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
>>> If you are going to suballocate from a slab yourself, you can handle
>>> alignment yourself easy enough, no?  Do I not understand what you mean
>>> here.  And what sort of alignment do you want?  Are you trying to
>>> align to 32/64 bytes because of AVX or some other x86 feature on Cray
>>> XC30 or do you want page alignment?
>>
>> We have an implementation of a global heap using a custom memory
>> allocator that depends on the heap being aligned to the maximum size
>> of a block, which we want to be 32 bits where possible. We are able to
>> achieve this with mmap by trying to allocate at addresses that are
>> right for our heap. So, trying to allocate random memory and align to
>> 32 bits may be difficult.
>
> I would recommend posix_memalign and then let the CrayMPI registration
> cache do the rest.

Yes, I think I will go with that. I just wanted to experiment with
MPI_Mem_alloc to empirically verify that it does not make (much)
difference, but getting the address I need may be just too much
trouble.

> Best,
>
> Jeff
>
>>> But what do you really want to achieve?  While it is usually
>>> beneficial to use pre-registered buffers on RDMA networks, good MPI
>>> implementations have a page-registration cache.  If, as you say, you
>>> are suballocating from a slab, Cray MPI should have the backing pages
>>> in the registration cache after you use them as MPI buffers.
>>
>> That's true, I did not think of registration cache. As you say below,
>> if I use huge pages for our heap, there should not be a problem with
>> the whole heap being cached. So probably the complexity of dealing
>> with MPI_Alloc_mem is not worth it at all.
>>
>>> You can maximize the efficiency of the page registration cache by
>>> using large pages.  Search for intro_hugepages using 'man' or on the
>>> Internet to learn the specifics of this.  I suspect that using large
>>> pages will induce much of the benefit you hoped to achieve with an
>>> explicitly-registering MPI_Alloc_mem.
>>>
>>> If you really want to max out RDMA on Cray networks, you need to use
>>> DMAPP.  I have some simple examples and pointers to docs here:
>>> https://github.com/jeffhammond/HPCInfo/tree/master/dmapp.  I have more
>>> examples other places that I'll migrate to that location if requested.
>>>
>>> If you're interested in portability, MPI-3 RMA is a good abstraction
>>> for RDMA networks.  Some implementations do a better job than others
>>> at exposing this relationship.  Cray MPI has a DMAPP back-end for RMA
>>> now, although it is not active by default.  You could also try Torsten
>>> Hoefler's foMPI
>>> [http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI/].
>>>
>>> Best,
>>>
>>> Jeff
>>>
>>> On Sat, Feb 14, 2015 at 2:37 PM, Marcin Zalewski
>>> <marcin.zalewski at gmail.com> wrote:
>>>> I am using Cray MPT, and I would like to allocate a large region of
>>>> memory from which, in turn, I will allocate buffers to be used with
>>>> MPI. I am wondering if there is any benefit from allocating that heap
>>>> with MPI_Alloc_mem. I would hope that it could be pre-registered for
>>>> RDMA, speeding things up. However, I need this memory to have a
>>>> specific alignment. Is there a general way in MPICH or maybe a
>>>> specific way for MPT to request alignment with MPI_Alloc_mem?
>>>>
>>>> Thanks,
>>>> Marcin
>>>> _______________________________________________
>>>> discuss mailing list     discuss at mpich.org
>>>> To manage subscription options or unsubscribe:
>>>> https://lists.mpich.org/mailman/listinfo/discuss
>>>
>>>
>>>
>>> --
>>> Jeff Hammond
>>> jeff.science at gmail.com
>>> http://jeffhammond.github.io/
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss