[mpich-discuss] Memory alignment with MPI_Alloc_mem

Jeff Hammond jeff.science at gmail.com
Wed Feb 18 09:01:23 CST 2015


On Mon, Feb 16, 2015 at 6:23 AM, Marcin Zalewski
<marcin.zalewski at gmail.com> wrote:
> Thanks for your answer Jeff.
>
> On Mon, Feb 16, 2015 at 1:42 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
>> If you are going to suballocate from a slab yourself, you can handle
>> alignment yourself easy enough, no?  Do I not understand what you mean
>> here.  And what sort of alignment do you want?  Are you trying to
>> align to 32/64 bytes because of AVX or some other x86 feature on Cray
>> XC30 or do you want page alignment?
>
> We have an implementation of a global heap using a custom memory
> allocator that depends on the heap being aligned to the maximum size
> of a block, which we want to be 32 bits where possible. We are able to
> achieve this with mmap by trying to allocate at addresses that are
> right for our heap. So, trying to allocate random memory and align to
> 32 bits may be difficult.

I would recommend posix_memalign and then let the CrayMPI registration
cache do the rest.

Best,

Jeff

>> But what do you really want to achieve?  While it is usually
>> beneficial to use pre-registered buffers on RDMA networks, good MPI
>> implementations have a page-registration cache.  If, as you say, you
>> are suballocating from a slab, Cray MPI should have the backing pages
>> in the registration cache after you use them as MPI buffers.
>
> That's true, I did not think of registration cache. As you say below,
> if I use huge pages for our heap, there should not be a problem with
> the whole heap being cached. So probably the complexity of dealing
> with MPI_Alloc_mem is not worth it at all.
>
>> You can maximize the efficiency of the page registration cache by
>> using large pages.  Search for intro_hugepages using 'man' or on the
>> Internet to learn the specifics of this.  I suspect that using large
>> pages will induce much of the benefit you hoped to achieve with an
>> explicitly-registering MPI_Alloc_mem.
>>
>> If you really want to max out RDMA on Cray networks, you need to use
>> DMAPP.  I have some simple examples and pointers to docs here:
>> https://github.com/jeffhammond/HPCInfo/tree/master/dmapp.  I have more
>> examples other places that I'll migrate to that location if requested.
>>
>> If you're interested in portability, MPI-3 RMA is a good abstraction
>> for RDMA networks.  Some implementations do a better job than others
>> at exposing this relationship.  Cray MPI has a DMAPP back-end for RMA
>> now, although it is not active by default.  You could also try Torsten
>> Hoefler's foMPI
>> [http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI/].
>>
>> Best,
>>
>> Jeff
>>
>> On Sat, Feb 14, 2015 at 2:37 PM, Marcin Zalewski
>> <marcin.zalewski at gmail.com> wrote:
>>> I am using Cray MPT, and I would like to allocate a large region of
>>> memory from which, in turn, I will allocate buffers to be used with
>>> MPI. I am wondering if there is any benefit from allocating that heap
>>> with MPI_Alloc_mem. I would hope that it could be pre-registered for
>>> RDMA, speeding things up. However, I need this memory to have a
>>> specific alignment. Is there a general way in MPICH or maybe a
>>> specific way for MPT to request alignment with MPI_Alloc_mem?
>>>
>>> Thanks,
>>> Marcin
>>> _______________________________________________
>>> discuss mailing list     discuss at mpich.org
>>> To manage subscription options or unsubscribe:
>>> https://lists.mpich.org/mailman/listinfo/discuss
>>
>>
>>
>> --
>> Jeff Hammond
>> jeff.science at gmail.com
>> http://jeffhammond.github.io/
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss



-- 
Jeff Hammond
jeff.science at gmail.com
http://jeffhammond.github.io/
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list