[mpich-discuss] Memory alignment with MPI_Alloc_mem

Marcin Zalewski marcin.zalewski at gmail.com
Mon Feb 16 08:23:34 CST 2015


Thanks for your answer Jeff.

On Mon, Feb 16, 2015 at 1:42 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
> If you are going to suballocate from a slab yourself, you can handle
> alignment yourself easy enough, no?  Do I not understand what you mean
> here.  And what sort of alignment do you want?  Are you trying to
> align to 32/64 bytes because of AVX or some other x86 feature on Cray
> XC30 or do you want page alignment?

We have an implementation of a global heap using a custom memory
allocator that depends on the heap being aligned to the maximum size
of a block, which we want to be 32 bits where possible. We are able to
achieve this with mmap by trying to allocate at addresses that are
right for our heap. So, trying to allocate random memory and align to
32 bits may be difficult.

> But what do you really want to achieve?  While it is usually
> beneficial to use pre-registered buffers on RDMA networks, good MPI
> implementations have a page-registration cache.  If, as you say, you
> are suballocating from a slab, Cray MPI should have the backing pages
> in the registration cache after you use them as MPI buffers.

That's true, I did not think of registration cache. As you say below,
if I use huge pages for our heap, there should not be a problem with
the whole heap being cached. So probably the complexity of dealing
with MPI_Alloc_mem is not worth it at all.

> You can maximize the efficiency of the page registration cache by
> using large pages.  Search for intro_hugepages using 'man' or on the
> Internet to learn the specifics of this.  I suspect that using large
> pages will induce much of the benefit you hoped to achieve with an
> explicitly-registering MPI_Alloc_mem.
>
> If you really want to max out RDMA on Cray networks, you need to use
> DMAPP.  I have some simple examples and pointers to docs here:
> https://github.com/jeffhammond/HPCInfo/tree/master/dmapp.  I have more
> examples other places that I'll migrate to that location if requested.
>
> If you're interested in portability, MPI-3 RMA is a good abstraction
> for RDMA networks.  Some implementations do a better job than others
> at exposing this relationship.  Cray MPI has a DMAPP back-end for RMA
> now, although it is not active by default.  You could also try Torsten
> Hoefler's foMPI
> [http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI/].
>
> Best,
>
> Jeff
>
> On Sat, Feb 14, 2015 at 2:37 PM, Marcin Zalewski
> <marcin.zalewski at gmail.com> wrote:
>> I am using Cray MPT, and I would like to allocate a large region of
>> memory from which, in turn, I will allocate buffers to be used with
>> MPI. I am wondering if there is any benefit from allocating that heap
>> with MPI_Alloc_mem. I would hope that it could be pre-registered for
>> RDMA, speeding things up. However, I need this memory to have a
>> specific alignment. Is there a general way in MPICH or maybe a
>> specific way for MPT to request alignment with MPI_Alloc_mem?
>>
>> Thanks,
>> Marcin
>> _______________________________________________
>> discuss mailing list     discuss at mpich.org
>> To manage subscription options or unsubscribe:
>> https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list