[mpich-discuss] Memory alignment with MPI_Alloc_mem

Thu Feb 19 01:40:34 CST 2015

On Wed, Feb 18, 2015 at 9:48 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
>
> I have a hard time imagining that Cray doesn't do what's necessary to
> ensure proper utilization of their network with the system software
> they package.

Oh, I am sure that they do something. However, I know that my large
heap will be the source of buffers used with MPI, so, in theory, I
could cut down on overhead by just preregistering it once and for all
rather than in small chunks. Probably with hugepages the overhead gets
close to 0 though.

> The only place MPI does atomics on user buffers is in MPI-3 RMA, and
> CrayMPI uses the software implementation in Ch3 by default (they have
> a DMAPP implementation as an option, but I don't know the details).
>
> The easiest way to deal with the NIC alignment requirements is for
> Cray to patch glibc so that malloc always returns an 8-byte aligned
> address if it doesn't already.  And stack alignment is always to basic
> datatype granularity, so the only case where it could be an issue is
> for atomics on 32b types on the stack, no?

Actually, I need a huge alignment, as in an address divisible by 2^32.
This is due to some implementation requirements, but I never have a
problem getting such address with mmap.

Thank you,
Marcin

> Best,
>
> Jeff
>
> On Mon, Feb 16, 2015 at 4:53 AM, Atchley, Scott <atchleyes at ornl.gov> wrote:
> > Jeff,
> >
> > I expect that he is concerned with GNI's 4-byte alignment requirement for both the address and length for RDMA Reads and the 8-byte alignment for atomics.
> >
> > Scott
> >
> > On Feb 16, 2015, at 1:42 AM, Jeff Hammond <jeff.science at gmail.com> wrote:
> >
> >> If you are going to suballocate from a slab yourself, you can handle
> >> alignment yourself easy enough, no?  Do I not understand what you mean
> >> here.  And what sort of alignment do you want?  Are you trying to
> >> align to 32/64 bytes because of AVX or some other x86 feature on Cray
> >> XC30 or do you want page alignment?
> >>
> >> But what do you really want to achieve?  While it is usually
> >> beneficial to use pre-registered buffers on RDMA networks, good MPI
> >> implementations have a page-registration cache.  If, as you say, you
> >> are suballocating from a slab, Cray MPI should have the backing pages
> >> in the registration cache after you use them as MPI buffers.
> >>
> >> You can maximize the efficiency of the page registration cache by
> >> using large pages.  Search for intro_hugepages using 'man' or on the
> >> Internet to learn the specifics of this.  I suspect that using large
> >> pages will induce much of the benefit you hoped to achieve with an
> >> explicitly-registering MPI_Alloc_mem.
> >>
> >> If you really want to max out RDMA on Cray networks, you need to use
> >> DMAPP.  I have some simple examples and pointers to docs here:
> >> https://github.com/jeffhammond/HPCInfo/tree/master/dmapp.  I have more
> >> examples other places that I'll migrate to that location if requested.
> >>
> >> If you're interested in portability, MPI-3 RMA is a good abstraction
> >> for RDMA networks.  Some implementations do a better job than others
> >> at exposing this relationship.  Cray MPI has a DMAPP back-end for RMA
> >> now, although it is not active by default.  You could also try Torsten
> >> Hoefler's foMPI
> >> [http://spcl.inf.ethz.ch/Research/Parallel_Programming/foMPI/].
> >>
> >> Best,
> >>
> >> Jeff
> >>
> >> On Sat, Feb 14, 2015 at 2:37 PM, Marcin Zalewski
> >> <marcin.zalewski at gmail.com> wrote:
> >>> I am using Cray MPT, and I would like to allocate a large region of
> >>> memory from which, in turn, I will allocate buffers to be used with
> >>> MPI. I am wondering if there is any benefit from allocating that heap
> >>> with MPI_Alloc_mem. I would hope that it could be pre-registered for
> >>> RDMA, speeding things up. However, I need this memory to have a
> >>> specific alignment. Is there a general way in MPICH or maybe a
> >>> specific way for MPT to request alignment with MPI_Alloc_mem?
> >>>
> >>> Thanks,
> >>> Marcin
> >>> _______________________________________________
> >>> discuss mailing list     discuss at mpich.org
> >>> To manage subscription options or unsubscribe:
> >>> https://lists.mpich.org/mailman/listinfo/discuss
> >>
> >>
> >>
> >> --
> >> Jeff Hammond
> >> jeff.science at gmail.com
> >> http://jeffhammond.github.io/
> >> _______________________________________________
> >> discuss mailing list     discuss at mpich.org
> >> To manage subscription options or unsubscribe:
> >> https://lists.mpich.org/mailman/listinfo/discuss
> >
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
>
>
>
> --
> Jeff Hammond
> jeff.science at gmail.com
> http://jeffhammond.github.io/
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss