[mpich-devel] is sublinear memory usage in MPIR_Group_create (and related ops) possible?

Dave Goodell goodell at mcs.anl.gov
Sat Feb 23 10:31:57 CST 2013


On Feb 22, 2013, at 5:47 PM CST, Jeff Hammond wrote:

> I see this error on BGQ at scale (nproc=524288).
[…]
> MPIR_Group_create(83)...: Unable to allocate 8388608 bytes of memory
> for newgroup->lrank_to_lpid (probably out of memory)
[…]
> I haven't thought about it very deeply yet, but is there a way to
> implement group operations without using O(p) memory?

That depends on the creation/access patterns you want to support and the number of hoops that you want to jump through.  Jesper published a very reasonable scheme a few years ago for implementing these efficiently in both time and space:

Google Scholar page: http://scholar.google.com/scholar?cluster=13915175648622039304&hl=en&as_sdt=1,14

Slides: https://fs.hlrs.de/projects/eurompi2010/TALKS/TUESDAY_MORNING_TRACK1/CompactMPI2010_jesper_larsson_traeff.pdf

I can't quickly find an electronic copy to which I actually have access, though I think I have one in dead-tree form in my office.


The MPICH group code is pretty horrible right now.  But it's not a heavily used piece of the implementation, so it's usually been hard to justify improving/fixing it.

-Dave



More information about the devel mailing list