[mpich-devel] is sublinear memory usage in MPIR_Group_create (and related ops) possible?

Jeff Hammond jhammond at alcf.anl.gov
Fri Feb 22 17:47:29 CST 2013


I see this error on BGQ at scale (nproc=524288).

Abort(1) on node 492082 (rank 492082 in comm 1140850688): Fatal error
in PMPI_Comm_group: Other MPI error, error stack:
PMPI_Comm_group(174)....: MPI_Comm_group(MPI_COMM_WORLD,
group=0x1fffffb9b4) failed
MPIR_Comm_group_impl(45):
MPIR_Group_create(83)...: Unable to allocate 8388608 bytes of memory
for newgroup->lrank_to_lpid (probably out of memory)

The culprit appears to be

    (*new_group_ptr)->lrank_to_lpid =
    (MPID_Group_pmap_t *)MPIU_Malloc( nproc * sizeof(MPID_Group_pmap_t) );

and

typedef struct MPID_Group_pmap_t {
    int          lrank;     /* Local rank in group (between 0 and size-1) */
    int          lpid;      /* local process id, from VCONN */
    int          next_lpid; /* Index of next lpid (in lpid order) */
    int          flag;      /* marker, used to implement group operations */
} MPID_Group_pmap_t;

I haven't thought about it very deeply yet, but is there a way to
implement group operations without using O(p) memory?

Thanks,

Jeff

-- 
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond


More information about the devel mailing list