[mpich-devel] is sublinear memory usage in MPIR_Group_create (and related ops) possible?
Jeff Hammond
jhammond at alcf.anl.gov
Fri Feb 22 17:47:29 CST 2013
I see this error on BGQ at scale (nproc=524288).
Abort(1) on node 492082 (rank 492082 in comm 1140850688): Fatal error
in PMPI_Comm_group: Other MPI error, error stack:
PMPI_Comm_group(174)....: MPI_Comm_group(MPI_COMM_WORLD,
group=0x1fffffb9b4) failed
MPIR_Comm_group_impl(45):
MPIR_Group_create(83)...: Unable to allocate 8388608 bytes of memory
for newgroup->lrank_to_lpid (probably out of memory)
The culprit appears to be
(*new_group_ptr)->lrank_to_lpid =
(MPID_Group_pmap_t *)MPIU_Malloc( nproc * sizeof(MPID_Group_pmap_t) );
and
typedef struct MPID_Group_pmap_t {
int lrank; /* Local rank in group (between 0 and size-1) */
int lpid; /* local process id, from VCONN */
int next_lpid; /* Index of next lpid (in lpid order) */
int flag; /* marker, used to implement group operations */
} MPID_Group_pmap_t;
I haven't thought about it very deeply yet, but is there a way to
implement group operations without using O(p) memory?
Thanks,
Jeff
--
Jeff Hammond
Argonne Leadership Computing Facility
University of Chicago Computation Institute
jhammond at alcf.anl.gov / (630) 252-5381
http://www.linkedin.com/in/jeffhammond
https://wiki.alcf.anl.gov/parts/index.php/User:Jhammond
More information about the devel
mailing list