[mpich-devel] ROMIO collective i/o memory use

Bob Cernohous bobc at us.ibm.com
Mon May 6 13:41:07 CDT 2013


I agree and suggested:
---------------------
It appears they don't have enough memory for an alltoallv exchange.   Try 
'1'...

 * - BGMPIO_COMM - Define how data is exchanged on collective

 *   reads and writes.  Possible values:

 *   - 0 - Use MPI_Alltoallv.

 *   - 1 - Use MPI_Isend/MPI_Irecv.

 *   - Default is 0.

---------------------

but they didn't want a work around they wanted a 'fix for o(p) 
allocations'.  There are o(p) allocations all over collective i/o from a 
quick glance.   Just wanted some input from the experts about scaling 
romio.   I haven't heard if the suggestion worked.

Bob Cernohous:  (T/L 553) 507-253-6093

BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester,  MN 55901-7829

> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.


devel-bounces at mpich.org wrote on 05/04/2013 09:43:10 PM:

> From: "Rob Latham" <robl at mcs.anl.gov>
> To: devel at mpich.org, 
> Cc: mpich2-dev at mcs.anl.gov
> Date: 05/04/2013 09:48 PM
> Subject: Re: [mpich-devel] ROMIO collective i/o memory use
> Sent by: devel-bounces at mpich.org
> 
> On Mon, Apr 29, 2013 at 10:28:01AM -0500, Bob Cernohous wrote:
> > A customer (Argonne ;) is complaining about O(p) allocations in 
collective 
> > i/o.  A collective read is failing at larger scale. 
> > 
> > Any thoughts or comments or advice?   There appears to be lots of O(p) 
in 
> > ROMIO collective I/O.  Plus a lot of (possibly large) aggregated data 
> > buffers.  A quick search shows
> 
> The O(p) allocations are a concern, sure.  For two-phase, though, the
> real problem lies in ADIOI_R_Exchange_data_alltoallv and
> ADIOI_W_Exchange_data_alltoallv .  The O(p) allocations are the least
> of our worries! 
> 
> around line 1063 of ad_bg_rdcoll.c 
> 
> all_recv_buf = (char *) ADIOI_Malloc( rtail );
> 
> all_send_buf = (char *) ADIOI_Malloc( stail );
> 
> (rtail and stail are the sum of the receive and send arrays)
> 
> ==rob
> 
> > The common ROMIO read collective code:
> > 
> > Find all "ADIOI_Malloc", Match case, Regular expression (UNIX)
> > 
> > File 
Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\common\ad_read_coll.c
> > 
> >   124 38:       st_offsets = (ADIO_Offset *) 
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> > 
> >   125 39:       end_offsets = (ADIO_Offset *) 
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> > 
> >   317 44:       *offset_list_ptr = (ADIO_Offset *) 
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));
> > 
> >   318 41:       *len_list_ptr = (ADIO_Offset *) 
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));
> > 
> >   334 44:       *offset_list_ptr = (ADIO_Offset *) 
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));
> > 
> >   335 41:       *len_list_ptr = (ADIO_Offset *) 
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));
> > 
> >   436 18: ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset)); 
> > 
> >   437 41:       *len_list_ptr = (ADIO_Offset *) 
> > ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));
> > 
> >   573 37:    if (ntimes) read_buf = (char *) 
ADIOI_Malloc(coll_bufsize);
> > 
> >   578 21:    count = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   587 25:    send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   590 25:    recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   598 25:    start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));
> > 
> >   739 32:           tmp_buf = (char *) ADIOI_Malloc(for_next_iter);
> > 
> >   744 33:           read_buf = (char *) 
> > ADIOI_Malloc(for_next_iter+coll_bufsize);
> > 
> >   805 9: 
ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));
> > 
> >   827 30:       recv_buf = (char **) ADIOI_Malloc(nprocs * 
sizeof(char*));
> > 
> >   830 44:                                  (char *) 
> > ADIOI_Malloc(recv_size[i]);
> > 
> >   870 31:    statuses = (MPI_Status *) 
> > ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \
> > 
> >   988 35:    curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> >   989 35:    done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> >   990 35:    recv_buf_idx   = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> > Total found: 22
> > 
> > 
> > Our BG version of read collective:
> > 
> > File 
Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\ad_bg\ad_bg_rdcoll.c
> > 
> >   179 40:       st_offsets   = (ADIO_Offset *) 
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> > 
> >   180 40:       end_offsets  = (ADIO_Offset *) 
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> > 
> >   183 43:           bg_offsets0 = (ADIO_Offset *) 
> > ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));
> > 
> >   184 43:           bg_offsets  = (ADIO_Offset *) 
> > ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));
> > 
> >   475 37:    if (ntimes) read_buf = (char *) 
ADIOI_Malloc(coll_bufsize);
> > 
> >   480 21:    count = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   489 25:    send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   492 25:    recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> > 
> >   500 25:    start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));
> > 
> >   676 32:           tmp_buf = (char *) ADIOI_Malloc(for_next_iter);
> > 
> >   681 33:           read_buf = (char *) 
> > ADIOI_Malloc(for_next_iter+coll_bufsize);
> > 
> >   761 9: 
ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));
> > 
> >   783 30:       recv_buf = (char **) ADIOI_Malloc(nprocs * 
sizeof(char*));
> > 
> >   786 44:                                  (char *) 
> > ADIOI_Malloc(recv_size[i]);
> > 
> >   826 31:    statuses = (MPI_Status *) 
> > ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \
> > 
> >   944 35:    curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> >   945 35:    done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> >   946 35:    recv_buf_idx   = (unsigned *) ADIOI_Malloc(nprocs * 
> > sizeof(unsigned));
> > 
> >   1058 23:    rdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );
> > 
> >   1063 29:    all_recv_buf = (char *) ADIOI_Malloc( rtail );
> > 
> >   1064 26:    recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char 
*));
> > 
> >   1068 23:    sdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );
> > 
> >   1073 29:    all_send_buf = (char *) ADIOI_Malloc( stail );
> > 
> > Total found: 23
> > 
> > 
> > Bob Cernohous:  (T/L 553) 507-253-6093
> > 
> > BobC at us.ibm.com
> > IBM Rochester, Building 030-2(C335), Department 61L
> > 3605 Hwy 52 North, Rochester,  MN 55901-7829
> > 
> > > Chaos reigns within.
> > > Reflect, repent, and reboot.
> > > Order shall return.
> 
> -- 
> Rob Latham
> Mathematics and Computer Science Division
> Argonne National Lab, IL USA
> 
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130506/64b0f701/attachment-0002.html>


More information about the devel mailing list