[mpich-devel] ROMIO collective i/o memory use

Bob Cernohous bobc at us.ibm.com
Mon Apr 29 10:28:01 CDT 2013


A customer (Argonne ;) is complaining about O(p) allocations in collective 
i/o.  A collective read is failing at larger scale. 

Any thoughts or comments or advice?   There appears to be lots of O(p) in 
ROMIO collective I/O.  Plus a lot of (possibly large) aggregated data 
buffers.  A quick search shows

The common ROMIO read collective code:

Find all "ADIOI_Malloc", Match case, Regular expression (UNIX)

File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\common\ad_read_coll.c

  124 38:       st_offsets = (ADIO_Offset *) 
ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));

  125 39:       end_offsets = (ADIO_Offset *) 
ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));

  317 44:       *offset_list_ptr = (ADIO_Offset *) 
ADIOI_Malloc(2*sizeof(ADIO_Offset));

  318 41:       *len_list_ptr = (ADIO_Offset *) 
ADIOI_Malloc(2*sizeof(ADIO_Offset));

  334 44:       *offset_list_ptr = (ADIO_Offset *) 
ADIOI_Malloc(2*sizeof(ADIO_Offset));

  335 41:       *len_list_ptr = (ADIO_Offset *) 
ADIOI_Malloc(2*sizeof(ADIO_Offset));

  436 18: ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset)); 

  437 41:       *len_list_ptr = (ADIO_Offset *) 
ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));

  573 37:    if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);

  578 21:    count = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  587 25:    send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  590 25:    recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  598 25:    start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));

  739 32:           tmp_buf = (char *) ADIOI_Malloc(for_next_iter);

  744 33:           read_buf = (char *) 
ADIOI_Malloc(for_next_iter+coll_bufsize);

  805 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));

  827 30:       recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char*));

  830 44:                                  (char *) 
ADIOI_Malloc(recv_size[i]);

  870 31:    statuses = (MPI_Status *) 
ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \

  988 35:    curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

  989 35:    done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

  990 35:    recv_buf_idx   = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

Total found: 22


Our BG version of read collective:

File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\ad_bg\ad_bg_rdcoll.c

  179 40:       st_offsets   = (ADIO_Offset *) 
ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));

  180 40:       end_offsets  = (ADIO_Offset *) 
ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));

  183 43:           bg_offsets0 = (ADIO_Offset *) 
ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));

  184 43:           bg_offsets  = (ADIO_Offset *) 
ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));

  475 37:    if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);

  480 21:    count = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  489 25:    send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  492 25:    recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));

  500 25:    start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));

  676 32:           tmp_buf = (char *) ADIOI_Malloc(for_next_iter);

  681 33:           read_buf = (char *) 
ADIOI_Malloc(for_next_iter+coll_bufsize);

  761 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));

  783 30:       recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char*));

  786 44:                                  (char *) 
ADIOI_Malloc(recv_size[i]);

  826 31:    statuses = (MPI_Status *) 
ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \

  944 35:    curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

  945 35:    done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

  946 35:    recv_buf_idx   = (unsigned *) ADIOI_Malloc(nprocs * 
sizeof(unsigned));

  1058 23:    rdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );

  1063 29:    all_recv_buf = (char *) ADIOI_Malloc( rtail );

  1064 26:    recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char *));

  1068 23:    sdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );

  1073 29:    all_send_buf = (char *) ADIOI_Malloc( stail );

Total found: 23


Bob Cernohous:  (T/L 553) 507-253-6093

BobC at us.ibm.com
IBM Rochester, Building 030-2(C335), Department 61L
3605 Hwy 52 North, Rochester,  MN 55901-7829

> Chaos reigns within.
> Reflect, repent, and reboot.
> Order shall return.
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130429/16833c1b/attachment.html>


More information about the devel mailing list