[mpich-devel] ROMIO collective i/o memory use
Kevin Harms
harms at alcf.anl.gov
Mon Apr 29 11:26:18 CDT 2013
Just as a heads up, both Rob L. and Dave G. are out some or all of this week.
kevin
On Apr 29, 2013, at 10:28 AM, Bob Cernohous <bobc at us.ibm.com> wrote:
> A customer (Argonne ;) is complaining about O(p) allocations in collective i/o. A collective read is failing at larger scale.
>
> Any thoughts or comments or advice? There appears to be lots of O(p) in ROMIO collective I/O. Plus a lot of (possibly large) aggregated data buffers. A quick search shows
>
> The common ROMIO read collective code:
>
> Find all "ADIOI_Malloc", Match case, Regular expression (UNIX)
> File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\common\ad_read_coll.c
> 124 38: st_offsets = (ADIO_Offset *) ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> 125 39: end_offsets = (ADIO_Offset *) ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> 317 44: *offset_list_ptr = (ADIO_Offset *) ADIOI_Malloc(2*sizeof(ADIO_Offset));
> 318 41: *len_list_ptr = (ADIO_Offset *) ADIOI_Malloc(2*sizeof(ADIO_Offset));
> 334 44: *offset_list_ptr = (ADIO_Offset *) ADIOI_Malloc(2*sizeof(ADIO_Offset));
> 335 41: *len_list_ptr = (ADIO_Offset *) ADIOI_Malloc(2*sizeof(ADIO_Offset));
> 436 18: ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));
> 437 41: *len_list_ptr = (ADIO_Offset *) ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));
> 573 37: if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);
> 578 21: count = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 587 25: send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 590 25: recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 598 25: start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));
> 739 32: tmp_buf = (char *) ADIOI_Malloc(for_next_iter);
> 744 33: read_buf = (char *) ADIOI_Malloc(for_next_iter+coll_bufsize);
> 805 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));
> 827 30: recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char*));
> 830 44: (char *) ADIOI_Malloc(recv_size[i]);
> 870 31: statuses = (MPI_Status *) ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \
> 988 35: curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> 989 35: done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> 990 35: recv_buf_idx = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> Total found: 22
>
> Our BG version of read collective:
>
> File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\ad_bg\ad_bg_rdcoll.c
> 179 40: st_offsets = (ADIO_Offset *) ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> 180 40: end_offsets = (ADIO_Offset *) ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));
> 183 43: bg_offsets0 = (ADIO_Offset *) ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));
> 184 43: bg_offsets = (ADIO_Offset *) ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));
> 475 37: if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);
> 480 21: count = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 489 25: send_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 492 25: recv_size = (int *) ADIOI_Malloc(nprocs * sizeof(int));
> 500 25: start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));
> 676 32: tmp_buf = (char *) ADIOI_Malloc(for_next_iter);
> 681 33: read_buf = (char *) ADIOI_Malloc(for_next_iter+coll_bufsize);
> 761 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));
> 783 30: recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char*));
> 786 44: (char *) ADIOI_Malloc(recv_size[i]);
> 826 31: statuses = (MPI_Status *) ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \
> 944 35: curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> 945 35: done_from_proc = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> 946 35: recv_buf_idx = (unsigned *) ADIOI_Malloc(nprocs * sizeof(unsigned));
> 1058 23: rdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );
> 1063 29: all_recv_buf = (char *) ADIOI_Malloc( rtail );
> 1064 26: recv_buf = (char **) ADIOI_Malloc(nprocs * sizeof(char *));
> 1068 23: sdispls = (int *) ADIOI_Malloc( nprocs * sizeof(int) );
> 1073 29: all_send_buf = (char *) ADIOI_Malloc( stail );
> Total found: 23
>
>
> Bob Cernohous: (T/L 553) 507-253-6093
>
> BobC at us.ibm.com
> IBM Rochester, Building 030-2(C335), Department 61L
> 3605 Hwy 52 North, Rochester, MN 55901-7829
>
> > Chaos reigns within.
> > Reflect, repent, and reboot.
> > Order shall return.
-------------- next part --------------
A non-text attachment was scrubbed...
Name: smime.p7s
Type: application/pkcs7-signature
Size: 3110 bytes
Desc: not available
URL: <http://lists.mpich.org/pipermail/devel/attachments/20130429/09a82408/attachment.bin>
More information about the devel
mailing list