<font size=2 face="sans-serif">I agree and suggested:</font>
<br><font size=2 face="sans-serif">---------------------</font>
<br><font size=2 face="sans-serif">It appears they don't have enough memory
for an alltoallv exchange. Try '1'...</font>
<br>
<br><font size=2 face="sans-serif"> * - BGMPIO_COMM - Define how data
is exchanged on collective
</font>
<br><font size=2 face="sans-serif"> * reads and writes. Possible
values:
</font>
<br><font size=2 face="sans-serif"> * - 0 - Use MPI_Alltoallv.
</font>
<br><font size=2 face="sans-serif"> * - 1 - Use MPI_Isend/MPI_Irecv.
</font>
<br><font size=2 face="sans-serif"> * - Default is 0.
</font>
<br><font size=2 face="sans-serif">---------------------</font>
<br>
<br><font size=2 face="sans-serif">but they didn't want a work around they
wanted a 'fix for o(p) allocations'. There are o(p) allocations all
over collective i/o from a quick glance. Just wanted some input
from the experts about scaling romio. I haven't heard if the suggestion
worked.</font>
<br><font size=2 face="sans-serif"><br>
Bob Cernohous: (T/L 553) 507-253-6093<br>
<br>
BobC@us.ibm.com<br>
IBM Rochester, Building 030-2(C335), Department 61L<br>
3605 Hwy 52 North, Rochester, MN 55901-7829<br>
<br>
> Chaos reigns within.<br>
> Reflect, repent, and reboot.<br>
> Order shall return.<br>
</font>
<br>
<br><tt><font size=2>devel-bounces@mpich.org wrote on 05/04/2013 09:43:10
PM:<br>
<br>
> From: "Rob Latham" <robl@mcs.anl.gov></font></tt>
<br><tt><font size=2>> To: devel@mpich.org, </font></tt>
<br><tt><font size=2>> Cc: mpich2-dev@mcs.anl.gov</font></tt>
<br><tt><font size=2>> Date: 05/04/2013 09:48 PM</font></tt>
<br><tt><font size=2>> Subject: Re: [mpich-devel] ROMIO collective i/o
memory use</font></tt>
<br><tt><font size=2>> Sent by: devel-bounces@mpich.org</font></tt>
<br><tt><font size=2>> <br>
> On Mon, Apr 29, 2013 at 10:28:01AM -0500, Bob Cernohous wrote:<br>
> > A customer (Argonne ;) is complaining about O(p) allocations
in collective <br>
> > i/o. A collective read is failing at larger scale. <br>
> > <br>
> > Any thoughts or comments or advice? There appears to be
lots of O(p) in <br>
> > ROMIO collective I/O. Plus a lot of (possibly large) aggregated
data <br>
> > buffers. A quick search shows<br>
> <br>
> The O(p) allocations are a concern, sure. For two-phase, though,
the<br>
> real problem lies in ADIOI_R_Exchange_data_alltoallv and<br>
> ADIOI_W_Exchange_data_alltoallv . The O(p) allocations are the
least<br>
> of our worries! <br>
> <br>
> around line 1063 of ad_bg_rdcoll.c <br>
> <br>
> all_recv_buf = (char *) ADIOI_Malloc( rtail );<br>
> <br>
> all_send_buf = (char *) ADIOI_Malloc( stail );<br>
> <br>
> (rtail and stail are the sum of the receive and send arrays)<br>
> <br>
> ==rob<br>
> <br>
> > The common ROMIO read collective code:<br>
> > <br>
> > Find all "ADIOI_Malloc", Match case, Regular expression
(UNIX)<br>
> > <br>
> > File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\common\ad_read_coll.c<br>
> > <br>
> > 124 38: st_offsets = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 125 39: end_offsets = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 317 44: *offset_list_ptr = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));<br>
> > <br>
> > 318 41: *len_list_ptr = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));<br>
> > <br>
> > 334 44: *offset_list_ptr = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));<br>
> > <br>
> > 335 41: *len_list_ptr = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(2*sizeof(ADIO_Offset));<br>
> > <br>
> > 436 18: ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));
<br>
> > <br>
> > 437 41: *len_list_ptr = (ADIO_Offset
*) <br>
> > ADIOI_Malloc((contig_access_count+1)*sizeof(ADIO_Offset));<br>
> > <br>
> > 573 37: if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);<br>
> > <br>
> > 578 21: count = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 587 25: send_size = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 590 25: recv_size = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 598 25: start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));<br>
> > <br>
> > 739 32: tmp_buf = (char
*) ADIOI_Malloc(for_next_iter);<br>
> > <br>
> > 744 33: read_buf =
(char *) <br>
> > ADIOI_Malloc(for_next_iter+coll_bufsize);<br>
> > <br>
> > 805 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));<br>
> > <br>
> > 827 30: recv_buf = (char **) ADIOI_Malloc(nprocs
* sizeof(char*));<br>
> > <br>
> > 830 44:
(char
*) <br>
> > ADIOI_Malloc(recv_size[i]);<br>
> > <br>
> > 870 31: statuses = (MPI_Status *) <br>
> > ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \<br>
> > <br>
> > 988 35: curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs
* <br>
> > sizeof(unsigned));<br>
> > <br>
> > 989 35: done_from_proc = (unsigned *) ADIOI_Malloc(nprocs
* <br>
> > sizeof(unsigned));<br>
> > <br>
> > 990 35: recv_buf_idx = (unsigned *)
ADIOI_Malloc(nprocs * <br>
> > sizeof(unsigned));<br>
> > <br>
> > Total found: 22<br>
> > <br>
> > <br>
> > Our BG version of read collective:<br>
> > <br>
> > File Z:\bgq\comm\lib\dev\mpich2\src\mpi\romio\adio\ad_bg\ad_bg_rdcoll.c<br>
> > <br>
> > 179 40: st_offsets = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 180 40: end_offsets = (ADIO_Offset
*) <br>
> > ADIOI_Malloc(nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 183 43: bg_offsets0
= (ADIO_Offset *) <br>
> > ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 184 43: bg_offsets
= (ADIO_Offset *) <br>
> > ADIOI_Malloc(2*nprocs*sizeof(ADIO_Offset));<br>
> > <br>
> > 475 37: if (ntimes) read_buf = (char *) ADIOI_Malloc(coll_bufsize);<br>
> > <br>
> > 480 21: count = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 489 25: send_size = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 492 25: recv_size = (int *) ADIOI_Malloc(nprocs
* sizeof(int));<br>
> > <br>
> > 500 25: start_pos = (int *) ADIOI_Malloc(nprocs*sizeof(int));<br>
> > <br>
> > 676 32: tmp_buf = (char
*) ADIOI_Malloc(for_next_iter);<br>
> > <br>
> > 681 33: read_buf =
(char *) <br>
> > ADIOI_Malloc(for_next_iter+coll_bufsize);<br>
> > <br>
> > 761 9: ADIOI_Malloc((nprocs_send+nprocs_recv+1)*sizeof(MPI_Request));<br>
> > <br>
> > 783 30: recv_buf = (char **) ADIOI_Malloc(nprocs
* sizeof(char*));<br>
> > <br>
> > 786 44:
(char
*) <br>
> > ADIOI_Malloc(recv_size[i]);<br>
> > <br>
> > 826 31: statuses = (MPI_Status *) <br>
> > ADIOI_Malloc((nprocs_send+nprocs_recv+1) * \<br>
> > <br>
> > 944 35: curr_from_proc = (unsigned *) ADIOI_Malloc(nprocs
* <br>
> > sizeof(unsigned));<br>
> > <br>
> > 945 35: done_from_proc = (unsigned *) ADIOI_Malloc(nprocs
* <br>
> > sizeof(unsigned));<br>
> > <br>
> > 946 35: recv_buf_idx = (unsigned *)
ADIOI_Malloc(nprocs * <br>
> > sizeof(unsigned));<br>
> > <br>
> > 1058 23: rdispls = (int *) ADIOI_Malloc(
nprocs * sizeof(int) );<br>
> > <br>
> > 1063 29: all_recv_buf = (char *) ADIOI_Malloc(
rtail );<br>
> > <br>
> > 1064 26: recv_buf = (char **) ADIOI_Malloc(nprocs
* sizeof(char *));<br>
> > <br>
> > 1068 23: sdispls = (int *) ADIOI_Malloc(
nprocs * sizeof(int) );<br>
> > <br>
> > 1073 29: all_send_buf = (char *) ADIOI_Malloc(
stail );<br>
> > <br>
> > Total found: 23<br>
> > <br>
> > <br>
> > Bob Cernohous: (T/L 553) 507-253-6093<br>
> > <br>
> > BobC@us.ibm.com<br>
> > IBM Rochester, Building 030-2(C335), Department 61L<br>
> > 3605 Hwy 52 North, Rochester, MN 55901-7829<br>
> > <br>
> > > Chaos reigns within.<br>
> > > Reflect, repent, and reboot.<br>
> > > Order shall return.<br>
> <br>
> -- <br>
> Rob Latham<br>
> Mathematics and Computer Science Division<br>
> Argonne National Lab, IL USA<br>
> <br>
</font></tt>