[mpich-discuss] About mpi i/o, mpi hints and collective i/o memory usage

pramod kumbhar pramod.s.kumbhar at gmail.com
Fri Apr 21 13:02:29 CDT 2017


Hi Rob,

Are you the same Pramod Kumbhar that works at EPFL?


Yes. After seeing failure on 8-rack, I started debugging/profiling on our
local

4-rack bg-q system. I planned to send an email to support team with more
detailed

information (for which job is currently in queue).

1: collective I/O does consume some memory.  not only is there an
> internal "collective buffer" maintained by MPI-IO itself, but the data
> exchange copies data as well before calling ALLTOALL.
>

Just wondering if there any way to print or query some internal statistics
about this.


> Paul Coffman has done a one-sided based two-phase implementation that
> should be lower memory overhead.  But here we should take the
> discussion off-list.
>

Perfect ! Thanks!

Regards,
Pramod


==rob
>
> >
> > Quick Summary :
> >
> > 1. On bg-q I see cb_buffer_size as 16MB when we query on file handle
> > using MPI_File_get_info.
> > An application that we are looking at has code section like:
> >
> > ….
> > MPI_File_set_view( fh, position_to_write, MPI_FLOAT, mappingType,
> > _native_, MPI_INFO_NULL );
> > max_mb_on_any_rank_using_Kernel_GetMemorySize () => 275 MB
> > MPI_File_write_all( fh, mappingBuffer, ....................
> > MPI_FLOAT, &status);
> > max_mb_on_any_rank_using_Kernel_GetMemorySize () => 373 MB
> > ……
> >
> > Why we see that spike in memory usage?  (see Detail section for size
> > information)
> >
> > I have seen “Kernel_GetMemorySize(KERNEL_MEMSIZE_HEAP….)” not
> > returning accurate memory footprint but I am not sure if that is the
> > case here.
> > Darshan screenshot attached shows the access sizes while running on 4
> > rack.
> >
> > 2. Is romio_cb_alltoall ignored on bg-q? Even if I disable it, I see
> > “automatic” in the output.
> >
> > (I am looking at
> > srcV1R2M4/comm/lib/dev/mpich2/src/mpi/romio/adio/ad_bg/ad_bg_hints.c
> > and see the code section is commented.)
> >
> > More Details :
> >
> > We are debugging an application on MIRA which runs on 1,2,4 racks but
> > fails at 8 racks while dumping a custom checkpoint. These are strong
> > scaling runs and the size of checkpoint remains same (~172GB). 32
> > ranks per mode. Max memory usage before start of checkpoint (i.e.
> > before single write_all call)
> > for 8 rack is ~ 300 MB. The checkpoint size from each rank is between
> > Kbs to few MBs (as shown by darshan). Once application call
> > checkpoint, we see below error :
> >
> >   Out of memory in file
> > /bgsys/source/srcV1R2M2.15270/comm/lib/dev/mpich2/src/mpi/romio/adio/
> > ad_bg/ad_bg_wrcoll.c,     line 500
> >
> > And hence I am confused about behaviour mentioned in question 1.
> > If someone has any insight, it will be great help!
> >
> > Regards,
> > Pramod
> >
> > p.s.
> >
> > Default values of all hints
> >
> > cb_buffer_size, value = 16777216
> > romio_cb_read, value = enable
> > romio_cb_write, value = enable
> > cb_nodes, value = 8320             (change based on partition size)
> > romio_no_indep_rw, value = false
> > romio_cb_pfr, value = disable
> > romio_cb_fr_types, value = aar
> > romio_cb_fr_alignment, value = 1
> > romio_cb_ds_threshold, value = 0
> > romio_cb_alltoall, value = automatic
> > ind_rd_buffer_size, value = 4194304
> > romio_ds_read, value = automatic
> > romio_ds_write, value = disable
> > _______________________________________________
> > discuss mailing list     discuss at mpich.org
> > To manage subscription options or unsubscribe:
> > https://lists.mpich.org/mailman/listinfo/discuss
> _______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://lists.mpich.org/pipermail/discuss/attachments/20170421/c50763cc/attachment.html>
-------------- next part --------------
_______________________________________________
discuss mailing list     discuss at mpich.org
To manage subscription options or unsubscribe:
https://lists.mpich.org/mailman/listinfo/discuss


More information about the discuss mailing list