<meta http-equiv="Content-Type" content="text/html; charset=utf-8"><div dir="ltr">Hi Rob,<div><br><div class="gmail_extra"><div class="gmail_quote"><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">Are you the same Pramod Kumbhar that works at EPFL?</blockquote><div><br></div><div>
<p style="margin:0px;font-size:12px;line-height:normal;font-family:helvetica">Yes. After seeing failure on 8-rack, I started debugging/profiling on our local</p><p style="margin:0px;font-size:12px;line-height:normal;font-family:helvetica">4-rack bg-q system. I planned to send an email to support team with more detailed</p><p style="margin:0px;font-size:12px;line-height:normal;font-family:helvetica">information (for which job is currently in queue).</p>
</div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
1: collective I/O does consume some memory. not only is there an<br>
internal "collective buffer" maintained by MPI-IO itself, but the data<br>
exchange copies data as well before calling ALLTOALL.<br></blockquote><div><br></div><div>Just wondering if there any way to print or query some internal statistics about this.</div><div> </div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
Paul Coffman has done a one-sided based two-phase implementation that<br>
should be lower memory overhead. But here we should take the<br>
discussion off-list.<br></blockquote><div><br></div><div>Perfect ! Thanks!</div><div><br></div><div>Regards,</div><div>Pramod </div><div><br></div><div><br></div><blockquote class="gmail_quote" style="margin:0px 0px 0px 0.8ex;border-left-width:1px;border-left-color:rgb(204,204,204);border-left-style:solid;padding-left:1ex">
==rob<br>
<div><div class="gmail-h5"><br>
><br>
> Quick Summary :<br>
><br>
> 1. On bg-q I see cb_buffer_size as 16MB when we query on file handle<br>
> using MPI_File_get_info.<br>
> An application that we are looking at has code section like: <br>
><br>
> ….<br>
> MPI_File_set_view( fh, position_to_write, MPI_FLOAT, mappingType,<br>
> _native_, MPI_INFO_NULL );<br>
> max_mb_on_any_rank_using_<wbr>Kernel_GetMemorySize () => 275 MB<br>
> MPI_File_write_all( fh, mappingBuffer, ....................<br>
> MPI_FLOAT, &status);<br>
> max_mb_on_any_rank_using_<wbr>Kernel_GetMemorySize () => 373 MB<br>
> ……<br>
><br>
> Why we see that spike in memory usage? (see Detail section for size<br>
> information)<br>
><br>
> I have seen “Kernel_GetMemorySize(KERNEL_<wbr>MEMSIZE_HEAP….)” not<br>
> returning accurate memory footprint but I am not sure if that is the<br>
> case here.<br>
> Darshan screenshot attached shows the access sizes while running on 4<br>
> rack.<br>
><br>
> 2. Is romio_cb_alltoall ignored on bg-q? Even if I disable it, I see<br>
> “automatic” in the output.<br>
><br>
> (I am looking at<br>
> srcV1R2M4/comm/lib/dev/mpich2/<wbr>src/mpi/romio/adio/ad_bg/ad_<wbr>bg_hints.c<br>
> and see the code section is commented.)<br>
><br>
> More Details :<br>
><br>
> We are debugging an application on MIRA which runs on 1,2,4 racks but<br>
> fails at 8 racks while dumping a custom checkpoint. These are strong<br>
> scaling runs and the size of checkpoint remains same (~172GB). 32<br>
> ranks per mode. Max memory usage before start of checkpoint (i.e.<br>
> before single write_all call)<br>
> for 8 rack is ~ 300 MB. The checkpoint size from each rank is between<br>
> Kbs to few MBs (as shown by darshan). Once application call<br>
> checkpoint, we see below error :<br>
><br>
> Out of memory in file<br>
> /bgsys/source/srcV1R2M2.15270/<wbr>comm/lib/dev/mpich2/src/mpi/<wbr>romio/adio/<br>
> ad_bg/ad_bg_wrcoll.c, line 500<br>
><br>
> And hence I am confused about behaviour mentioned in question 1.<br>
> If someone has any insight, it will be great help!<br>
><br>
> Regards,<br>
> Pramod<br>
><br>
> p.s. <br>
><br>
> Default values of all hints <br>
><br>
> cb_buffer_size, value = 16777216<br>
> romio_cb_read, value = enable<br>
> romio_cb_write, value = enable<br>
> cb_nodes, value = 8320 (change based on partition size)<br>
> romio_no_indep_rw, value = false<br>
> romio_cb_pfr, value = disable<br>
> romio_cb_fr_types, value = aar<br>
> romio_cb_fr_alignment, value = 1<br>
> romio_cb_ds_threshold, value = 0<br>
> romio_cb_alltoall, value = automatic<br>
> ind_rd_buffer_size, value = 4194304<br>
> romio_ds_read, value = automatic<br>
> romio_ds_write, value = disable<br>
</div></div>> ______________________________<wbr>_________________<br>
> discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
> To manage subscription options or unsubscribe:<br>
> <a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a><br>
______________________________<wbr>_________________<br>
discuss mailing list <a href="mailto:discuss@mpich.org">discuss@mpich.org</a><br>
To manage subscription options or unsubscribe:<br>
<a href="https://lists.mpich.org/mailman/listinfo/discuss" rel="noreferrer" target="_blank">https://lists.mpich.org/<wbr>mailman/listinfo/discuss</a></blockquote></div><br></div></div></div>