[mpich-discuss] very slow file writes independent of file size

Wei-keng Liao wkliao at eecs.northwestern.edu
Mon Mar 3 12:43:31 CST 2014


Hi, Geoffrey

Edison has a Lustre file system and I assume your program writes files
there. If this is the case, please check the file striping setting of your
output files, using command "lfs getstripe filename"

Usually, a high-performance I/O can be obtained if you have configured
the lustre striping setting to a higher stripe_count (96 or 144 max on Edison).

In addition, MPI_File_write_ordered is for shared file pointers.
It can have some impact to the performance. Using independent file pointers
often results in a better performance. You might want to consider to
change your program to use those functions (eg. MPI_File_write_all).


Wei-keng

On Mar 3, 2014, at 12:21 AM, Geoffrey Irving wrote:

> I'm doing postmortem on a 2048 node (16384 rank) job on Edison, and
> trying to understand why my I/O performance might have been slow.
> 
> Here's the data:
> 
> Measured I/O bandwidth:
> slice 35 write sparse bandwidth = 6082640 / (3.52519e+06 s / 16384) =
> 2.63287e-05 GB/s
> slice 34 write sparse bandwidth = 13824080 / (3.66608e+06 s / 16384) =
> 5.75379e-05 GB/s
> slice 33 write sparse bandwidth = 24754256 / (2.83647e+06 s / 16384) =
> 0.000133166 GB/s
> slice 32 write sparse bandwidth = 39370832 / (3.47016e+06 s / 16384) =
> 0.000173119 GB/s
> slice 31 write sparse bandwidth = 55812176 / (2.53623e+06 s / 16384) =
> 0.000335785 GB/s
> slice 30 write sparse bandwidth = 74741840 / (2.5714e+06 s / 16384) =
> 0.00044352 GB/s
> slice 29 write sparse bandwidth = 93560912 / (2.67336e+06 s / 16384) =
> 0.000534019 GB/s
> slice 28 write sparse bandwidth = 112803920 / (2.74639e+06 s / 16384)
> = 0.000626733 GB/s
> slice 27 write sparse bandwidth = 128194640 / (3.1603e+06 s / 16384) =
> 0.000618958 GB/s
> slice 26 write sparse bandwidth = 141281360 / (3.12754e+06 s / 16384)
> = 0.00068929 GB/s
> slice 25 write sparse bandwidth = 148193360 / (2.62376e+06 s / 16384)
> = 0.000861835 GB/s
> slice 24 write sparse bandwidth = 151861328 / (3.2145e+06 s / 16384) =
> 0.000720865 GB/s
> slice 23 write sparse bandwidth = 148193360 / (2.44736e+06 s / 16384)
> = 0.000923956 GB/s
> slice 22 write sparse bandwidth = 142055504 / (3.15962e+06 s / 16384)
> = 0.000686031 GB/s
> slice 21 write sparse bandwidth = 130388048 / (3.09774e+06 s / 16384)
> = 0.000642263 GB/s
> slice 20 write sparse bandwidth = 117964880 / (3.02676e+06 s / 16384)
> = 0.000594696 GB/s
> slice 19 write sparse bandwidth = 101560400 / (2.97198e+06 s / 16384)
> = 0.000521434 GB/s
> slice 18 write sparse bandwidth = 86372432 / (2.96247e+06 s / 16384) =
> 0.000444878 GB/s
> slice 18 write sections bandwidth = 1954957518434 / (1.83937e+07 s /
> 16384) = 1.62177 GB/s
> slice 17 write sparse bandwidth = 70170704 / (2.88973e+06 s / 16384) =
> 0.000370526 GB/s
> slice 17 write sections bandwidth = 1475380615039 / (1.36018e+07 s /
> 16384) = 1.65511 GB/s
> slice 17 read bandwidth (192 nodes) = 1475380615039 / 3383.36 s = 0.406122 GB/s
>  per node: measured = 2.16598 MB/s, theoretical peak = 33.1341 MB/s
> 
> Focusing on the "sparse" lines, the main point is that the time seems
> to be roughly independent of file size (plot attached).  Each timing
> sample consists of (1) setup which I believe is negligible, (2)
> MPI_File_open, (3) MPI_File_write_ordered, (4) MPI_File_close.
> 
> What might have caused these file writes to take so long?
> 
> Geoffrey
> <2014-03-02-221950_1220x500.png>_______________________________________________
> discuss mailing list     discuss at mpich.org
> To manage subscription options or unsubscribe:
> https://lists.mpich.org/mailman/listinfo/discuss




More information about the discuss mailing list