Message ID | cover.1744822090.git.josef@toxicpanda.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: simplify extent buffer writeback | expand |
On Wed, Apr 16, 2025 at 12:51:05PM -0400, Josef Bacik wrote: > Hello, > > We currently have two different paths for writing out extent buffers, a subpage > path and a normal path. This has resulted in subtle bugs with subpage code that > took us a while to figure out. Additionally we have this complex interaction of > get folio, find eb, see if we already started writing that eb out, write out the > eb. > > We already have a radix tree for our extent buffers, so we can use that > similarly to how pagecache uses the radix tree. Tag the buffers with DIRTY when > they're dirty, and WRITEBACK when we start writing them out. > > The unfortunate part is we have to re-implement folio_batch for extent buffers, > so that's where most of the new code comes from. The good part is we are now > down to a single path for writing out extent buffers, it's way simpler, and in > fact quite a bit faster now that we don't have all of these folio->eb > transitions to deal with. > > I ran this through fsperf on a VM with 8 CPUs and 16gib of ram. I used > smallfiles100k, but reduced the files to 1k to make it run faster, the > results are as follows, with the statistically significant improvements > marked with *, there were no regressions. fsperf was run with -n 10 for > both runs, so the baseline is the average 10 runs and the test is the > average of 10 runs. > > smallfiles100k results > metric baseline current stdev diff > ================================================================================ > avg_commit_ms 68.58 58.44 3.35 -14.79% * > commits 270.60 254.70 16.24 -5.88% > dev_read_iops 48 48 0 0.00% > dev_read_kbytes 1044 1044 0 0.00% > dev_write_iops 866117.90 850028.10 14292.20 -1.86% > dev_write_kbytes 10939976.40 10605701.20 351330.32 -3.06% > elapsed 49.30 33 1.64 -33.06% * > end_state_mount_ns 41251498.80 35773220.70 2531205.32 -13.28% * > end_state_umount_ns 1.90e+09 1.50e+09 14186226.85 -21.38% * > max_commit_ms 139 111.60 9.72 -19.71% * > sys_cpu 4.90 3.86 0.88 -21.29% > write_bw_bytes 42935768.20 64318451.10 1609415.05 49.80% * > write_clat_ns_mean 366431.69 243202.60 14161.98 -33.63% * > write_clat_ns_p50 49203.20 20992 264.40 -57.34% * > write_clat_ns_p99 827392 653721.60 65904.74 -20.99% * > write_io_kbytes 2035940 2035940 0 0.00% > write_iops 10482.37 15702.75 392.92 49.80% * > write_lat_ns_max 1.01e+08 90516129 3910102.06 -10.29% * > write_lat_ns_mean 366556.19 243308.48 14154.51 -33.62% * > > As you can see we get about a 33% decrease runtime, with a 50% > throughput increase, which is pretty significant. Thanks, Ignore this for now, the xarray<->radix thing isn't quite one to one, so I've got to convert the buffer radix to a proper xarray first. Thanks, Josef