Message ID | 2b4271752514c9f376b1fc6a988336ed9238aa0d.1608608848.git.naohiro.aota@wdc.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: zoned block device support | expand |
On 12/21/20 10:49 PM, Naohiro Aota wrote: > If more than one IO is issued for one file extent, these IO can be written > to separate regions on a device. Since we cannot map one file extent to > such a separate area, we need to follow the "one IO == one ordered extent" > rule. > > The Normal buffered, uncompressed, not pre-allocated write path (used by > cow_file_range()) sometimes does not follow this rule. It can write a part > of an ordered extent when specified a region to write e.g., when its > called from fdatasync(). > > Introduces a dedicated (uncompressed buffered) data write path for ZONED > mode. This write path will CoW the region and write it at once. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> This means we'll write one page at a time, no coalescing of data pages. I'm not the one with zoned devices in production, but it might be worth fixing this in the future so you're not generating a billion bio's for large sequential data areas. Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On 12/21/20 10:49 PM, Naohiro Aota wrote: > If more than one IO is issued for one file extent, these IO can be written > to separate regions on a device. Since we cannot map one file extent to > such a separate area, we need to follow the "one IO == one ordered extent" > rule. > > The Normal buffered, uncompressed, not pre-allocated write path (used by > cow_file_range()) sometimes does not follow this rule. It can write a part > of an ordered extent when specified a region to write e.g., when its > called from fdatasync(). > > Introduces a dedicated (uncompressed buffered) data write path for ZONED > mode. This write path will CoW the region and write it at once. > > Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
On Tue, Jan 12, 2021 at 02:24:09PM -0500, Josef Bacik wrote: >On 12/21/20 10:49 PM, Naohiro Aota wrote: >>If more than one IO is issued for one file extent, these IO can be written >>to separate regions on a device. Since we cannot map one file extent to >>such a separate area, we need to follow the "one IO == one ordered extent" >>rule. >> >>The Normal buffered, uncompressed, not pre-allocated write path (used by >>cow_file_range()) sometimes does not follow this rule. It can write a part >>of an ordered extent when specified a region to write e.g., when its >>called from fdatasync(). >> >>Introduces a dedicated (uncompressed buffered) data write path for ZONED >>mode. This write path will CoW the region and write it at once. >> >>Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> > >This means we'll write one page at a time, no coalescing of data >pages. I'm not the one with zoned devices in production, but it might >be worth fixing this in the future so you're not generating a billion >bio's for large sequential data areas. Actually, it is already wrting multiple pages in one bio. We get a delalloced range that spans multiple pages from btrfs_run_delalloc_range() and write all the pages with one bio in extent_write_locked_range(). > >Reviewed-by: Josef Bacik <josef@toxicpanda.com> > >Thanks, > >Josef
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5e96d9631038..5f4de6ebebbd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1400,6 +1400,29 @@ static int cow_file_range_async(struct btrfs_inode *inode, return 0; } +static noinline int run_delalloc_zoned(struct btrfs_inode *inode, + struct page *locked_page, u64 start, + u64 end, int *page_started, + unsigned long *nr_written) +{ + int ret; + + ret = cow_file_range(inode, locked_page, start, end, + page_started, nr_written, 0); + if (ret) + return ret; + + if (*page_started) + return 0; + + __set_page_dirty_nobuffers(locked_page); + account_page_redirty(locked_page); + extent_write_locked_range(&inode->vfs_inode, start, end, WB_SYNC_ALL); + *page_started = 1; + + return 0; +} + static noinline int csum_exist_in_range(struct btrfs_fs_info *fs_info, u64 bytenr, u64 num_bytes) { @@ -1879,17 +1902,24 @@ int btrfs_run_delalloc_range(struct btrfs_inode *inode, struct page *locked_page { int ret; int force_cow = need_force_cow(inode, start, end); + const bool do_compress = inode_can_compress(inode) && + inode_need_compress(inode, start, end); + const bool zoned = btrfs_is_zoned(inode->root->fs_info); if (inode->flags & BTRFS_INODE_NODATACOW && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 1, nr_written); } else if (inode->flags & BTRFS_INODE_PREALLOC && !force_cow) { + ASSERT(!zoned); ret = run_delalloc_nocow(inode, locked_page, start, end, page_started, 0, nr_written); - } else if (!inode_can_compress(inode) || - !inode_need_compress(inode, start, end)) { + } else if (!do_compress && !zoned) { ret = cow_file_range(inode, locked_page, start, end, page_started, nr_written, 1); + } else if (!do_compress && zoned) { + ret = run_delalloc_zoned(inode, locked_page, start, end, + page_started, nr_written); } else { set_bit(BTRFS_INODE_HAS_ASYNC_EXTENT, &inode->runtime_flags); ret = cow_file_range_async(inode, wbc, locked_page, start, end,
If more than one IO is issued for one file extent, these IO can be written to separate regions on a device. Since we cannot map one file extent to such a separate area, we need to follow the "one IO == one ordered extent" rule. The Normal buffered, uncompressed, not pre-allocated write path (used by cow_file_range()) sometimes does not follow this rule. It can write a part of an ordered extent when specified a region to write e.g., when its called from fdatasync(). Introduces a dedicated (uncompressed buffered) data write path for ZONED mode. This write path will CoW the region and write it at once. Signed-off-by: Naohiro Aota <naohiro.aota@wdc.com> --- fs/btrfs/inode.c | 34 ++++++++++++++++++++++++++++++++-- 1 file changed, 32 insertions(+), 2 deletions(-)