mbox series

[v2,0/7] btrfs-progs: use direct-IO for zoned device

Message ID 20211005062305.549871-1-naohiro.aota@wdc.com (mailing list archive)
Headers show
Series btrfs-progs: use direct-IO for zoned device | expand

Message

Naohiro Aota Oct. 5, 2021, 6:22 a.m. UTC
As discussed in the Zoned Storage page [1],  the kernel page cache does not
guarantee that cached dirty pages will be flushed to a block device in
sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
device to ensure the write ordering.

[1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions

As a writng buffer is embedded in some other struct (e.g., "char data[]" in
struct extent_buffer), it is difficult to allocate the struct so that the
writng buffer is aligned.

This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
which allocates an aligned bounce buffer, copy the buffer contents, and
proceeds the IO. And, it now opens a zoned device with O_DIRECT.

Since the allocation and copying are costly, it is better to do them only
when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
the file is opened with O_DIRECT or not every time doing an IO.

As zoned device forces to use zoned btrfs, I decided to use the zoned flag
to determine if it is direct-IO or not. This can cause a false-positive (to
use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
emulated zoned mode on a non-zoned device or a regular file. Considering
the emulated zoned mode is mostly for debugging or testing, I believe this
is acceptable.

* Changes
v2
  - Rebased on the latest "devel" branch
  - Add patch to fix segfault in several cases
  - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES

Patches 1 to 3 are preparation to fix some issues in the current code.

Patches 4 and 5 wraps pread/pwrite with newly introduced function
btrfs_pread/btrfs_pwrite.

Patch 6 deals with the zoned flag while reading the initial trees.

Patch 7 finally opens a zoned device with O_DIRECT.

Naohiro Aota (7):
  btrfs-progs: mkfs: do not set zone size on non-zoned mode
  btrfs-progs: set eb->fs_info properly
  btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
  btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
  btrfs-progs: introduce btrfs_pread wrapper for pread
  btrfs-progs: temporally set zoned flag for initial tree reading
  btrfs-progs: use direct-io for zoned device

 common/device-utils.c     | 76 ++++++++++++++++++++++++++++++++++++---
 common/device-utils.h     | 29 ++++++++++++++-
 common/fsfeatures.h       |  3 +-
 convert/main.c            |  1 +
 kernel-shared/disk-io.c   | 19 +++++++++-
 kernel-shared/extent_io.c | 14 +++++---
 kernel-shared/volumes.c   |  6 ++++
 kernel-shared/zoned.c     |  6 ++--
 mkfs/common.c             | 14 +++++---
 mkfs/main.c               | 12 +++++--
 mkfs/rootdir.c            |  1 +
 11 files changed, 158 insertions(+), 23 deletions(-)

Comments

David Sterba Oct. 6, 2021, 2:28 p.m. UTC | #1
On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote:
> As discussed in the Zoned Storage page [1],  the kernel page cache does not
> guarantee that cached dirty pages will be flushed to a block device in
> sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
> device to ensure the write ordering.
> 
> [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions
> 
> As a writng buffer is embedded in some other struct (e.g., "char data[]" in
> struct extent_buffer), it is difficult to allocate the struct so that the
> writng buffer is aligned.
> 
> This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
> which allocates an aligned bounce buffer, copy the buffer contents, and
> proceeds the IO. And, it now opens a zoned device with O_DIRECT.
> 
> Since the allocation and copying are costly, it is better to do them only
> when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
> the file is opened with O_DIRECT or not every time doing an IO.
> 
> As zoned device forces to use zoned btrfs, I decided to use the zoned flag
> to determine if it is direct-IO or not. This can cause a false-positive (to
> use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
> emulated zoned mode on a non-zoned device or a regular file. Considering
> the emulated zoned mode is mostly for debugging or testing, I believe this
> is acceptable.
> 
> * Changes
> v2
>   - Rebased on the latest "devel" branch
>   - Add patch to fix segfault in several cases
>   - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
> 
> Patches 1 to 3 are preparation to fix some issues in the current code.
> 
> Patches 4 and 5 wraps pread/pwrite with newly introduced function
> btrfs_pread/btrfs_pwrite.
> 
> Patch 6 deals with the zoned flag while reading the initial trees.
> 
> Patch 7 finally opens a zoned device with O_DIRECT.
> 
> Naohiro Aota (7):
>   btrfs-progs: mkfs: do not set zone size on non-zoned mode
>   btrfs-progs: set eb->fs_info properly
>   btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
>   btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
>   btrfs-progs: introduce btrfs_pread wrapper for pread
>   btrfs-progs: temporally set zoned flag for initial tree reading
>   btrfs-progs: use direct-io for zoned device

Added to devel with some minor fixups, thanks.
David Sterba Oct. 6, 2021, 9:02 p.m. UTC | #2
On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote:
> As discussed in the Zoned Storage page [1],  the kernel page cache does not
> guarantee that cached dirty pages will be flushed to a block device in
> sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
> device to ensure the write ordering.
> 
> [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions
> 
> As a writng buffer is embedded in some other struct (e.g., "char data[]" in
> struct extent_buffer), it is difficult to allocate the struct so that the
> writng buffer is aligned.
> 
> This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
> which allocates an aligned bounce buffer, copy the buffer contents, and
> proceeds the IO. And, it now opens a zoned device with O_DIRECT.
> 
> Since the allocation and copying are costly, it is better to do them only
> when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
> the file is opened with O_DIRECT or not every time doing an IO.
> 
> As zoned device forces to use zoned btrfs, I decided to use the zoned flag
> to determine if it is direct-IO or not. This can cause a false-positive (to
> use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
> emulated zoned mode on a non-zoned device or a regular file. Considering
> the emulated zoned mode is mostly for debugging or testing, I believe this
> is acceptable.
> 
> * Changes
> v2
>   - Rebased on the latest "devel" branch
>   - Add patch to fix segfault in several cases
>   - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
> 
> Patches 1 to 3 are preparation to fix some issues in the current code.
> 
> Patches 4 and 5 wraps pread/pwrite with newly introduced function
> btrfs_pread/btrfs_pwrite.
> 
> Patch 6 deals with the zoned flag while reading the initial trees.
> 
> Patch 7 finally opens a zoned device with O_DIRECT.
> 
> Naohiro Aota (7):
>   btrfs-progs: mkfs: do not set zone size on non-zoned mode
>   btrfs-progs: set eb->fs_info properly
>   btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
>   btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
>   btrfs-progs: introduce btrfs_pread wrapper for pread
>   btrfs-progs: temporally set zoned flag for initial tree reading
>   btrfs-progs: use direct-io for zoned device

Is this still supposed to work?

  $ ./mkfs.btrfs -f -O zoned -d single -m single img
  ...
  ERROR: 16384 is not aligned to 1048576
  ERROR: error during mkfs: Input/output error

On commit below this patchset it works and creates a filesystem with
zoned mode and zone size 256M.
Naohiro Aota Oct. 20, 2021, 6:53 a.m. UTC | #3
On Wed, Oct 06, 2021 at 11:02:47PM +0200, David Sterba wrote:
> On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote:
> > As discussed in the Zoned Storage page [1],  the kernel page cache does not
> > guarantee that cached dirty pages will be flushed to a block device in
> > sequential sector order. Thus, we must use O_DIRECT for writing to a zoned
> > device to ensure the write ordering.
> > 
> > [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions
> > 
> > As a writng buffer is embedded in some other struct (e.g., "char data[]" in
> > struct extent_buffer), it is difficult to allocate the struct so that the
> > writng buffer is aligned.
> > 
> > This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite,
> > which allocates an aligned bounce buffer, copy the buffer contents, and
> > proceeds the IO. And, it now opens a zoned device with O_DIRECT.
> > 
> > Since the allocation and copying are costly, it is better to do them only
> > when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine
> > the file is opened with O_DIRECT or not every time doing an IO.
> > 
> > As zoned device forces to use zoned btrfs, I decided to use the zoned flag
> > to determine if it is direct-IO or not. This can cause a false-positive (to
> > use the bounce buffer when a file is *not* opened with O_DIRECT) in case of
> > emulated zoned mode on a non-zoned device or a regular file. Considering
> > the emulated zoned mode is mostly for debugging or testing, I believe this
> > is acceptable.
> > 
> > * Changes
> > v2
> >   - Rebased on the latest "devel" branch
> >   - Add patch to fix segfault in several cases
> >   - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
> > 
> > Patches 1 to 3 are preparation to fix some issues in the current code.
> > 
> > Patches 4 and 5 wraps pread/pwrite with newly introduced function
> > btrfs_pread/btrfs_pwrite.
> > 
> > Patch 6 deals with the zoned flag while reading the initial trees.
> > 
> > Patch 7 finally opens a zoned device with O_DIRECT.
> > 
> > Naohiro Aota (7):
> >   btrfs-progs: mkfs: do not set zone size on non-zoned mode
> >   btrfs-progs: set eb->fs_info properly
> >   btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES
> >   btrfs-progs: introduce btrfs_pwrite wrapper for pwrite
> >   btrfs-progs: introduce btrfs_pread wrapper for pread
> >   btrfs-progs: temporally set zoned flag for initial tree reading
> >   btrfs-progs: use direct-io for zoned device
> 
> Is this still supposed to work?
> 
>   $ ./mkfs.btrfs -f -O zoned -d single -m single img
>   ...
>   ERROR: 16384 is not aligned to 1048576
>   ERROR: error during mkfs: Input/output error
> 
> On commit below this patchset it works and creates a filesystem with
> zoned mode and zone size 256M.

I'm sorry to respond so late. My email fetcher was broken.

The mkfs on an image file should work well. I tested the current
"devel" branch. While it needs a patch to replace pwrite with
btrfs_pwrite in create_free_space_tree(), it worked well besides
that. Is this failing on your side?

I'll send the patch soon.
David Sterba Oct. 20, 2021, 4:57 p.m. UTC | #4
On Wed, Oct 20, 2021 at 06:53:32AM +0000, Naohiro Aota wrote:
> On Wed, Oct 06, 2021 at 11:02:47PM +0200, David Sterba wrote:
> > On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote:
> > Is this still supposed to work?
> > 
> >   $ ./mkfs.btrfs -f -O zoned -d single -m single img
> >   ...
> >   ERROR: 16384 is not aligned to 1048576
> >   ERROR: error during mkfs: Input/output error
> > 
> > On commit below this patchset it works and creates a filesystem with
> > zoned mode and zone size 256M.
> 
> The mkfs on an image file should work well. I tested the current
> "devel" branch. While it needs a patch to replace pwrite with
> btrfs_pwrite in create_free_space_tree(), it worked well besides
> that. Is this failing on your side?

Yes it still fails with the patched pwrite, same error, with or without
enabling free-space-tree.