Message ID | 20211005062305.549871-1-naohiro.aota@wdc.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs-progs: use direct-IO for zoned device | expand |
On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote: > As discussed in the Zoned Storage page [1], the kernel page cache does not > guarantee that cached dirty pages will be flushed to a block device in > sequential sector order. Thus, we must use O_DIRECT for writing to a zoned > device to ensure the write ordering. > > [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions > > As a writng buffer is embedded in some other struct (e.g., "char data[]" in > struct extent_buffer), it is difficult to allocate the struct so that the > writng buffer is aligned. > > This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite, > which allocates an aligned bounce buffer, copy the buffer contents, and > proceeds the IO. And, it now opens a zoned device with O_DIRECT. > > Since the allocation and copying are costly, it is better to do them only > when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine > the file is opened with O_DIRECT or not every time doing an IO. > > As zoned device forces to use zoned btrfs, I decided to use the zoned flag > to determine if it is direct-IO or not. This can cause a false-positive (to > use the bounce buffer when a file is *not* opened with O_DIRECT) in case of > emulated zoned mode on a non-zoned device or a regular file. Considering > the emulated zoned mode is mostly for debugging or testing, I believe this > is acceptable. > > * Changes > v2 > - Rebased on the latest "devel" branch > - Add patch to fix segfault in several cases > - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > > Patches 1 to 3 are preparation to fix some issues in the current code. > > Patches 4 and 5 wraps pread/pwrite with newly introduced function > btrfs_pread/btrfs_pwrite. > > Patch 6 deals with the zoned flag while reading the initial trees. > > Patch 7 finally opens a zoned device with O_DIRECT. > > Naohiro Aota (7): > btrfs-progs: mkfs: do not set zone size on non-zoned mode > btrfs-progs: set eb->fs_info properly > btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > btrfs-progs: introduce btrfs_pwrite wrapper for pwrite > btrfs-progs: introduce btrfs_pread wrapper for pread > btrfs-progs: temporally set zoned flag for initial tree reading > btrfs-progs: use direct-io for zoned device Added to devel with some minor fixups, thanks.
On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote: > As discussed in the Zoned Storage page [1], the kernel page cache does not > guarantee that cached dirty pages will be flushed to a block device in > sequential sector order. Thus, we must use O_DIRECT for writing to a zoned > device to ensure the write ordering. > > [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions > > As a writng buffer is embedded in some other struct (e.g., "char data[]" in > struct extent_buffer), it is difficult to allocate the struct so that the > writng buffer is aligned. > > This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite, > which allocates an aligned bounce buffer, copy the buffer contents, and > proceeds the IO. And, it now opens a zoned device with O_DIRECT. > > Since the allocation and copying are costly, it is better to do them only > when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine > the file is opened with O_DIRECT or not every time doing an IO. > > As zoned device forces to use zoned btrfs, I decided to use the zoned flag > to determine if it is direct-IO or not. This can cause a false-positive (to > use the bounce buffer when a file is *not* opened with O_DIRECT) in case of > emulated zoned mode on a non-zoned device or a regular file. Considering > the emulated zoned mode is mostly for debugging or testing, I believe this > is acceptable. > > * Changes > v2 > - Rebased on the latest "devel" branch > - Add patch to fix segfault in several cases > - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > > Patches 1 to 3 are preparation to fix some issues in the current code. > > Patches 4 and 5 wraps pread/pwrite with newly introduced function > btrfs_pread/btrfs_pwrite. > > Patch 6 deals with the zoned flag while reading the initial trees. > > Patch 7 finally opens a zoned device with O_DIRECT. > > Naohiro Aota (7): > btrfs-progs: mkfs: do not set zone size on non-zoned mode > btrfs-progs: set eb->fs_info properly > btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > btrfs-progs: introduce btrfs_pwrite wrapper for pwrite > btrfs-progs: introduce btrfs_pread wrapper for pread > btrfs-progs: temporally set zoned flag for initial tree reading > btrfs-progs: use direct-io for zoned device Is this still supposed to work? $ ./mkfs.btrfs -f -O zoned -d single -m single img ... ERROR: 16384 is not aligned to 1048576 ERROR: error during mkfs: Input/output error On commit below this patchset it works and creates a filesystem with zoned mode and zone size 256M.
On Wed, Oct 06, 2021 at 11:02:47PM +0200, David Sterba wrote: > On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote: > > As discussed in the Zoned Storage page [1], the kernel page cache does not > > guarantee that cached dirty pages will be flushed to a block device in > > sequential sector order. Thus, we must use O_DIRECT for writing to a zoned > > device to ensure the write ordering. > > > > [1] https://zonedstorage.io/linux/overview/#zbd-support-restrictions > > > > As a writng buffer is embedded in some other struct (e.g., "char data[]" in > > struct extent_buffer), it is difficult to allocate the struct so that the > > writng buffer is aligned. > > > > This series introduces btrfs_{pread,pwrite} to wrap around pread/pwrite, > > which allocates an aligned bounce buffer, copy the buffer contents, and > > proceeds the IO. And, it now opens a zoned device with O_DIRECT. > > > > Since the allocation and copying are costly, it is better to do them only > > when necessary. But, it is cumbersome to call fcntl(F_GETFL) to determine > > the file is opened with O_DIRECT or not every time doing an IO. > > > > As zoned device forces to use zoned btrfs, I decided to use the zoned flag > > to determine if it is direct-IO or not. This can cause a false-positive (to > > use the bounce buffer when a file is *not* opened with O_DIRECT) in case of > > emulated zoned mode on a non-zoned device or a regular file. Considering > > the emulated zoned mode is mostly for debugging or testing, I believe this > > is acceptable. > > > > * Changes > > v2 > > - Rebased on the latest "devel" branch > > - Add patch to fix segfault in several cases > > - drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > > > > Patches 1 to 3 are preparation to fix some issues in the current code. > > > > Patches 4 and 5 wraps pread/pwrite with newly introduced function > > btrfs_pread/btrfs_pwrite. > > > > Patch 6 deals with the zoned flag while reading the initial trees. > > > > Patch 7 finally opens a zoned device with O_DIRECT. > > > > Naohiro Aota (7): > > btrfs-progs: mkfs: do not set zone size on non-zoned mode > > btrfs-progs: set eb->fs_info properly > > btrfs-progs: drop ZONED flag from BTRFS_CONVERT_ALLOWED_FEATURES > > btrfs-progs: introduce btrfs_pwrite wrapper for pwrite > > btrfs-progs: introduce btrfs_pread wrapper for pread > > btrfs-progs: temporally set zoned flag for initial tree reading > > btrfs-progs: use direct-io for zoned device > > Is this still supposed to work? > > $ ./mkfs.btrfs -f -O zoned -d single -m single img > ... > ERROR: 16384 is not aligned to 1048576 > ERROR: error during mkfs: Input/output error > > On commit below this patchset it works and creates a filesystem with > zoned mode and zone size 256M. I'm sorry to respond so late. My email fetcher was broken. The mkfs on an image file should work well. I tested the current "devel" branch. While it needs a patch to replace pwrite with btrfs_pwrite in create_free_space_tree(), it worked well besides that. Is this failing on your side? I'll send the patch soon.
On Wed, Oct 20, 2021 at 06:53:32AM +0000, Naohiro Aota wrote: > On Wed, Oct 06, 2021 at 11:02:47PM +0200, David Sterba wrote: > > On Tue, Oct 05, 2021 at 03:22:58PM +0900, Naohiro Aota wrote: > > Is this still supposed to work? > > > > $ ./mkfs.btrfs -f -O zoned -d single -m single img > > ... > > ERROR: 16384 is not aligned to 1048576 > > ERROR: error during mkfs: Input/output error > > > > On commit below this patchset it works and creates a filesystem with > > zoned mode and zone size 256M. > > The mkfs on an image file should work well. I tested the current > "devel" branch. While it needs a patch to replace pwrite with > btrfs_pwrite in create_free_space_tree(), it worked well besides > that. Is this failing on your side? Yes it still fails with the patched pwrite, same error, with or without enabling free-space-tree.