diff mbox series

btrfs-progs: docs: add an extra note to btrfs data checksum and directIO

Message ID b7dd4b16ffffa1114177f37bc349d437fc51cc63.1739484084.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs-progs: docs: add an extra note to btrfs data checksum and directIO | expand

Commit Message

Qu Wenruo Feb. 13, 2025, 10:01 p.m. UTC
In v6.14 kernel release, btrfs will force direct IO to fall back to
buffered one if the inode requires data checksum.

This will cause a small performance drop, to solve the false data
checksum problem caused by direct IOs.

Although such change is small to most end users, for those requiring
zero-copy direct IO this will be a behavior change, and require a proper
documentation update.

Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/ch-checksumming.rst | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)

Comments

Johannes Thumshirn Feb. 14, 2025, 10:34 a.m. UTC | #1
On 13.02.25 23:02, Qu Wenruo wrote:

Not a native English speaker either but I think it should be:

> +.. note::
> +   Since data checksum is calculated just before submitting to the block device,
            ^~the
> +   btrfs has a strong requirement that those data can not be modified until the
               this data/those data blocks ~^
> +   writeback is finished.
> +
> +   This requirement is met for buffered IO as btrfs has full control on the
> +   page cache, but direct IOs (``O_DIRECT``) bypass the page cache, and btrfs
                                           bypasses ~^
> +   can not control the direct IO buffer (can be user space memory), thus it's
           as it can be in user space memory ~^
> +   possible that user space programs modify the buffer before it's fully written
> +   back, and lead to data checksum mismatch.
        this leads ~^   ^~a
> +
> +   To avoid such checksum mismatch, since v6.14 btrfs will force direct IOs to
                 a ~^
> +   fall back to buffered IOs, if the inode requires data checksum.
                                                    a ~^
> +   This will bring a small performance penalty, if the end user requires true
> +   zero-copy direct IOs, they should set the ``NODATASUM`` flag for the inode
> +   and make sure the direct IO buffer is fully aligned to btrfs block size.
> +
> +

Byte,
	Johannes
diff mbox series

Patch

diff --git a/Documentation/ch-checksumming.rst b/Documentation/ch-checksumming.rst
index 5e47a6bfb492..782191692746 100644
--- a/Documentation/ch-checksumming.rst
+++ b/Documentation/ch-checksumming.rst
@@ -3,6 +3,24 @@  writing and verified after reading the blocks from devices. The whole metadata
 block has an inline checksum stored in the b-tree node header. Each data block
 has a detached checksum stored in the checksum tree.
 
+.. note::
+   Since data checksum is calculated just before submitting to the block device,
+   btrfs has a strong requirement that those data can not be modified until the
+   writeback is finished.
+
+   This requirement is met for buffered IO as btrfs has full control on the
+   page cache, but direct IOs (``O_DIRECT``) bypass the page cache, and btrfs
+   can not control the direct IO buffer (can be user space memory), thus it's
+   possible that user space programs modify the buffer before it's fully written
+   back, and lead to data checksum mismatch.
+
+   To avoid such checksum mismatch, since v6.14 btrfs will force direct IOs to
+   fall back to buffered IOs, if the inode requires data checksum.
+   This will bring a small performance penalty, if the end user requires true
+   zero-copy direct IOs, they should set the ``NODATASUM`` flag for the inode
+   and make sure the direct IO buffer is fully aligned to btrfs block size.
+
+
 There are several checksum algorithms supported. The default and backward
 compatible algorithm is *crc32c*. Since kernel 5.5 there are three more with different
 characteristics and trade-offs regarding speed and strength. The following list