diff mbox series

btrfs-progs: docs: add an extra note to btrfs data checksum and directIO

Message ID 3ec3396e9b4c02031a5a050763557c82a8852c75.1739780705.git.wqu@suse.com (mailing list archive)
State New
Headers show
Series btrfs-progs: docs: add an extra note to btrfs data checksum and directIO | expand

Commit Message

Qu Wenruo Feb. 26, 2025, 3:59 a.m. UTC
In v6.14 kernel release, btrfs will force a direct IO to fall back to
a buffered one if the inode requires a data checksum.

This will cause a small performance drop, to solve the false data
checksum mismatch problem caused by direct IOs.

Although such a change is small to most end users, for those requiring
such a zero-copy direct IO this will be a behavior change, and this
requires a proper documentation update.

Reviewed-by: Johannes Thumshirn <johannes.thumshirn@wdc.com>
Signed-off-by: Qu Wenruo <wqu@suse.com>
---
 Documentation/ch-checksumming.rst | 18 ++++++++++++++++++
 1 file changed, 18 insertions(+)
diff mbox series

Patch

diff --git a/Documentation/ch-checksumming.rst b/Documentation/ch-checksumming.rst
index 5e47a6bfb492..b7fde46fe902 100644
--- a/Documentation/ch-checksumming.rst
+++ b/Documentation/ch-checksumming.rst
@@ -3,6 +3,24 @@  writing and verified after reading the blocks from devices. The whole metadata
 block has an inline checksum stored in the b-tree node header. Each data block
 has a detached checksum stored in the checksum tree.
 
+.. note::
+   Since a data checksum is calculated just before submitting to the block
+   device, btrfs has a strong requirement that the coresponding data block must
+   not be modified until the writeback is finished.
+
+   This requirement is met for a buffered write as btrfs has the full control on
+   its page caches, but a direct write (``O_DIRECT``) bypasses page caches, and
+   btrfs can not control the direct IO buffer (as it can be in user space memory),
+   thus it's possible that a user space program modifies its direct write buffer
+   before the buffer is fully written back, and this can lead to a data checksum mismatch.
+
+   To avoid such a checksum mismatch, since v6.14 btrfs will force a direct
+   write to fall back to a buffered one, if the inode requires a data checksum.
+   This will bring a small performance penalty, and if the end user requires true
+   zero-copy direct writes, they should set the ``NODATASUM`` flag for the inode
+   and make sure the direct IO buffer is fully aligned to btrfs block size.
+
+
 There are several checksum algorithms supported. The default and backward
 compatible algorithm is *crc32c*. Since kernel 5.5 there are three more with different
 characteristics and trade-offs regarding speed and strength. The following list