diff mbox series

[RFC] btrfs : avoid O_DIRECT when the file is protected by CSUM

Message ID 5d52220b-177d-72d4-7825-dbe6cbf8722f@inwind.it (mailing list archive)
State New, archived
Headers show
Series [RFC] btrfs : avoid O_DIRECT when the file is protected by CSUM | expand

Commit Message

Goffredo Baroncelli Jan. 11, 2021, 10:19 p.m. UTC
Dear all,

this is an RFC. I made very few test, so use at your risk.

In the past there were several reports [1][2] about corruption when O_DIRECT is used.
Every time the analysis was the same: it is too difficult to sync the checksum
with the data when O_DIRECT is used.

This aim of this patch is to avoid the corruption disabling the O_DIRECT write when
the file is NOT marked as NODATACSUM. In other words, O_DIRECT is honored only for
the file not protected by CSUM: O_DIRECT ^ DATASUM

The user-space is not informed that O_DIRECT is not honored.

On the best of my knowledge, ZFS does the same thing.

It is not a regression: today an O_DIRECT file update may compromise the checksum
calculation. So may be the file content is correct, but the checksum no.This prevent
to read the file.

In [2] there is a test program which trigger the problem. With this patch the program
doesn't trigger the problem.

Open question:
- does the kernel return an error when a file is opened with O_DIRECT and the file has
   the checksum ?
- does make sense to have a mount option to select different behaviours:
	1) let the kernel to behave as today (O_DIRECT is allowed even at risk of
            having some form of corruption)
	2) let the kernel to ignore O_DIRECT if the file is protected by checksum
	3) let the kernel to return error when O_DIRECT is used with file with
            checksum
    ?

Comments are welcome.

BR
G.Baroncelli

[1] https://lore.kernel.org/linux-btrfs/1ad3962943592e9a60f88aecdb493f368c70bbe1.camel@infradead.org/#r
[2] https://lore.kernel.org/linux-btrfs/cf8a733f-2c9d-7ffe-e865-4c13d99dfb60@libero.it/


-----




-----------
diff mbox series

Patch

diff --git a/fs/btrfs/file.c b/fs/btrfs/file.c
index 0e41459b8de6..af73157e8200 100644
--- a/fs/btrfs/file.c
+++ b/fs/btrfs/file.c
@@ -2018,7 +2018,12 @@  static ssize_t btrfs_file_write_iter(struct kiocb *iocb,
         if (sync)
                 atomic_inc(&BTRFS_I(inode)->sync_writers);
  
-       if (iocb->ki_flags & IOCB_DIRECT)
+       /*
+        * O_DIRECT doesn't play well with CSUM, so allow the O_DIRECT
+        * only if the file is marked BTRFS_INODE_NODATASUM
+        */
+       if (iocb->ki_flags & IOCB_DIRECT &&
+           (BTRFS_I(inode)->flags & BTRFS_INODE_NODATASUM))
                 num_written = btrfs_direct_write(iocb, from);
         else
                 num_written = btrfs_buffered_write(iocb, from);