Message ID | cover.1716053516.git.fdmanana@suse.com (mailing list archive) |
---|---|
Headers | show |
Series | btrfs: fix logging unwritten extents after failure in write paths | expand |
在 2024/5/20 19:16, fdmanana@kernel.org 写道: > From: Filipe Manana <fdmanana@suse.com> > > There's a bug where a fast fsync can log extent maps that were not written > due to an error in a write path or during writeback. This affects both > direct IO writes and buffered writes, and besides the failure depends on > a race due to the fact that ordered extent completion happens in a work > queue and a fast fsync doesn't wait for ordered extent completion before > logging. The details are in the change log of the first patch. > > V4: Use a slightly different approach to avoid a deadlock on the inode's > spinlock due to it being used both in irq and non-irq context, pointed > out by Qu. > Added some cleanup patches (patches 3, 4, 5 and 6). The whole series looks good to me. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > > V3: Change the approach of patch 1/2 to not drop extent maps at > btrfs_finish_ordered_extent() since that runs in irq context and > dropping an extent map range triggers NOFS extent map allocations, > which can trigger a reclaim and that can't run in irq context. > Updated comments and changelog to distinguish differences between > failures for direct IO writes and buffered writes. > > V2: Rework solution since other error paths caused the same problem, make > it more generic. > Added more details to change log and comment about what's going on, > and why reads aren't affected. > > https://lore.kernel.org/linux-btrfs/cover.1715798440.git.fdmanana@suse.com/ > > V1: https://lore.kernel.org/linux-btrfs/cover.1715688057.git.fdmanana@suse.com/ > > Filipe Manana (6): > btrfs: ensure fast fsync waits for ordered extents after a write failure > btrfs: make btrfs_finish_ordered_extent() return void > btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx) > btrfs: pass a btrfs_inode to btrfs_fdatawrite_range() > btrfs: pass a btrfs_inode to btrfs_wait_ordered_range() > btrfs: use a btrfs_inode local variable at btrfs_sync_file() > > fs/btrfs/btrfs_inode.h | 10 ++++++ > fs/btrfs/file.c | 63 ++++++++++++++++++++++--------------- > fs/btrfs/file.h | 2 +- > fs/btrfs/free-space-cache.c | 4 +-- > fs/btrfs/inode.c | 16 +++++----- > fs/btrfs/ordered-data.c | 40 ++++++++++++++++++++--- > fs/btrfs/ordered-data.h | 4 +-- > fs/btrfs/reflink.c | 8 ++--- > fs/btrfs/relocation.c | 2 +- > fs/btrfs/tree-log.c | 10 +++--- > fs/btrfs/tree-log.h | 4 +-- > 11 files changed, 108 insertions(+), 55 deletions(-) >
From: Filipe Manana <fdmanana@suse.com> There's a bug where a fast fsync can log extent maps that were not written due to an error in a write path or during writeback. This affects both direct IO writes and buffered writes, and besides the failure depends on a race due to the fact that ordered extent completion happens in a work queue and a fast fsync doesn't wait for ordered extent completion before logging. The details are in the change log of the first patch. V4: Use a slightly different approach to avoid a deadlock on the inode's spinlock due to it being used both in irq and non-irq context, pointed out by Qu. Added some cleanup patches (patches 3, 4, 5 and 6). V3: Change the approach of patch 1/2 to not drop extent maps at btrfs_finish_ordered_extent() since that runs in irq context and dropping an extent map range triggers NOFS extent map allocations, which can trigger a reclaim and that can't run in irq context. Updated comments and changelog to distinguish differences between failures for direct IO writes and buffered writes. V2: Rework solution since other error paths caused the same problem, make it more generic. Added more details to change log and comment about what's going on, and why reads aren't affected. https://lore.kernel.org/linux-btrfs/cover.1715798440.git.fdmanana@suse.com/ V1: https://lore.kernel.org/linux-btrfs/cover.1715688057.git.fdmanana@suse.com/ Filipe Manana (6): btrfs: ensure fast fsync waits for ordered extents after a write failure btrfs: make btrfs_finish_ordered_extent() return void btrfs: use a btrfs_inode in the log context (struct btrfs_log_ctx) btrfs: pass a btrfs_inode to btrfs_fdatawrite_range() btrfs: pass a btrfs_inode to btrfs_wait_ordered_range() btrfs: use a btrfs_inode local variable at btrfs_sync_file() fs/btrfs/btrfs_inode.h | 10 ++++++ fs/btrfs/file.c | 63 ++++++++++++++++++++++--------------- fs/btrfs/file.h | 2 +- fs/btrfs/free-space-cache.c | 4 +-- fs/btrfs/inode.c | 16 +++++----- fs/btrfs/ordered-data.c | 40 ++++++++++++++++++++--- fs/btrfs/ordered-data.h | 4 +-- fs/btrfs/reflink.c | 8 ++--- fs/btrfs/relocation.c | 2 +- fs/btrfs/tree-log.c | 10 +++--- fs/btrfs/tree-log.h | 4 +-- 11 files changed, 108 insertions(+), 55 deletions(-)