Message ID | 20241008064849.1814829-1-haisuwang@tencent.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: fix the length of reserved qgroup to free | expand |
在 2024/10/8 17:18, iamhswang@gmail.com 写道: > From: Haisu Wang <haisuwang@tencent.com> > > The dealloc flag may be cleared and the extent won't reach the disk > in cow_file_range when errors path. The reserved qgroup space is > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in > cow_file_range"). However, the length of untouched region to free > need to be adjusted with the region size. > > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") > Signed-off-by: Haisu Wang <haisuwang@tencent.com> Right, just several lines before that, we increased @start by @cur_alloc_size if @extent_reserved is true. So we can not directly use the old range size. You can improve that one step further by not modifying @start just for the error handling path, although that should be another patch. Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > fs/btrfs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index b0ad46b734c3..5eefa2318fa8 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > clear_bits |= EXTENT_CLEAR_DATA_RESV; > extent_clear_unlock_delalloc(inode, start, end, locked_folio, > &cached, clear_bits, page_ops); > - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); > + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); > } > return ret; > }
Qu Wenruo <quwenruo.btrfs@gmx.com> 于2024年10月8日周二 15:56写道: > > > > 在 2024/10/8 17:18, iamhswang@gmail.com 写道: > > From: Haisu Wang <haisuwang@tencent.com> > > > > The dealloc flag may be cleared and the extent won't reach the disk > > in cow_file_range when errors path. The reserved qgroup space is > > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in > > cow_file_range"). However, the length of untouched region to free > > need to be adjusted with the region size. > > > > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") > > Signed-off-by: Haisu Wang <haisuwang@tencent.com> > > Right, just several lines before that, we increased @start by > @cur_alloc_size if @extent_reserved is true. > > So we can not directly use the old range size. Thanks for the review. > > You can improve that one step further by not modifying @start just for > the error handling path, although that should be another patch. Indeed, modify the start value based on @extent_reserved in error path only is tricky and ambiguous. I agree to keep the fix as simple as possible (like the previous patch), since commit 30479f31d44d ("btrfs: fix qgroupreserve leaks in cow_file_range") assigned to CVE-2024-46733 already. A simple fix is easier to port to stable branch of different versions. Also the possible change to keep @start is more like an enhancement instead of a fix. > > Reviewed-by: Qu Wenruo <wqu@suse.com> > > Thanks, > Qu To make sure we are on the same page of keeping the @start unchanged. I write a POC below for your opinion. (Anyway, i will think/test again before convert POC to a PATCH.) The @start will advanced in every succeed reservation, the @cur_alloc_size can represent the @extent_reserved state instead of using a standalone @extent_reserved flag. In this case, the @start region no longer need to be modified based on @extent_reserved state in the error path. diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index 5eefa2318fa8..0c35292550bd 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1341,7 +1341,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode, struct extent_map *em; unsigned clear_bits; unsigned long page_ops; - bool extent_reserved = false; int ret = 0; if (btrfs_is_free_space_inode(inode)) { @@ -1395,8 +1394,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, struct btrfs_ordered_extent *ordered; struct btrfs_file_extent file_extent; - cur_alloc_size = num_bytes; - ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size, + ret = btrfs_reserve_extent(root, num_bytes, num_bytes, min_alloc_size, 0, alloc_hint, &ins, 1, 1); if (ret == -EAGAIN) { @@ -1427,7 +1425,6 @@ static noinline int cow_file_range(struct btrfs_inode *inode, if (ret < 0) goto out_unlock; cur_alloc_size = ins.offset; - extent_reserved = true; ram_size = ins.offset; file_extent.disk_bytenr = ins.objectid; @@ -1503,7 +1500,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, num_bytes -= cur_alloc_size; alloc_hint = ins.objectid + ins.offset; start += cur_alloc_size; - extent_reserved = false; + cur_alloc_size = 0; /* * btrfs_reloc_clone_csums() error, since start is increased @@ -1573,13 +1570,12 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * to decrement again the data space_info's bytes_may_use counter, * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV. */ - if (extent_reserved) { + if (cur_alloc_size) { extent_clear_unlock_delalloc(inode, start, start + cur_alloc_size - 1, locked_folio, &cached, clear_bits, page_ops); btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); - start += cur_alloc_size; } /* @@ -1588,11 +1584,13 @@ static noinline int cow_file_range(struct btrfs_inode *inode, * space_info's bytes_may_use counter, reserved in * btrfs_check_data_free_space(). */ - if (start < end) { + if (start + cur_alloc_size < end) { clear_bits |= EXTENT_CLEAR_DATA_RESV; - extent_clear_unlock_delalloc(inode, start, end, locked_folio, + extent_clear_unlock_delalloc(inode, start + cur_alloc_size, + end, locked_folio, &cached, clear_bits, page_ops); - btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); + btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size, + end - start - cur_alloc_size + 1, NULL); } return ret; } Thanks, Haisu Wang > > > --- > > fs/btrfs/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index b0ad46b734c3..5eefa2318fa8 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > clear_bits |= EXTENT_CLEAR_DATA_RESV; > > extent_clear_unlock_delalloc(inode, start, end, locked_folio, > > &cached, clear_bits, page_ops); > > - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); > > + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); > > } > > return ret; > > } >
On Tue, Oct 08, 2024 at 02:48:46PM +0800, iamhswang@gmail.com wrote: > From: Haisu Wang <haisuwang@tencent.com> > > The dealloc flag may be cleared and the extent won't reach the disk > in cow_file_range when errors path. The reserved qgroup space is > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in > cow_file_range"). However, the length of untouched region to free > need to be adjusted with the region size. > > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") > Signed-off-by: Haisu Wang <haisuwang@tencent.com> Good catch and fix, thank you! Reviewed-by: Boris Burkov <boris@bur.io> Can you please share more information about how you reproduced and tested this issue for the fix? In one of the other emails in the chain, you also mentioned a CVE, so explaining the specific impact of the bug is helpful too. As far as I can tell, we risk freeing too much space past the real desired range if start gets bumped before this free, which could lead to prematurely freeing some other rsv marked data past end. This naturally leads to incorrect accounting, And I think would allow us to reserve this same range again. Though perhaps delalloc extent range stuff would prevent that. Between that, and the changesets gating most of the qgroup freeing, it's hard to actually see what happens :) Long ramble short: do you have a reproducer? > --- > fs/btrfs/inode.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index b0ad46b734c3..5eefa2318fa8 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > clear_bits |= EXTENT_CLEAR_DATA_RESV; > extent_clear_unlock_delalloc(inode, start, end, locked_folio, > &cached, clear_bits, page_ops); > - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); > + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); > } > return ret; > } > -- > 2.39.3 >
在 2024/10/8 21:48, hs wang 写道: > Qu Wenruo <quwenruo.btrfs@gmx.com> 于2024年10月8日周二 15:56写道: >> >> >> >> 在 2024/10/8 17:18, iamhswang@gmail.com 写道: >>> From: Haisu Wang <haisuwang@tencent.com> >>> >>> The dealloc flag may be cleared and the extent won't reach the disk >>> in cow_file_range when errors path. The reserved qgroup space is >>> freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in >>> cow_file_range"). However, the length of untouched region to free >>> need to be adjusted with the region size. >>> >>> Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") >>> Signed-off-by: Haisu Wang <haisuwang@tencent.com> >> >> Right, just several lines before that, we increased @start by >> @cur_alloc_size if @extent_reserved is true. >> >> So we can not directly use the old range size. > > Thanks for the review. > >> >> You can improve that one step further by not modifying @start just for >> the error handling path, although that should be another patch. > > Indeed, modify the start value based on @extent_reserved in > error path only is tricky and ambiguous. > > I agree to keep the fix as simple as possible (like the previous patch), > since commit 30479f31d44d ("btrfs: fix qgroupreserve leaks in > cow_file_range") assigned to CVE-2024-46733 already. > A simple fix is easier to port to stable branch of different versions. > Also the possible change to keep @start is more like an > enhancement instead of a fix. > >> >> Reviewed-by: Qu Wenruo <wqu@suse.com> >> >> Thanks, >> Qu > > To make sure we are on the same page of keeping the @start > unchanged. I write a POC below for your opinion. > (Anyway, i will think/test again before convert POC to a PATCH.) > > The @start will advanced in every succeed reservation, the > @cur_alloc_size can represent the @extent_reserved state > instead of using a standalone @extent_reserved flag. > In this case, the @start region no longer need to be modified > based on @extent_reserved state in the error path. This snippet looks good to me. Thanks, Qu > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > index 5eefa2318fa8..0c35292550bd 100644 > --- a/fs/btrfs/inode.c > +++ b/fs/btrfs/inode.c > @@ -1341,7 +1341,6 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > struct extent_map *em; > unsigned clear_bits; > unsigned long page_ops; > - bool extent_reserved = false; > int ret = 0; > > if (btrfs_is_free_space_inode(inode)) { > @@ -1395,8 +1394,7 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > struct btrfs_ordered_extent *ordered; > struct btrfs_file_extent file_extent; > > - cur_alloc_size = num_bytes; > - ret = btrfs_reserve_extent(root, cur_alloc_size, cur_alloc_size, > + ret = btrfs_reserve_extent(root, num_bytes, num_bytes, > min_alloc_size, 0, alloc_hint, > &ins, 1, 1); > if (ret == -EAGAIN) { > @@ -1427,7 +1425,6 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > if (ret < 0) > goto out_unlock; > cur_alloc_size = ins.offset; > - extent_reserved = true; > > ram_size = ins.offset; > file_extent.disk_bytenr = ins.objectid; > @@ -1503,7 +1500,7 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > num_bytes -= cur_alloc_size; > alloc_hint = ins.objectid + ins.offset; > start += cur_alloc_size; > - extent_reserved = false; > + cur_alloc_size = 0; > > /* > * btrfs_reloc_clone_csums() error, since start is increased > @@ -1573,13 +1570,12 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > * to decrement again the data space_info's bytes_may_use counter, > * therefore we do not pass it the flag EXTENT_CLEAR_DATA_RESV. > */ > - if (extent_reserved) { > + if (cur_alloc_size) { > extent_clear_unlock_delalloc(inode, start, > start + cur_alloc_size - 1, > locked_folio, &cached, clear_bits, > page_ops); > btrfs_qgroup_free_data(inode, NULL, start, > cur_alloc_size, NULL); > - start += cur_alloc_size; > } > > /* > @@ -1588,11 +1584,13 @@ static noinline int cow_file_range(struct > btrfs_inode *inode, > * space_info's bytes_may_use counter, reserved in > * btrfs_check_data_free_space(). > */ > - if (start < end) { > + if (start + cur_alloc_size < end) { > clear_bits |= EXTENT_CLEAR_DATA_RESV; > - extent_clear_unlock_delalloc(inode, start, end, locked_folio, > + extent_clear_unlock_delalloc(inode, start + cur_alloc_size, > + end, locked_folio, > &cached, clear_bits, page_ops); > - btrfs_qgroup_free_data(inode, NULL, start, end - start > + 1, NULL); > + btrfs_qgroup_free_data(inode, NULL, start + cur_alloc_size, > + end - start - cur_alloc_size + 1, NULL); > } > return ret; > } > > > Thanks, > Haisu Wang > >> >>> --- >>> fs/btrfs/inode.c | 2 +- >>> 1 file changed, 1 insertion(+), 1 deletion(-) >>> >>> diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c >>> index b0ad46b734c3..5eefa2318fa8 100644 >>> --- a/fs/btrfs/inode.c >>> +++ b/fs/btrfs/inode.c >>> @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, >>> clear_bits |= EXTENT_CLEAR_DATA_RESV; >>> extent_clear_unlock_delalloc(inode, start, end, locked_folio, >>> &cached, clear_bits, page_ops); >>> - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); >>> + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); >>> } >>> return ret; >>> } >>
Boris Burkov <boris@bur.io> 于2024年10月9日周三 00:12写道: > > On Tue, Oct 08, 2024 at 02:48:46PM +0800, iamhswang@gmail.com wrote: > > From: Haisu Wang <haisuwang@tencent.com> > > > > The dealloc flag may be cleared and the extent won't reach the disk > > in cow_file_range when errors path. The reserved qgroup space is > > freed in commit 30479f31d44d ("btrfs: fix qgroup reserve leaks in > > cow_file_range"). However, the length of untouched region to free > > need to be adjusted with the region size. > > > > Fixes: 30479f31d44d ("btrfs: fix qgroup reserve leaks in cow_file_range") > > Signed-off-by: Haisu Wang <haisuwang@tencent.com> > > Good catch and fix, thank you! > Reviewed-by: Boris Burkov <boris@bur.io> > Thanks for the review. > Can you please share more information about how you reproduced and > tested this issue for the fix? In one of the other emails in the chain, > you also mentioned a CVE, so explaining the specific impact of the bug > is helpful too. > Instead of hitting this in the real world, I get this while backporting the CVE-2024-46733 fixes. I need to understand the full story and the extent reservation/clean up context, found the free data region mismatch to the dealloc region and the potential risky. So i write the fix of the inconsistent size. > As far as I can tell, we risk freeing too much space past the real > desired range if start gets bumped before this free, which could lead to > prematurely freeing some other rsv marked data past end. This naturally > leads to incorrect accounting, And I think would allow us to reserve > this same range again. Though perhaps delalloc extent range stuff would > prevent that. Between that, and the changesets gating most of the qgroup > freeing, it's hard to actually see what happens :) > > Long ramble short: do you have a reproducer? > Sadly, i don't have a reproducer yet. In another mail of the chain, Wenruo suggested it is possible to polish the usage of @startand @extent_reserved to make it more clear/safe. I will check more to finish this in another patch, together with generic fstest at least. Thanks, Haisu Wang > > --- > > fs/btrfs/inode.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c > > index b0ad46b734c3..5eefa2318fa8 100644 > > --- a/fs/btrfs/inode.c > > +++ b/fs/btrfs/inode.c > > @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, > > clear_bits |= EXTENT_CLEAR_DATA_RESV; > > extent_clear_unlock_delalloc(inode, start, end, locked_folio, > > &cached, clear_bits, page_ops); > > - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); > > + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); > > } > > return ret; > > } > > -- > > 2.39.3 > >
diff --git a/fs/btrfs/inode.c b/fs/btrfs/inode.c index b0ad46b734c3..5eefa2318fa8 100644 --- a/fs/btrfs/inode.c +++ b/fs/btrfs/inode.c @@ -1592,7 +1592,7 @@ static noinline int cow_file_range(struct btrfs_inode *inode, clear_bits |= EXTENT_CLEAR_DATA_RESV; extent_clear_unlock_delalloc(inode, start, end, locked_folio, &cached, clear_bits, page_ops); - btrfs_qgroup_free_data(inode, NULL, start, cur_alloc_size, NULL); + btrfs_qgroup_free_data(inode, NULL, start, end - start + 1, NULL); } return ret; }