Message ID | 7e4b248dad75a0f0dd3a41b4a3af138a418b05d8.1727342969.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: delayed refs and qgroups, fixes, cleanups, improvements | expand |
Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org: > From: Filipe Manana <fdmanana@suse.com> > > When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(), > if we fail to insert the qgroup record we don't error out, we ignore it. > In fact we treat it as if there was no error and there was already an > existing record - we don't distinguish between the cases where > btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already > existed and we can free the given record, and the case where it returns > a negative error value, meaning the insertion into the xarray that is > used to track records failed. > > Effectively we end up ignoring that we are lacking qgroup record in the > dirty extents xarray, resulting in incorrect qgroup accounting. > > Fix this by checking for errors and return them to the callers. > > Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction") > Signed-off-by: Filipe Manana <fdmanana@suse.com> > --- > [... > @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, > * insert both the head node and the new ref without dropping > * the spin lock > */ > - head_ref = add_delayed_ref_head(trans, head_ref, record, > - action, &qrecord_inserted); > + new_head_ref = add_delayed_ref_head(trans, head_ref, record, > + action, &qrecord_inserted); > + if (IS_ERR(new_head_ref)) { > + spin_unlock(&delayed_refs->lock); > + ret = PTR_ERR(new_head_ref); > + goto free_record; > + } > + head_ref = new_head_ref; > There is a chance (not sure how big a chance) that head_ref is going to be freed twice. In the IS_ERR true path, head_ref still has the old value from before calling add_delayed_ref_head. However, in add_delayed_ref_head is is possible that head_ref is freed and replaced. If IS_ERR(new_head_ref) is true the code jumps to the end of the function where (the old) head_ref is freed again. Is it perhaps possible to assign head_ref to the new value before doing the IS_ERR check? In other words, do this: head_ref = new_head_ref; if (IS_ERR(new_head_ref)) { spin_unlock(&delayed_refs->lock); ret = PTR_ERR(new_head_ref); goto free_record; }
On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote: > > Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org: > > From: Filipe Manana <fdmanana@suse.com> > > > > When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(), > > if we fail to insert the qgroup record we don't error out, we ignore it. > > In fact we treat it as if there was no error and there was already an > > existing record - we don't distinguish between the cases where > > btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already > > existed and we can free the given record, and the case where it returns > > a negative error value, meaning the insertion into the xarray that is > > used to track records failed. > > > > Effectively we end up ignoring that we are lacking qgroup record in the > > dirty extents xarray, resulting in incorrect qgroup accounting. > > > > Fix this by checking for errors and return them to the callers. > > > > Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction") > > Signed-off-by: Filipe Manana <fdmanana@suse.com> > > --- > > [... > > @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, > > * insert both the head node and the new ref without dropping > > * the spin lock > > */ > > - head_ref = add_delayed_ref_head(trans, head_ref, record, > > - action, &qrecord_inserted); > > + new_head_ref = add_delayed_ref_head(trans, head_ref, record, > > + action, &qrecord_inserted); > > + if (IS_ERR(new_head_ref)) { > > + spin_unlock(&delayed_refs->lock); > > + ret = PTR_ERR(new_head_ref); > > + goto free_record; > > + } > > + head_ref = new_head_ref; > > > There is a chance (not sure how big a chance) that head_ref is going to > be freed twice. > > In the IS_ERR true path, head_ref still has the old value from before > calling add_delayed_ref_head. > However, in add_delayed_ref_head is is possible that head_ref is freed > and replaced. If > IS_ERR(new_head_ref) is true the code jumps to the end of the function > where (the old) head_ref > is freed again. No, it's not possible to have a double free. add_delayed_ref_head() never frees the given head_ref in case it returns an error - it's the caller's responsibility to free it. Thanks. > > Is it perhaps possible to assign head_ref to the new value before doing > the IS_ERR check? > In other words, do this: > head_ref = new_head_ref; > if (IS_ERR(new_head_ref)) { > spin_unlock(&delayed_refs->lock); > ret = PTR_ERR(new_head_ref); > goto free_record; > } > > -- > Kees
Op 08-10-2024 om 22:03 schreef Filipe Manana: > On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote: >> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org: >>> From: Filipe Manana <fdmanana@suse.com> >>> >>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(), >>> if we fail to insert the qgroup record we don't error out, we ignore it. >>> In fact we treat it as if there was no error and there was already an >>> existing record - we don't distinguish between the cases where >>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already >>> existed and we can free the given record, and the case where it returns >>> a negative error value, meaning the insertion into the xarray that is >>> used to track records failed. >>> >>> Effectively we end up ignoring that we are lacking qgroup record in the >>> dirty extents xarray, resulting in incorrect qgroup accounting. >>> >>> Fix this by checking for errors and return them to the callers. >>> >>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction") >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> >>> --- >>> [... >>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, >>> * insert both the head node and the new ref without dropping >>> * the spin lock >>> */ >>> - head_ref = add_delayed_ref_head(trans, head_ref, record, >>> - action, &qrecord_inserted); >>> + new_head_ref = add_delayed_ref_head(trans, head_ref, record, >>> + action, &qrecord_inserted); >>> + if (IS_ERR(new_head_ref)) { >>> + spin_unlock(&delayed_refs->lock); >>> + ret = PTR_ERR(new_head_ref); >>> + goto free_record; >>> + } >>> + head_ref = new_head_ref; >>> >> There is a chance (not sure how big a chance) that head_ref is going to >> be freed twice. >> >> In the IS_ERR true path, head_ref still has the old value from before >> calling add_delayed_ref_head. >> However, in add_delayed_ref_head is is possible that head_ref is freed >> and replaced. If >> IS_ERR(new_head_ref) is true the code jumps to the end of the function >> where (the old) head_ref >> is freed again. > No, it's not possible to have a double free. > add_delayed_ref_head() never frees the given head_ref in case it > returns an error - it's the caller's responsibility to free it. > > Thanks. OK, but ... in add_delayed_ref_head() I see the following on line 881 existing = htree_insert(&delayed_refs->href_root, &head_ref->href_node); if (existing) { update_existing_head_ref(trans, existing, head_ref); /* * we've updated the existing ref, free the newly * allocated ref */ kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); head_ref = existing; It's calling kmem_cache_free with (the old) head_ref, which can be repeated in add_delayed_ref > >> Is it perhaps possible to assign head_ref to the new value before doing >> the IS_ERR check? >> In other words, do this: >> head_ref = new_head_ref; >> if (IS_ERR(new_head_ref)) { >> spin_unlock(&delayed_refs->lock); >> ret = PTR_ERR(new_head_ref); >> goto free_record; >> } >> >> -- >> Kees
On Tue, Oct 8, 2024 at 9:21 PM Kees Bakker <kees@ijzerbout.nl> wrote: > > Op 08-10-2024 om 22:03 schreef Filipe Manana: > > On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote: > >> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org: > >>> From: Filipe Manana <fdmanana@suse.com> > >>> > >>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(), > >>> if we fail to insert the qgroup record we don't error out, we ignore it. > >>> In fact we treat it as if there was no error and there was already an > >>> existing record - we don't distinguish between the cases where > >>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already > >>> existed and we can free the given record, and the case where it returns > >>> a negative error value, meaning the insertion into the xarray that is > >>> used to track records failed. > >>> > >>> Effectively we end up ignoring that we are lacking qgroup record in the > >>> dirty extents xarray, resulting in incorrect qgroup accounting. > >>> > >>> Fix this by checking for errors and return them to the callers. > >>> > >>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction") > >>> Signed-off-by: Filipe Manana <fdmanana@suse.com> > >>> --- > >>> [... > >>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, > >>> * insert both the head node and the new ref without dropping > >>> * the spin lock > >>> */ > >>> - head_ref = add_delayed_ref_head(trans, head_ref, record, > >>> - action, &qrecord_inserted); > >>> + new_head_ref = add_delayed_ref_head(trans, head_ref, record, > >>> + action, &qrecord_inserted); > >>> + if (IS_ERR(new_head_ref)) { > >>> + spin_unlock(&delayed_refs->lock); > >>> + ret = PTR_ERR(new_head_ref); > >>> + goto free_record; > >>> + } > >>> + head_ref = new_head_ref; > >>> > >> There is a chance (not sure how big a chance) that head_ref is going to > >> be freed twice. > >> > >> In the IS_ERR true path, head_ref still has the old value from before > >> calling add_delayed_ref_head. > >> However, in add_delayed_ref_head is is possible that head_ref is freed > >> and replaced. If > >> IS_ERR(new_head_ref) is true the code jumps to the end of the function > >> where (the old) head_ref > >> is freed again. > > No, it's not possible to have a double free. > > add_delayed_ref_head() never frees the given head_ref in case it > > returns an error - it's the caller's responsibility to free it. > > > > Thanks. > OK, but ... in add_delayed_ref_head() I see the following on line 881 > > existing = htree_insert(&delayed_refs->href_root, > &head_ref->href_node); > if (existing) { > update_existing_head_ref(trans, existing, head_ref); > /* > * we've updated the existing ref, free the newly > * allocated ref > */ > kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); > head_ref = existing; > > It's calling kmem_cache_free with (the old) head_ref, which can be > repeated in add_delayed_ref No, it can't because after line 881, the one which calls kmem_cache_free(), we never return an error pointer. Thanks. > > > >> Is it perhaps possible to assign head_ref to the new value before doing > >> the IS_ERR check? > >> In other words, do this: > >> head_ref = new_head_ref; > >> if (IS_ERR(new_head_ref)) { > >> spin_unlock(&delayed_refs->lock); > >> ret = PTR_ERR(new_head_ref); > >> goto free_record; > >> } > >> > >> -- > >> Kees >
diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c index ad9ef8312e41..32f719b9e661 100644 --- a/fs/btrfs/delayed-ref.c +++ b/fs/btrfs/delayed-ref.c @@ -840,6 +840,8 @@ static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref, * helper function to actually insert a head node into the rbtree. * this does all the dirty work in terms of maintaining the correct * overall modification count. + * + * Returns an error pointer in case of an error. */ static noinline struct btrfs_delayed_ref_head * add_delayed_ref_head(struct btrfs_trans_handle *trans, @@ -862,6 +864,9 @@ add_delayed_ref_head(struct btrfs_trans_handle *trans, if (ret) { /* Clean up if insertion fails or item exists. */ xa_release(&delayed_refs->dirty_extents, qrecord->bytenr); + /* Caller responsible for freeing qrecord on error. */ + if (ret < 0) + return ERR_PTR(ret); kfree(qrecord); } else { qrecord_inserted = true; @@ -1000,27 +1005,35 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, struct btrfs_fs_info *fs_info = trans->fs_info; struct btrfs_delayed_ref_node *node; struct btrfs_delayed_ref_head *head_ref; + struct btrfs_delayed_ref_head *new_head_ref; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_qgroup_extent_record *record = NULL; bool qrecord_inserted; int action = generic_ref->action; bool merged; + int ret; node = kmem_cache_alloc(btrfs_delayed_ref_node_cachep, GFP_NOFS); if (!node) return -ENOMEM; head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS); - if (!head_ref) + if (!head_ref) { + ret = -ENOMEM; goto free_node; + } if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) { record = kzalloc(sizeof(*record), GFP_NOFS); - if (!record) + if (!record) { + ret = -ENOMEM; goto free_head_ref; + } if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents, - generic_ref->bytenr, GFP_NOFS)) + generic_ref->bytenr, GFP_NOFS)) { + ret = -ENOMEM; goto free_record; + } } init_delayed_ref_common(fs_info, node, generic_ref); @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, * insert both the head node and the new ref without dropping * the spin lock */ - head_ref = add_delayed_ref_head(trans, head_ref, record, - action, &qrecord_inserted); + new_head_ref = add_delayed_ref_head(trans, head_ref, record, + action, &qrecord_inserted); + if (IS_ERR(new_head_ref)) { + spin_unlock(&delayed_refs->lock); + ret = PTR_ERR(new_head_ref); + goto free_record; + } + head_ref = new_head_ref; merged = insert_delayed_ref(trans, head_ref, node); spin_unlock(&delayed_refs->lock); @@ -1063,7 +1082,7 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans, kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); free_node: kmem_cache_free(btrfs_delayed_ref_node_cachep, node); - return -ENOMEM; + return ret; } /* @@ -1094,6 +1113,7 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans, struct btrfs_delayed_extent_op *extent_op) { struct btrfs_delayed_ref_head *head_ref; + struct btrfs_delayed_ref_head *head_ref_ret; struct btrfs_delayed_ref_root *delayed_refs; struct btrfs_ref generic_ref = { .type = BTRFS_REF_METADATA, @@ -1113,11 +1133,15 @@ int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans, delayed_refs = &trans->transaction->delayed_refs; spin_lock(&delayed_refs->lock); - add_delayed_ref_head(trans, head_ref, NULL, BTRFS_UPDATE_DELAYED_HEAD, - NULL); - + head_ref_ret = add_delayed_ref_head(trans, head_ref, NULL, + BTRFS_UPDATE_DELAYED_HEAD, NULL); spin_unlock(&delayed_refs->lock); + if (IS_ERR(head_ref_ret)) { + kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref); + return PTR_ERR(head_ref_ret); + } + /* * Need to update the delayed_refs_rsv with any changes we may have * made.