diff mbox series

[v2,1/8] btrfs: fix missing error handling when adding delayed ref with qgroups enabled

Message ID 7e4b248dad75a0f0dd3a41b4a3af138a418b05d8.1727342969.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: delayed refs and qgroups, fixes, cleanups, improvements | expand

Commit Message

Filipe Manana Sept. 26, 2024, 9:33 a.m. UTC
From: Filipe Manana <fdmanana@suse.com>

When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
if we fail to insert the qgroup record we don't error out, we ignore it.
In fact we treat it as if there was no error and there was already an
existing record - we don't distinguish between the cases where
btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
existed and we can free the given record, and the case where it returns
a negative error value, meaning the insertion into the xarray that is
used to track records failed.

Effectively we end up ignoring that we are lacking qgroup record in the
dirty extents xarray, resulting in incorrect qgroup accounting.

Fix this by checking for errors and return them to the callers.

Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/delayed-ref.c | 42 +++++++++++++++++++++++++++++++++---------
 1 file changed, 33 insertions(+), 9 deletions(-)

Comments

Kees Bakker Oct. 8, 2024, 7:39 p.m. UTC | #1
Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> From: Filipe Manana <fdmanana@suse.com>
>
> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> if we fail to insert the qgroup record we don't error out, we ignore it.
> In fact we treat it as if there was no error and there was already an
> existing record - we don't distinguish between the cases where
> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> existed and we can free the given record, and the case where it returns
> a negative error value, meaning the insertion into the xarray that is
> used to track records failed.
>
> Effectively we end up ignoring that we are lacking qgroup record in the
> dirty extents xarray, resulting in incorrect qgroup accounting.
>
> Fix this by checking for errors and return them to the callers.
>
> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
> [...
> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>   	 * insert both the head node and the new ref without dropping
>   	 * the spin lock
>   	 */
> -	head_ref = add_delayed_ref_head(trans, head_ref, record,
> -					action, &qrecord_inserted);
> +	new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> +					    action, &qrecord_inserted);
> +	if (IS_ERR(new_head_ref)) {
> +		spin_unlock(&delayed_refs->lock);
> +		ret = PTR_ERR(new_head_ref);
> +		goto free_record;
> +	}
> +	head_ref = new_head_ref;
>
There is a chance (not sure how big a chance) that head_ref is going to 
be freed twice.

In the IS_ERR true path, head_ref still has the old value from before 
calling add_delayed_ref_head.
However, in add_delayed_ref_head is is possible that head_ref is freed 
and replaced. If
IS_ERR(new_head_ref) is true the code jumps to the end of the function 
where (the old) head_ref
is freed again.

Is it perhaps possible to assign head_ref to the new value before doing 
the IS_ERR check?
In other words, do this:
         head_ref = new_head_ref;
         if (IS_ERR(new_head_ref)) {
                 spin_unlock(&delayed_refs->lock);
                 ret = PTR_ERR(new_head_ref);
                 goto free_record;
         }
Filipe Manana Oct. 8, 2024, 8:03 p.m. UTC | #2
On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>
> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> > if we fail to insert the qgroup record we don't error out, we ignore it.
> > In fact we treat it as if there was no error and there was already an
> > existing record - we don't distinguish between the cases where
> > btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> > existed and we can free the given record, and the case where it returns
> > a negative error value, meaning the insertion into the xarray that is
> > used to track records failed.
> >
> > Effectively we end up ignoring that we are lacking qgroup record in the
> > dirty extents xarray, resulting in incorrect qgroup accounting.
> >
> > Fix this by checking for errors and return them to the callers.
> >
> > Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> > [...
> > @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >        * insert both the head node and the new ref without dropping
> >        * the spin lock
> >        */
> > -     head_ref = add_delayed_ref_head(trans, head_ref, record,
> > -                                     action, &qrecord_inserted);
> > +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> > +                                         action, &qrecord_inserted);
> > +     if (IS_ERR(new_head_ref)) {
> > +             spin_unlock(&delayed_refs->lock);
> > +             ret = PTR_ERR(new_head_ref);
> > +             goto free_record;
> > +     }
> > +     head_ref = new_head_ref;
> >
> There is a chance (not sure how big a chance) that head_ref is going to
> be freed twice.
>
> In the IS_ERR true path, head_ref still has the old value from before
> calling add_delayed_ref_head.
> However, in add_delayed_ref_head is is possible that head_ref is freed
> and replaced. If
> IS_ERR(new_head_ref) is true the code jumps to the end of the function
> where (the old) head_ref
> is freed again.

No, it's not possible to have a double free.
add_delayed_ref_head() never frees the given head_ref in case it
returns an error - it's the caller's responsibility to free it.

Thanks.

>
> Is it perhaps possible to assign head_ref to the new value before doing
> the IS_ERR check?
> In other words, do this:
>          head_ref = new_head_ref;
>          if (IS_ERR(new_head_ref)) {
>                  spin_unlock(&delayed_refs->lock);
>                  ret = PTR_ERR(new_head_ref);
>                  goto free_record;
>          }
>
> --
> Kees
Kees Bakker Oct. 8, 2024, 8:20 p.m. UTC | #3
Op 08-10-2024 om 22:03 schreef Filipe Manana:
> On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
>>> From: Filipe Manana <fdmanana@suse.com>
>>>
>>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
>>> if we fail to insert the qgroup record we don't error out, we ignore it.
>>> In fact we treat it as if there was no error and there was already an
>>> existing record - we don't distinguish between the cases where
>>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
>>> existed and we can free the given record, and the case where it returns
>>> a negative error value, meaning the insertion into the xarray that is
>>> used to track records failed.
>>>
>>> Effectively we end up ignoring that we are lacking qgroup record in the
>>> dirty extents xarray, resulting in incorrect qgroup accounting.
>>>
>>> Fix this by checking for errors and return them to the callers.
>>>
>>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>> [...
>>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
>>>         * insert both the head node and the new ref without dropping
>>>         * the spin lock
>>>         */
>>> -     head_ref = add_delayed_ref_head(trans, head_ref, record,
>>> -                                     action, &qrecord_inserted);
>>> +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
>>> +                                         action, &qrecord_inserted);
>>> +     if (IS_ERR(new_head_ref)) {
>>> +             spin_unlock(&delayed_refs->lock);
>>> +             ret = PTR_ERR(new_head_ref);
>>> +             goto free_record;
>>> +     }
>>> +     head_ref = new_head_ref;
>>>
>> There is a chance (not sure how big a chance) that head_ref is going to
>> be freed twice.
>>
>> In the IS_ERR true path, head_ref still has the old value from before
>> calling add_delayed_ref_head.
>> However, in add_delayed_ref_head is is possible that head_ref is freed
>> and replaced. If
>> IS_ERR(new_head_ref) is true the code jumps to the end of the function
>> where (the old) head_ref
>> is freed again.
> No, it's not possible to have a double free.
> add_delayed_ref_head() never frees the given head_ref in case it
> returns an error - it's the caller's responsibility to free it.
>
> Thanks.
OK, but ... in add_delayed_ref_head() I see the following on line 881

         existing = htree_insert(&delayed_refs->href_root,
                                 &head_ref->href_node);
         if (existing) {
                 update_existing_head_ref(trans, existing, head_ref);
                 /*
                  * we've updated the existing ref, free the newly
                  * allocated ref
                  */
                 kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
                 head_ref = existing;

It's calling kmem_cache_free with (the old) head_ref, which can be 
repeated in add_delayed_ref
>
>> Is it perhaps possible to assign head_ref to the new value before doing
>> the IS_ERR check?
>> In other words, do this:
>>           head_ref = new_head_ref;
>>           if (IS_ERR(new_head_ref)) {
>>                   spin_unlock(&delayed_refs->lock);
>>                   ret = PTR_ERR(new_head_ref);
>>                   goto free_record;
>>           }
>>
>> --
>> Kees
Filipe Manana Oct. 8, 2024, 8:30 p.m. UTC | #4
On Tue, Oct 8, 2024 at 9:21 PM Kees Bakker <kees@ijzerbout.nl> wrote:
>
> Op 08-10-2024 om 22:03 schreef Filipe Manana:
> > On Tue, Oct 8, 2024 at 8:39 PM Kees Bakker <kees@ijzerbout.nl> wrote:
> >> Op 26-09-2024 om 11:33 schreef fdmanana@kernel.org:
> >>> From: Filipe Manana <fdmanana@suse.com>
> >>>
> >>> When adding a delayed ref head, at delayed-ref.c:add_delayed_ref_head(),
> >>> if we fail to insert the qgroup record we don't error out, we ignore it.
> >>> In fact we treat it as if there was no error and there was already an
> >>> existing record - we don't distinguish between the cases where
> >>> btrfs_qgroup_trace_extent_nolock() returns 1, meaning a record already
> >>> existed and we can free the given record, and the case where it returns
> >>> a negative error value, meaning the insertion into the xarray that is
> >>> used to track records failed.
> >>>
> >>> Effectively we end up ignoring that we are lacking qgroup record in the
> >>> dirty extents xarray, resulting in incorrect qgroup accounting.
> >>>
> >>> Fix this by checking for errors and return them to the callers.
> >>>
> >>> Fixes: 3cce39a8ca4e ("btrfs: qgroup: use xarray to track dirty extents in transaction")
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>> ---
> >>> [...
> >>> @@ -1034,8 +1047,14 @@ static int add_delayed_ref(struct btrfs_trans_handle *trans,
> >>>         * insert both the head node and the new ref without dropping
> >>>         * the spin lock
> >>>         */
> >>> -     head_ref = add_delayed_ref_head(trans, head_ref, record,
> >>> -                                     action, &qrecord_inserted);
> >>> +     new_head_ref = add_delayed_ref_head(trans, head_ref, record,
> >>> +                                         action, &qrecord_inserted);
> >>> +     if (IS_ERR(new_head_ref)) {
> >>> +             spin_unlock(&delayed_refs->lock);
> >>> +             ret = PTR_ERR(new_head_ref);
> >>> +             goto free_record;
> >>> +     }
> >>> +     head_ref = new_head_ref;
> >>>
> >> There is a chance (not sure how big a chance) that head_ref is going to
> >> be freed twice.
> >>
> >> In the IS_ERR true path, head_ref still has the old value from before
> >> calling add_delayed_ref_head.
> >> However, in add_delayed_ref_head is is possible that head_ref is freed
> >> and replaced. If
> >> IS_ERR(new_head_ref) is true the code jumps to the end of the function
> >> where (the old) head_ref
> >> is freed again.
> > No, it's not possible to have a double free.
> > add_delayed_ref_head() never frees the given head_ref in case it
> > returns an error - it's the caller's responsibility to free it.
> >
> > Thanks.
> OK, but ... in add_delayed_ref_head() I see the following on line 881
>
>          existing = htree_insert(&delayed_refs->href_root,
>                                  &head_ref->href_node);
>          if (existing) {
>                  update_existing_head_ref(trans, existing, head_ref);
>                  /*
>                   * we've updated the existing ref, free the newly
>                   * allocated ref
>                   */
>                  kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
>                  head_ref = existing;
>
> It's calling kmem_cache_free with (the old) head_ref, which can be
> repeated in add_delayed_ref

No, it can't because after line 881, the one which calls
kmem_cache_free(), we never return an error pointer.

Thanks.

> >
> >> Is it perhaps possible to assign head_ref to the new value before doing
> >> the IS_ERR check?
> >> In other words, do this:
> >>           head_ref = new_head_ref;
> >>           if (IS_ERR(new_head_ref)) {
> >>                   spin_unlock(&delayed_refs->lock);
> >>                   ret = PTR_ERR(new_head_ref);
> >>                   goto free_record;
> >>           }
> >>
> >> --
> >> Kees
>
diff mbox series

Patch

diff --git a/fs/btrfs/delayed-ref.c b/fs/btrfs/delayed-ref.c
index ad9ef8312e41..32f719b9e661 100644
--- a/fs/btrfs/delayed-ref.c
+++ b/fs/btrfs/delayed-ref.c
@@ -840,6 +840,8 @@  static void init_delayed_ref_head(struct btrfs_delayed_ref_head *head_ref,
  * helper function to actually insert a head node into the rbtree.
  * this does all the dirty work in terms of maintaining the correct
  * overall modification count.
+ *
+ * Returns an error pointer in case of an error.
  */
 static noinline struct btrfs_delayed_ref_head *
 add_delayed_ref_head(struct btrfs_trans_handle *trans,
@@ -862,6 +864,9 @@  add_delayed_ref_head(struct btrfs_trans_handle *trans,
 		if (ret) {
 			/* Clean up if insertion fails or item exists. */
 			xa_release(&delayed_refs->dirty_extents, qrecord->bytenr);
+			/* Caller responsible for freeing qrecord on error. */
+			if (ret < 0)
+				return ERR_PTR(ret);
 			kfree(qrecord);
 		} else {
 			qrecord_inserted = true;
@@ -1000,27 +1005,35 @@  static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	struct btrfs_fs_info *fs_info = trans->fs_info;
 	struct btrfs_delayed_ref_node *node;
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *new_head_ref;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_qgroup_extent_record *record = NULL;
 	bool qrecord_inserted;
 	int action = generic_ref->action;
 	bool merged;
+	int ret;
 
 	node = kmem_cache_alloc(btrfs_delayed_ref_node_cachep, GFP_NOFS);
 	if (!node)
 		return -ENOMEM;
 
 	head_ref = kmem_cache_alloc(btrfs_delayed_ref_head_cachep, GFP_NOFS);
-	if (!head_ref)
+	if (!head_ref) {
+		ret = -ENOMEM;
 		goto free_node;
+	}
 
 	if (btrfs_qgroup_full_accounting(fs_info) && !generic_ref->skip_qgroup) {
 		record = kzalloc(sizeof(*record), GFP_NOFS);
-		if (!record)
+		if (!record) {
+			ret = -ENOMEM;
 			goto free_head_ref;
+		}
 		if (xa_reserve(&trans->transaction->delayed_refs.dirty_extents,
-			       generic_ref->bytenr, GFP_NOFS))
+			       generic_ref->bytenr, GFP_NOFS)) {
+			ret = -ENOMEM;
 			goto free_record;
+		}
 	}
 
 	init_delayed_ref_common(fs_info, node, generic_ref);
@@ -1034,8 +1047,14 @@  static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	 * insert both the head node and the new ref without dropping
 	 * the spin lock
 	 */
-	head_ref = add_delayed_ref_head(trans, head_ref, record,
-					action, &qrecord_inserted);
+	new_head_ref = add_delayed_ref_head(trans, head_ref, record,
+					    action, &qrecord_inserted);
+	if (IS_ERR(new_head_ref)) {
+		spin_unlock(&delayed_refs->lock);
+		ret = PTR_ERR(new_head_ref);
+		goto free_record;
+	}
+	head_ref = new_head_ref;
 
 	merged = insert_delayed_ref(trans, head_ref, node);
 	spin_unlock(&delayed_refs->lock);
@@ -1063,7 +1082,7 @@  static int add_delayed_ref(struct btrfs_trans_handle *trans,
 	kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
 free_node:
 	kmem_cache_free(btrfs_delayed_ref_node_cachep, node);
-	return -ENOMEM;
+	return ret;
 }
 
 /*
@@ -1094,6 +1113,7 @@  int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 				struct btrfs_delayed_extent_op *extent_op)
 {
 	struct btrfs_delayed_ref_head *head_ref;
+	struct btrfs_delayed_ref_head *head_ref_ret;
 	struct btrfs_delayed_ref_root *delayed_refs;
 	struct btrfs_ref generic_ref = {
 		.type = BTRFS_REF_METADATA,
@@ -1113,11 +1133,15 @@  int btrfs_add_delayed_extent_op(struct btrfs_trans_handle *trans,
 	delayed_refs = &trans->transaction->delayed_refs;
 	spin_lock(&delayed_refs->lock);
 
-	add_delayed_ref_head(trans, head_ref, NULL, BTRFS_UPDATE_DELAYED_HEAD,
-			     NULL);
-
+	head_ref_ret = add_delayed_ref_head(trans, head_ref, NULL,
+					    BTRFS_UPDATE_DELAYED_HEAD, NULL);
 	spin_unlock(&delayed_refs->lock);
 
+	if (IS_ERR(head_ref_ret)) {
+		kmem_cache_free(btrfs_delayed_ref_head_cachep, head_ref);
+		return PTR_ERR(head_ref_ret);
+	}
+
 	/*
 	 * Need to update the delayed_refs_rsv with any changes we may have
 	 * made.