diff mbox series

btrfs: fix leak of qgroup extent records after transaction abort

Message ID 0a4d66f6922f5219c7c8c37d88a919304abdbb55.1717416325.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: fix leak of qgroup extent records after transaction abort | expand

Commit Message

Filipe Manana June 3, 2024, 12:06 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

Qgroup extent records are created when delayed ref heads are created and
then released after accounting extents at btrfs_qgroup_account_extents(),
called during the transaction commit path.

If a transaction is aborted we free the qgroup records by calling
btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
unless we don't have delayed references. We are incorrectly assuming
that no delayed references means we don't have qgroup extents records.

We can currently have no delayed references because we ran them all
during a transaction commit and the transaction was aborted after that
due to some error in the commit path.

So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
btrfs_destroy_delayed_refs() even if we don't have any delayed references.

Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/disk-io.c | 10 +---------
 1 file changed, 1 insertion(+), 9 deletions(-)

Comments

Qu Wenruo June 3, 2024, 10:58 p.m. UTC | #1
在 2024/6/3 21:36, fdmanana@kernel.org 写道:
> From: Filipe Manana <fdmanana@suse.com>
>
> Qgroup extent records are created when delayed ref heads are created and
> then released after accounting extents at btrfs_qgroup_account_extents(),
> called during the transaction commit path.
>
> If a transaction is aborted we free the qgroup records by calling
> btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
> unless we don't have delayed references. We are incorrectly assuming
> that no delayed references means we don't have qgroup extents records.
>
> We can currently have no delayed references because we ran them all
> during a transaction commit and the transaction was aborted after that
> due to some error in the commit path.
>
> So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
> btrfs_destroy_delayed_refs() even if we don't have any delayed references.

Will it cause some underflow for delayed_refs->num_entries?

As in the rb tree iteration code, we would try to decrease
delayed_refs->num_entries again.

Thanks,
Qu
>
> Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
> Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/disk-io.c | 10 +---------
>   1 file changed, 1 insertion(+), 9 deletions(-)
>
> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> index 8693893744a0..b1daaaec0614 100644
> --- a/fs/btrfs/disk-io.c
> +++ b/fs/btrfs/disk-io.c
> @@ -4522,18 +4522,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
>   				       struct btrfs_fs_info *fs_info)
>   {
>   	struct rb_node *node;
> -	struct btrfs_delayed_ref_root *delayed_refs;
> +	struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
>   	struct btrfs_delayed_ref_node *ref;
>
> -	delayed_refs = &trans->delayed_refs;
> -
>   	spin_lock(&delayed_refs->lock);
> -	if (atomic_read(&delayed_refs->num_entries) == 0) {
> -		spin_unlock(&delayed_refs->lock);
> -		btrfs_debug(fs_info, "delayed_refs has NO entry");
> -		return;
> -	}
> -
>   	while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
>   		struct btrfs_delayed_ref_head *head;
>   		struct rb_node *n;
Filipe Manana June 3, 2024, 11:09 p.m. UTC | #2
On Mon, Jun 3, 2024 at 11:58 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/6/3 21:36, fdmanana@kernel.org 写道:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > Qgroup extent records are created when delayed ref heads are created and
> > then released after accounting extents at btrfs_qgroup_account_extents(),
> > called during the transaction commit path.
> >
> > If a transaction is aborted we free the qgroup records by calling
> > btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
> > unless we don't have delayed references. We are incorrectly assuming
> > that no delayed references means we don't have qgroup extents records.
> >
> > We can currently have no delayed references because we ran them all
> > during a transaction commit and the transaction was aborted after that
> > due to some error in the commit path.
> >
> > So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
> > btrfs_destroy_delayed_refs() even if we don't have any delayed references.
>
> Will it cause some underflow for delayed_refs->num_entries?
>
> As in the rb tree iteration code, we would try to decrease
> delayed_refs->num_entries again.

What underflow, where?

btrfs_qgroup_destroy_extent_records() doesn't do anything to the
counter (or delayed refs).

Or are you seeing that delayed_refs->num_entries can be 0 while the
delayed_refs->href_root rb tree is not empty?
How is that possible?

>
> Thanks,
> Qu
> >
> > Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
> > Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/disk-io.c | 10 +---------
> >   1 file changed, 1 insertion(+), 9 deletions(-)
> >
> > diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> > index 8693893744a0..b1daaaec0614 100644
> > --- a/fs/btrfs/disk-io.c
> > +++ b/fs/btrfs/disk-io.c
> > @@ -4522,18 +4522,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> >                                      struct btrfs_fs_info *fs_info)
> >   {
> >       struct rb_node *node;
> > -     struct btrfs_delayed_ref_root *delayed_refs;
> > +     struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
> >       struct btrfs_delayed_ref_node *ref;
> >
> > -     delayed_refs = &trans->delayed_refs;
> > -
> >       spin_lock(&delayed_refs->lock);
> > -     if (atomic_read(&delayed_refs->num_entries) == 0) {
> > -             spin_unlock(&delayed_refs->lock);
> > -             btrfs_debug(fs_info, "delayed_refs has NO entry");
> > -             return;
> > -     }
> > -
> >       while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
> >               struct btrfs_delayed_ref_head *head;
> >               struct rb_node *n;
Qu Wenruo June 3, 2024, 11:21 p.m. UTC | #3
在 2024/6/4 08:39, Filipe Manana 写道:
> On Mon, Jun 3, 2024 at 11:58 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>> We can currently have no delayed references because we ran them all
>>> during a transaction commit and the transaction was aborted after that
>>> due to some error in the commit path.
>>>
>>> So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
>>> btrfs_destroy_delayed_refs() even if we don't have any delayed references.
>>
>> Will it cause some underflow for delayed_refs->num_entries?
>>
>> As in the rb tree iteration code, we would try to decrease
>> delayed_refs->num_entries again.
>
> What underflow, where?
>
> btrfs_qgroup_destroy_extent_records() doesn't do anything to the
> counter (or delayed refs).
>
> Or are you seeing that delayed_refs->num_entries can be 0 while the
> delayed_refs->href_root rb tree is not empty?
> How is that possible?

Never mind, I was originally referring to the "atomic_dec()" call inside
"while ((n = rb_first_cached())" loop.

But btrfs_run_delayed_refs_for_head() has ensured the entry is properly
removed from ref_tree before decreasing the "delayed_refs->num_entries",
it should be safe.

I'd prefer to call btrfs_qgroup_destory_extent_records() inside the "if
(atomic_read() == 0)" branch to be a little more easier to read.
But it's only a preference.

Thanks,
Qu

>
>>
>> Thanks,
>> Qu
>>>
>>> Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
>>> Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>> ---
>>>    fs/btrfs/disk-io.c | 10 +---------
>>>    1 file changed, 1 insertion(+), 9 deletions(-)
>>>
>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>> index 8693893744a0..b1daaaec0614 100644
>>> --- a/fs/btrfs/disk-io.c
>>> +++ b/fs/btrfs/disk-io.c
>>> @@ -4522,18 +4522,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
>>>                                       struct btrfs_fs_info *fs_info)
>>>    {
>>>        struct rb_node *node;
>>> -     struct btrfs_delayed_ref_root *delayed_refs;
>>> +     struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
>>>        struct btrfs_delayed_ref_node *ref;
>>>
>>> -     delayed_refs = &trans->delayed_refs;
>>> -
>>>        spin_lock(&delayed_refs->lock);
>>> -     if (atomic_read(&delayed_refs->num_entries) == 0) {
>>> -             spin_unlock(&delayed_refs->lock);
>>> -             btrfs_debug(fs_info, "delayed_refs has NO entry");
>>> -             return;
>>> -     }
>>> -
>>>        while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
>>>                struct btrfs_delayed_ref_head *head;
>>>                struct rb_node *n;
Filipe Manana June 3, 2024, 11:28 p.m. UTC | #4
On Tue, Jun 4, 2024 at 12:21 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
>
>
>
> 在 2024/6/4 08:39, Filipe Manana 写道:
> > On Mon, Jun 3, 2024 at 11:58 PM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
> [...]
> >>> We can currently have no delayed references because we ran them all
> >>> during a transaction commit and the transaction was aborted after that
> >>> due to some error in the commit path.
> >>>
> >>> So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
> >>> btrfs_destroy_delayed_refs() even if we don't have any delayed references.
> >>
> >> Will it cause some underflow for delayed_refs->num_entries?
> >>
> >> As in the rb tree iteration code, we would try to decrease
> >> delayed_refs->num_entries again.
> >
> > What underflow, where?
> >
> > btrfs_qgroup_destroy_extent_records() doesn't do anything to the
> > counter (or delayed refs).
> >
> > Or are you seeing that delayed_refs->num_entries can be 0 while the
> > delayed_refs->href_root rb tree is not empty?
> > How is that possible?
>
> Never mind, I was originally referring to the "atomic_dec()" call inside
> "while ((n = rb_first_cached())" loop.
>
> But btrfs_run_delayed_refs_for_head() has ensured the entry is properly
> removed from ref_tree before decreasing the "delayed_refs->num_entries",
> it should be safe.
>
> I'd prefer to call btrfs_qgroup_destory_extent_records() inside the "if
> (atomic_read() == 0)" branch to be a little more easier to read.
> But it's only a preference.

Maybe it makes the diff more immediate to understand.
But the final code is certainly less verbose and clear enough:

while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
    (...)
}
btrfs_qgroup_destroy_extent_records(trans);

As opposed to:

if (atomic_read(&delayed_refs->num_entries) == 0) {
    btrfs_qgroup_destroy_extent_records(trans);
    spin_unlock(&delayed_refs->lock);
    return;
}

while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
    (...)
}
btrfs_qgroup_destroy_extent_records(trans);


More code for absolutely nothing, not even better readability.


>
> Thanks,
> Qu
>
> >
> >>
> >> Thanks,
> >> Qu
> >>>
> >>> Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
> >>> Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
> >>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> >>> ---
> >>>    fs/btrfs/disk-io.c | 10 +---------
> >>>    1 file changed, 1 insertion(+), 9 deletions(-)
> >>>
> >>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
> >>> index 8693893744a0..b1daaaec0614 100644
> >>> --- a/fs/btrfs/disk-io.c
> >>> +++ b/fs/btrfs/disk-io.c
> >>> @@ -4522,18 +4522,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
> >>>                                       struct btrfs_fs_info *fs_info)
> >>>    {
> >>>        struct rb_node *node;
> >>> -     struct btrfs_delayed_ref_root *delayed_refs;
> >>> +     struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
> >>>        struct btrfs_delayed_ref_node *ref;
> >>>
> >>> -     delayed_refs = &trans->delayed_refs;
> >>> -
> >>>        spin_lock(&delayed_refs->lock);
> >>> -     if (atomic_read(&delayed_refs->num_entries) == 0) {
> >>> -             spin_unlock(&delayed_refs->lock);
> >>> -             btrfs_debug(fs_info, "delayed_refs has NO entry");
> >>> -             return;
> >>> -     }
> >>> -
> >>>        while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
> >>>                struct btrfs_delayed_ref_head *head;
> >>>                struct rb_node *n;
Josef Bacik June 4, 2024, 4:24 p.m. UTC | #5
On Mon, Jun 03, 2024 at 01:06:13PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> Qgroup extent records are created when delayed ref heads are created and
> then released after accounting extents at btrfs_qgroup_account_extents(),
> called during the transaction commit path.
> 
> If a transaction is aborted we free the qgroup records by calling
> btrfs_qgroup_destroy_extent_records() at btrfs_destroy_delayed_refs(),
> unless we don't have delayed references. We are incorrectly assuming
> that no delayed references means we don't have qgroup extents records.
> 
> We can currently have no delayed references because we ran them all
> during a transaction commit and the transaction was aborted after that
> due to some error in the commit path.
> 
> So fix this by ensuring we btrfs_qgroup_destroy_extent_records() at
> btrfs_destroy_delayed_refs() even if we don't have any delayed references.
> 
> Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
> Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
> Signed-off-by: Filipe Manana <fdmanana@suse.com>

Reviewed-by: Josef Bacik <josef@toxicpanda.com>

Thanks,

Josef
Qu Wenruo June 4, 2024, 10:38 p.m. UTC | #6
在 2024/6/4 08:58, Filipe Manana 写道:
> On Tue, Jun 4, 2024 at 12:21 AM Qu Wenruo <quwenruo.btrfs@gmx.com> wrote:
[...]
>>
>> I'd prefer to call btrfs_qgroup_destory_extent_records() inside the "if
>> (atomic_read() == 0)" branch to be a little more easier to read.
>> But it's only a preference.
>
> Maybe it makes the diff more immediate to understand.
> But the final code is certainly less verbose and clear enough:
>
> while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
>      (...)
> }
> btrfs_qgroup_destroy_extent_records(trans);
>
> As opposed to:
>
> if (atomic_read(&delayed_refs->num_entries) == 0) {
>      btrfs_qgroup_destroy_extent_records(trans);
>      spin_unlock(&delayed_refs->lock);
>      return;
> }
>
> while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
>      (...)
> }
> btrfs_qgroup_destroy_extent_records(trans);
>
>
> More code for absolutely nothing, not even better readability.

It looks like I'm still tending to do empty check, other than let the
loop to  handle empty cases.

And indeed the extra empty check is only going to bloat the code without
much need, and I should change my tendency.

Forgot to give a reviewed by tag:

Reviewed-by: Qu Wenruo <wqu@suse.com>

Thanks,
Qu
>
>
>>
>> Thanks,
>> Qu
>>
>>>
>>>>
>>>> Thanks,
>>>> Qu
>>>>>
>>>>> Reported-by: syzbot+0fecc032fa134afd49df@syzkaller.appspotmail.com
>>>>> Link: https://lore.kernel.org/linux-btrfs/0000000000004e7f980619f91835@google.com/
>>>>> Signed-off-by: Filipe Manana <fdmanana@suse.com>
>>>>> ---
>>>>>     fs/btrfs/disk-io.c | 10 +---------
>>>>>     1 file changed, 1 insertion(+), 9 deletions(-)
>>>>>
>>>>> diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
>>>>> index 8693893744a0..b1daaaec0614 100644
>>>>> --- a/fs/btrfs/disk-io.c
>>>>> +++ b/fs/btrfs/disk-io.c
>>>>> @@ -4522,18 +4522,10 @@ static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
>>>>>                                        struct btrfs_fs_info *fs_info)
>>>>>     {
>>>>>         struct rb_node *node;
>>>>> -     struct btrfs_delayed_ref_root *delayed_refs;
>>>>> +     struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
>>>>>         struct btrfs_delayed_ref_node *ref;
>>>>>
>>>>> -     delayed_refs = &trans->delayed_refs;
>>>>> -
>>>>>         spin_lock(&delayed_refs->lock);
>>>>> -     if (atomic_read(&delayed_refs->num_entries) == 0) {
>>>>> -             spin_unlock(&delayed_refs->lock);
>>>>> -             btrfs_debug(fs_info, "delayed_refs has NO entry");
>>>>> -             return;
>>>>> -     }
>>>>> -
>>>>>         while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
>>>>>                 struct btrfs_delayed_ref_head *head;
>>>>>                 struct rb_node *n;
>
diff mbox series

Patch

diff --git a/fs/btrfs/disk-io.c b/fs/btrfs/disk-io.c
index 8693893744a0..b1daaaec0614 100644
--- a/fs/btrfs/disk-io.c
+++ b/fs/btrfs/disk-io.c
@@ -4522,18 +4522,10 @@  static void btrfs_destroy_delayed_refs(struct btrfs_transaction *trans,
 				       struct btrfs_fs_info *fs_info)
 {
 	struct rb_node *node;
-	struct btrfs_delayed_ref_root *delayed_refs;
+	struct btrfs_delayed_ref_root *delayed_refs = &trans->delayed_refs;
 	struct btrfs_delayed_ref_node *ref;
 
-	delayed_refs = &trans->delayed_refs;
-
 	spin_lock(&delayed_refs->lock);
-	if (atomic_read(&delayed_refs->num_entries) == 0) {
-		spin_unlock(&delayed_refs->lock);
-		btrfs_debug(fs_info, "delayed_refs has NO entry");
-		return;
-	}
-
 	while ((node = rb_first_cached(&delayed_refs->href_root)) != NULL) {
 		struct btrfs_delayed_ref_head *head;
 		struct rb_node *n;