mbox series

[v2,00/21] btrfs: updates to delayed refs accounting and space reservation

Message ID cover.1694192469.git.fdmanana@suse.com (mailing list archive)
Headers show
Series btrfs: updates to delayed refs accounting and space reservation | expand

Message

Filipe Manana Sept. 8, 2023, 5:20 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

The following are some fixes, improvements and cleanups around delayed refs.
Mostly about space accouting and reservation and were motivated by a case
hit by a SLE (SUSE Linux Enterprise) user where a filesystem became unmountable
and unusable because it fails a RW mount with -ENOSPC when attempting to do
any orphan cleanup. The problem was that the device had no available space
for allocating new block groups and the available metadata space was about
1.5M, too little to commit any transaction, but enough to start a transaction,
as during the transaction commit we need to COW more than we accounted for
when starting the transaction (running delayed refs generates more delayed
refs to update the extent tree for example). Starting any transaction there,
either to do orphan cleanup, attempt to reclaim data block groups, unlink,
etc, always failed during the transaction commit and result in transaction
aborts.

We have some cases where we use and abuse of the global block reserve
because we don't reserve enough space when starting a transaction or account
delayed refs properly, and can therefore lead to exhaustion of metadata space
in case we don't have more unallocated space to allocate a new metadata block
group.

More details on the individual changelogs.

There are more cases that will be addressed later and depend on this patchset,
but they'll be sent later and separately.

Filipe Manana (21):
  btrfs: fix race when refilling delayed refs block reserve
  btrfs: prevent transaction block reserve underflow when starting transaction
  btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes()
  btrfs: remove unnecessary logic when running new delayed references
  btrfs: remove the refcount warning/check at btrfs_put_delayed_ref()
  btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
  btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
  btrfs: remove refs_to_add argument from __btrfs_inc_extent_ref()
  btrfs: remove refs_to_drop argument from __btrfs_free_extent()
  btrfs: initialize key where it's used when running delayed data ref
  btrfs: remove pointless 'ref_root' variable from run_delayed_data_ref()
  btrfs: log message if extent item not found when running delayed extent op
  btrfs: use a single variable for return value at run_delayed_extent_op()
  btrfs: use a single variable for return value at lookup_inline_extent_backref()
  btrfs: return -EUCLEAN if extent item is missing when searching inline backref
  btrfs: simplify check for extent item overrun at lookup_inline_extent_backref()
  btrfs: allow to run delayed refs by bytes to be released instead of count
  btrfs: reserve space for delayed refs on a per ref basis
  btrfs: remove pointless initialization at btrfs_delayed_refs_rsv_release()
  btrfs: stop doing excessive space reservation for csum deletion
  btrfs: always reserve space for delayed refs when starting transaction

 fs/btrfs/block-group.c    |  11 +-
 fs/btrfs/block-rsv.c      |  18 ++--
 fs/btrfs/delalloc-space.c |   3 +-
 fs/btrfs/delayed-ref.c    | 132 +++++++++++++++++-------
 fs/btrfs/delayed-ref.h    |  15 ++-
 fs/btrfs/disk-io.c        |   3 +-
 fs/btrfs/extent-tree.c    | 208 +++++++++++++++++++-------------------
 fs/btrfs/extent-tree.h    |   4 +-
 fs/btrfs/space-info.c     |  29 ++----
 fs/btrfs/space-info.h     |   2 +-
 fs/btrfs/transaction.c    | 143 ++++++++++++++++++++------
 fs/btrfs/transaction.h    |   3 +
 12 files changed, 357 insertions(+), 214 deletions(-)

Comments

David Sterba Sept. 11, 2023, 5:20 p.m. UTC | #1
On Fri, Sep 08, 2023 at 06:20:17PM +0100, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> The following are some fixes, improvements and cleanups around delayed refs.
> Mostly about space accouting and reservation and were motivated by a case
> hit by a SLE (SUSE Linux Enterprise) user where a filesystem became unmountable
> and unusable because it fails a RW mount with -ENOSPC when attempting to do
> any orphan cleanup. The problem was that the device had no available space
> for allocating new block groups and the available metadata space was about
> 1.5M, too little to commit any transaction, but enough to start a transaction,
> as during the transaction commit we need to COW more than we accounted for
> when starting the transaction (running delayed refs generates more delayed
> refs to update the extent tree for example). Starting any transaction there,
> either to do orphan cleanup, attempt to reclaim data block groups, unlink,
> etc, always failed during the transaction commit and result in transaction
> aborts.
> 
> We have some cases where we use and abuse of the global block reserve
> because we don't reserve enough space when starting a transaction or account
> delayed refs properly, and can therefore lead to exhaustion of metadata space
> in case we don't have more unallocated space to allocate a new metadata block
> group.
> 
> More details on the individual changelogs.
> 
> There are more cases that will be addressed later and depend on this patchset,
> but they'll be sent later and separately.
> 
> Filipe Manana (21):
>   btrfs: fix race when refilling delayed refs block reserve
>   btrfs: prevent transaction block reserve underflow when starting transaction
>   btrfs: pass a space_info argument to btrfs_reserve_metadata_bytes()
>   btrfs: remove unnecessary logic when running new delayed references
>   btrfs: remove the refcount warning/check at btrfs_put_delayed_ref()
>   btrfs: return -EUCLEAN for delayed tree ref with a ref count not equals to 1
>   btrfs: remove redundant BUG_ON() from __btrfs_inc_extent_ref()
>   btrfs: remove refs_to_add argument from __btrfs_inc_extent_ref()
>   btrfs: remove refs_to_drop argument from __btrfs_free_extent()
>   btrfs: initialize key where it's used when running delayed data ref
>   btrfs: remove pointless 'ref_root' variable from run_delayed_data_ref()
>   btrfs: log message if extent item not found when running delayed extent op
>   btrfs: use a single variable for return value at run_delayed_extent_op()
>   btrfs: use a single variable for return value at lookup_inline_extent_backref()
>   btrfs: return -EUCLEAN if extent item is missing when searching inline backref
>   btrfs: simplify check for extent item overrun at lookup_inline_extent_backref()
>   btrfs: allow to run delayed refs by bytes to be released instead of count
>   btrfs: reserve space for delayed refs on a per ref basis
>   btrfs: remove pointless initialization at btrfs_delayed_refs_rsv_release()
>   btrfs: stop doing excessive space reservation for csum deletion
>   btrfs: always reserve space for delayed refs when starting transaction

Added to misc-next, thanks.