mbox series

[00/10] btrfs: backref cache cleanups

Message ID cover.1727970062.git.josef@toxicpanda.com (mailing list archive)
Headers show
Series btrfs: backref cache cleanups | expand

Message

Josef Bacik Oct. 3, 2024, 3:43 p.m. UTC
Hello,

This is the followup to the relocation fix that I sent out earlier.  This series
cleans up a lot of the complicated things that exist in backref cache because we
were keeping track of changes to the file system during relocation.  Now that we
do not do this we can simplify a lot of the code and make it easier to
understand.  I've tested this with the horror show of a relocation test I was
using to trigger the original problem.  I'm running fstests now via the CI, but
this seems solid.  Hopefully this makes the relocation code a bit easier to
understand.  Thanks,

Josef

Josef Bacik (10):
  btrfs: convert BUG_ON in btrfs_reloc_cow_block to proper error
    handling
  btrfs: remove the changed list for backref cache
  btrfs: add a comment for new_bytenr in bacref_cache_node
  btrfs: cleanup select_reloc_root
  btrfs: remove clone_backref_node
  btrfs: don't build backref tree for cowonly blocks
  btrfs: do not handle non-shareable roots in backref cache
  btrfs: simplify btrfs_backref_release_cache
  btrfs: remove the ->lowest and ->leaves members from backref cache
  btrfs: remove detached list from btrfs_backref_cache

 fs/btrfs/backref.c    | 105 +++++-------
 fs/btrfs/backref.h    |  16 +-
 fs/btrfs/relocation.c | 362 +++++++++++++++++-------------------------
 3 files changed, 192 insertions(+), 291 deletions(-)

Comments

Boris Burkov Oct. 3, 2024, 9:39 p.m. UTC | #1
On Thu, Oct 03, 2024 at 11:43:02AM -0400, Josef Bacik wrote:
> Hello,
> 
> This is the followup to the relocation fix that I sent out earlier.  This series
> cleans up a lot of the complicated things that exist in backref cache because we
> were keeping track of changes to the file system during relocation.  Now that we
> do not do this we can simplify a lot of the code and make it easier to
> understand.  I've tested this with the horror show of a relocation test I was
> using to trigger the original problem.  I'm running fstests now via the CI, but
> this seems solid.  Hopefully this makes the relocation code a bit easier to
> understand.  Thanks,
> 
> Josef

Sent a few minor comments inline, but all the patches look good to me.
The only ones I would say I didn't deeply understand were 6 and 7 for
the cowonly changes, in case you feel you need a closer look at those.

One high level request (also raised inline on one of them) is just to
add more high level documentation of the functions and general
algorithm design, while it's fresh. The doc covering the backref
cache data structure itself that we have in our dev doc repo is great,
but doesn't do enough to explain the actual relocation operations
carried out with the help of the backref cache.

With that said, thanks for following up (!!) and you can add

Reviewed-by: Boris Burkov <boris@bur.io>

> 
> Josef Bacik (10):
>   btrfs: convert BUG_ON in btrfs_reloc_cow_block to proper error
>     handling
>   btrfs: remove the changed list for backref cache
>   btrfs: add a comment for new_bytenr in bacref_cache_node
>   btrfs: cleanup select_reloc_root
>   btrfs: remove clone_backref_node
>   btrfs: don't build backref tree for cowonly blocks
>   btrfs: do not handle non-shareable roots in backref cache
>   btrfs: simplify btrfs_backref_release_cache
>   btrfs: remove the ->lowest and ->leaves members from backref cache
>   btrfs: remove detached list from btrfs_backref_cache
> 
>  fs/btrfs/backref.c    | 105 +++++-------
>  fs/btrfs/backref.h    |  16 +-
>  fs/btrfs/relocation.c | 362 +++++++++++++++++-------------------------
>  3 files changed, 192 insertions(+), 291 deletions(-)
> 
> -- 
> 2.43.0
>
David Sterba Dec. 6, 2024, 7:38 p.m. UTC | #2
On Thu, Oct 03, 2024 at 11:43:02AM -0400, Josef Bacik wrote:
> Hello,
> 
> This is the followup to the relocation fix that I sent out earlier.  This series
> cleans up a lot of the complicated things that exist in backref cache because we
> were keeping track of changes to the file system during relocation.  Now that we
> do not do this we can simplify a lot of the code and make it easier to
> understand.  I've tested this with the horror show of a relocation test I was
> using to trigger the original problem.  I'm running fstests now via the CI, but
> this seems solid.  Hopefully this makes the relocation code a bit easier to
> understand.  Thanks,
> 
> Josef
> 
> Josef Bacik (10):
>   btrfs: convert BUG_ON in btrfs_reloc_cow_block to proper error
>     handling
>   btrfs: remove the changed list for backref cache
>   btrfs: add a comment for new_bytenr in bacref_cache_node
>   btrfs: cleanup select_reloc_root
>   btrfs: remove clone_backref_node
>   btrfs: don't build backref tree for cowonly blocks
>   btrfs: do not handle non-shareable roots in backref cache
>   btrfs: simplify btrfs_backref_release_cache
>   btrfs: remove the ->lowest and ->leaves members from backref cache
>   btrfs: remove detached list from btrfs_backref_cache

The patches have been in misc-next as I've been expecting an update. We
want the cleanups so I've applied the series to for-next. I've removed
th ASSERT(0) callls, we need proper macros/functions in case you really
want to see something fail during development. As the errors are EUCLEAN
they'll bubble up eventually with some noisy message so I hope we're not
losing much.
Josef Bacik Dec. 9, 2024, 2:01 p.m. UTC | #3
On Fri, Dec 06, 2024 at 08:38:54PM +0100, David Sterba wrote:
> On Thu, Oct 03, 2024 at 11:43:02AM -0400, Josef Bacik wrote:
> > Hello,
> > 
> > This is the followup to the relocation fix that I sent out earlier.  This series
> > cleans up a lot of the complicated things that exist in backref cache because we
> > were keeping track of changes to the file system during relocation.  Now that we
> > do not do this we can simplify a lot of the code and make it easier to
> > understand.  I've tested this with the horror show of a relocation test I was
> > using to trigger the original problem.  I'm running fstests now via the CI, but
> > this seems solid.  Hopefully this makes the relocation code a bit easier to
> > understand.  Thanks,
> > 
> > Josef
> > 
> > Josef Bacik (10):
> >   btrfs: convert BUG_ON in btrfs_reloc_cow_block to proper error
> >     handling
> >   btrfs: remove the changed list for backref cache
> >   btrfs: add a comment for new_bytenr in bacref_cache_node
> >   btrfs: cleanup select_reloc_root
> >   btrfs: remove clone_backref_node
> >   btrfs: don't build backref tree for cowonly blocks
> >   btrfs: do not handle non-shareable roots in backref cache
> >   btrfs: simplify btrfs_backref_release_cache
> >   btrfs: remove the ->lowest and ->leaves members from backref cache
> >   btrfs: remove detached list from btrfs_backref_cache
> 
> The patches have been in misc-next as I've been expecting an update. We
> want the cleanups so I've applied the series to for-next. I've removed
> th ASSERT(0) callls, we need proper macros/functions in case you really
> want to see something fail during development. As the errors are EUCLEAN
> they'll bubble up eventually with some noisy message so I hope we're not
> losing much.

Sorry Dave, I let this one fall through the cracks, thanks for picking it up for
me.

As for replacing ASSERT(0) I agree this is a blunt tool.  Maybe we could have a
macro that we put around EUCLEAN, at least in these cases where it also
indicates we might have broken something?  Something like

#ifdef CONFIG_BTRFS_DEBUG
#define BTRRFS_EUCLEAN ({ WARN_ON(1); -EUCLEAN; })
#else
#define BTRFS_EUCLEAN -EUCLEAN
#endif

And then we can just use BTRFS_EUCLEAN to replace the ASSERT(0) calls.  Thanks,

Josef
David Sterba Dec. 10, 2024, 11:24 p.m. UTC | #4
On Mon, Dec 09, 2024 at 09:01:14AM -0500, Josef Bacik wrote:
> On Fri, Dec 06, 2024 at 08:38:54PM +0100, David Sterba wrote:
> > On Thu, Oct 03, 2024 at 11:43:02AM -0400, Josef Bacik wrote:
> > > Hello,
> > > 
> > > This is the followup to the relocation fix that I sent out earlier.  This series
> > > cleans up a lot of the complicated things that exist in backref cache because we
> > > were keeping track of changes to the file system during relocation.  Now that we
> > > do not do this we can simplify a lot of the code and make it easier to
> > > understand.  I've tested this with the horror show of a relocation test I was
> > > using to trigger the original problem.  I'm running fstests now via the CI, but
> > > this seems solid.  Hopefully this makes the relocation code a bit easier to
> > > understand.  Thanks,
> > > 
> > > Josef
> > > 
> > > Josef Bacik (10):
> > >   btrfs: convert BUG_ON in btrfs_reloc_cow_block to proper error
> > >     handling
> > >   btrfs: remove the changed list for backref cache
> > >   btrfs: add a comment for new_bytenr in bacref_cache_node
> > >   btrfs: cleanup select_reloc_root
> > >   btrfs: remove clone_backref_node
> > >   btrfs: don't build backref tree for cowonly blocks
> > >   btrfs: do not handle non-shareable roots in backref cache
> > >   btrfs: simplify btrfs_backref_release_cache
> > >   btrfs: remove the ->lowest and ->leaves members from backref cache
> > >   btrfs: remove detached list from btrfs_backref_cache
> > 
> > The patches have been in misc-next as I've been expecting an update. We
> > want the cleanups so I've applied the series to for-next. I've removed
> > th ASSERT(0) callls, we need proper macros/functions in case you really
> > want to see something fail during development. As the errors are EUCLEAN
> > they'll bubble up eventually with some noisy message so I hope we're not
> > losing much.
> 
> Sorry Dave, I let this one fall through the cracks, thanks for picking it up for
> me.
> 
> As for replacing ASSERT(0) I agree this is a blunt tool.  Maybe we could have a
> macro that we put around EUCLEAN, at least in these cases where it also
> indicates we might have broken something?  Something like
> 
> #ifdef CONFIG_BTRFS_DEBUG
> #define BTRRFS_EUCLEAN ({ WARN_ON(1); -EUCLEAN; })
> #else
> #define BTRFS_EUCLEAN -EUCLEAN
> #endif
> 
> And then we can just use BTRFS_EUCLEAN to replace the ASSERT(0) calls.  Thanks,

Something like that, though I'd prefer a standalone macro and name it
like WARN_DEBUG or WARN_EUCLEAN for this particular error, not to
override an error code. We don't want to print a stack for each instance
of EUCLEAN, this is meant to be selective for new code or something
you'd really like to debug or know about once it happens.