diff mbox series

[1/2] btrfs: fix extent map merging not happening for adjacent extents

Message ID 9243b672972756682e44c7e69a696c9cc08181ff.1730220532.git.fdmanana@suse.com (mailing list archive)
State New, archived
Headers show
Series btrfs: a couple fixes for extent map merging and defrag | expand

Commit Message

Filipe Manana Oct. 29, 2024, 5:22 p.m. UTC
From: Filipe Manana <fdmanana@suse.com>

If we have 3 or more adjacent extents in a file, that is, consecutive file
extent items pointing to adjacent extents, within a contiguous file range
and compatible flags, we end up not merging all the extents into a single
extent map.

For example:

  $ mkfs.btrfs -f /dev/sdc
  $ mount /dev/sdc /mnt/sdc

  $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \
                 -c "pwrite -b 64K 64K 64K" \
                 -c "pwrite -b 64K 128K 64K" \
                 -c "pwrite -b 64K 192K 64K" \
                 /mnt/sdc/foo

After all the ordered extents complete we unpin the extent maps and try
to merge them, but instead of getting a single extent map we get two
because:

1) When the first ordered extent completes (file range [0, 64K)) we
   unpin its extent map and attempt to merge it with the extent map for
   the range [64K, 128K), but we can't because that extent map is still
   pinned;

2) When the second ordered extent completes (file range [64K, 128K)), we
   unpin its extent map and merge it with the previous extent map, for
   file range [0, 64K), but we can't merge with the next extent map, for
   the file range [128K, 192K), because this one is still pinned.

   The merged extent map for the file range [0, 128K) gets the flag
   EXTENT_MAP_MERGED set;

3) When the third ordered extent completes (file range [128K, 192K)), we
   unpin its exent map and attempt to merge it with the previous extent
   map, for file range [0, 128K), but we can't because that extent map
   has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false
   due to different flags) while the extent map for the range [128K, 192K)
   doesn't have that flag set.

   We also can't merge it with the next extent map, for file range
   [192K, 256K), because that one is still pinned.

   At this moment we have 3 extent maps:

   One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
   One for file range [128K, 192K).
   One for file range [192K, 256K) which is still pinned;

4) When the fourth and final extent completes (file range [192K, 256K)),
   we unpin its extent map and attempt to merge it with the previous
   extent map, for file range [128K, 192K), which succeeds since none
   of these extent maps have the EXTENT_MAP_MERGED flag set.

   So we end up with 2 extent maps:

   One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
   One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set.

   Since after merging extent maps we don't attempt to merge again, that
   is, merge the resulting extent map with the one that is now preceding
   it (and the one following it), we end up with those two extent maps,
   when we could have had a single extent map to represent the whole file.

Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag.
While this doesn't present any functional issue, it prevents the merging
of extent maps which allows to save memory, and can make defrag not
merging extents too (that will be addressed in the next patch).

Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check")
Signed-off-by: Filipe Manana <fdmanana@suse.com>
---
 fs/btrfs/extent_map.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Anand Jain Oct. 30, 2024, 12:47 a.m. UTC | #1
On 30/10/24 01:22, fdmanana@kernel.org wrote:
> From: Filipe Manana <fdmanana@suse.com>
> 
> If we have 3 or more adjacent extents in a file, that is, consecutive file
> extent items pointing to adjacent extents, within a contiguous file range
> and compatible flags, we end up not merging all the extents into a single
> extent map.
> 
> For example:
> 
>    $ mkfs.btrfs -f /dev/sdc
>    $ mount /dev/sdc /mnt/sdc
> 
>    $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \
>                   -c "pwrite -b 64K 64K 64K" \
>                   -c "pwrite -b 64K 128K 64K" \
>                   -c "pwrite -b 64K 192K 64K" \
>                   /mnt/sdc/foo
> 
> After all the ordered extents complete we unpin the extent maps and try
> to merge them, but instead of getting a single extent map we get two
> because:
> 
> 1) When the first ordered extent completes (file range [0, 64K)) we
>     unpin its extent map and attempt to merge it with the extent map for
>     the range [64K, 128K), but we can't because that extent map is still
>     pinned;
> 
> 2) When the second ordered extent completes (file range [64K, 128K)), we
>     unpin its extent map and merge it with the previous extent map, for
>     file range [0, 64K), but we can't merge with the next extent map, for
>     the file range [128K, 192K), because this one is still pinned.
> 
>     The merged extent map for the file range [0, 128K) gets the flag
>     EXTENT_MAP_MERGED set;
> 
> 3) When the third ordered extent completes (file range [128K, 192K)), we
>     unpin its exent map and attempt to merge it with the previous extent
>     map, for file range [0, 128K), but we can't because that extent map
>     has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false
>     due to different flags) while the extent map for the range [128K, 192K)
>     doesn't have that flag set.
> 
>     We also can't merge it with the next extent map, for file range
>     [192K, 256K), because that one is still pinned.
> 
>     At this moment we have 3 extent maps:
> 
>     One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
>     One for file range [128K, 192K).
>     One for file range [192K, 256K) which is still pinned;
> 
> 4) When the fourth and final extent completes (file range [192K, 256K)),
>     we unpin its extent map and attempt to merge it with the previous
>     extent map, for file range [128K, 192K), which succeeds since none
>     of these extent maps have the EXTENT_MAP_MERGED flag set.
> 
>     So we end up with 2 extent maps:
> 
>     One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
>     One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set.
> 
>     Since after merging extent maps we don't attempt to merge again, that
>     is, merge the resulting extent map with the one that is now preceding
>     it (and the one following it), we end up with those two extent maps,
>     when we could have had a single extent map to represent the whole file.
> 
> Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag.
> While this doesn't present any functional issue, it prevents the merging
> of extent maps which allows to save memory, and can make defrag not
> merging extents too (that will be addressed in the next patch).
> 

Why don’t the extents merge, even after mount-recycles and multiple 
manual defrag runs, without this fix?

Thanks, Anand


> Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check")
> Signed-off-by: Filipe Manana <fdmanana@suse.com>
> ---
>   fs/btrfs/extent_map.c | 7 ++++++-
>   1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> index 1f85b54c8f0c..67ce85ff0ae2 100644
> --- a/fs/btrfs/extent_map.c
> +++ b/fs/btrfs/extent_map.c
> @@ -233,7 +233,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
>   	if (extent_map_end(prev) != next->start)
>   		return false;
>   
> -	if (prev->flags != next->flags)
> +	/*
> +	 * The merged flag is not an on-disk flag, it just indicates we had the
> +	 * extent maps of 2 (or more) adjacent extents merged, so factor it out.
> +	 */
> +	if ((prev->flags & ~EXTENT_FLAG_MERGED) !=
> +	    (next->flags & ~EXTENT_FLAG_MERGED))
>   		return false;
>   
>   	if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1)
Filipe Manana Oct. 30, 2024, 7:41 a.m. UTC | #2
On Wed, Oct 30, 2024 at 12:48 AM Anand Jain <anand.jain@oracle.com> wrote:
>
> On 30/10/24 01:22, fdmanana@kernel.org wrote:
> > From: Filipe Manana <fdmanana@suse.com>
> >
> > If we have 3 or more adjacent extents in a file, that is, consecutive file
> > extent items pointing to adjacent extents, within a contiguous file range
> > and compatible flags, we end up not merging all the extents into a single
> > extent map.
> >
> > For example:
> >
> >    $ mkfs.btrfs -f /dev/sdc
> >    $ mount /dev/sdc /mnt/sdc
> >
> >    $ xfs_io -f -d -c "pwrite -b 64K 0 64K" \
> >                   -c "pwrite -b 64K 64K 64K" \
> >                   -c "pwrite -b 64K 128K 64K" \
> >                   -c "pwrite -b 64K 192K 64K" \
> >                   /mnt/sdc/foo
> >
> > After all the ordered extents complete we unpin the extent maps and try
> > to merge them, but instead of getting a single extent map we get two
> > because:
> >
> > 1) When the first ordered extent completes (file range [0, 64K)) we
> >     unpin its extent map and attempt to merge it with the extent map for
> >     the range [64K, 128K), but we can't because that extent map is still
> >     pinned;
> >
> > 2) When the second ordered extent completes (file range [64K, 128K)), we
> >     unpin its extent map and merge it with the previous extent map, for
> >     file range [0, 64K), but we can't merge with the next extent map, for
> >     the file range [128K, 192K), because this one is still pinned.
> >
> >     The merged extent map for the file range [0, 128K) gets the flag
> >     EXTENT_MAP_MERGED set;
> >
> > 3) When the third ordered extent completes (file range [128K, 192K)), we
> >     unpin its exent map and attempt to merge it with the previous extent
> >     map, for file range [0, 128K), but we can't because that extent map
> >     has the flag EXTENT_MAP_MERGED set (mergeable_maps() returns false
> >     due to different flags) while the extent map for the range [128K, 192K)
> >     doesn't have that flag set.
> >
> >     We also can't merge it with the next extent map, for file range
> >     [192K, 256K), because that one is still pinned.
> >
> >     At this moment we have 3 extent maps:
> >
> >     One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
> >     One for file range [128K, 192K).
> >     One for file range [192K, 256K) which is still pinned;
> >
> > 4) When the fourth and final extent completes (file range [192K, 256K)),
> >     we unpin its extent map and attempt to merge it with the previous
> >     extent map, for file range [128K, 192K), which succeeds since none
> >     of these extent maps have the EXTENT_MAP_MERGED flag set.
> >
> >     So we end up with 2 extent maps:
> >
> >     One for file range [0, 128K), with the flag EXTENT_MAP_MERGED set.
> >     One for file range [128K, 256K), with the flag EXTENT_MAP_MERGED set.
> >
> >     Since after merging extent maps we don't attempt to merge again, that
> >     is, merge the resulting extent map with the one that is now preceding
> >     it (and the one following it), we end up with those two extent maps,
> >     when we could have had a single extent map to represent the whole file.
> >
> > Fix this by making mergeable_maps() ignore the EXTENT_MAP_MERGED flag.
> > While this doesn't present any functional issue, it prevents the merging
> > of extent maps which allows to save memory, and can make defrag not
> > merging extents too (that will be addressed in the next patch).
> >
>
> Why don’t the extents merge, even after mount-recycles and multiple
> manual defrag runs, without this fix?

Why do you think mount recycles would make any difference? Whenever
extent maps are merged they get the merge flag set.
As for defrag and merged extent maps, that's explained in the next patch.

>
> Thanks, Anand
>
>
> > Fixes: 199257a78bb0 ("btrfs: defrag: don't use merged extent map for their generation check")
> > Signed-off-by: Filipe Manana <fdmanana@suse.com>
> > ---
> >   fs/btrfs/extent_map.c | 7 ++++++-
> >   1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
> > index 1f85b54c8f0c..67ce85ff0ae2 100644
> > --- a/fs/btrfs/extent_map.c
> > +++ b/fs/btrfs/extent_map.c
> > @@ -233,7 +233,12 @@ static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
> >       if (extent_map_end(prev) != next->start)
> >               return false;
> >
> > -     if (prev->flags != next->flags)
> > +     /*
> > +      * The merged flag is not an on-disk flag, it just indicates we had the
> > +      * extent maps of 2 (or more) adjacent extents merged, so factor it out.
> > +      */
> > +     if ((prev->flags & ~EXTENT_FLAG_MERGED) !=
> > +         (next->flags & ~EXTENT_FLAG_MERGED))
> >               return false;
> >
> >       if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1)
>
diff mbox series

Patch

diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c
index 1f85b54c8f0c..67ce85ff0ae2 100644
--- a/fs/btrfs/extent_map.c
+++ b/fs/btrfs/extent_map.c
@@ -233,7 +233,12 @@  static bool mergeable_maps(const struct extent_map *prev, const struct extent_ma
 	if (extent_map_end(prev) != next->start)
 		return false;
 
-	if (prev->flags != next->flags)
+	/*
+	 * The merged flag is not an on-disk flag, it just indicates we had the
+	 * extent maps of 2 (or more) adjacent extents merged, so factor it out.
+	 */
+	if ((prev->flags & ~EXTENT_FLAG_MERGED) !=
+	    (next->flags & ~EXTENT_FLAG_MERGED))
 		return false;
 
 	if (next->disk_bytenr < EXTENT_MAP_LAST_BYTE - 1)