Message ID | cb12212b9c599817507f3978c9102767267625b2.1719825714.git.fdmanana@suse.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | btrfs: don't loop again over pinned extent maps when shrinking extent maps | expand |
在 2024/7/1 18:53, fdmanana@kernel.org 写道: > From: Filipe Manana <fdmanana@suse.com> > > During extent map shrinking, while iterating over the extent maps of an > inode, if we happen to find a lot of pinned extent maps and we need to > reschedule, we'll start iterating the extent map tree from its first > extent map. This can result in visiting the same extent maps again, and if > they are not yet unpinned, we are just wasting time and can end up > iterating over them again if we happen to reschedule again before finding > an extent map that is not pinned - this could happen yet more times if the > unpinning doesn't happen soon (at ordered extent completion). > > So improve on this by starting on the next extent map everytime we need > to reschedule. Any previously pinned extent maps we be checked again the > next time the extent map shrinker is run (if needed). > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Qu Wenruo <wqu@suse.com> Thanks, Qu > --- > > This applies against the "for-next" branch, for a version that > applies cleanly to 6.10-rcX: > > https://gist.githubusercontent.com/fdmanana/5262e608b3eecb9a3b2631f8dad49863/raw/1a82fe8eafbd5f6958dddf34d3c9648d7335018e/btrfs-don-t-loop-again-over-pinned-extent-maps-when-.patch > > fs/btrfs/extent_map.c | 24 ++++++++++++++++++------ > 1 file changed, 18 insertions(+), 6 deletions(-) > > diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c > index b869a0ee24d2..2d75059eedd8 100644 > --- a/fs/btrfs/extent_map.c > +++ b/fs/btrfs/extent_map.c > @@ -1139,8 +1139,10 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t > while (node) { > struct rb_node *next = rb_next(node); > struct extent_map *em; > + u64 next_min_offset; > > em = rb_entry(node, struct extent_map, rb_node); > + next_min_offset = extent_map_end(em); > (*scanned)++; > > if (em->flags & EXTENT_FLAG_PINNED) > @@ -1166,14 +1168,24 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t > break; > > /* > - * Restart if we had to reschedule, and any extent maps that were > - * pinned before may have become unpinned after we released the > - * lock and took it again. > + * If we had to reschedule start from where we were before. We > + * could start from the first extent map in the tree in case we > + * passed through pinned extent maps that may have become > + * unpinned in the meanwhile, but it might be the case that they > + * haven't been unpinned yet, so if we have many still unpinned > + * extent maps, we could be wasting a lot of time and cpu. So > + * don't consider previously pinned extent maps, we'll consider > + * them in future calls of the extent map shrinker. > */ > - if (cond_resched_rwlock_write(&tree->lock)) > - node = rb_first(&tree->root); > - else > + if (cond_resched_rwlock_write(&tree->lock)) { > + em = search_extent_mapping(tree, next_min_offset, 0); > + if (em) > + node = &em->rb_node; > + else > + node = NULL; > + } else { > node = next; > + } > } > write_unlock(&tree->lock); > up_read(&inode->i_mmap_lock);
On Mon, Jul 01, 2024 at 10:23:31AM +0100, fdmanana@kernel.org wrote: > From: Filipe Manana <fdmanana@suse.com> > > During extent map shrinking, while iterating over the extent maps of an > inode, if we happen to find a lot of pinned extent maps and we need to > reschedule, we'll start iterating the extent map tree from its first > extent map. This can result in visiting the same extent maps again, and if > they are not yet unpinned, we are just wasting time and can end up > iterating over them again if we happen to reschedule again before finding > an extent map that is not pinned - this could happen yet more times if the > unpinning doesn't happen soon (at ordered extent completion). > > So improve on this by starting on the next extent map everytime we need > to reschedule. Any previously pinned extent maps we be checked again the > next time the extent map shrinker is run (if needed). > > Signed-off-by: Filipe Manana <fdmanana@suse.com> Reviewed-by: Josef Bacik <josef@toxicpanda.com> Thanks, Josef
diff --git a/fs/btrfs/extent_map.c b/fs/btrfs/extent_map.c index b869a0ee24d2..2d75059eedd8 100644 --- a/fs/btrfs/extent_map.c +++ b/fs/btrfs/extent_map.c @@ -1139,8 +1139,10 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t while (node) { struct rb_node *next = rb_next(node); struct extent_map *em; + u64 next_min_offset; em = rb_entry(node, struct extent_map, rb_node); + next_min_offset = extent_map_end(em); (*scanned)++; if (em->flags & EXTENT_FLAG_PINNED) @@ -1166,14 +1168,24 @@ static long btrfs_scan_inode(struct btrfs_inode *inode, long *scanned, long nr_t break; /* - * Restart if we had to reschedule, and any extent maps that were - * pinned before may have become unpinned after we released the - * lock and took it again. + * If we had to reschedule start from where we were before. We + * could start from the first extent map in the tree in case we + * passed through pinned extent maps that may have become + * unpinned in the meanwhile, but it might be the case that they + * haven't been unpinned yet, so if we have many still unpinned + * extent maps, we could be wasting a lot of time and cpu. So + * don't consider previously pinned extent maps, we'll consider + * them in future calls of the extent map shrinker. */ - if (cond_resched_rwlock_write(&tree->lock)) - node = rb_first(&tree->root); - else + if (cond_resched_rwlock_write(&tree->lock)) { + em = search_extent_mapping(tree, next_min_offset, 0); + if (em) + node = &em->rb_node; + else + node = NULL; + } else { node = next; + } } write_unlock(&tree->lock); up_read(&inode->i_mmap_lock);