Message ID | 20191009032124.10541-1-david@fromorbit.com (mailing list archive) |
---|---|
Headers | show |
Series | mm, xfs: non-blocking inode reclaim | expand |
On Wed, Oct 09, 2019 at 02:20:58PM +1100, Dave Chinner wrote: > Hi folks, > > This is the second version of the RFC I originally posted here: Do you happen to have a git tree to pull from as well?
On Wed, Oct 09, 2019 at 02:20:58PM +1100, Dave Chinner wrote: > Hi folks, > > This is the second version of the RFC I originally posted here: > > https://lore.kernel.org/linux-xfs/20190801021752.4986-1-david@fromorbit.com/ > > The original description of the patchset is below, the issues and > approach to solving them has not changed. THere is some > restructuring of the patch set - the first few patches are all the > XFS fixes that can be merged regardless of the rest of the patchset, > but the non-blocking reclaim is somewhat dependent of them for > correct behaviour. The second set of patches are the shrinker > infrastructure changes needed for the shrinkers to feed back > reclaim progress to the main reclaim instructure and act on the > feedback. The last set of patches are the XFS changes needed to > convert inode reclaim over to a non-blocking, IO-less algorithm. > I looked through the MM patches and other than the congestion thing they look reasonable. I think I can probably use this stuff to drop the use of the btree inode. However I'm wondering if it would be a good idea to add an explicit backoff thing for heavy metadata dirty'ing operations. Btrfs generates a lot more dirty metadata than most, partly why my attempt to deal with this was tied to using balance dirty pages since it already has all of the backoff logic. Perhaps an explict balance_dirty_metadata() that we put after all metadata operations so we have a good way to throttle dirtiers when we aren't able to keep up? Just a thought, based on my previous experiences trying to tackle this issue for btrfs, what you've done already may be enough to address these concerns. Thanks, Josef
On Fri, Oct 11, 2019 at 03:03:08PM -0400, Josef Bacik wrote: > On Wed, Oct 09, 2019 at 02:20:58PM +1100, Dave Chinner wrote: > > Hi folks, > > > > This is the second version of the RFC I originally posted here: > > > > https://lore.kernel.org/linux-xfs/20190801021752.4986-1-david@fromorbit.com/ > > > > The original description of the patchset is below, the issues and > > approach to solving them has not changed. THere is some > > restructuring of the patch set - the first few patches are all the > > XFS fixes that can be merged regardless of the rest of the patchset, > > but the non-blocking reclaim is somewhat dependent of them for > > correct behaviour. The second set of patches are the shrinker > > infrastructure changes needed for the shrinkers to feed back > > reclaim progress to the main reclaim instructure and act on the > > feedback. The last set of patches are the XFS changes needed to > > convert inode reclaim over to a non-blocking, IO-less algorithm. > > I looked through the MM patches and other than the congestion thing they look > reasonable. I think I can probably use this stuff to drop the use of the btree > inode. However I'm wondering if it would be a good idea to add an explicit > backoff thing for heavy metadata dirty'ing operations. Btrfs generates a lot > more dirty metadata than most, partly why my attempt to deal with this was tied > to using balance dirty pages since it already has all of the backoff logic. That's an orthorgonal problem, I think. We still need the IO-less reclaim in XFS regardless of how we throttle build up of dirty metadata... > Perhaps an explict balance_dirty_metadata() that we put after all > metadata operations so we have a good way to throttle dirtiers > when we aren't able to keep up? Just a thought, based on my > previous experiences trying to tackle this issue for btrfs, what > you've done already may be enough to address these concerns. The biggest issue is that different filesystems need different mechanisms for throttling dirty metadata build-up. In ext4/XFS, the amount of dirty metadata is bound by the log size, but that can still be massively more metadata than the disk subsystem can handle in a finite time. IOWs, for XFS, the way to throttle dirty metadata buildup is to limit the amount of log space we allow the filesystem to use when we are able to throttle incoming transaction reservations. Nothing in the VFS/mm subsystem can see any of this inside XFS, so I'm not really sure how generic we could make a metadata dirtying throttle implementation.... Cheers, Dave.
On Sat, Oct 12, 2019 at 10:48:42AM +1100, Dave Chinner wrote: > On Fri, Oct 11, 2019 at 03:03:08PM -0400, Josef Bacik wrote: > > On Wed, Oct 09, 2019 at 02:20:58PM +1100, Dave Chinner wrote: > > > Hi folks, > > > > > > This is the second version of the RFC I originally posted here: > > > > > > https://lore.kernel.org/linux-xfs/20190801021752.4986-1-david@fromorbit.com/ > > > > > > The original description of the patchset is below, the issues and > > > approach to solving them has not changed. THere is some > > > restructuring of the patch set - the first few patches are all the > > > XFS fixes that can be merged regardless of the rest of the patchset, > > > but the non-blocking reclaim is somewhat dependent of them for > > > correct behaviour. The second set of patches are the shrinker > > > infrastructure changes needed for the shrinkers to feed back > > > reclaim progress to the main reclaim instructure and act on the > > > feedback. The last set of patches are the XFS changes needed to > > > convert inode reclaim over to a non-blocking, IO-less algorithm. > > > > I looked through the MM patches and other than the congestion thing they look > > reasonable. I think I can probably use this stuff to drop the use of the btree > > inode. However I'm wondering if it would be a good idea to add an explicit > > backoff thing for heavy metadata dirty'ing operations. Btrfs generates a lot > > more dirty metadata than most, partly why my attempt to deal with this was tied > > to using balance dirty pages since it already has all of the backoff logic. > > That's an orthorgonal problem, I think. We still need the IO-less > reclaim in XFS regardless of how we throttle build up of dirty > metadata... > > > Perhaps an explict balance_dirty_metadata() that we put after all > > metadata operations so we have a good way to throttle dirtiers > > when we aren't able to keep up? Just a thought, based on my > > previous experiences trying to tackle this issue for btrfs, what > > you've done already may be enough to address these concerns. > > The biggest issue is that different filesystems need different > mechanisms for throttling dirty metadata build-up. In ext4/XFS, the > amount of dirty metadata is bound by the log size, but that can > still be massively more metadata than the disk subsystem can handle > in a finite time. > > IOWs, for XFS, the way to throttle dirty metadata buildup is to > limit the amount of log space we allow the filesystem to use when we > are able to throttle incoming transaction reservations. Nothing in > the VFS/mm subsystem can see any of this inside XFS, so I'm not > really sure how generic we could make a metadata dirtying throttle > implementation.... > Ok, I just read the mm patches and made assumptions about what you were trying to accomplish. I suppose I should probably dig my stuff back out. Thanks, Josef
On Fri, Oct 11, 2019 at 08:19:21PM -0400, Josef Bacik wrote: > Ok, I just read the mm patches and made assumptions about what you were trying > to accomplish. I suppose I should probably dig my stuff back out. Thanks, Fair enough. The mm bits are basically providing backoffs when shrinkers can't make progress for whatever reason (e.g. GFP_NOFS context, requires IO, etc) so that other reclaim scanning can be done while we wait for other progress (like cleaning inodes) can be made before trying to reclaim inodes again. The back-offs are required to prevent priority wind-up and OOM if reclaim progress is extremely slow. These patches aggregate them into bound global reclaim delays between page reclaim and slab shrinking rather than lots of unbound individual delays inside specific shrinkers that end up slowing down the entire slab shrinking scan. Cheers, Dave.