Message ID | cover.1604444952.git.asml.silence@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | fixes for btrfs async discards | expand |
On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote: > Several fixes for async discards. The first patch might increase discard > rate, drastically in some cases. That may be a suprise for those > assuming that hitting iops_limit is rare and rarther outliers. Though, > it still stays in allowed range, so should be fine. I think this highly depends on the workload, if you really need to issue the discards fast because of the rate of the change in the regular data. That was the point of the async discard and the knobs, the defaults should be ok for most users and allow adjusting for specific loads. My testing of the original discard patchset was tailored towards the default usecase and likely intense than yours. I did not observe the corner cases where the work queue scheduling was off, or changing the sysfs values did not poke the right kthreads. Patches look ok to me and I've added them to topic branch for testing, going to misc-next later. Thanks.
On 05/11/2020 22:23, David Sterba wrote: > On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote: >> Several fixes for async discards. The first patch might increase discard >> rate, drastically in some cases. That may be a suprise for those >> assuming that hitting iops_limit is rare and rarther outliers. Though, >> it still stays in allowed range, so should be fine. > > I think this highly depends on the workload, if you really need to issue > the discards fast because of the rate of the change in the regular data. > That was the point of the async discard and the knobs, the defaults > should be ok for most users and allow adjusting for specific loads. Chris mentioned that _there are_ problems with faster drives though. The problem is that this iops_limit knot just clamps the chosen delay. Ultimately, I want to find later a better delay function than delay = CONSTANT_INTERVAL_MS / nr_extents. But that will take some thinking. For instance, one of the cases I've seen is recycling large extents like deletion of subvolumes. There we have a small number of extents but each takes a lot of space, so there are, say, 100+GB queued to be discarded. But because there are few extents, delay is calculated to >10s that's then clamped to a constant max limit. That was taking a long to recycle. Not sure though how many bytes/extents are discarded on each iteration of btrfs_discard_workfn(). > > My testing of the original discard patchset was tailored towards the > default usecase and likely intense than yours. I did not observe the > corner cases where the work queue scheduling was off, or changing the > sysfs values did not poke the right kthreads. > > Patches look ok to me and I've added them to topic branch for testing, > going to misc-next later. Thanks. >
On Fri, Nov 06, 2020 at 01:20:25PM +0000, Pavel Begunkov wrote: > On 05/11/2020 22:23, David Sterba wrote: > > On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote: > >> Several fixes for async discards. The first patch might increase discard > >> rate, drastically in some cases. That may be a suprise for those > >> assuming that hitting iops_limit is rare and rarther outliers. Though, > >> it still stays in allowed range, so should be fine. > > > > I think this highly depends on the workload, if you really need to issue > > the discards fast because of the rate of the change in the regular data. > > That was the point of the async discard and the knobs, the defaults > > should be ok for most users and allow adjusting for specific loads. > > Chris mentioned that _there are_ problems with faster drives though. > The problem is that this iops_limit knot just clamps the chosen delay. > Ultimately, I want to find later a better delay function than > > delay = CONSTANT_INTERVAL_MS / nr_extents. > > But that will take some thinking. > > For instance, one of the cases I've seen is recycling large extents > like deletion of subvolumes. There we have a small number of extents > but each takes a lot of space, so there are, say, 100+GB queued to be > discarded. But because there are few extents, delay is calculated > to >10s that's then clamped to a constant max limit. > That was taking a long to recycle. Not sure though how many bytes/extents > are discarded on each iteration of btrfs_discard_workfn(). BTRFS_ASYNC_DISCARD_DEFAULT_MAX_SIZE 64M So the few large extents do not fit current scheme. I thought some translation to "logical" units could do the same discard io scheduling. 100G of size would become N times maximum discard extent units (N = 100G / max) and submitted for discard until N is 0 for a given input range. But if you know you'll have ranges that are orders of magnitude larger than the extent size, then setting the sysfs value accordingly seems like the right approach. I'm not sure if the freed ranges are coalesced before adding them to the discard list. If not, then icreasing the max size should work, otherwise the coalescing could be adjusted.
On 6 Nov 2020, at 8:20, Pavel Begunkov wrote: > On 05/11/2020 22:23, David Sterba wrote: >> On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote: >>> Several fixes for async discards. The first patch might increase >>> discard >>> rate, drastically in some cases. That may be a suprise for those >>> assuming that hitting iops_limit is rare and rarther outliers. >>> Though, >>> it still stays in allowed range, so should be fine. >> >> I think this highly depends on the workload, if you really need to >> issue >> the discards fast because of the rate of the change in the regular >> data. >> That was the point of the async discard and the knobs, the defaults >> should be ok for most users and allow adjusting for specific loads. > > Chris mentioned that _there are_ problems with faster drives though. > The problem is that this iops_limit knot just clamps the chosen delay. > Ultimately, I want to find later a better delay function than > > delay = CONSTANT_INTERVAL_MS / nr_extents. > > But that will take some thinking. Yeah, we have drives where latencies increase as the drive fills up. Async discard moves the knee on utilization percentage before the latencies increase, but without patches like these, we do still have garbage collection related stalls. We still need to measure if the GC related stalls turn into discard-saturation stalls, but at least now we can try. -chris