[0/4] fixes for btrfs async discards

Message ID	cover.1604444952.git.asml.silence@gmail.com (mailing list archive)
Headers	show Return-Path: <SRS0=SmLw=EK=vger.kernel.org=linux-btrfs-owner@kernel.org> From: Pavel Begunkov <asml.silence@gmail.com> To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>, David Sterba <dsterba@suse.com>, linux-btrfs@vger.kernel.org Subject: [PATCH 0/4] fixes for btrfs async discards Date: Wed, 4 Nov 2020 09:45:50 +0000 Message-Id: <cover.1604444952.git.asml.silence@gmail.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	fixes for btrfs async discards \| expand [0/4] fixes for btrfs async discards [1/4] btrfs: discard: speed up discard up to iops_limit [2/4] btrfs: discard: save discard delay as ns not jiffy [3/4] btrfs: don't miss discards after override-schedule [4/4] btrfs: discard: reschedule work after param update

Message ID

cover.1604444952.git.asml.silence@gmail.com (mailing list archive)

Headers

From: Pavel Begunkov <asml.silence@gmail.com>
To: Chris Mason <clm@fb.com>, Josef Bacik <josef@toxicpanda.com>,
        David Sterba <dsterba@suse.com>, linux-btrfs@vger.kernel.org
Subject: [PATCH 0/4] fixes for btrfs async discards
Date: Wed,  4 Nov 2020 09:45:50 +0000
Message-Id: <cover.1604444952.git.asml.silence@gmail.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

fixes for btrfs async discards | expand

Message

Pavel Begunkov Nov. 4, 2020, 9:45 a.m. UTC

Several fixes for async discards. The first patch might increase discard
rate, drastically in some cases. That may be a suprise for those
assuming that hitting iops_limit is rare and rarther outliers. Though,
it still stays in allowed range, so should be fine.

Pavel Begunkov (4):
  btrfs: discard: speed up discard up to iops_limit
  btrfs: discard: save discard delay as ns not jiffy
  btrfs: don't miss discards after override-schedule
  btrfs: discard: reschedule work after param update

 fs/btrfs/ctree.h   |  3 ++-
 fs/btrfs/discard.c | 35 +++++++++++++++++++++++------------
 fs/btrfs/sysfs.c   |  5 +++--
 3 files changed, 28 insertions(+), 15 deletions(-)

Comments

David Sterba Nov. 5, 2020, 10:23 p.m. UTC | #1

On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote:
> Several fixes for async discards. The first patch might increase discard
> rate, drastically in some cases. That may be a suprise for those
> assuming that hitting iops_limit is rare and rarther outliers. Though,
> it still stays in allowed range, so should be fine.

I think this highly depends on the workload, if you really need to issue
the discards fast because of the rate of the change in the regular data.
That was the point of the async discard and the knobs, the defaults
should be ok for most users and allow adjusting for specific loads.

My testing of the original discard patchset was tailored towards the
default usecase and likely intense than yours. I did not observe the
corner cases where the work queue scheduling was off, or changing the
sysfs values did not poke the right kthreads.

Patches look ok to me and I've added them to topic branch for testing,
going to misc-next later. Thanks.

Pavel Begunkov Nov. 6, 2020, 1:20 p.m. UTC | #2

On 05/11/2020 22:23, David Sterba wrote:
> On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote:
>> Several fixes for async discards. The first patch might increase discard
>> rate, drastically in some cases. That may be a suprise for those
>> assuming that hitting iops_limit is rare and rarther outliers. Though,
>> it still stays in allowed range, so should be fine.
> 
> I think this highly depends on the workload, if you really need to issue
> the discards fast because of the rate of the change in the regular data.
> That was the point of the async discard and the knobs, the defaults
> should be ok for most users and allow adjusting for specific loads.

Chris mentioned that _there are_ problems with faster drives though.
The problem is that this iops_limit knot just clamps the chosen delay.
Ultimately, I want to find later a better delay function than

delay = CONSTANT_INTERVAL_MS / nr_extents.

But that will take some thinking.

For instance, one of the cases I've seen is recycling large extents
like deletion of subvolumes. There we have a small number of extents
but each takes a lot of space, so there are, say, 100+GB queued to be
discarded. But because there are few extents, delay is calculated
to >10s that's then clamped to a constant max limit.
That was taking a long to recycle. Not sure though how many bytes/extents
are discarded on each iteration of btrfs_discard_workfn().

> 
> My testing of the original discard patchset was tailored towards the
> default usecase and likely intense than yours. I did not observe the
> corner cases where the work queue scheduling was off, or changing the
> sysfs values did not poke the right kthreads.
> 
> Patches look ok to me and I've added them to topic branch for testing,
> going to misc-next later. Thanks.
>

David Sterba Nov. 6, 2020, 1:56 p.m. UTC | #3

On Fri, Nov 06, 2020 at 01:20:25PM +0000, Pavel Begunkov wrote:
> On 05/11/2020 22:23, David Sterba wrote:
> > On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote:
> >> Several fixes for async discards. The first patch might increase discard
> >> rate, drastically in some cases. That may be a suprise for those
> >> assuming that hitting iops_limit is rare and rarther outliers. Though,
> >> it still stays in allowed range, so should be fine.
> > 
> > I think this highly depends on the workload, if you really need to issue
> > the discards fast because of the rate of the change in the regular data.
> > That was the point of the async discard and the knobs, the defaults
> > should be ok for most users and allow adjusting for specific loads.
> 
> Chris mentioned that _there are_ problems with faster drives though.
> The problem is that this iops_limit knot just clamps the chosen delay.
> Ultimately, I want to find later a better delay function than
> 
> delay = CONSTANT_INTERVAL_MS / nr_extents.
> 
> But that will take some thinking.
> 
> For instance, one of the cases I've seen is recycling large extents
> like deletion of subvolumes. There we have a small number of extents
> but each takes a lot of space, so there are, say, 100+GB queued to be
> discarded. But because there are few extents, delay is calculated
> to >10s that's then clamped to a constant max limit.
> That was taking a long to recycle. Not sure though how many bytes/extents
> are discarded on each iteration of btrfs_discard_workfn().

BTRFS_ASYNC_DISCARD_DEFAULT_MAX_SIZE 64M

So the few large extents do not fit current scheme. I thought some
translation to "logical" units could do the same discard io scheduling.
100G of size would become N times maximum discard extent units
(N = 100G / max) and submitted for discard until N is 0 for a given
input range.

But if you know you'll have ranges that are orders of magnitude larger
than the extent size, then setting the sysfs value accordingly seems
like the right approach. I'm not sure if the freed ranges are coalesced
before adding them to the discard list. If not, then icreasing the max
size should work, otherwise the coalescing could be adjusted.

Chris Mason Nov. 6, 2020, 2:19 p.m. UTC | #4

On 6 Nov 2020, at 8:20, Pavel Begunkov wrote:

> On 05/11/2020 22:23, David Sterba wrote:
>> On Wed, Nov 04, 2020 at 09:45:50AM +0000, Pavel Begunkov wrote:
>>> Several fixes for async discards. The first patch might increase 
>>> discard
>>> rate, drastically in some cases. That may be a suprise for those
>>> assuming that hitting iops_limit is rare and rarther outliers. 
>>> Though,
>>> it still stays in allowed range, so should be fine.
>>
>> I think this highly depends on the workload, if you really need to 
>> issue
>> the discards fast because of the rate of the change in the regular 
>> data.
>> That was the point of the async discard and the knobs, the defaults
>> should be ok for most users and allow adjusting for specific loads.
>
> Chris mentioned that _there are_ problems with faster drives though.
> The problem is that this iops_limit knot just clamps the chosen delay.
> Ultimately, I want to find later a better delay function than
>
> delay = CONSTANT_INTERVAL_MS / nr_extents.
>
> But that will take some thinking.

Yeah, we have drives where latencies increase as the drive fills up.  
Async discard moves the knee on utilization percentage before the 
latencies increase, but without patches like these, we do still have 
garbage collection related stalls.  We still need to measure if the GC 
related stalls turn into discard-saturation stalls, but at least now we 
can try.

-chris