From patchwork Tue Nov 14 23:10:22 2017 Content-Type: text/plain; charset="utf-8" MIME-Version: 1.0 Content-Transfer-Encoding: 7bit X-Patchwork-Submitter: Khazhy Kumykov X-Patchwork-Id: 10058549 Return-Path: Received: from mail.wl.linuxfoundation.org (pdx-wl-mail.web.codeaurora.org [172.30.200.125]) by pdx-korg-patchwork.web.codeaurora.org (Postfix) with ESMTP id 5279660215 for ; Tue, 14 Nov 2017 23:14:02 +0000 (UTC) Received: from mail.wl.linuxfoundation.org (localhost [127.0.0.1]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4055E29AD0 for ; Tue, 14 Nov 2017 23:14:02 +0000 (UTC) Received: by mail.wl.linuxfoundation.org (Postfix, from userid 486) id 34D5129AEF; Tue, 14 Nov 2017 23:14:02 +0000 (UTC) X-Spam-Checker-Version: SpamAssassin 3.3.1 (2010-03-16) on pdx-wl-mail.web.codeaurora.org X-Spam-Level: X-Spam-Status: No, score=-5.3 required=2.0 tests=BAYES_00,DKIM_SIGNED, DKIM_VALID, DKIM_VALID_AU, MIME_HEADER_CTYPE_ONLY, MIME_NO_TEXT, RCVD_IN_DNSWL_HI, T_TVD_MIME_NO_HEADERS autolearn=ham version=3.3.1 Received: from vger.kernel.org (vger.kernel.org [209.132.180.67]) by mail.wl.linuxfoundation.org (Postfix) with ESMTP id 4AF6729AD0 for ; Tue, 14 Nov 2017 23:14:01 +0000 (UTC) Received: (majordomo@vger.kernel.org) by vger.kernel.org via listexpand id S1755093AbdKNXOA (ORCPT ); Tue, 14 Nov 2017 18:14:00 -0500 Received: from mail-pg0-f68.google.com ([74.125.83.68]:53910 "EHLO mail-pg0-f68.google.com" rhost-flags-OK-OK-OK-OK) by vger.kernel.org with ESMTP id S1756875AbdKNXNu (ORCPT ); Tue, 14 Nov 2017 18:13:50 -0500 Received: by mail-pg0-f68.google.com with SMTP id s2so16524408pge.10 for ; Tue, 14 Nov 2017 15:13:50 -0800 (PST) DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=google.com; s=20161025; h=from:to:cc:subject:date:message-id; bh=Eg9ZZD3KXA3hqBWo92SHN77ELqygFNA9bT1m1283vN4=; b=XRMCJwUNDo0oQEwATos7n5PAtmzFaLr/xwrv7fAv1Q+P5+D1LLfg1Hhmgb1n+nVeVh Qv6HZvVo7ZNtMgdFF2d23xNUIW+m/qQhFCZjODrft4wne9pPkxnP7ETbieJIUn2TtmJW mMYvxvm0WBxuagEP5MBch5W7Hyzx/80urQIdigaf3+vUYX83a9Ieww59msloFdEuqkUk O/SbDyRxwVJH3jJbuh0SA7tOWo0/nVQpH98pLF5111HEu35ZnEEfZRrBW5mAzBYwSDe8 +m+h0mrU3X8bQvnOH38u2vV6T7u2aH1h+hk7z846PYpkZ7KeQYV1UJf3g6IguozqJWqK wdFA== X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed; d=1e100.net; s=20161025; h=x-gm-message-state:from:to:cc:subject:date:message-id; bh=Eg9ZZD3KXA3hqBWo92SHN77ELqygFNA9bT1m1283vN4=; b=Z5afktGFxlCzGIxuWDZ6nHT0surVnWOByBXVxYkvVc8iU14U8OrxPJNVMPad0F1nfV mKPiUhvjPT/LgZMwjvLp6Psl6tJQdrECnsFWy5p795Z8RN8mEBbWwa84ZyyXjDNelRFy KH+hfJfUsfBkb69gW7Yq8XYbKG7+Ptx92YLrAAAbZwa2VmUOb6qLJIfmbTOAXuEotzy4 26I9VMNvz/sDXsiWgLbPPUDkYvJcJLOMkkMMNUQx1jS8fCKuf6V/utiYXsvjoZeAnkdz UidyzLkxbW1ma2A3UsX/Z1DWElwM7qu1kwq1eKZIpc0cP6+oRijwLVkTkf6hb93NyGHP gAQg== X-Gm-Message-State: AJaThX6pt8gIk4RipmKdtO9V0gJdVWsZHDaMxR6/o5R06/A05y6yfwgK LLSL7sVLC+qNa00FSnGBXrgedQ== X-Google-Smtp-Source: AGs4zMbgh7pSn7D3XhnDbeT70XJNSnj08ShwroZtZcYyyE6ZRIBqFer+gkMzggdBc+wl8o+MbRv3WQ== X-Received: by 10.101.91.137 with SMTP id i9mr8308847pgr.313.1510701228986; Tue, 14 Nov 2017 15:13:48 -0800 (PST) Received: from khazhy.svl.corp.google.com ([100.116.144.80]) by smtp.gmail.com with ESMTPSA id l23sm42018692pfg.83.2017.11.14.15.13.47 (version=TLS1_2 cipher=ECDHE-RSA-AES128-SHA bits=128/128); Tue, 14 Nov 2017 15:13:48 -0800 (PST) From: Khazhismel Kumykov To: axobe@kernel.dk, shli@fb.com, vgoyal@redhat.com, tj@kernel.org Cc: linux-kernel@vger.kernel.org, linux-block@vger.kernel.org, Khazhismel Kumykov Subject: [RFC PATCH] blk-throttle: add burst allowance. Date: Tue, 14 Nov 2017 15:10:22 -0800 Message-Id: <20171114231022.42961-1-khazhy@google.com> X-Mailer: git-send-email 2.15.0.448.gf294e3d99a-goog Sender: linux-block-owner@vger.kernel.org Precedence: bulk List-ID: X-Mailing-List: linux-block@vger.kernel.org X-Virus-Scanned: ClamAV using ClamSMTP Allows configuration additional bytes or ios before a throttle is triggered. This allows implementation of a bucket style rate-limit/throttle on a block device. Previously, bursting to a device was limited to allowance granted in a single throtl_slice (similar to a bucket with limit N and refill rate N/slice). Additional parameters bytes/io_burst_conf defined for tg, which define a number of bytes/ios that must be depleted before throttling happens. A tg that does not deplete this allowance functions as though it has no configured limits. tgs earn additional allowance at rate defined by bps/iops for the tg. Once a tg has *_disp > *_burst_conf, throttling kicks in. If a tg is idle for a while, it will again have some burst allowance before it gets throttled again. slice_end for a tg is extended until io_disp/byte_disp would fall to 0, when all "used" burst allowance would be earned back. trim_slice still does progress slice_start as before and decrements *_disp as before, and tgs continue to get bytes/ios in throtl_slice intervals. Signed-off-by: Khazhismel Kumykov --- block/Kconfig | 11 +++ block/blk-throttle.c | 192 +++++++++++++++++++++++++++++++++++++++++++++++---- 2 files changed, 189 insertions(+), 14 deletions(-) diff --git a/block/Kconfig b/block/Kconfig index 28ec55752b68..fbd05b419f93 100644 --- a/block/Kconfig +++ b/block/Kconfig @@ -128,6 +128,17 @@ config BLK_DEV_THROTTLING_LOW Note, this is an experimental interface and could be changed someday. +config BLK_DEV_THROTTLING_BURST + bool "Block throttling .burst allowance interface" + depends on BLK_DEV_THROTTLING + default n + ---help--- + Add .burst allowance for block throttling. Burst allowance allows for + additional unthrottled usage, while still limiting speed for sustained + usage. + + If in doubt, say N. + config BLK_CMDLINE_PARSER bool "Block device command line partition parser" default n diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 96ad32623427..27c084312772 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -157,6 +157,11 @@ struct throtl_grp { /* Number of bio's dispatched in current slice */ unsigned int io_disp[2]; +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + uint64_t bytes_burst_conf[2]; + unsigned int io_burst_conf[2]; +#endif + unsigned long last_low_overflow_time[2]; uint64_t last_bytes_disp[2]; @@ -507,6 +512,12 @@ static struct blkg_policy_data *throtl_pd_alloc(gfp_t gfp, int node) tg->bps_conf[WRITE][LIMIT_MAX] = U64_MAX; tg->iops_conf[READ][LIMIT_MAX] = UINT_MAX; tg->iops_conf[WRITE][LIMIT_MAX] = UINT_MAX; +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + tg->bytes_burst_conf[READ] = 0; + tg->bytes_burst_conf[WRITE] = 0; + tg->io_burst_conf[READ] = 0; + tg->io_burst_conf[WRITE] = 0; +#endif /* LIMIT_LOW will have default value 0 */ tg->latency_target = DFL_LATENCY_TARGET; @@ -800,6 +811,26 @@ static inline void throtl_start_new_slice(struct throtl_grp *tg, bool rw) tg->slice_end[rw], jiffies); } +/* + * When current slice should end. + * + * With CONFIG_BLK_DEV_THROTTLING_BURST, we will wait longer than min_wait + * for slice to recover used burst allowance. (*_disp -> 0). Setting slice_end + * before this would result in tg receiving additional burst allowance. + */ +static inline unsigned long throtl_slice_wait(struct throtl_grp *tg, bool rw, + unsigned long min_wait) +{ + unsigned long bytes_wait = 0, io_wait = 0; +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + if (tg->bytes_burst_conf[rw]) + bytes_wait = (tg->bytes_disp[rw] * HZ) / tg_bps_limit(tg, rw); + if (tg->io_burst_conf[rw]) + io_wait = (tg->io_disp[rw] * HZ) / tg_iops_limit(tg, rw); +#endif + return max(min_wait, max(bytes_wait, io_wait)); +} + static inline void throtl_set_slice_end(struct throtl_grp *tg, bool rw, unsigned long jiffy_end) { @@ -849,7 +880,8 @@ static inline void throtl_trim_slice(struct throtl_grp *tg, bool rw) * is bad because it does not allow new slice to start. */ - throtl_set_slice_end(tg, rw, jiffies + tg->td->throtl_slice); + throtl_set_slice_end(tg, rw, + jiffies + throtl_slice_wait(tg, rw, tg->td->throtl_slice)); time_elapsed = jiffies - tg->slice_start[rw]; @@ -889,7 +921,7 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, unsigned long *wait) { bool rw = bio_data_dir(bio); - unsigned int io_allowed; + unsigned int io_allowed, io_disp; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; u64 tmp; @@ -908,6 +940,17 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, * have been trimmed. */ + io_disp = tg->io_disp[rw]; + +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + if (tg->io_disp[rw] < tg->io_burst_conf[rw]) { + if (wait) + *wait = 0; + return true; + } + io_disp -= tg->io_burst_conf[rw]; +#endif + tmp = (u64)tg_iops_limit(tg, rw) * jiffy_elapsed_rnd; do_div(tmp, HZ); @@ -916,14 +959,14 @@ static bool tg_with_in_iops_limit(struct throtl_grp *tg, struct bio *bio, else io_allowed = tmp; - if (tg->io_disp[rw] + 1 <= io_allowed) { + if (io_disp + 1 <= io_allowed) { if (wait) *wait = 0; return true; } /* Calc approx time to dispatch */ - jiffy_wait = ((tg->io_disp[rw] + 1) * HZ) / tg_iops_limit(tg, rw) + 1; + jiffy_wait = ((io_disp + 1) * HZ) / tg_iops_limit(tg, rw) + 1; if (jiffy_wait > jiffy_elapsed) jiffy_wait = jiffy_wait - jiffy_elapsed; @@ -939,7 +982,7 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, unsigned long *wait) { bool rw = bio_data_dir(bio); - u64 bytes_allowed, extra_bytes, tmp; + u64 bytes_allowed, extra_bytes, bytes_disp, tmp; unsigned long jiffy_elapsed, jiffy_wait, jiffy_elapsed_rnd; unsigned int bio_size = throtl_bio_data_size(bio); @@ -951,18 +994,28 @@ static bool tg_with_in_bps_limit(struct throtl_grp *tg, struct bio *bio, jiffy_elapsed_rnd = roundup(jiffy_elapsed_rnd, tg->td->throtl_slice); + bytes_disp = tg->bytes_disp[rw]; +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + if (tg->bytes_disp[rw] < tg->bytes_burst_conf[rw]) { + if (wait) + *wait = 0; + return true; + } + bytes_disp -= tg->bytes_burst_conf[rw]; +#endif + tmp = tg_bps_limit(tg, rw) * jiffy_elapsed_rnd; do_div(tmp, HZ); bytes_allowed = tmp; - if (tg->bytes_disp[rw] + bio_size <= bytes_allowed) { + if (bytes_disp + bio_size <= bytes_allowed) { if (wait) *wait = 0; return true; } /* Calc approx time to dispatch */ - extra_bytes = tg->bytes_disp[rw] + bio_size - bytes_allowed; + extra_bytes = bytes_disp + bio_size - bytes_allowed; jiffy_wait = div64_u64(extra_bytes * HZ, tg_bps_limit(tg, rw)); if (!jiffy_wait) @@ -986,7 +1039,7 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, unsigned long *wait) { bool rw = bio_data_dir(bio); - unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0; + unsigned long bps_wait = 0, iops_wait = 0, max_wait = 0, disp_time; /* * Currently whole state machine of group depends on first bio @@ -1015,10 +1068,10 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, if (throtl_slice_used(tg, rw) && !(tg->service_queue.nr_queued[rw])) throtl_start_new_slice(tg, rw); else { - if (time_before(tg->slice_end[rw], - jiffies + tg->td->throtl_slice)) - throtl_extend_slice(tg, rw, - jiffies + tg->td->throtl_slice); + disp_time = jiffies + throtl_slice_wait( + tg, rw, tg->td->throtl_slice); + if (time_before(tg->slice_end[rw], disp_time)) + throtl_extend_slice(tg, rw, disp_time); } if (tg_with_in_bps_limit(tg, bio, &bps_wait) && @@ -1033,8 +1086,9 @@ static bool tg_may_dispatch(struct throtl_grp *tg, struct bio *bio, if (wait) *wait = max_wait; - if (time_before(tg->slice_end[rw], jiffies + max_wait)) - throtl_extend_slice(tg, rw, jiffies + max_wait); + disp_time = jiffies + throtl_slice_wait(tg, rw, max_wait); + if (time_before(tg->slice_end[rw], disp_time)) + throtl_extend_slice(tg, rw, disp_time); return 0; } @@ -1705,6 +1759,108 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, return ret ?: nbytes; } +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST +static u64 tg_prfill_burst(struct seq_file *sf, struct blkg_policy_data *pd, + int data) +{ + struct throtl_grp *tg = pd_to_tg(pd); + const char *dname = blkg_dev_name(pd->blkg); + char bufs[4][21]; + + if (!dname) + return 0; + + if (tg->bytes_burst_conf[READ] == 0 && + tg->bytes_burst_conf[WRITE] == 0 && + tg->io_burst_conf[READ] == 0 && + tg->io_burst_conf[WRITE] == 0) + return 0; + + snprintf(bufs[0], sizeof(bufs[0]), "%llu", + tg->bytes_burst_conf[READ]); + snprintf(bufs[1], sizeof(bufs[1]), "%llu", + tg->bytes_burst_conf[WRITE]); + snprintf(bufs[2], sizeof(bufs[2]), "%u", + tg->io_burst_conf[READ]); + snprintf(bufs[3], sizeof(bufs[3]), "%u", + tg->io_burst_conf[WRITE]); + + seq_printf(sf, "%s brbyte=%s bwbyte=%s brio=%s bwio=%s\n", + dname, bufs[0], bufs[1], bufs[2], bufs[3]); + return 0; +} + +static int tg_print_burst(struct seq_file *sf, void *v) +{ + blkcg_print_blkgs(sf, css_to_blkcg(seq_css(sf)), tg_prfill_burst, + &blkcg_policy_throtl, 0, false); + return 0; +} + +static ssize_t tg_set_burst(struct kernfs_open_file *of, + char *buf, size_t nbytes, loff_t off) +{ + struct blkcg *blkcg = css_to_blkcg(of_css(of)); + struct blkg_conf_ctx ctx; + struct throtl_grp *tg; + u64 v[4]; + int ret; + + ret = blkg_conf_prep(blkcg, &blkcg_policy_throtl, buf, &ctx); + if (ret) + return ret; + + tg = blkg_to_tg(ctx.blkg); + + v[0] = tg->bytes_burst_conf[READ]; + v[1] = tg->bytes_burst_conf[WRITE]; + v[2] = tg->io_burst_conf[READ]; + v[3] = tg->io_burst_conf[WRITE]; + + while (true) { + char tok[28]; /* bwbyte=18446744073709551616 */ + char *p; + u64 val = U64_MAX; + int len; + + if (sscanf(ctx.body, "%27s%n", tok, &len) != 1) + break; + if (tok[0] == '\0') + break; + ctx.body += len; + + ret = -EINVAL; + p = tok; + strsep(&p, "="); + if (!p || (kstrtoull(p, 0, &val) != 0 && strcmp(p, "max"))) + goto out_finish; + + ret = -EINVAL; + if (!strcmp(tok, "brbyte")) + v[0] = val; + else if (!strcmp(tok, "bwbyte")) + v[1] = val; + else if (!strcmp(tok, "brio")) + v[2] = min_t(u64, val, UINT_MAX); + else if (!strcmp(tok, "bwio")) + v[3] = min_t(u64, val, UINT_MAX); + else + goto out_finish; + } + + tg->bytes_burst_conf[READ] = v[0]; + tg->bytes_burst_conf[WRITE] = v[1]; + tg->io_burst_conf[READ] = v[2]; + tg->io_burst_conf[WRITE] = v[3]; + + tg_conf_updated(tg, false); + ret = 0; +out_finish: + blkg_conf_finish(&ctx); + return ret ?: nbytes; +} +#endif + static struct cftype throtl_files[] = { #ifdef CONFIG_BLK_DEV_THROTTLING_LOW { @@ -1714,6 +1870,14 @@ static struct cftype throtl_files[] = { .write = tg_set_limit, .private = LIMIT_LOW, }, +#endif +#ifdef CONFIG_BLK_DEV_THROTTLING_BURST + { + .name = "burst", + .flags = CFTYPE_NOT_ON_ROOT, + .seq_show = tg_print_burst, + .write = tg_set_burst, + }, #endif { .name = "max",