Message ID | 20240406080059.2248314-6-yukuai1@huaweicloud.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | blk-throttle: support enable and disable during runtime | expand |
Hello, On Sat, Apr 06, 2024 at 04:00:58PM +0800, Yu Kuai wrote: > From: Yu Kuai <yukuai3@huawei.com> > > Currently once blk-throttle is enabled, it can't be destroyed until disk > removal, even it's disabled. > > Also prepare to support building it as kernel module. The benefit of doing this whenever the ruleset becomes empty seems marginal. This isn't necessary to allow unloading blk-throttle and blkg_conf_exit_blkg() is also necessary because of this, right? Thanks.
Hi, 在 2024/04/13 2:05, Tejun Heo 写道: > Hello, > > On Sat, Apr 06, 2024 at 04:00:58PM +0800, Yu Kuai wrote: >> From: Yu Kuai <yukuai3@huawei.com> >> >> Currently once blk-throttle is enabled, it can't be destroyed until disk >> removal, even it's disabled. >> >> Also prepare to support building it as kernel module. > > The benefit of doing this whenever the ruleset becomes empty seems marginal. > This isn't necessary to allow unloading blk-throttle and > blkg_conf_exit_blkg() is also necessary because of this, right? Yes, this is why blkg_conf_exit_blkg() is necessary. I think that we need find an appropriate time to unload blk-throttle other than deleting the gendisk. I also think of adding a new user input like "8:0 free" to do this. These are the solutions that I can think of for now. Thanks, Kuai > > Thanks. >
Hello, On Sat, Apr 13, 2024 at 10:06:00AM +0800, Yu Kuai wrote: > I think that we need find an appropriate time to unload blk-throttle > other than deleting the gendisk. I also think of adding a new user input > like "8:0 free" to do this. These are the solutions that I can think of > for now. Probably a better interface is for unloading to force blk-throtl to be deactivated rather than asking the user to nuke all configs. Thanks.
Hi, 在 2024/04/17 1:09, Tejun Heo 写道: > Hello, > > On Sat, Apr 13, 2024 at 10:06:00AM +0800, Yu Kuai wrote: >> I think that we need find an appropriate time to unload blk-throttle >> other than deleting the gendisk. I also think of adding a new user input >> like "8:0 free" to do this. These are the solutions that I can think of >> for now. > > Probably a better interface is for unloading to force blk-throtl to be > deactivated rather than asking the user to nuke all configs. I was thinking that rmmod in this case should return busy, for example, if bfq is currently used for some disk, rmmod bfq will return busy. Is there any example that unloading will deactivate resources that users are still using? Thanks, Kuai > > Thanks. >
Hello, On Wed, Apr 17, 2024 at 09:13:34AM +0800, Yu Kuai wrote: > > Probably a better interface is for unloading to force blk-throtl to be > > deactivated rather than asking the user to nuke all configs. > > I was thinking that rmmod in this case should return busy, for example, > if bfq is currently used for some disk, rmmod bfq will return busy. > > Is there any example that unloading will deactivate resources that users > are still using? Hmm... yeah, I'm not sure. Pinning the module while in use is definitely more conventional, so let's stick with that. It's usually achieved by inc'ing the module's ref on each usage, so here, the module refs would be counting the number of active rules, I guess. I'm not sure about modularization tho mostly because we've historically had a lot of lifetime issues around block and blkcg data structures and the supposed gain here is rather minimal. We only have a handful of these policies and they aren't that big. If hot path overhead when not being used is concern, lazy init solves most of it, no? Thanks.
Hi, 在 2024/04/17 9:22, Tejun Heo 写道: > Hello, > > On Wed, Apr 17, 2024 at 09:13:34AM +0800, Yu Kuai wrote: >>> Probably a better interface is for unloading to force blk-throtl to be >>> deactivated rather than asking the user to nuke all configs. >> >> I was thinking that rmmod in this case should return busy, for example, >> if bfq is currently used for some disk, rmmod bfq will return busy. >> >> Is there any example that unloading will deactivate resources that users >> are still using? > > Hmm... yeah, I'm not sure. Pinning the module while in use is definitely > more conventional, so let's stick with that. It's usually achieved by > inc'ing the module's ref on each usage, so here, the module refs would be > counting the number of active rules, I guess. Yes, aggred. > > I'm not sure about modularization tho mostly because we've historically had > a lot of lifetime issues around block and blkcg data structures and the > supposed gain here is rather minimal. We only have a handful of these > policies and they aren't that big. > > If hot path overhead when not being used is concern, lazy init solves most > of it, no? For performance, yes. Another point is that we can reduce kernel size this way, without losing support for blk-cgroup policies. Yes, it's just blk-throttle is the most pain in the ass becasue it exposed lots of implementations to block layer. Other rq_qos based policy should be much easier. I guess I'll do lazy init first, and then modularization for rq_qos, and leave blk-throtl there for now. Perhaps add a new throtl model in iocost can replace blk-throtl in the future. BTW, currently during test of iocost, I found that iocost can already achieve that, for example, by following configure: echo "$dev enable=1 min=100 max=100" > qos echo "$dev wbps=4096 wseqiops=1 wrandiops=1" > model In the test, I found that wbps and iops is actually limited to the set value. Thanks, Kuai > > Thanks. >
Hello, On Wed, Apr 17, 2024 at 09:39:43AM +0800, Yu Kuai wrote: ... > I guess I'll do lazy init first, and then modularization for rq_qos, > and leave blk-throtl there for now. Perhaps add a new throtl model in > iocost can replace blk-throtl in the future. That sounds like a plan. > BTW, currently during test of iocost, I found that iocost can already > achieve that, for example, by following configure: > > echo "$dev enable=1 min=100 max=100" > qos > echo "$dev wbps=4096 wseqiops=1 wrandiops=1" > model > > In the test, I found that wbps and iops is actually limited to the > set value. Yeah, it shouldn't be too difficult to add .max support to iocost so that you can say something like "this cgroup subtree can't use more than 60% of available capacity". That'd be really cool to have. Thanks.
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index b371442131fe..5c16be07a594 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -28,6 +28,7 @@ /* A workqueue to queue throttle related work */ static struct workqueue_struct *kthrotld_workqueue; +static DECLARE_WAIT_QUEUE_HEAD(destroy_wait); #define rb_entry_tg(node) rb_entry((node), struct throtl_grp, rb_node) @@ -906,6 +907,11 @@ static void start_parent_slice_with_credit(struct throtl_grp *child_tg, } +static bool td_has_io(struct throtl_data *td) +{ + return td->nr_queued[READ] + td->nr_queued[WRITE] != 0; +} + static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) { struct throtl_service_queue *sq = &tg->service_queue; @@ -941,6 +947,8 @@ static void tg_dispatch_one_bio(struct throtl_grp *tg, bool rw) &parent_sq->queued[rw]); BUG_ON(tg->td->nr_queued[rw] <= 0); tg->td->nr_queued[rw]--; + if (!td_has_io(tg->td)) + wake_up(&destroy_wait); } throtl_trim_slice(tg, rw); @@ -1268,6 +1276,31 @@ static int blk_throtl_init(struct gendisk *disk) return ret; } +void blk_throtl_exit(struct gendisk *disk) +{ + struct request_queue *q = disk->queue; + + if (!q->td) + return; + + del_timer_sync(&q->td->service_queue.pending_timer); + cancel_work_sync(&q->td->dispatch_work); + blkcg_deactivate_policy(disk, &blkcg_policy_throtl); + kfree(q->td); + q->td = NULL; +} + +static void blk_throtl_destroy(struct gendisk *disk) +{ + struct throtl_data *td = disk->queue->td; + + /* + * There are no rules, all throttled BIO should be dispatched + * immediately. + */ + wait_event(destroy_wait, !td_has_io(td)); + blk_throtl_exit(disk); +} static ssize_t tg_set_conf(struct kernfs_open_file *of, char *buf, size_t nbytes, loff_t off, bool is_u64) @@ -1308,7 +1341,11 @@ static ssize_t tg_set_conf(struct kernfs_open_file *of, else *(unsigned int *)((void *)tg + of_cft(of)->private) = v; - tg_conf_updated(tg, false); + blkg_conf_exit_blkg(&ctx); + + if (!tg_conf_updated(tg, false)) + blk_throtl_destroy(ctx.bdev->bd_disk); + ret = 0; out_finish: blkg_conf_exit(&ctx); @@ -1516,7 +1553,11 @@ static ssize_t tg_set_limit(struct kernfs_open_file *of, tg->iops[READ] = v[2]; tg->iops[WRITE] = v[3]; - tg_conf_updated(tg, false); + blkg_conf_exit_blkg(&ctx); + + if (!tg_conf_updated(tg, false)) + blk_throtl_destroy(ctx.bdev->bd_disk); + ret = 0; out_finish: blkg_conf_exit(&ctx); @@ -1533,13 +1574,6 @@ static struct cftype throtl_files[] = { { } /* terminate */ }; -static void throtl_shutdown_wq(struct request_queue *q) -{ - struct throtl_data *td = q->td; - - cancel_work_sync(&td->dispatch_work); -} - struct blkcg_policy blkcg_policy_throtl = { .dfl_cftypes = throtl_files, .legacy_cftypes = throtl_legacy_files, @@ -1688,19 +1722,6 @@ bool __blk_throtl_bio(struct bio *bio) return throttled; } -void blk_throtl_exit(struct gendisk *disk) -{ - struct request_queue *q = disk->queue; - - if (!q->td) - return; - - del_timer_sync(&q->td->service_queue.pending_timer); - throtl_shutdown_wq(q); - blkcg_deactivate_policy(disk, &blkcg_policy_throtl); - kfree(q->td); -} - static int __init throtl_init(void) { kthrotld_workqueue = alloc_workqueue("kthrotld", WQ_MEM_RECLAIM, 0);