Message ID | 20220114093000.3323470-1-yukuai3@huawei.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [-next] blk-throttle: enable io throttle for root in cgroup v2 | expand |
On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote: > RFC patch: https://lkml.org/lkml/2021/9/9/1432 > > There is a proformance problem in our environment: > > A host can provide a remote device to difierent client. If one client is > under high io pressure, other clients might be affected. > > Limit the overall iops/bps(io.max) from the client can fix the problem, > however, config files do not exist in root cgroup currently, which makes > it impossible. > > This patch enables io throttle for root cgroup: > - enable "io.max" and "io.low" in root > - don't skip root group in tg_iops_limit() and tg_bps_limit() > - don't skip root group in tg_conf_updated() > > Signed-off-by: Yu Kuai <yukuai3@huawei.com> Yeah, I'm kinda split. It's a simple change with some utility, but it's also something which doesn't fit with the cgroup feature or interface. It's regulating the whole system behavior. There's no reason for any of the control "groups" to be involved here and semantically the interface would fit a lot better under /proc, /sys or some other system-wide location. Here are some points to consider: * As a comparison, it'd be rather absurd to enable memory.max at system root in terms of interface and most likely break whole lot of mm operations. * Resource control knobs of a cgroup belong to the parent as the parent is responsible for divvying up the available resources to its children. Here too, the knobs are making sense because there's a higher level parent (whether that's hypervisor or some network server). Is your use case VMs or network attached storage? Thanks.
在 2022/01/27 1:29, Tejun Heo 写道: > On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote: >> RFC patch: https://lkml.org/lkml/2021/9/9/1432 >> >> There is a proformance problem in our environment: >> >> A host can provide a remote device to difierent client. If one client is >> under high io pressure, other clients might be affected. >> >> Limit the overall iops/bps(io.max) from the client can fix the problem, >> however, config files do not exist in root cgroup currently, which makes >> it impossible. >> >> This patch enables io throttle for root cgroup: >> - enable "io.max" and "io.low" in root >> - don't skip root group in tg_iops_limit() and tg_bps_limit() >> - don't skip root group in tg_conf_updated() >> >> Signed-off-by: Yu Kuai <yukuai3@huawei.com> > > Yeah, I'm kinda split. It's a simple change with some utility, but it's also > something which doesn't fit with the cgroup feature or interface. It's > regulating the whole system behavior. There's no reason for any of the > control "groups" to be involved here and semantically the interface would > fit a lot better under /proc, /sys or some other system-wide location. Here > are some points to consider: > > * As a comparison, it'd be rather absurd to enable memory.max at system root > in terms of interface and most likely break whole lot of mm operations. > > * Resource control knobs of a cgroup belong to the parent as the parent is > responsible for divvying up the available resources to its children. Here > too, the knobs are making sense because there's a higher level parent > (whether that's hypervisor or some network server). > > Is your use case VMs or network attached storage? > Hi, In our case, the disk is provided by server, and such disk can be shared by multipul clients. Thus for the client side, the server is a higher level parent. Theoretically, limit the io from server for each client is feasible, however, the main reason we don't want to do this is the following shortcoming: client can still send io to server unlimited, we can just limit the amount of io that can complete from server, which might cause too much pressure on the server side. Thanks, Kuai
Hello, On Thu, Jan 27, 2022 at 10:36:38AM +0800, yukuai (C) wrote: > In our case, the disk is provided by server, and such disk can be shared > by multipul clients. Thus for the client side, the server is a higher > level parent. > > Theoretically, limit the io from server for each client is feasible, > however, the main reason we don't want to do this is the following > shortcoming: > > client can still send io to server unlimited, we can just limit the > amount of io that can complete from server, which might cause too much > pressure on the server side. I don't quite follow the "send io to server unlimited" part. Doesn't that get limited by available number of requests? ie. if the server throttles, the in-flight requests will take longer to complete which exhausts the available requests and thus slows down the client. That's how it's supposed to work on the local machine too. Thanks.
在 2022/02/02 1:20, Tejun Heo 写道: > Hello, > > On Thu, Jan 27, 2022 at 10:36:38AM +0800, yukuai (C) wrote: >> In our case, the disk is provided by server, and such disk can be shared >> by multipul clients. Thus for the client side, the server is a higher >> level parent. >> >> Theoretically, limit the io from server for each client is feasible, >> however, the main reason we don't want to do this is the following >> shortcoming: >> >> client can still send io to server unlimited, we can just limit the >> amount of io that can complete from server, which might cause too much >> pressure on the server side. > > I don't quite follow the "send io to server unlimited" part. Doesn't that > get limited by available number of requests? Hi, Tejun Here I mean that io is not limited through io throttle from client. Of course io must be limited by avaliable number of requests. > ie. if the server throttles, > the in-flight requests will take longer to complete which exhausts the > available requests and thus slows down the client. For example, if we have 8 clients, and available requests is 64. 1) If we don't limit iops, and each client send 64 io to server, some client might have performance issue. 2) If we limit iops to 8 from clients, then server can receive 64 io at most at the same time, both client and server should work fine. 3) If we limit iops to 8 for each client from server, client should work fine, however, server can receive 64 x 8 = 512 io at most at the same time, which might cause too much pressure on the server.(maybe bps is more appropriate to explain the high pressure). Thus I prefer to limit io from client. Thanks, Kuai > That's how it's supposed > to work on the local machine too. > > Thanks. >
Hello, On Tue, Feb 08, 2022 at 09:38:33AM +0800, yukuai (C) wrote: > 3) If we limit iops to 8 for each client from server, client should work > fine, however, server can receive 64 x 8 = 512 io at most at the same > time, which might cause too much pressure on the server.(maybe bps is > more appropriate to explain the high pressure). I don't follow this part. Why can't the server control however it wanna control? Thanks.
在 2022/02/09 2:49, Tejun Heo 写道: > Hello, > > On Tue, Feb 08, 2022 at 09:38:33AM +0800, yukuai (C) wrote: >> 3) If we limit iops to 8 for each client from server, client should work >> fine, however, server can receive 64 x 8 = 512 io at most at the same >> time, which might cause too much pressure on the server.(maybe bps is >> more appropriate to explain the high pressure). > > I don't follow this part. Why can't the server control however it wanna > control? Hi, Do you agree that the server can't control how many io it can receives from one client if we limit from server? I think the difference is that limit from client can control it... Thanks, Kuai > > Thanks. >
Hello Yu Kuai, On Fri, Jan 14, 2022 at 05:30:00PM +0800, Yu Kuai wrote: > RFC patch: https://lkml.org/lkml/2021/9/9/1432 > > There is a proformance problem in our environment: > > A host can provide a remote device to difierent client. If one client is > under high io pressure, other clients might be affected. Can you use the linux kernel storage term to describe the issue? Such as, I guess here host means target server(iscsi, nvme target?), client should be scsi initiator, or nvme host. If not, can you provide one actual example for your storage use case? With common term used, it becomes pretty easy for people to understand & solve the issue, and avoid any misunderstanding. > > Limit the overall iops/bps(io.max) from the client can fix the problem, Just be curious how each client can figure out perfect iops/bps limit? Given one client doesn't know how many clients are connected to the target server. It sounds like the throttle shouldn't be done in client side cgroup, given the throttle is nothing to do with tasks. Maybe it should be done in server side, since server has enough information to provide fair iops/bps allocation for each clients. Thanks, Ming
From: Ofir Gal <ofir@gal.software>
Hello Ming Lei,
I am trying to use cgroups v2 to throttle a media disk that is controlled by an NVME target.
Unfortunately, it cannot be done without setting the limit in the root cgroup.
It can be done via cgroups v1. Yu Kuai's patch allows this to be accomplished.
My setup consist from 3 servers.
Server #1:
a. SSD media disk (needs to be throttled to 100K IOPs)
b. NVME target controlling the SSD (1.a)
Server #2:
a. NVME initiator is connected to Server #1 NVME target (1.b)
Server #3:
a. NVME initiator is connected to Server #1 NVME target (1.b)
My setup accesses this media from multiple servers using NVMe over TCP,
but the initiator servers' workloads are unknown and can be changed dynamically. I need to limit the media disk to 100K IOPS on the target side.
I have tried to limit the SSD on Server #1, but it seems that the NVME target kworkers are not affected unless I use Yu Kuai's patch.
Can you elaborate on the issues with this patch or how the scenario described above can be done with cgroups v2?
Best regards, Ofir Gal.
On Sun, Feb 05, 2023 at 05:55:41PM +0200, Ofir Gal <ofir.gal@volumez.com> wrote: > I have tried to limit the SSD on Server #1, but it seems that the NVME > target kworkers are not affected unless I use Yu Kuai's patch. > > Can you elaborate on the issues with this patch or how the scenario > described above can be done with cgroups v2? The issue is that if there's a client that doesn't implement self-throttling you cannot guarantee anything on the server side. Hence the mechanism must exist on the server side. The NVME target should charge IO to respective blkcg's (just a generic advice, I'm not familiar with that interface; see also kthread_associate_blkcg() use in loop device driver). HTH, Michal
Hello Kuai. On Wed, Feb 09, 2022 at 09:22:30AM +0800, "yukuai (C)" <yukuai3@huawei.com> wrote: > Do you agree that the server can't control how many io it can receives > from one client if we limit from server? I think the difference is that > limit from client can control it... (Perhaps it depends on the protocol used for the IO but) eventually client requests would be noticably lost/dropped and that's how the server propagates the requested control onto the clients without relying on the clients throttling themselves. (Maybe better place to implement this would be a "testing" device mapper target akin to the 'delay' one.) Regards, Michal
diff --git a/block/blk-throttle.c b/block/blk-throttle.c index 7c462c006b26..ac25bfbbfe7f 100644 --- a/block/blk-throttle.c +++ b/block/blk-throttle.c @@ -156,9 +156,6 @@ static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw) struct throtl_data *td; uint64_t ret; - if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent) - return U64_MAX; - td = tg->td; ret = tg->bps[rw][td->limit_index]; if (ret == 0 && td->limit_index == LIMIT_LOW) { @@ -186,9 +183,6 @@ static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw) struct throtl_data *td; unsigned int ret; - if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent) - return UINT_MAX; - td = tg->td; ret = tg->iops[rw][td->limit_index]; if (ret == 0 && tg->td->limit_index == LIMIT_LOW) { @@ -1284,9 +1278,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global) struct throtl_grp *parent_tg; tg_update_has_rules(this_tg); - /* ignore root/second level */ - if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent || - !blkg->parent->parent) + /* ignore root level */ + if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent) continue; parent_tg = blkg_to_tg(blkg->parent); /* @@ -1625,7 +1618,6 @@ static struct cftype throtl_files[] = { #ifdef CONFIG_BLK_DEV_THROTTLING_LOW { .name = "low", - .flags = CFTYPE_NOT_ON_ROOT, .seq_show = tg_print_limit, .write = tg_set_limit, .private = LIMIT_LOW, @@ -1633,7 +1625,6 @@ static struct cftype throtl_files[] = { #endif { .name = "max", - .flags = CFTYPE_NOT_ON_ROOT, .seq_show = tg_print_limit, .write = tg_set_limit, .private = LIMIT_MAX,
RFC patch: https://lkml.org/lkml/2021/9/9/1432 There is a proformance problem in our environment: A host can provide a remote device to difierent client. If one client is under high io pressure, other clients might be affected. Limit the overall iops/bps(io.max) from the client can fix the problem, however, config files do not exist in root cgroup currently, which makes it impossible. This patch enables io throttle for root cgroup: - enable "io.max" and "io.low" in root - don't skip root group in tg_iops_limit() and tg_bps_limit() - don't skip root group in tg_conf_updated() Signed-off-by: Yu Kuai <yukuai3@huawei.com> --- block/blk-throttle.c | 13 ++----------- 1 file changed, 2 insertions(+), 11 deletions(-)