diff mbox series

[RFC] blk-throttle: enable io throttle for root in cgroup v2

Message ID 20210909140815.2600858-1-yukuai3@huawei.com (mailing list archive)
State New, archived
Headers show
Series [RFC] blk-throttle: enable io throttle for root in cgroup v2 | expand

Commit Message

Yu Kuai Sept. 9, 2021, 2:08 p.m. UTC
We want to limit the overall iops/bps of the device in cgroup v2,
however "io.max" or "io.min" do not exist in root cgroup currently,
which makes it impossible.

This patch enables io throttle for root cgroup:
 - enable "io.max" and "io.min" in root
 - don't skip root group in tg_iops_limit() and tg_bps_limit()
 - don't skip root group in tg_conf_updated()

I'm not sure why this feature is disabled in the first place, is
there any problem or design constraint?

Signed-off-by: Yu Kuai <yukuai3@huawei.com>
---
 block/blk-throttle.c | 13 ++-----------
 1 file changed, 2 insertions(+), 11 deletions(-)

Comments

Yu Kuai Sept. 15, 2021, 8:20 a.m. UTC | #1
On 2021/09/09 22:08, Yu Kuai wrote:
> We want to limit the overall iops/bps of the device in cgroup v2,
> however "io.max" or "io.min" do not exist in root cgroup currently,
> which makes it impossible.
> 
> This patch enables io throttle for root cgroup:
>   - enable "io.max" and "io.min" in root
>   - don't skip root group in tg_iops_limit() and tg_bps_limit()
>   - don't skip root group in tg_conf_updated()
> 
> I'm not sure why this feature is disabled in the first place, is
> there any problem or design constraint?
> 
Ping...

BTW, sorry about the misspell of "io.low".
> Signed-off-by: Yu Kuai <yukuai3@huawei.com>
> ---
>   block/blk-throttle.c | 13 ++-----------
>   1 file changed, 2 insertions(+), 11 deletions(-)
> 
> diff --git a/block/blk-throttle.c b/block/blk-throttle.c
> index 55c49015e533..fffe072de538 100644
> --- a/block/blk-throttle.c
> +++ b/block/blk-throttle.c
> @@ -302,9 +302,6 @@ static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw)
>   	struct throtl_data *td;
>   	uint64_t ret;
>   
> -	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
> -		return U64_MAX;
> -
>   	td = tg->td;
>   	ret = tg->bps[rw][td->limit_index];
>   	if (ret == 0 && td->limit_index == LIMIT_LOW) {
> @@ -332,9 +329,6 @@ static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw)
>   	struct throtl_data *td;
>   	unsigned int ret;
>   
> -	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
> -		return UINT_MAX;
> -
>   	td = tg->td;
>   	ret = tg->iops[rw][td->limit_index];
>   	if (ret == 0 && tg->td->limit_index == LIMIT_LOW) {
> @@ -1430,9 +1424,8 @@ static void tg_conf_updated(struct throtl_grp *tg, bool global)
>   		struct throtl_grp *parent_tg;
>   
>   		tg_update_has_rules(this_tg);
> -		/* ignore root/second level */
> -		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent ||
> -		    !blkg->parent->parent)
> +		/* ignore root level */
> +		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent)
>   			continue;
>   		parent_tg = blkg_to_tg(blkg->parent);
>   		/*
> @@ -1771,7 +1764,6 @@ static struct cftype throtl_files[] = {
>   #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
>   	{
>   		.name = "low",
> -		.flags = CFTYPE_NOT_ON_ROOT,
>   		.seq_show = tg_print_limit,
>   		.write = tg_set_limit,
>   		.private = LIMIT_LOW,
> @@ -1779,7 +1771,6 @@ static struct cftype throtl_files[] = {
>   #endif
>   	{
>   		.name = "max",
> -		.flags = CFTYPE_NOT_ON_ROOT,
>   		.seq_show = tg_print_limit,
>   		.write = tg_set_limit,
>   		.private = LIMIT_MAX,
>
Michal Koutný Sept. 17, 2021, 5:41 p.m. UTC | #2
Hello Yu.

On Thu, Sep 09, 2021 at 10:08:15PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> I'm not sure why this feature is disabled in the first place, is
> there any problem or design constraint?

The idea for v2 is that in the root cgroup remain only kernel threads that
provide "global" services and any user workload that should be
constrained is put into non-root cgroups. Additionally, if kernel
threads carry out work associated with a cgroup they can charge it to
the respective cgroup.

[snip]
> We want to limit the overall iops/bps of the device in cgroup v2,

Cui bono? (I mean what is the reason for throttling on the global level
when there's no other entity utiliting the residual?
<joke>Your drives are too fast?</joke>)

Michal
Khazhy Kumykov Sept. 17, 2021, 7:58 p.m. UTC | #3
On Fri, Sep 17, 2021 at 10:41 AM Michal Koutný <mkoutny@suse.com> wrote:
>
> Hello Yu.
>
> On Thu, Sep 09, 2021 at 10:08:15PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
> > I'm not sure why this feature is disabled in the first place, is
> > there any problem or design constraint?
>
> The idea for v2 is that in the root cgroup remain only kernel threads that
> provide "global" services and any user workload that should be
> constrained is put into non-root cgroups. Additionally, if kernel
> threads carry out work associated with a cgroup they can charge it to
> the respective cgroup.
>
> [snip]
> > We want to limit the overall iops/bps of the device in cgroup v2,
>
> Cui bono? (I mean what is the reason for throttling on the global level
> when there's no other entity utiliting the residual?
> <joke>Your drives are too fast?</joke>)

We'd be interested in something like this as well. (at least for
io.max). Our use case is providing remote devices which are a shared
resource. A "global" throttle like this (which is set by a local
management daemon) allows for throttling before sending network
traffic. It's also useful since we can put this throttle on a dm, so
we can enforce an aggregate throttle without needing backchannels to
coordinate multiple targets.
(This does also bring up: if this is a useful thing, would it make
sense to tie to the device, vs. requiring cgroup. We happen to use
cgroups so that requirement doesn't affect us).

Khazhy
>
> Michal
Yu Kuai Sept. 19, 2021, 10:31 a.m. UTC | #4
On 2021/09/18 3:58, Khazhy Kumykov wrote:
> On Fri, Sep 17, 2021 at 10:41 AM Michal Koutný <mkoutny@suse.com> wrote:
>>
>> Hello Yu.
>>
>> On Thu, Sep 09, 2021 at 10:08:15PM +0800, Yu Kuai <yukuai3@huawei.com> wrote:
>>> I'm not sure why this feature is disabled in the first place, is
>>> there any problem or design constraint?
>>
>> The idea for v2 is that in the root cgroup remain only kernel threads that
>> provide "global" services and any user workload that should be
>> constrained is put into non-root cgroups. Additionally, if kernel
>> threads carry out work associated with a cgroup they can charge it to
>> the respective cgroup.
>>
>> [snip]
>>> We want to limit the overall iops/bps of the device in cgroup v2,
>>
>> Cui bono? (I mean what is the reason for throttling on the global level
>> when there's no other entity utiliting the residual?
>> <joke>Your drives are too fast?</joke>)
> 
> We'd be interested in something like this as well. (at least for
> io.max). Our use case is providing remote devices which are a shared
> resource. A "global" throttle like this (which is set by a local

Our use case is similair to this, a host can provide several remote
devices to difierent client. If one client is under high io pressure,
other client might be affected. Thus we want to limit the overall
iops/bps from the client.

Thanks,
Kuai

> management daemon) allows for throttling before sending network
> traffic. It's also useful since we can put this throttle on a dm, so
> we can enforce an aggregate throttle without needing backchannels to
> coordinate multiple targets.
> (This does also bring up: if this is a useful thing, would it make
> sense to tie to the device, vs. requiring cgroup. We happen to use
> cgroups so that requirement doesn't affect us).
> 
> Khazhy
>>
>> Michal
Michal Koutný Sept. 21, 2021, 1:44 p.m. UTC | #5
On Sun, Sep 19, 2021 at 06:31:38PM +0800, "yukuai (C)" <yukuai3@huawei.com> wrote:
> Our use case is similair to this, a host can provide several remote
> devices to difierent client. If one client is under high io pressure,
> other client might be affected. Thus we want to limit the overall
> iops/bps from the client.

I see where are you coming from now. (Perhaps I'd suggest
allocating/prioritizing the allowances on the hosting side. If simply
wrapping "everything" into a non-root cgroup is not enough.)


On 2021/09/18 3:58, Khazhy Kumykov wrote:
> (This does also bring up: if this is a useful thing, would it make
> sense to tie to the device, vs. requiring cgroup. We happen to use
> cgroups so that requirement doesn't affect us).

Good point, That's IMO a better idea, it'd be more consistent with other
resources for which there exist global (cgroup independent) kernel
constraints (e.g. threads-max sysctl, mem= cmdline, cpu hotplug) that
double the root cgroup contraint role.

OTOH, this also deepens the precedent of init NS root cgroup being
special (more special than a container's root cgroup).

My .02€,
Michal
Tejun Heo Sept. 27, 2021, 5:08 p.m. UTC | #6
Hello,

On Tue, Sep 21, 2021 at 03:44:14PM +0200, Michal Koutný wrote:
> On 2021/09/18 3:58, Khazhy Kumykov wrote:
> > (This does also bring up: if this is a useful thing, would it make
> > sense to tie to the device, vs. requiring cgroup. We happen to use
> > cgroups so that requirement doesn't affect us).
> 
> Good point, That's IMO a better idea, it'd be more consistent with other
> resources for which there exist global (cgroup independent) kernel
> constraints (e.g. threads-max sysctl, mem= cmdline, cpu hotplug) that
> double the root cgroup contraint role.

This is why I usually try to push root-cgroup level features outside cgroup
because it really doesn't have much to do with cgroups at the root level.
For visibility stuff, we do replicate quite a bit in the root level because
not doing so becomes too painful for users but for control I'm more
hesitant.

One side-way solution could be using iocost. It doesn't have root-level
control per-se but it does configure per-device attributes which define what
the device can and is allowed to do so that it can be used as the basis for
weighted fair distribution. Even if IO control is disabled from the root
level, it'll still modulate the device according to the parameters.

> OTOH, this also deepens the precedent of init NS root cgroup being
> special (more special than a container's root cgroup).

While it does seem like an aesthetical wrinkle, I don't think this is a
practical problem. System root being different is a given whether
aesthetically pleasing or not (not the most important but we have
CONFIG_CGROUPS after all). I don't think it'll lead anywhere good to try to
mask the differences.

Thanks.
diff mbox series

Patch

diff --git a/block/blk-throttle.c b/block/blk-throttle.c
index 55c49015e533..fffe072de538 100644
--- a/block/blk-throttle.c
+++ b/block/blk-throttle.c
@@ -302,9 +302,6 @@  static uint64_t tg_bps_limit(struct throtl_grp *tg, int rw)
 	struct throtl_data *td;
 	uint64_t ret;
 
-	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
-		return U64_MAX;
-
 	td = tg->td;
 	ret = tg->bps[rw][td->limit_index];
 	if (ret == 0 && td->limit_index == LIMIT_LOW) {
@@ -332,9 +329,6 @@  static unsigned int tg_iops_limit(struct throtl_grp *tg, int rw)
 	struct throtl_data *td;
 	unsigned int ret;
 
-	if (cgroup_subsys_on_dfl(io_cgrp_subsys) && !blkg->parent)
-		return UINT_MAX;
-
 	td = tg->td;
 	ret = tg->iops[rw][td->limit_index];
 	if (ret == 0 && tg->td->limit_index == LIMIT_LOW) {
@@ -1430,9 +1424,8 @@  static void tg_conf_updated(struct throtl_grp *tg, bool global)
 		struct throtl_grp *parent_tg;
 
 		tg_update_has_rules(this_tg);
-		/* ignore root/second level */
-		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent ||
-		    !blkg->parent->parent)
+		/* ignore root level */
+		if (!cgroup_subsys_on_dfl(io_cgrp_subsys) || !blkg->parent)
 			continue;
 		parent_tg = blkg_to_tg(blkg->parent);
 		/*
@@ -1771,7 +1764,6 @@  static struct cftype throtl_files[] = {
 #ifdef CONFIG_BLK_DEV_THROTTLING_LOW
 	{
 		.name = "low",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = tg_print_limit,
 		.write = tg_set_limit,
 		.private = LIMIT_LOW,
@@ -1779,7 +1771,6 @@  static struct cftype throtl_files[] = {
 #endif
 	{
 		.name = "max",
-		.flags = CFTYPE_NOT_ON_ROOT,
 		.seq_show = tg_print_limit,
 		.write = tg_set_limit,
 		.private = LIMIT_MAX,