[v5,2/5] cgroup/pids: Make event counters hierarchical

Message ID	20240521092130.7883-3-mkoutny@suse.com (mailing list archive)
State	Accepted
Commit	385a635cacfe0d96d3b56633640a1ba65b3fddc3
Headers	show Received: from smtp-out2.suse.de (smtp-out2.suse.de [195.135.223.131]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id DC3FB56770; Tue, 21 May 2024 09:21:34 +0000 (UTC) From: =?utf-8?q?Michal_Koutn=C3=BD?= <mkoutny@suse.com> To: cgroups@vger.kernel.org, linux-doc@vger.kernel.org, linux-kernel@vger.kernel.org, linux-kselftest@vger.kernel.org Cc: Tejun Heo <tj@kernel.org>, Zefan Li <lizefan.x@bytedance.com>, Johannes Weiner <hannes@cmpxchg.org>, Jonathan Corbet <corbet@lwn.net>, Shuah Khan <shuah@kernel.org>, Muhammad Usama Anjum <usama.anjum@collabora.com> Subject: [PATCH v5 2/5] cgroup/pids: Make event counters hierarchical Date: Tue, 21 May 2024 11:21:27 +0200 Message-ID: <20240521092130.7883-3-mkoutny@suse.com> In-Reply-To: <20240521092130.7883-1-mkoutny@suse.com> References: <20240521092130.7883-1-mkoutny@suse.com> Precedence: bulk MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit default: False [-3.51 / 50.00]; BAYES_HAM(-3.00)[100.00%]; NEURAL_HAM_LONG(-1.00)[-1.000]; MID_CONTAINS_FROM(1.00)[]; R_DKIM_ALLOW(-0.20)[suse.com:s=susede1]; NEURAL_HAM_SHORT(-0.20)[-1.000]; MIME_GOOD(-0.10)[text/plain]; MX_GOOD(-0.01)[]; RCPT_COUNT_SEVEN(0.00)[10]; RCVD_VIA_SMTP_AUTH(0.00)[]; FUZZY_BLOCKED(0.00)[rspamd.com]; ARC_NA(0.00)[]; MIME_TRACE(0.00)[0:+]; DWL_DNSWL_BLOCKED(0.00)[suse.com:dkim]; RCVD_TLS_ALL(0.00)[]; DKIM_SIGNED(0.00)[suse.com:s=susede1]; FROM_EQ_ENVFROM(0.00)[]; FROM_HAS_DN(0.00)[]; TO_DN_SOME(0.00)[]; RCVD_COUNT_TWO(0.00)[2]; TO_MATCH_ENVRCPT_ALL(0.00)[]; DBL_BLOCKED_OPENRESOLVER(0.00)[suse.com:dkim,suse.com:email,imap1.dmz-prg2.suse.org:helo,imap1.dmz-prg2.suse.org:rdns]; DKIM_TRACE(0.00)[suse.com:+]
Series	pids controller events rework \| expand [v5,0/5] pids controller events rework [v5,1/5] cgroup/pids: Separate semantics of pids.events related to pids.max [v5,2/5] cgroup/pids: Make event counters hierarchical [v5,3/5] cgroup/pids: Add pids.events.local [v5,4/5] selftests: cgroup: Lexicographic order in Makefile [v5,5/5] selftests: cgroup: Add basic tests for pids controller

Message ID

20240521092130.7883-3-mkoutny@suse.com (mailing list archive)

State

Accepted

Commit

385a635cacfe0d96d3b56633640a1ba65b3fddc3

Headers

From: =?utf-8?q?Michal_Koutn=C3=BD?= <mkoutny@suse.com>
To: cgroups@vger.kernel.org,
	linux-doc@vger.kernel.org,
	linux-kernel@vger.kernel.org,
	linux-kselftest@vger.kernel.org
Cc: Tejun Heo <tj@kernel.org>,
	Zefan Li <lizefan.x@bytedance.com>,
	Johannes Weiner <hannes@cmpxchg.org>,
	Jonathan Corbet <corbet@lwn.net>,
	Shuah Khan <shuah@kernel.org>,
	Muhammad Usama Anjum <usama.anjum@collabora.com>
Subject: [PATCH v5 2/5] cgroup/pids: Make event counters hierarchical
Date: Tue, 21 May 2024 11:21:27 +0200
Message-ID: <20240521092130.7883-3-mkoutny@suse.com>
In-Reply-To: <20240521092130.7883-1-mkoutny@suse.com>
References: <20240521092130.7883-1-mkoutny@suse.com>
Precedence: bulk
MIME-Version: 1.0
Content-Type: text/plain; charset=UTF-8
Content-Transfer-Encoding: 8bit

Series

pids controller events rework | expand

Commit Message

Michal Koutný May 21, 2024, 9:21 a.m. UTC

The pids.events file should honor the hierarchy, so make the events
propagate from their origin up to the root on the unified hierarchy. The
legacy behavior remains non-hierarchical.

Signed-off-by: Michal Koutný <mkoutny@suse.com>
---
 Documentation/admin-guide/cgroup-v2.rst |  9 +++--
 kernel/cgroup/pids.c                    | 46 ++++++++++++++++---------
 2 files changed, 36 insertions(+), 19 deletions(-)

Comments

Xiu Jianfeng July 3, 2024, 6:59 a.m. UTC | #1

On 2024/5/21 17:21, Michal Koutný wrote:
> The pids.events file should honor the hierarchy, so make the events
> propagate from their origin up to the root on the unified hierarchy. The
> legacy behavior remains non-hierarchical.
> 
> Signed-off-by: Michal Koutný <mkoutny@suse.com>
> --
[...]
> diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
> index a557f5c8300b..c09b744d548c 100644
> --- a/kernel/cgroup/pids.c
> +++ b/kernel/cgroup/pids.c
> @@ -238,6 +238,34 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
>  	}
>  }
>  
> +static void pids_event(struct pids_cgroup *pids_forking,
> +		       struct pids_cgroup *pids_over_limit)
> +{
> +	struct pids_cgroup *p = pids_forking;
> +	bool limit = false;
> +
> +	for (; parent_pids(p); p = parent_pids(p)) {
> +		/* Only log the first time limit is hit. */
> +		if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
> +			pr_info("cgroup: fork rejected by pids controller in ");
> +			pr_cont_cgroup_path(p->css.cgroup);
> +			pr_cont("\n");
> +		}
> +		cgroup_file_notify(&p->events_file);
> +
> +		if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
> +		    cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
> +			break;
> +
> +		if (p == pids_over_limit)
> +			limit = true;
> +		if (limit)
> +			atomic64_inc(&p->events[PIDCG_MAX]);
> +
> +		cgroup_file_notify(&p->events_file);

Hi Michal,

I have doubts about this code. To better illustrate the problem, I am
posting the final code here.

static void pids_event(struct pids_cgroup *pids_forking,
                       struct pids_cgroup *pids_over_limit)
{
...
        cgroup_file_notify(&p->events_local_file);
        if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
            cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
                return;

        for (; parent_pids(p); p = parent_pids(p)) {
                if (p == pids_over_limit) {
                        limit = true;
                        atomic64_inc(&p->events_local[PIDCG_MAX]);
                        cgroup_file_notify(&p->events_local_file);
                }
                if (limit)
                        atomic64_inc(&p->events[PIDCG_MAX]);

                cgroup_file_notify(&p->events_file);
        }
}

Consider this scenario: there are 4 groups A, B, C,and D. The
relationships are as follows, the latter is the child of the former:

root->A->B->C->D

Then the user is polling on C.pids.events. When a process in D forks and
fails due to B.max restrictions(pids_forking is D, and pids_over_limit
is B), the user is awakened. However, when the user reads C.pids.events,
he will find that the content has not changed. because the 'limit' is
set to true started from B, and C.pids.events shows as below:

seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[PIDCG_MAX]));

Wouldn't this behavior confuse the user? Should the code to be changed
to this?

if (limit) {
      atomic64_inc(&p->events[PIDCG_MAX]);
      cgroup_file_notify(&p->events_file);
}

Xiu Jianfeng July 16, 2024, 3:27 a.m. UTC | #2

Hi,

Friendly ping, more comment as below.

On 2024/7/3 14:59, xiujianfeng wrote:
> 
> 
> On 2024/5/21 17:21, Michal Koutný wrote:
>> The pids.events file should honor the hierarchy, so make the events
>> propagate from their origin up to the root on the unified hierarchy. The
>> legacy behavior remains non-hierarchical.
>>
>> Signed-off-by: Michal Koutný <mkoutny@suse.com>
>> --
> [...]
>> diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
>> index a557f5c8300b..c09b744d548c 100644
>> --- a/kernel/cgroup/pids.c
>> +++ b/kernel/cgroup/pids.c
>> @@ -238,6 +238,34 @@ static void pids_cancel_attach(struct cgroup_taskset *tset)
>>  	}
>>  }
>>  
>> +static void pids_event(struct pids_cgroup *pids_forking,
>> +		       struct pids_cgroup *pids_over_limit)
>> +{
>> +	struct pids_cgroup *p = pids_forking;
>> +	bool limit = false;
>> +
>> +	for (; parent_pids(p); p = parent_pids(p)) {
>> +		/* Only log the first time limit is hit. */
>> +		if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
>> +			pr_info("cgroup: fork rejected by pids controller in ");
>> +			pr_cont_cgroup_path(p->css.cgroup);
>> +			pr_cont("\n");
>> +		}
>> +		cgroup_file_notify(&p->events_file);
>> +
>> +		if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
>> +		    cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
>> +			break;
>> +
>> +		if (p == pids_over_limit)
>> +			limit = true;
>> +		if (limit)
>> +			atomic64_inc(&p->events[PIDCG_MAX]);
>> +
>> +		cgroup_file_notify(&p->events_file);
> 
> Hi Michal,
> 
> I have doubts about this code. To better illustrate the problem, I am
> posting the final code here.
> 
> static void pids_event(struct pids_cgroup *pids_forking,
>                        struct pids_cgroup *pids_over_limit)
> {
> ...
>         cgroup_file_notify(&p->events_local_file);
>         if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
>             cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
>                 return;
> 
>         for (; parent_pids(p); p = parent_pids(p)) {
>                 if (p == pids_over_limit) {
>                         limit = true;
>                         atomic64_inc(&p->events_local[PIDCG_MAX]);
>                         cgroup_file_notify(&p->events_local_file);
>                 }
>                 if (limit)
>                         atomic64_inc(&p->events[PIDCG_MAX]);
> 
>                 cgroup_file_notify(&p->events_file);
>         }
> }
> 
> Consider this scenario: there are 4 groups A, B, C,and D. The
> relationships are as follows, the latter is the child of the former:
> 
> root->A->B->C->D
> 
> Then the user is polling on C.pids.events. When a process in D forks and
> fails due to B.max restrictions(pids_forking is D, and pids_over_limit
> is B), the user is awakened. However, when the user reads C.pids.events,
> he will find that the content has not changed. because the 'limit' is
> set to true started from B, and C.pids.events shows as below:
> 
> seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[PIDCG_MAX]));
> 
> Wouldn't this behavior confuse the user? Should the code to be changed
> to this?
> 
> if (limit) {
>       atomic64_inc(&p->events[PIDCG_MAX]);
>       cgroup_file_notify(&p->events_file);
> }
>

or should the for loop be changed to the following?

atomic64_inc(&pids_over_limit->events_local[PIDCG_MAX]);
cgroup_file_notify(&pids_over_limit->events_local_file);

for (p = pids_over_limit; parent_pids(p); p = parent_pids(p)) {
    atomic64_inc(&pt->events[PIDCG_MAX]);
    cgroup_file_notify(&p->events_file);
}

The current behaviour is quite different from other subsys, such as
memcg, that make me confused, maybe I am missing something.

it's appreciated if anyone could respond.

Michal Koutný July 25, 2024, 9:38 a.m. UTC | #3

Hello Jianfeng.

On Tue, Jul 16, 2024 at 11:27:39AM GMT, xiujianfeng <xiujianfeng@huawei.com> wrote:
> On 2024/7/3 14:59, xiujianfeng wrote:
...
> >         for (; parent_pids(p); p = parent_pids(p)) {
> >                 if (p == pids_over_limit) {
> >                         limit = true;
> >                         atomic64_inc(&p->events_local[PIDCG_MAX]);
> >                         cgroup_file_notify(&p->events_local_file);
> >                 }
> >                 if (limit)
> >                         atomic64_inc(&p->events[PIDCG_MAX]);
> > 
> >                 cgroup_file_notify(&p->events_file);
> >         }
> > }
> > 
> > Consider this scenario: there are 4 groups A, B, C,and D. The
> > relationships are as follows, the latter is the child of the former:
> > 
> > root->A->B->C->D
> > 
> > Then the user is polling on C.pids.events. When a process in D forks and
> > fails due to B.max restrictions(pids_forking is D, and pids_over_limit
> > is B), the user is awakened. However, when the user reads C.pids.events,
> > he will find that the content has not changed. because the 'limit' is
> > set to true started from B, and C.pids.events shows as below:
> > 
> > seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[PIDCG_MAX]));
> > 
> > Wouldn't this behavior confuse the user? Should the code to be changed
> > to this?

Two generic notes:
- event notifications can be rate limited, so users won't necessarily
  see every change,
- upon notification it's better to read the event counter/status anyway
  to base a response on it.

But your remark is justified, there is no reason in this case for
"spurious" event notification. It's an omission from v3 version of the
patch when there had been also pids.events:max.imposed (that'd trigger
events from D up to the root, it's only internal PIDCG_FORKFAIL now).

The upwards traversal loop can be simplified and fixed with only
PIDCG_MAX exposed. Can you send it as a separate patch please?

(Apologies for late response, somehow I didn't see your e-mails.)

Michal

Xiu Jianfeng July 30, 2024, 3:21 a.m. UTC | #4

On 2024/7/25 17:38, Michal Koutný wrote:
> Hello Jianfeng.
> 
> On Tue, Jul 16, 2024 at 11:27:39AM GMT, xiujianfeng <xiujianfeng@huawei.com> wrote:
>> On 2024/7/3 14:59, xiujianfeng wrote:
> ...
>>>         for (; parent_pids(p); p = parent_pids(p)) {
>>>                 if (p == pids_over_limit) {
>>>                         limit = true;
>>>                         atomic64_inc(&p->events_local[PIDCG_MAX]);
>>>                         cgroup_file_notify(&p->events_local_file);
>>>                 }
>>>                 if (limit)
>>>                         atomic64_inc(&p->events[PIDCG_MAX]);
>>>
>>>                 cgroup_file_notify(&p->events_file);
>>>         }
>>> }
>>>
>>> Consider this scenario: there are 4 groups A, B, C,and D. The
>>> relationships are as follows, the latter is the child of the former:
>>>
>>> root->A->B->C->D
>>>
>>> Then the user is polling on C.pids.events. When a process in D forks and
>>> fails due to B.max restrictions(pids_forking is D, and pids_over_limit
>>> is B), the user is awakened. However, when the user reads C.pids.events,
>>> he will find that the content has not changed. because the 'limit' is
>>> set to true started from B, and C.pids.events shows as below:
>>>
>>> seq_printf(sf, "max %lld\n", (s64)atomic64_read(&events[PIDCG_MAX]));
>>>
>>> Wouldn't this behavior confuse the user? Should the code to be changed
>>> to this?
> 
> Two generic notes:
> - event notifications can be rate limited, so users won't necessarily
>   see every change,
> - upon notification it's better to read the event counter/status anyway
>   to base a response on it.
> 
> But your remark is justified, there is no reason in this case for
> "spurious" event notification. It's an omission from v3 version of the
> patch when there had been also pids.events:max.imposed (that'd trigger
> events from D up to the root, it's only internal PIDCG_FORKFAIL now).
> 
> The upwards traversal loop can be simplified and fixed with only
> PIDCG_MAX exposed. Can you send it as a separate patch please?

Hi Michal,

Thanks for your feedback. and I'm sorry I forgot to reply this thread
after sending the patch.

> 
> (Apologies for late response, somehow I didn't see your e-mails.)
> 
> Michal

diff --git a/Documentation/admin-guide/cgroup-v2.rst b/Documentation/admin-guide/cgroup-v2.rst
index 945ff743a3c9..0b5f77104e8b 100644
--- a/Documentation/admin-guide/cgroup-v2.rst
+++ b/Documentation/admin-guide/cgroup-v2.rst
@@ -240,8 +240,11 @@  cgroup v2 currently supports the following mount options.
           v2 is remounted later on).
 
   pids_localevents
-        Represent fork failures inside cgroup's pids.events:max (v1 behavior),
-        not its limit being hit (v2 behavior).
+        The option restores v1-like behavior of pids.events:max, that is only
+        local (inside cgroup proper) fork failures are counted. Without this
+        option pids.events.max represents any pids.max enforcemnt across
+        cgroup's subtree.
+
 
 
 Organizing Processes and Threads
@@ -2205,7 +2208,7 @@  PID Interface Files
 	modified event. The following entries are defined.
 
 	  max
-		The number of times the cgroup's number of processes hit the
+		The number of times the cgroup's total number of processes hit the pids.max
 		limit (see also pids_localevents).
 
 Organisational operations are not blocked by cgroup policies, so it is
diff --git a/kernel/cgroup/pids.c b/kernel/cgroup/pids.c
index a557f5c8300b..c09b744d548c 100644
--- a/kernel/cgroup/pids.c
+++ b/kernel/cgroup/pids.c
@@ -238,6 +238,34 @@  static void pids_cancel_attach(struct cgroup_taskset *tset)
 	}
 }
 
+static void pids_event(struct pids_cgroup *pids_forking,
+		       struct pids_cgroup *pids_over_limit)
+{
+	struct pids_cgroup *p = pids_forking;
+	bool limit = false;
+
+	for (; parent_pids(p); p = parent_pids(p)) {
+		/* Only log the first time limit is hit. */
+		if (atomic64_inc_return(&p->events[PIDCG_FORKFAIL]) == 1) {
+			pr_info("cgroup: fork rejected by pids controller in ");
+			pr_cont_cgroup_path(p->css.cgroup);
+			pr_cont("\n");
+		}
+		cgroup_file_notify(&p->events_file);
+
+		if (!cgroup_subsys_on_dfl(pids_cgrp_subsys) ||
+		    cgrp_dfl_root.flags & CGRP_ROOT_PIDS_LOCAL_EVENTS)
+			break;
+
+		if (p == pids_over_limit)
+			limit = true;
+		if (limit)
+			atomic64_inc(&p->events[PIDCG_MAX]);
+
+		cgroup_file_notify(&p->events_file);
+	}
+}
+
 /*
  * task_css_check(true) in pids_can_fork() and pids_cancel_fork() relies
  * on cgroup_threadgroup_change_begin() held by the copy_process().
@@ -254,23 +282,9 @@  static int pids_can_fork(struct task_struct *task, struct css_set *cset)
 		css = task_css_check(current, pids_cgrp_id, true);
 	pids = css_pids(css);
 	err = pids_try_charge(pids, 1, &pids_over_limit);
-	if (err) {
-		/* compatibility on v1 where events were notified in leaves. */
-		if (!cgroup_subsys_on_dfl(pids_cgrp_subsys))
-			pids_over_limit = pids;
-
-		/* Only log the first time limit is hit. */
-		if (atomic64_inc_return(&pids->events[PIDCG_FORKFAIL]) == 1) {
-			pr_info("cgroup: fork rejected by pids controller in ");
-			pr_cont_cgroup_path(pids->css.cgroup);
-			pr_cont("\n");
-		}
-		atomic64_inc(&pids_over_limit->events[PIDCG_MAX]);
+	if (err)
+		pids_event(pids, pids_over_limit);
 
-		cgroup_file_notify(&pids->events_file);
-		if (pids_over_limit != pids)
-			cgroup_file_notify(&pids_over_limit->events_file);
-	}
 	return err;
 }

[v5,2/5] cgroup/pids: Make event counters hierarchical

Commit Message

Comments

Patch