diff mbox series

cgroup/dmem: Don't clobber pool in dmem_cgroup_calculate_protection

Message ID 20250114153912.278909-1-friedrich.vock@gmx.de (mailing list archive)
State New
Headers show
Series cgroup/dmem: Don't clobber pool in dmem_cgroup_calculate_protection | expand

Commit Message

Friedrich Vock Jan. 14, 2025, 3:39 p.m. UTC
If the current css doesn't contain any pool that is a descendant of
the "pool" (i.e. when found_descendant == false), then "pool" will
point to some unrelated pool. If the current css has a child, we'll
overwrite parent_pool with this unrelated pool on the next iteration.

Fix this by overwriting "pool" only if it actually is a descendant of
parent_pool, and setting it to NULL otherwise. Also, skip traversing
subtrees if pool == NULL to avoid overwriting parent_pool (and because
it's pointless).

Fixes: b168ed458 ("kernel/cgroup: Add "dmem" memory accounting cgroup")
Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de>
---
 kernel/cgroup/dmem.c | 21 +++++++++++++--------
 1 file changed, 13 insertions(+), 8 deletions(-)

--
2.48.0

Comments

Michal Koutný Jan. 14, 2025, 3:58 p.m. UTC | #1
On Tue, Jan 14, 2025 at 04:39:12PM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote:
> If the current css doesn't contain any pool that is a descendant of
> the "pool" (i.e. when found_descendant == false), then "pool" will
> point to some unrelated pool. If the current css has a child, we'll
> overwrite parent_pool with this unrelated pool on the next iteration.

Could this be verified with more idiomatic way with
cgroup_is_descendant()? (The predicate could be used between pools [1]
if they pin respective cgroups).

Thanks,
Michal

[1] https://lore.kernel.org/all/uj6railxyazpu6ocl2ygo6lw4lavbsgg26oq57pxxqe5uzxw42@fhnqvq3tia6n/
Friedrich Vock Jan. 16, 2025, 8:20 a.m. UTC | #2
Hi,

On 14.01.25 16:58, Michal Koutný wrote:
> On Tue, Jan 14, 2025 at 04:39:12PM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote:
>> If the current css doesn't contain any pool that is a descendant of
>> the "pool" (i.e. when found_descendant == false), then "pool" will
>> point to some unrelated pool. If the current css has a child, we'll
>> overwrite parent_pool with this unrelated pool on the next iteration.
>
> Could this be verified with more idiomatic way with
> cgroup_is_descendant()? (The predicate could be used between pools [1]
> if they pin respective cgroups).

I'm not sure if I'm missing something, but I don't think
cgroup_is_descendant is really related to this issue. Each css can
contain some amount of "pools" representing memory from different
devices (e.g. with the current DRM implementation, one pool corresponds
to VRAM of a specific GPU). These pools are allocated on-demand, so if a
cgroup has not made any allocations for a specific device, there will be
no pool corresponding to that device's memory. Pools have a hierarchy of
their own (that is, for a given cgroup's pool corresponding to some
device, the "parent pool" refers to the parent cgroup's pool
corresponding to the same device).

In dmem_cgroup_calculate_protection, we're trying to update the
protection values for the entire pool hierarchy between
limit_pool/test_pool (with the end goal of having accurate
effective-protection values for test_pool). Since pools only store
parent pointers to establish that hierarchy, to find child pools given
only the parent pool, we iterate over the pools of all child cgroups and
check if the parent pointer matches with our current "parent pool" pointer.

The bug happens when some cgroup doesn't have any pool in the hierarchy
we're iterating over (that is, we iterate over all pools but don't find
any pool whose parent matches our current "parent pool" pointer). The
cgroup itself is part of the (cgroup) hierarchy, so the result of
cgroup_is_descendant is obviously true - but because of how we allocate
pools on-demand, it's still possible there is no pool that is part of
the (pool) hierarchy we're iterating over.

Thanks,
Friedrich

>
> Thanks,
> Michal
>
> [1] https://lore.kernel.org/all/uj6railxyazpu6ocl2ygo6lw4lavbsgg26oq57pxxqe5uzxw42@fhnqvq3tia6n/
Michal Koutný Jan. 17, 2025, 5:29 p.m. UTC | #3
On Thu, Jan 16, 2025 at 09:20:08AM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote:
> These pools are allocated on-demand, so if a
> cgroup has not made any allocations for a specific device, there will be
> no pool corresponding to that device's memory.

Here I understand.

> Pools have a hierarchy of their own (that is, for a given cgroup's
> pool corresponding to some device, the "parent pool" refers to the
> parent cgroup's pool corresponding to the same device).
> 
> In dmem_cgroup_calculate_protection, we're trying to update the
> protection values for the entire pool hierarchy between
> limit_pool/test_pool (with the end goal of having accurate
> effective-protection values for test_pool).

If you check and bail out at start:
	if (!cgroup_is_descendant(test_pool->cs->css.cgroup, limit_pool->cs->css.cgroup))
		return;
...

> Since pools only store parent pointers to establish that hierarchy, to
> find child pools given only the parent pool, we iterate over the pools
> of all child cgroups and check if the parent pointer matches with our
> current "parent pool" pointer.
 
> The bug happens when some cgroup doesn't have any pool in the hierarchy
> we're iterating over (that is, we iterate over all pools but don't find
> any pool whose parent matches our current "parent pool" pointer).

...then the initial check ensures, you always find a pool that is
a descendant of limit_pool (at least the test_pool).
And there are pools for whole path between limit_pool and test_pool, or
am I mistaken here?

> The cgroup itself is part of the (cgroup) hierarchy, so the result of
> cgroup_is_descendant is obviously true - but because of how we
> allocate pools on-demand, it's still possible there is no pool that is
> part of the (pool) hierarchy we're iterating over.

Can there be a pool without cgroup?

Thanks,
Michal
Friedrich Vock Jan. 17, 2025, 7:02 p.m. UTC | #4
On 17.01.25 18:29, Michal Koutný wrote:
> On Thu, Jan 16, 2025 at 09:20:08AM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote:
>> These pools are allocated on-demand, so if a
>> cgroup has not made any allocations for a specific device, there will be
>> no pool corresponding to that device's memory.
>
> Here I understand.
>
>> Pools have a hierarchy of their own (that is, for a given cgroup's
>> pool corresponding to some device, the "parent pool" refers to the
>> parent cgroup's pool corresponding to the same device).
>>
>> In dmem_cgroup_calculate_protection, we're trying to update the
>> protection values for the entire pool hierarchy between
>> limit_pool/test_pool (with the end goal of having accurate
>> effective-protection values for test_pool).
>
> If you check and bail out at start:
> 	if (!cgroup_is_descendant(test_pool->cs->css.cgroup, limit_pool->cs->css.cgroup))
> 		return;
> ...
>
>> Since pools only store parent pointers to establish that hierarchy, to
>> find child pools given only the parent pool, we iterate over the pools
>> of all child cgroups and check if the parent pointer matches with our
>> current "parent pool" pointer.
>
>> The bug happens when some cgroup doesn't have any pool in the hierarchy
>> we're iterating over (that is, we iterate over all pools but don't find
>> any pool whose parent matches our current "parent pool" pointer).
>
> ...then the initial check ensures, you always find a pool that is
> a descendant of limit_pool (at least the test_pool).
> And there are pools for whole path between limit_pool and test_pool, or
> am I mistaken here?

Yeah, there are pools for the whole path between limit_pool and
test_pool, but the issue is that we traverse the entire tree of cgroups,
and we don't always stay on the path between limit_pool and test_pool
(because we're iterating from the top down, and we don't know what the
path is in that direction - so we just traverse the whole tree until we
find test_pool).

This means that we'll sometimes end up straying off-path - and there are
no guarantees for which pools are present in the cgroups we visit there.
These cgroups are the potentially problematic ones where the issue can
happen.

Ideally we could always stay on the path between limit_pool and
test_pool, but this is hardly possible because we can only follow parent
links (so bottom-up traversal) but for accurate protection calculation
we need to traverse the path top-down.

>
>> The cgroup itself is part of the (cgroup) hierarchy, so the result of
>> cgroup_is_descendant is obviously true - but because of how we
>> allocate pools on-demand, it's still possible there is no pool that is
>> part of the (pool) hierarchy we're iterating over.
>
> Can there be a pool without cgroup?

No, each pool is always associated with exactly one cgroup. What I was
talking about was the case where a parent cgroup has pools A and B, but
its child cgroup only has a pool for A. In that case, the child cgroup
has no pool that is part of B's pool hierarchy.

Thanks,
Friedrich
diff mbox series

Patch

diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c
index 52736ef0ccf25..10d37df5d50f6 100644
--- a/kernel/cgroup/dmem.c
+++ b/kernel/cgroup/dmem.c
@@ -222,8 +222,7 @@  dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool,
 	struct page_counter *climit;
 	struct cgroup_subsys_state *css, *next_css;
 	struct dmemcg_state *dmemcg_iter;
-	struct dmem_cgroup_pool_state *pool, *parent_pool;
-	bool found_descendant;
+	struct dmem_cgroup_pool_state *pool, *candidate_pool, *parent_pool;

 	climit = &limit_pool->cnt;

@@ -241,7 +240,13 @@  dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool,
 	 */
 	while (pool != test_pool) {
 		next_css = css_next_child(NULL, css);
-		if (next_css) {
+		/*
+		 * pool is NULL when the current css does not contain a
+		 * pool of the type we're interested in. In that case, it's
+		 * impossible that any child css contains a relevant pool, so
+		 * skip the subtree entirely and move on to the next sibling.
+		 */
+		if (next_css && pool) {
 			parent_pool = pool;
 		} else {
 			while (css != &limit_pool->cs->css) {
@@ -260,16 +265,16 @@  dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool,
 		}
 		css = next_css;

-		found_descendant = false;
 		dmemcg_iter = container_of(css, struct dmemcg_state, css);

-		list_for_each_entry_rcu(pool, &dmemcg_iter->pools, css_node) {
-			if (pool_parent(pool) == parent_pool) {
-				found_descendant = true;
+		pool = NULL;
+		list_for_each_entry_rcu(candidate_pool, &dmemcg_iter->pools, css_node) {
+			if (pool_parent(candidate_pool) == parent_pool) {
+				pool = candidate_pool;
 				break;
 			}
 		}
-		if (!found_descendant)
+		if (!pool)
 			continue;

 		page_counter_calculate_protection(