Message ID | 20250114153912.278909-1-friedrich.vock@gmx.de (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | cgroup/dmem: Don't clobber pool in dmem_cgroup_calculate_protection | expand |
On Tue, Jan 14, 2025 at 04:39:12PM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote: > If the current css doesn't contain any pool that is a descendant of > the "pool" (i.e. when found_descendant == false), then "pool" will > point to some unrelated pool. If the current css has a child, we'll > overwrite parent_pool with this unrelated pool on the next iteration. Could this be verified with more idiomatic way with cgroup_is_descendant()? (The predicate could be used between pools [1] if they pin respective cgroups). Thanks, Michal [1] https://lore.kernel.org/all/uj6railxyazpu6ocl2ygo6lw4lavbsgg26oq57pxxqe5uzxw42@fhnqvq3tia6n/
Hi, On 14.01.25 16:58, Michal Koutný wrote: > On Tue, Jan 14, 2025 at 04:39:12PM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote: >> If the current css doesn't contain any pool that is a descendant of >> the "pool" (i.e. when found_descendant == false), then "pool" will >> point to some unrelated pool. If the current css has a child, we'll >> overwrite parent_pool with this unrelated pool on the next iteration. > > Could this be verified with more idiomatic way with > cgroup_is_descendant()? (The predicate could be used between pools [1] > if they pin respective cgroups). I'm not sure if I'm missing something, but I don't think cgroup_is_descendant is really related to this issue. Each css can contain some amount of "pools" representing memory from different devices (e.g. with the current DRM implementation, one pool corresponds to VRAM of a specific GPU). These pools are allocated on-demand, so if a cgroup has not made any allocations for a specific device, there will be no pool corresponding to that device's memory. Pools have a hierarchy of their own (that is, for a given cgroup's pool corresponding to some device, the "parent pool" refers to the parent cgroup's pool corresponding to the same device). In dmem_cgroup_calculate_protection, we're trying to update the protection values for the entire pool hierarchy between limit_pool/test_pool (with the end goal of having accurate effective-protection values for test_pool). Since pools only store parent pointers to establish that hierarchy, to find child pools given only the parent pool, we iterate over the pools of all child cgroups and check if the parent pointer matches with our current "parent pool" pointer. The bug happens when some cgroup doesn't have any pool in the hierarchy we're iterating over (that is, we iterate over all pools but don't find any pool whose parent matches our current "parent pool" pointer). The cgroup itself is part of the (cgroup) hierarchy, so the result of cgroup_is_descendant is obviously true - but because of how we allocate pools on-demand, it's still possible there is no pool that is part of the (pool) hierarchy we're iterating over. Thanks, Friedrich > > Thanks, > Michal > > [1] https://lore.kernel.org/all/uj6railxyazpu6ocl2ygo6lw4lavbsgg26oq57pxxqe5uzxw42@fhnqvq3tia6n/
On Thu, Jan 16, 2025 at 09:20:08AM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote: > These pools are allocated on-demand, so if a > cgroup has not made any allocations for a specific device, there will be > no pool corresponding to that device's memory. Here I understand. > Pools have a hierarchy of their own (that is, for a given cgroup's > pool corresponding to some device, the "parent pool" refers to the > parent cgroup's pool corresponding to the same device). > > In dmem_cgroup_calculate_protection, we're trying to update the > protection values for the entire pool hierarchy between > limit_pool/test_pool (with the end goal of having accurate > effective-protection values for test_pool). If you check and bail out at start: if (!cgroup_is_descendant(test_pool->cs->css.cgroup, limit_pool->cs->css.cgroup)) return; ... > Since pools only store parent pointers to establish that hierarchy, to > find child pools given only the parent pool, we iterate over the pools > of all child cgroups and check if the parent pointer matches with our > current "parent pool" pointer. > The bug happens when some cgroup doesn't have any pool in the hierarchy > we're iterating over (that is, we iterate over all pools but don't find > any pool whose parent matches our current "parent pool" pointer). ...then the initial check ensures, you always find a pool that is a descendant of limit_pool (at least the test_pool). And there are pools for whole path between limit_pool and test_pool, or am I mistaken here? > The cgroup itself is part of the (cgroup) hierarchy, so the result of > cgroup_is_descendant is obviously true - but because of how we > allocate pools on-demand, it's still possible there is no pool that is > part of the (pool) hierarchy we're iterating over. Can there be a pool without cgroup? Thanks, Michal
On 17.01.25 18:29, Michal Koutný wrote: > On Thu, Jan 16, 2025 at 09:20:08AM +0100, Friedrich Vock <friedrich.vock@gmx.de> wrote: >> These pools are allocated on-demand, so if a >> cgroup has not made any allocations for a specific device, there will be >> no pool corresponding to that device's memory. > > Here I understand. > >> Pools have a hierarchy of their own (that is, for a given cgroup's >> pool corresponding to some device, the "parent pool" refers to the >> parent cgroup's pool corresponding to the same device). >> >> In dmem_cgroup_calculate_protection, we're trying to update the >> protection values for the entire pool hierarchy between >> limit_pool/test_pool (with the end goal of having accurate >> effective-protection values for test_pool). > > If you check and bail out at start: > if (!cgroup_is_descendant(test_pool->cs->css.cgroup, limit_pool->cs->css.cgroup)) > return; > ... > >> Since pools only store parent pointers to establish that hierarchy, to >> find child pools given only the parent pool, we iterate over the pools >> of all child cgroups and check if the parent pointer matches with our >> current "parent pool" pointer. > >> The bug happens when some cgroup doesn't have any pool in the hierarchy >> we're iterating over (that is, we iterate over all pools but don't find >> any pool whose parent matches our current "parent pool" pointer). > > ...then the initial check ensures, you always find a pool that is > a descendant of limit_pool (at least the test_pool). > And there are pools for whole path between limit_pool and test_pool, or > am I mistaken here? Yeah, there are pools for the whole path between limit_pool and test_pool, but the issue is that we traverse the entire tree of cgroups, and we don't always stay on the path between limit_pool and test_pool (because we're iterating from the top down, and we don't know what the path is in that direction - so we just traverse the whole tree until we find test_pool). This means that we'll sometimes end up straying off-path - and there are no guarantees for which pools are present in the cgroups we visit there. These cgroups are the potentially problematic ones where the issue can happen. Ideally we could always stay on the path between limit_pool and test_pool, but this is hardly possible because we can only follow parent links (so bottom-up traversal) but for accurate protection calculation we need to traverse the path top-down. > >> The cgroup itself is part of the (cgroup) hierarchy, so the result of >> cgroup_is_descendant is obviously true - but because of how we >> allocate pools on-demand, it's still possible there is no pool that is >> part of the (pool) hierarchy we're iterating over. > > Can there be a pool without cgroup? No, each pool is always associated with exactly one cgroup. What I was talking about was the case where a parent cgroup has pools A and B, but its child cgroup only has a pool for A. In that case, the child cgroup has no pool that is part of B's pool hierarchy. Thanks, Friedrich
diff --git a/kernel/cgroup/dmem.c b/kernel/cgroup/dmem.c index 52736ef0ccf25..10d37df5d50f6 100644 --- a/kernel/cgroup/dmem.c +++ b/kernel/cgroup/dmem.c @@ -222,8 +222,7 @@ dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool, struct page_counter *climit; struct cgroup_subsys_state *css, *next_css; struct dmemcg_state *dmemcg_iter; - struct dmem_cgroup_pool_state *pool, *parent_pool; - bool found_descendant; + struct dmem_cgroup_pool_state *pool, *candidate_pool, *parent_pool; climit = &limit_pool->cnt; @@ -241,7 +240,13 @@ dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool, */ while (pool != test_pool) { next_css = css_next_child(NULL, css); - if (next_css) { + /* + * pool is NULL when the current css does not contain a + * pool of the type we're interested in. In that case, it's + * impossible that any child css contains a relevant pool, so + * skip the subtree entirely and move on to the next sibling. + */ + if (next_css && pool) { parent_pool = pool; } else { while (css != &limit_pool->cs->css) { @@ -260,16 +265,16 @@ dmem_cgroup_calculate_protection(struct dmem_cgroup_pool_state *limit_pool, } css = next_css; - found_descendant = false; dmemcg_iter = container_of(css, struct dmemcg_state, css); - list_for_each_entry_rcu(pool, &dmemcg_iter->pools, css_node) { - if (pool_parent(pool) == parent_pool) { - found_descendant = true; + pool = NULL; + list_for_each_entry_rcu(candidate_pool, &dmemcg_iter->pools, css_node) { + if (pool_parent(candidate_pool) == parent_pool) { + pool = candidate_pool; break; } } - if (!found_descendant) + if (!pool) continue; page_counter_calculate_protection(
If the current css doesn't contain any pool that is a descendant of the "pool" (i.e. when found_descendant == false), then "pool" will point to some unrelated pool. If the current css has a child, we'll overwrite parent_pool with this unrelated pool on the next iteration. Fix this by overwriting "pool" only if it actually is a descendant of parent_pool, and setting it to NULL otherwise. Also, skip traversing subtrees if pool == NULL to avoid overwriting parent_pool (and because it's pointless). Fixes: b168ed458 ("kernel/cgroup: Add "dmem" memory accounting cgroup") Signed-off-by: Friedrich Vock <friedrich.vock@gmx.de> --- kernel/cgroup/dmem.c | 21 +++++++++++++-------- 1 file changed, 13 insertions(+), 8 deletions(-) -- 2.48.0