Message ID | 1648113743-32622-1-git-send-email-zhaoyang.huang@unisoc.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] cgroup: introduce proportional protection on memcg | expand |
I'm confused by the aims of this patch. We already have proportional reclaim for memory.min and memory.low, and memory.high is already "proportional" by its nature to drive memory back down behind the configured threshold. Could you please be more clear about what you're trying to achieve and in what way the existing proportional reclaim mechanisms are insufficient for you?
It seems like what’s being proposed is an ability to express the protection in % of the current usage rather than an absolute number. It’s an equivalent for something like a memory (reclaim) priority: e.g. a cgroup with 80% protection is _always_ reclaimed less aggressively than one with a 20% protection. That said, I’m not a fan of this idea. It might make sense in some reasonable range of usages, but if your workload is simply leaking memory and growing indefinitely, protecting it seems like a bad idea. And the first part can be easily achieved using an userspace tool. Thanks! > On Mar 24, 2022, at 7:33 AM, Chris Down <chris@chrisdown.name> wrote: > > I'm confused by the aims of this patch. We already have proportional reclaim for memory.min and memory.low, and memory.high is already "proportional" by its nature to drive memory back down behind the configured threshold. > > Could you please be more clear about what you're trying to achieve and in what way the existing proportional reclaim mechanisms are insufficient for you? >
On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote: > > I'm confused by the aims of this patch. We already have proportional reclaim > for memory.min and memory.low, and memory.high is already "proportional" by its > nature to drive memory back down behind the configured threshold. > > Could you please be more clear about what you're trying to achieve and in what > way the existing proportional reclaim mechanisms are insufficient for you? What I am trying to solve is that, the memcg's protection judgment[1] is based on a set of fixed value on current design, while the real scan and reclaim number[2] is based on the proportional min/low on the real memory usage which you mentioned above. Fixed value setting has some constraints as 1. It is an experienced value based on observation, which could be inaccurate. 2. working load is various from scenarios. 3. fixed value from [1] could be against the dynamic cgroup_size in [2]. shrink_node_memcgs mem_cgroup_calculate_protection(target_memcg, memcg); \ if (mem_cgroup_below_min(memcg)) \ ===> [1] check if the memcg is protected based on fixed min/low value ... / else if (mem_cgroup_below_low(memcg)) / ... shrink_lruvec get_scan_count \ mem_cgroup_protection \ ===> [2] calculate the number of scan size proportionally scan = lruvec_size - lruvec_size * protection / (cgroup_size + 1); /
On Fri, Mar 25, 2022 at 11:02 AM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote: > > On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote: > > > > I'm confused by the aims of this patch. We already have proportional reclaim > > for memory.min and memory.low, and memory.high is already "proportional" by its > > nature to drive memory back down behind the configured threshold. > > > > Could you please be more clear about what you're trying to achieve and in what > > way the existing proportional reclaim mechanisms are insufficient for you? sorry for the bad formatting of previous reply, resend it in new format What I am trying to solve is that, the memcg's protection judgment[1] is based on a set of fixed value on current design, while the real scan and reclaim number[2] is based on the proportional min/low on the real memory usage which you mentioned above. Fixed value setting has some constraints as 1. It is an experienced value based on observation, which could be inaccurate. 2. working load is various from scenarios. 3. fixed value from [1] could be against the dynamic cgroup_size in [2]. shrink_node_memcgs [1] check if the memcg is protected based on fixed min/low value mem_cgroup_calculate_protection(target_memcg, memcg); if (mem_cgroup_below_min(memcg)) ... else if (mem_cgroup_below_low(memcg)) ... [2] calculate the number of scan size proportionally shrink_lruvec get_scan_count mem_cgroup_protection scan = lruvec_size - lruvec_size * protection / (cgroup_size + 1);
On Fri, Mar 25, 2022 at 12:23 AM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > It seems like what’s being proposed is an ability to express the protection in % of the current usage rather than an absolute number. > It’s an equivalent for something like a memory (reclaim) priority: e.g. a cgroup with 80% protection is _always_ reclaimed less aggressively than one with a 20% protection. > > That said, I’m not a fan of this idea. > It might make sense in some reasonable range of usages, but if your workload is simply leaking memory and growing indefinitely, protecting it seems like a bad idea. And the first part can be easily achieved using an userspace tool. > > Thanks! > > > On Mar 24, 2022, at 7:33 AM, Chris Down <chris@chrisdown.name> wrote: > > > > I'm confused by the aims of this patch. We already have proportional reclaim for memory.min and memory.low, and memory.high is already "proportional" by its nature to drive memory back down behind the configured threshold. > > > > Could you please be more clear about what you're trying to achieve and in what way the existing proportional reclaim mechanisms are insufficient for you? ok, I think it could be fixable for memory leak issues. Please refer to my reply on Chris's comment for more explanation.
On Fri 25-03-22 11:08:00, Zhaoyang Huang wrote: > On Fri, Mar 25, 2022 at 11:02 AM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote: > > > > On Thu, Mar 24, 2022 at 10:27 PM Chris Down <chris@chrisdown.name> wrote: > > > > > > I'm confused by the aims of this patch. We already have proportional reclaim > > > for memory.min and memory.low, and memory.high is already "proportional" by its > > > nature to drive memory back down behind the configured threshold. > > > > > > Could you please be more clear about what you're trying to achieve and in what > > > way the existing proportional reclaim mechanisms are insufficient for you? > > sorry for the bad formatting of previous reply, resend it in new format > > What I am trying to solve is that, the memcg's protection judgment[1] > is based on a set of fixed value on current design, while the real > scan and reclaim number[2] is based on the proportional min/low on the > real memory usage which you mentioned above. Fixed value setting has > some constraints as > 1. It is an experienced value based on observation, which could be inaccurate. > 2. working load is various from scenarios. > 3. fixed value from [1] could be against the dynamic cgroup_size in [2]. Could you elaborate some more about those points. I guess providing an example how you are using the new interface instead would be helpful.
diff --git a/include/linux/page_counter.h b/include/linux/page_counter.h index 6795913..7762629 100644 --- a/include/linux/page_counter.h +++ b/include/linux/page_counter.h @@ -27,6 +27,9 @@ struct page_counter { unsigned long watermark; unsigned long failcnt; + /* proportional protection */ + unsigned long min_prop; + unsigned long low_prop; /* * 'parent' is placed here to be far from 'usage' to reduce * cache false sharing, as 'usage' is written mostly while diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 508bcea..937c6ce 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -6616,6 +6616,7 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, { unsigned long usage, parent_usage; struct mem_cgroup *parent; + unsigned long memcg_emin, memcg_elow, parent_emin, parent_elow; if (mem_cgroup_disabled()) return; @@ -6650,14 +6651,22 @@ void mem_cgroup_calculate_protection(struct mem_cgroup *root, parent_usage = page_counter_read(&parent->memory); + /* use proportional protect first and take 1024 as 100% */ + memcg_emin = READ_ONCE(memcg->memory.min_prop) ? + READ_ONCE(memcg->memory.min_prop) * READ_ONCE(memcg->memory.watermark) / 1024 : READ_ONCE(memcg->memory.min); + memcg_elow = READ_ONCE(memcg->memory.low_prop) ? + READ_ONCE(memcg->memory.low_prop) * READ_ONCE(memcg->memory.watermark) / 1024 : READ_ONCE(memcg->memory.low); + parent_emin = READ_ONCE(parent->memory.min_prop) ? + READ_ONCE(parent->memory.min_prop) * READ_ONCE(parent->memory.watermark) / 1024 : READ_ONCE(parent->memory.emin); + parent_elow = READ_ONCE(parent->memory.low_prop) ? + READ_ONCE(parent->memory.low_prop) * READ_ONCE(parent->memory.watermark) / 1024 : READ_ONCE(parent->memory.elow); + WRITE_ONCE(memcg->memory.emin, effective_protection(usage, parent_usage, - READ_ONCE(memcg->memory.min), - READ_ONCE(parent->memory.emin), + memcg_emin, parent_emin, atomic_long_read(&parent->memory.children_min_usage))); WRITE_ONCE(memcg->memory.elow, effective_protection(usage, parent_usage, - READ_ONCE(memcg->memory.low), - READ_ONCE(parent->memory.elow), + memcg_elow, parent_elow, atomic_long_read(&parent->memory.children_low_usage))); }