mbox series

[v2,0/3] mm: memcontrol: recursive memory protection

Message ID 20191219200718.15696-1-hannes@cmpxchg.org (mailing list archive)
Headers show
Series mm: memcontrol: recursive memory protection | expand

Message

Johannes Weiner Dec. 19, 2019, 8:07 p.m. UTC
Changes since v1:
- improved Changelogs based on the discussion with Roman. Thanks!
- fix div0 when recursive & fixed protection is combined
- fix an unused compiler warning

The current memory.low (and memory.min) semantics require protection
to be assigned to a cgroup in an untinterrupted chain from the
top-level cgroup all the way to the leaf.

In practice, we want to protect entire cgroup subtrees from each other
(system management software vs. workload), but we would like the VM to
balance memory optimally *within* each subtree, without having to make
explicit weight allocations among individual components. The current
semantics make that impossible.

This patch series extends memory.low/min such that the knobs apply
recursively to the entire subtree. Users can still assign explicit
protection to subgroups, but if they don't, the protection set by the
parent cgroup will be distributed dynamically such that children
compete freely - as if no memory control were enabled inside the
subtree - but enjoy protection from neighboring trees.

Patch #1 fixes an existing bug that can give a cgroup tree more
protection than it should receive as per ancestor configuration.

Patch #2 simplifies and documents the existing code to make it easier
to reason about the changes in the next patch.

Patch #3 finally implements recursive memory protection semantics.

Because of a risk of regressing legacy setups, the new semantics are
hidden behind a cgroup2 mount option, 'memory_recursiveprot'.

More details in patch #3.

 Documentation/admin-guide/cgroup-v2.rst |  11 ++
 include/linux/cgroup-defs.h             |   5 +
 kernel/cgroup/cgroup.c                  |  17 ++-
 mm/memcontrol.c                         | 243 +++++++++++++++++++-----------
 mm/page_counter.c                       |  12 +-
 5 files changed, 192 insertions(+), 96 deletions(-)

Comments

Tejun Heo Dec. 19, 2019, 8:22 p.m. UTC | #1
On Thu, Dec 19, 2019 at 03:07:15PM -0500, Johannes Weiner wrote:
> Changes since v1:
> - improved Changelogs based on the discussion with Roman. Thanks!
> - fix div0 when recursive & fixed protection is combined
> - fix an unused compiler warning
> 
> The current memory.low (and memory.min) semantics require protection
> to be assigned to a cgroup in an untinterrupted chain from the
> top-level cgroup all the way to the leaf.
> 
> In practice, we want to protect entire cgroup subtrees from each other
> (system management software vs. workload), but we would like the VM to
> balance memory optimally *within* each subtree, without having to make
> explicit weight allocations among individual components. The current
> semantics make that impossible.

Acked-by: Tejun Heo <tj@kernel.org>

The original behavior turned out to be a significant source of
mistakes and use cases which would require older behavior just weren't
there.

Thanks.
Roman Gushchin Dec. 20, 2019, 4:06 a.m. UTC | #2
On Thu, Dec 19, 2019 at 03:07:15PM -0500, Johannes Weiner wrote:
> Changes since v1:
> - improved Changelogs based on the discussion with Roman. Thanks!
> - fix div0 when recursive & fixed protection is combined
> - fix an unused compiler warning
> 
> The current memory.low (and memory.min) semantics require protection
> to be assigned to a cgroup in an untinterrupted chain from the
> top-level cgroup all the way to the leaf.
> 
> In practice, we want to protect entire cgroup subtrees from each other
> (system management software vs. workload), but we would like the VM to
> balance memory optimally *within* each subtree, without having to make
> explicit weight allocations among individual components. The current
> semantics make that impossible.
> 
> This patch series extends memory.low/min such that the knobs apply
> recursively to the entire subtree. Users can still assign explicit
> protection to subgroups, but if they don't, the protection set by the
> parent cgroup will be distributed dynamically such that children
> compete freely - as if no memory control were enabled inside the
> subtree - but enjoy protection from neighboring trees.
> 
> Patch #1 fixes an existing bug that can give a cgroup tree more
> protection than it should receive as per ancestor configuration.
> 
> Patch #2 simplifies and documents the existing code to make it easier
> to reason about the changes in the next patch.
> 
> Patch #3 finally implements recursive memory protection semantics.
> 
> Because of a risk of regressing legacy setups, the new semantics are
> hidden behind a cgroup2 mount option, 'memory_recursiveprot'.

I really like the new semantics: it looks nice and doesn't require
any new magic values aka "bypass", which have been discussed previously.
The ability to disable the protection for a particular cgroup inside
the protected sub-tree looks overvalued: I don't have any practical
example when it makes any sense. So it's totally worth it to sacrifice
it. Thank you for adding comments to the changelog!

Acked-by: Roman Gushchin <guro@fb.com>
for the series.

Thanks!
Chris Down Dec. 20, 2019, 4:29 a.m. UTC | #3
Johannes Weiner writes:
>Changes since v1:
>- improved Changelogs based on the discussion with Roman. Thanks!
>- fix div0 when recursive & fixed protection is combined
>- fix an unused compiler warning
>
>The current memory.low (and memory.min) semantics require protection
>to be assigned to a cgroup in an untinterrupted chain from the
>top-level cgroup all the way to the leaf.
>
>In practice, we want to protect entire cgroup subtrees from each other
>(system management software vs. workload), but we would like the VM to
>balance memory optimally *within* each subtree, without having to make
>explicit weight allocations among individual components. The current
>semantics make that impossible.
>
>This patch series extends memory.low/min such that the knobs apply
>recursively to the entire subtree. Users can still assign explicit
>protection to subgroups, but if they don't, the protection set by the
>parent cgroup will be distributed dynamically such that children
>compete freely - as if no memory control were enabled inside the
>subtree - but enjoy protection from neighboring trees.

Thanks, from experience working with these semantics in userspace, I agree that 
this design makes it easier to configure the protections in a way that is 
meaningful.

For the series:

Acked-by: Chris Down <chris@chrisdown.name>

>Patch #1 fixes an existing bug that can give a cgroup tree more
>protection than it should receive as per ancestor configuration.
>
>Patch #2 simplifies and documents the existing code to make it easier
>to reason about the changes in the next patch.
>
>Patch #3 finally implements recursive memory protection semantics.

Just as an off-topic aside, although I'm sure you already have it in mind, we 
should definitely make sure to clearly point this out to those in the container 
management tooling space who are in the process of moving to support/default to 
v2. For example, I wonder about CoreOS' systemwide strategy around memory 
management and whether it can benefit from this.