Message ID | 20210209163304.77088-5-hannes@cmpxchg.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: memcontrol: switch to rstat | expand |
Hello. On Tue, Feb 09, 2021 at 11:33:00AM -0500, Johannes Weiner <hannes@cmpxchg.org> wrote: > @@ -1971,10 +1978,14 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) > if (ret) > goto destroy_root; > > - ret = rebind_subsystems(root, ss_mask); > + ret = cgroup_rstat_init(root_cgrp); Would it make sense to do cgroup_rstat_init() only if there's a subsys in ss_mask that makes use of rstat? (On legacy systems there could be individual hierarchy for each controller so the rstat space can be saved.) > @@ -5159,11 +5170,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, > if (ret) > goto out_free_cgrp; > > - if (cgroup_on_dfl(parent)) { > - ret = cgroup_rstat_init(cgrp); > - if (ret) > - goto out_cancel_ref; > - } > + ret = cgroup_rstat_init(cgrp); And here do cgroup_rstat_init() only when parent has it. > @@ -285,8 +285,6 @@ void __init cgroup_rstat_boot(void) > > for_each_possible_cpu(cpu) > raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > - > - BUG_ON(cgroup_rstat_init(&cgrp_dfl_root.cgrp)); > } Regardless of the suggestion above, this removal obsoletes the comment cgroup_rstat_init: int cpu; - /* the root cgrp has rstat_cpu preallocated */ if (!cgrp->rstat_cpu) { cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); Regards, Michal
On Wed, Feb 17, 2021 at 06:42:32PM +0100, Michal Koutný wrote: > Hello. > > On Tue, Feb 09, 2021 at 11:33:00AM -0500, Johannes Weiner <hannes@cmpxchg.org> wrote: > > @@ -1971,10 +1978,14 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) > > if (ret) > > goto destroy_root; > > > > - ret = rebind_subsystems(root, ss_mask); > > + ret = cgroup_rstat_init(root_cgrp); > Would it make sense to do cgroup_rstat_init() only if there's a subsys > in ss_mask that makes use of rstat? > (On legacy systems there could be individual hierarchy for each > controller so the rstat space can be saved.) It's possible, but I don't think worth the trouble. It would have to be done from rebind_subsystems(), as remount can add more subsystems to an existing cgroup1 root. That in turn means we'd have to have separate init paths for cgroup1 and cgroup2. While we split cgroup1 and cgroup2 paths where necessary in the code, it's a significant maintenance burden and a not unlikely source of subtle errors (see the recent 'fix swap undercounting in cgroup2'). In this case, we're talking about a relatively small data structure and the overhead is per mountpoint. Comparatively, we're allocating the full vmstats structures for cgroup1 groups which barely use them, and cgroup1 softlimit tree structures for each cgroup2 group. So I don't think it's a good tradeoff. Subtle bugs that require kernel patches are more disruptive to the user experience than the amount of memory in question here. > > @@ -285,8 +285,6 @@ void __init cgroup_rstat_boot(void) > > > > for_each_possible_cpu(cpu) > > raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); > > - > > - BUG_ON(cgroup_rstat_init(&cgrp_dfl_root.cgrp)); > > } > Regardless of the suggestion above, this removal obsoletes the comment > cgroup_rstat_init: > > int cpu; > > - /* the root cgrp has rstat_cpu preallocated */ > if (!cgrp->rstat_cpu) { > cgrp->rstat_cpu = alloc_percpu(struct cgroup_rstat_cpu); Oh, I'm not removing the init call, I'm merely moving it from cgroup_rstat_boot() to cgroup_setup_root(). The default root group has statically preallocated percpu data before and after this patch. See cgroup.c: static DEFINE_PER_CPU(struct cgroup_rstat_cpu, cgrp_dfl_root_rstat_cpu); /* the default hierarchy */ struct cgroup_root cgrp_dfl_root = { .cgrp.rstat_cpu = &cgrp_dfl_root_rstat_cpu }; EXPORT_SYMBOL_GPL(cgrp_dfl_root);
On Wed, Feb 17, 2021 at 03:52:59PM -0500, Johannes Weiner <hannes@cmpxchg.org> wrote: > It's possible, but I don't think worth the trouble. You're right. I gave it a deeper look and what would be saved on data, would be paid in code complexity. > In this case, we're talking about a relatively small data structure > and the overhead is per mountpoint. IIUC, it is per each mountpoint's number of cgroups. But I still accept the argument above. Furthermore, this can be changed later. > The default root group has statically preallocated percpu data before > and after this patch. See cgroup.c: I stand corrected, the comment is still valid. Therefore, Reviewed-by: Michal Koutný <mkoutny@suse.com>
On Thu, Feb 18, 2021 at 04:45:11PM +0100, Michal Koutný wrote: > On Wed, Feb 17, 2021 at 03:52:59PM -0500, Johannes Weiner <hannes@cmpxchg.org> wrote: > > In this case, we're talking about a relatively small data structure > > and the overhead is per mountpoint. > IIUC, it is per each mountpoint's number of cgroups. But I still accept > the argument above. Furthermore, this can be changed later. Oops, you're right of course. > > The default root group has statically preallocated percpu data before > > and after this patch. See cgroup.c: > I stand corrected, the comment is still valid. > > Therefore, > Reviewed-by: Michal Koutný <mkoutny@suse.com> Thanks for your reviews, Michal!
diff --git a/kernel/cgroup/cgroup.c b/kernel/cgroup/cgroup.c index 9153b20e5cc6..e049edd66776 100644 --- a/kernel/cgroup/cgroup.c +++ b/kernel/cgroup/cgroup.c @@ -1339,6 +1339,7 @@ static void cgroup_destroy_root(struct cgroup_root *root) mutex_unlock(&cgroup_mutex); + cgroup_rstat_exit(cgrp); kernfs_destroy_root(root->kf_root); cgroup_free_root(root); } @@ -1751,6 +1752,12 @@ int rebind_subsystems(struct cgroup_root *dst_root, u16 ss_mask) &dcgrp->e_csets[ss->id]); spin_unlock_irq(&css_set_lock); + if (ss->css_rstat_flush) { + list_del_rcu(&css->rstat_css_node); + list_add_rcu(&css->rstat_css_node, + &dcgrp->rstat_css_list); + } + /* default hierarchy doesn't enable controllers by default */ dst_root->subsys_mask |= 1 << ssid; if (dst_root == &cgrp_dfl_root) { @@ -1971,10 +1978,14 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) if (ret) goto destroy_root; - ret = rebind_subsystems(root, ss_mask); + ret = cgroup_rstat_init(root_cgrp); if (ret) goto destroy_root; + ret = rebind_subsystems(root, ss_mask); + if (ret) + goto exit_stats; + ret = cgroup_bpf_inherit(root_cgrp); WARN_ON_ONCE(ret); @@ -2006,6 +2017,8 @@ int cgroup_setup_root(struct cgroup_root *root, u16 ss_mask) ret = 0; goto out; +exit_stats: + cgroup_rstat_exit(root_cgrp); destroy_root: kernfs_destroy_root(root->kf_root); root->kf_root = NULL; @@ -4934,8 +4947,7 @@ static void css_free_rwork_fn(struct work_struct *work) cgroup_put(cgroup_parent(cgrp)); kernfs_put(cgrp->kn); psi_cgroup_free(cgrp); - if (cgroup_on_dfl(cgrp)) - cgroup_rstat_exit(cgrp); + cgroup_rstat_exit(cgrp); kfree(cgrp); } else { /* @@ -4976,8 +4988,7 @@ static void css_release_work_fn(struct work_struct *work) /* cgroup release path */ TRACE_CGROUP_PATH(release, cgrp); - if (cgroup_on_dfl(cgrp)) - cgroup_rstat_flush(cgrp); + cgroup_rstat_flush(cgrp); spin_lock_irq(&css_set_lock); for (tcgrp = cgroup_parent(cgrp); tcgrp; @@ -5034,7 +5045,7 @@ static void init_and_link_css(struct cgroup_subsys_state *css, css_get(css->parent); } - if (cgroup_on_dfl(cgrp) && ss->css_rstat_flush) + if (ss->css_rstat_flush) list_add_rcu(&css->rstat_css_node, &cgrp->rstat_css_list); BUG_ON(cgroup_css(cgrp, ss)); @@ -5159,11 +5170,9 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, if (ret) goto out_free_cgrp; - if (cgroup_on_dfl(parent)) { - ret = cgroup_rstat_init(cgrp); - if (ret) - goto out_cancel_ref; - } + ret = cgroup_rstat_init(cgrp); + if (ret) + goto out_cancel_ref; /* create the directory */ kn = kernfs_create_dir(parent->kn, name, mode, cgrp); @@ -5250,8 +5259,7 @@ static struct cgroup *cgroup_create(struct cgroup *parent, const char *name, out_kernfs_remove: kernfs_remove(cgrp->kn); out_stat_exit: - if (cgroup_on_dfl(parent)) - cgroup_rstat_exit(cgrp); + cgroup_rstat_exit(cgrp); out_cancel_ref: percpu_ref_exit(&cgrp->self.refcnt); out_free_cgrp: diff --git a/kernel/cgroup/rstat.c b/kernel/cgroup/rstat.c index d51175cedfca..faa767a870ba 100644 --- a/kernel/cgroup/rstat.c +++ b/kernel/cgroup/rstat.c @@ -285,8 +285,6 @@ void __init cgroup_rstat_boot(void) for_each_possible_cpu(cpu) raw_spin_lock_init(per_cpu_ptr(&cgroup_rstat_cpu_lock, cpu)); - - BUG_ON(cgroup_rstat_init(&cgrp_dfl_root.cgrp)); } /*