Message ID | 20221220182745.1903540-2-roman.gushchin@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: kmem: optimize obj_cgroup pointer retrieval | expand |
On Tue, Dec 20, 2022 at 10:28 AM Roman Gushchin <roman.gushchin@linux.dev> wrote: > > Manually inline memcg_kmem_bypass() and active_memcg() to speed up > get_obj_cgroup_from_current() by avoiding duplicate in_task() checks > and active_memcg() readings. > > Also add a likely() macro to __get_obj_cgroup_from_memcg(): > obj_cgroup_tryget() should succeed at almost all times except > a very unlikely race with the memcg deletion path. > > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> Can you please add your performance experiment setup and result of this patch in the commit description of this patch as well? Acked-by: Shakeel Butt <shakeelb@google.com>
On Tue, Dec 20, 2022 at 11:55:34AM -0800, Shakeel Butt wrote: > On Tue, Dec 20, 2022 at 10:28 AM Roman Gushchin > <roman.gushchin@linux.dev> wrote: > > > > Manually inline memcg_kmem_bypass() and active_memcg() to speed up > > get_obj_cgroup_from_current() by avoiding duplicate in_task() checks > > and active_memcg() readings. > > > > Also add a likely() macro to __get_obj_cgroup_from_memcg(): > > obj_cgroup_tryget() should succeed at almost all times except > > a very unlikely race with the memcg deletion path. > > > > Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> > > Can you please add your performance experiment setup and result of > this patch in the commit description of this patch as well? Sure. I used a small hack to just do a bunch of allocations in a raw and measured the time. Will include it into the commit message. Also will fix the #ifdef thing from the second patch, thanks for spotting it. > > Acked-by: Shakeel Butt <shakeelb@google.com> Thank you for taking a look!
diff --git a/mm/memcontrol.c b/mm/memcontrol.c index bafd3cde4507..82828c51d2ea 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -1047,19 +1047,6 @@ struct mem_cgroup *get_mem_cgroup_from_mm(struct mm_struct *mm) } EXPORT_SYMBOL(get_mem_cgroup_from_mm); -static __always_inline bool memcg_kmem_bypass(void) -{ - /* Allow remote memcg charging from any context. */ - if (unlikely(active_memcg())) - return false; - - /* Memcg to charge can't be determined. */ - if (!in_task() || !current->mm || (current->flags & PF_KTHREAD)) - return true; - - return false; -} - /** * mem_cgroup_iter - iterate over memory cgroup hierarchy * @root: hierarchy root @@ -3004,7 +2991,7 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg) for (; !mem_cgroup_is_root(memcg); memcg = parent_mem_cgroup(memcg)) { objcg = rcu_dereference(memcg->objcg); - if (objcg && obj_cgroup_tryget(objcg)) + if (likely(objcg && obj_cgroup_tryget(objcg))) break; objcg = NULL; } @@ -3013,16 +3000,23 @@ static struct obj_cgroup *__get_obj_cgroup_from_memcg(struct mem_cgroup *memcg) __always_inline struct obj_cgroup *get_obj_cgroup_from_current(void) { - struct obj_cgroup *objcg = NULL; struct mem_cgroup *memcg; + struct obj_cgroup *objcg; - if (memcg_kmem_bypass()) - return NULL; + if (in_task()) { + memcg = current->active_memcg; + + /* Memcg to charge can't be determined. */ + if (likely(!memcg) && (!current->mm || (current->flags & PF_KTHREAD))) + return NULL; + } else { + memcg = this_cpu_read(int_active_memcg); + if (likely(!memcg)) + return NULL; + } rcu_read_lock(); - if (unlikely(active_memcg())) - memcg = active_memcg(); - else + if (!memcg) memcg = mem_cgroup_from_task(current); objcg = __get_obj_cgroup_from_memcg(memcg); rcu_read_unlock();
Manually inline memcg_kmem_bypass() and active_memcg() to speed up get_obj_cgroup_from_current() by avoiding duplicate in_task() checks and active_memcg() readings. Also add a likely() macro to __get_obj_cgroup_from_memcg(): obj_cgroup_tryget() should succeed at almost all times except a very unlikely race with the memcg deletion path. Signed-off-by: Roman Gushchin <roman.gushchin@linux.dev> --- mm/memcontrol.c | 34 ++++++++++++++-------------------- 1 file changed, 14 insertions(+), 20 deletions(-)