Message ID | 20210814052519.86679-2-songmuchun@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Use obj_cgroup APIs to charge the LRU pages | expand |
Hi Muchun,
Thank you for the patch! Yet something to improve:
[auto build test ERROR on next-20210813]
[also build test ERROR on v5.14-rc5]
[cannot apply to hnaz-linux-mm/master cgroup/for-next linus/master v5.14-rc5 v5.14-rc4 v5.14-rc3]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]
url: https://github.com/0day-ci/linux/commits/Muchun-Song/Use-obj_cgroup-APIs-to-charge-the-LRU-pages/20210814-132844
base: 4b358aabb93a2c654cd1dcab1a25a589f6e2b153
config: x86_64-randconfig-a004-20210815 (attached as .config)
compiler: clang version 14.0.0 (https://github.com/llvm/llvm-project 1f7b25ea76a925aca690da28de9d78db7ca99d0c)
reproduce (this is a W=1 build):
wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
chmod +x ~/bin/make.cross
# https://github.com/0day-ci/linux/commit/8a7c5ff0c21bed8ed69322118d0a01388ee8a6fd
git remote add linux-review https://github.com/0day-ci/linux
git fetch --no-tags linux-review Muchun-Song/Use-obj_cgroup-APIs-to-charge-the-LRU-pages/20210814-132844
git checkout 8a7c5ff0c21bed8ed69322118d0a01388ee8a6fd
# save the attached .config to linux build tree
mkdir build_dir
COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross O=build_dir ARCH=x86_64 SHELL=/bin/bash
If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>
All errors (new ones prefixed by >>):
>> mm/memcontrol.c:266:6: error: redefinition of 'mem_cgroup_kmem_disabled'
bool mem_cgroup_kmem_disabled(void)
^
include/linux/memcontrol.h:1728:20: note: previous definition is here
static inline bool mem_cgroup_kmem_disabled(void)
^
1 error generated.
vim +/mem_cgroup_kmem_disabled +266 mm/memcontrol.c
bf4f059954dcb2 Roman Gushchin 2020-08-06 265
4d5c8aedc8aa6a Roman Gushchin 2021-06-02 @266 bool mem_cgroup_kmem_disabled(void)
4d5c8aedc8aa6a Roman Gushchin 2021-06-02 267 {
4d5c8aedc8aa6a Roman Gushchin 2021-06-02 268 return cgroup_memory_nokmem;
4d5c8aedc8aa6a Roman Gushchin 2021-06-02 269 }
4d5c8aedc8aa6a Roman Gushchin 2021-06-02 270
---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
On Sat, Aug 14, 2021 at 01:25:08PM +0800, Muchun Song wrote: > Pagecache pages are charged at the allocation time and holding a > reference to the original memory cgroup until being reclaimed. > Depending on the memory pressure, specific patterns of the page > sharing between different cgroups and the cgroup creation and > destruction rates, a large number of dying memory cgroups can be > pinned by pagecache pages. It makes the page reclaim less efficient > and wastes memory. > > We can convert LRU pages and most other raw memcg pins to the objcg > direction to fix this problem, and then the page->memcg will always > point to an object cgroup pointer. > > Therefore, the infrastructure of objcg no longer only serves > CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the > objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages > can reuse it to charge pages. > > We know that the LRU pages are not accounted at the root level. But > the page->memcg_data points to the root_mem_cgroup. So the > page->memcg_data of the LRU pages always points to a valid pointer. > But the root_mem_cgroup dose not have an object cgroup. If we use > obj_cgroup APIs to charge the LRU pages, we should set the > page->memcg_data to a root object cgroup. So we also allocate an > object cgroup for the root_mem_cgroup. > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> I like the "move objcg stuff to memcg/css level" part. I'm less convinced about making byte-sized charging kmem-specific (both naming and #ifdef CONFIG_MEMCG_KMEM). Do we really win a lot? I understand why we might wanna compile out some checks from the hot allocation path, but few bytes in struct objcg will not make a big difference, as well as few lines of code in cgroup creation/removal paths. Also it might be useful for byte-sized accounting outside kmem, e.g. zswap. So, I'd remove this dependency and rename to something like obj_cgroup_release_bytes(). In the long run we might wanna to eliminate CONFIG_MEMCG_KMEM completely, so let's at least not add new dependencies. Thanks!
On Wed, Aug 18, 2021 at 11:01 AM Roman Gushchin <guro@fb.com> wrote: > > On Sat, Aug 14, 2021 at 01:25:08PM +0800, Muchun Song wrote: > > Pagecache pages are charged at the allocation time and holding a > > reference to the original memory cgroup until being reclaimed. > > Depending on the memory pressure, specific patterns of the page > > sharing between different cgroups and the cgroup creation and > > destruction rates, a large number of dying memory cgroups can be > > pinned by pagecache pages. It makes the page reclaim less efficient > > and wastes memory. > > > > We can convert LRU pages and most other raw memcg pins to the objcg > > direction to fix this problem, and then the page->memcg will always > > point to an object cgroup pointer. > > > > Therefore, the infrastructure of objcg no longer only serves > > CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the > > objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages > > can reuse it to charge pages. > > > > We know that the LRU pages are not accounted at the root level. But > > the page->memcg_data points to the root_mem_cgroup. So the > > page->memcg_data of the LRU pages always points to a valid pointer. > > But the root_mem_cgroup dose not have an object cgroup. If we use > > obj_cgroup APIs to charge the LRU pages, we should set the > > page->memcg_data to a root object cgroup. So we also allocate an > > object cgroup for the root_mem_cgroup. > > > > Signed-off-by: Muchun Song <songmuchun@bytedance.com> > > I like the "move objcg stuff to memcg/css level" part. > > I'm less convinced about making byte-sized charging kmem-specific > (both naming and #ifdef CONFIG_MEMCG_KMEM). Do we really win a lot? > > I understand why we might wanna compile out some checks from the > hot allocation path, but few bytes in struct objcg will not make a big > difference, as well as few lines of code in cgroup creation/removal paths. > > Also it might be useful for byte-sized accounting outside kmem, e.g. zswap. > So, I'd remove this dependency and rename to something like > obj_cgroup_release_bytes(). Got it. I'll do that in the next version. Thanks for your suggestions. > > In the long run we might wanna to eliminate CONFIG_MEMCG_KMEM completely, > so let's at least not add new dependencies. Got it. > > Thanks!
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 0ff146486aed..41a35de93d75 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -221,7 +221,9 @@ struct memcg_cgwb_frn { struct obj_cgroup { struct percpu_ref refcnt; struct mem_cgroup *memcg; +#ifdef CONFIG_MEMCG_KMEM atomic_t nr_charged_bytes; +#endif union { struct list_head list; struct rcu_head rcu; @@ -319,9 +321,9 @@ struct mem_cgroup { #ifdef CONFIG_MEMCG_KMEM int kmemcg_id; enum memcg_kmem_state kmem_state; +#endif struct obj_cgroup __rcu *objcg; struct list_head objcg_list; /* list of inherited objcgs */ -#endif MEMCG_PADDING(_pad2_); diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 3e7c205a1852..7df2176e4f02 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -261,7 +261,6 @@ struct mem_cgroup *vmpressure_to_memcg(struct vmpressure *vmpr) return container_of(vmpr, struct mem_cgroup, vmpressure); } -#ifdef CONFIG_MEMCG_KMEM extern spinlock_t css_set_lock; bool mem_cgroup_kmem_disabled(void) @@ -269,15 +268,14 @@ bool mem_cgroup_kmem_disabled(void) return cgroup_memory_nokmem; } +#ifdef CONFIG_MEMCG_KMEM static void obj_cgroup_uncharge_pages(struct obj_cgroup *objcg, unsigned int nr_pages); -static void obj_cgroup_release(struct percpu_ref *ref) +static void obj_cgroup_release_kmem(struct obj_cgroup *objcg) { - struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); unsigned int nr_bytes; unsigned int nr_pages; - unsigned long flags; /* * At this point all allocated objects are freed, and @@ -291,9 +289,9 @@ static void obj_cgroup_release(struct percpu_ref *ref) * 3) CPU1: a process from another memcg is allocating something, * the stock if flushed, * objcg->nr_charged_bytes = PAGE_SIZE - 92 - * 5) CPU0: we do release this object, + * 4) CPU0: we do release this object, * 92 bytes are added to stock->nr_bytes - * 6) CPU0: stock is flushed, + * 5) CPU0: stock is flushed, * 92 bytes are added to objcg->nr_charged_bytes * * In the result, nr_charged_bytes == PAGE_SIZE. @@ -305,6 +303,19 @@ static void obj_cgroup_release(struct percpu_ref *ref) if (nr_pages) obj_cgroup_uncharge_pages(objcg, nr_pages); +} +#else +static inline void obj_cgroup_release_kmem(struct obj_cgroup *objcg) +{ +} +#endif + +static void obj_cgroup_release(struct percpu_ref *ref) +{ + struct obj_cgroup *objcg = container_of(ref, struct obj_cgroup, refcnt); + unsigned long flags; + + obj_cgroup_release_kmem(objcg); spin_lock_irqsave(&css_set_lock, flags); list_del(&objcg->list); @@ -333,10 +344,14 @@ static struct obj_cgroup *obj_cgroup_alloc(void) return objcg; } -static void memcg_reparent_objcgs(struct mem_cgroup *memcg, - struct mem_cgroup *parent) +static void memcg_reparent_objcgs(struct mem_cgroup *memcg) { struct obj_cgroup *objcg, *iter; + struct mem_cgroup *parent; + + parent = parent_mem_cgroup(memcg); + if (!parent) + parent = root_mem_cgroup; objcg = rcu_replace_pointer(memcg->objcg, NULL, true); @@ -355,6 +370,7 @@ static void memcg_reparent_objcgs(struct mem_cgroup *memcg, percpu_ref_kill(&objcg->refcnt); } +#ifdef CONFIG_MEMCG_KMEM /* * This will be used as a shrinker list's index. * The main reason for not using cgroup id for this: @@ -3579,7 +3595,6 @@ static u64 mem_cgroup_read_u64(struct cgroup_subsys_state *css, #ifdef CONFIG_MEMCG_KMEM static int memcg_online_kmem(struct mem_cgroup *memcg) { - struct obj_cgroup *objcg; int memcg_id; if (cgroup_memory_nokmem) @@ -3592,14 +3607,6 @@ static int memcg_online_kmem(struct mem_cgroup *memcg) if (memcg_id < 0) return memcg_id; - objcg = obj_cgroup_alloc(); - if (!objcg) { - memcg_free_cache_id(memcg_id); - return -ENOMEM; - } - objcg->memcg = memcg; - rcu_assign_pointer(memcg->objcg, objcg); - static_branch_enable(&memcg_kmem_enabled_key); memcg->kmemcg_id = memcg_id; @@ -3623,8 +3630,6 @@ static void memcg_offline_kmem(struct mem_cgroup *memcg) if (!parent) parent = root_mem_cgroup; - memcg_reparent_objcgs(memcg, parent); - kmemcg_id = memcg->kmemcg_id; BUG_ON(kmemcg_id < 0); @@ -5151,8 +5156,8 @@ static struct mem_cgroup *mem_cgroup_alloc(void) memcg->socket_pressure = jiffies; #ifdef CONFIG_MEMCG_KMEM memcg->kmemcg_id = -1; - INIT_LIST_HEAD(&memcg->objcg_list); #endif + INIT_LIST_HEAD(&memcg->objcg_list); #ifdef CONFIG_CGROUP_WRITEBACK INIT_LIST_HEAD(&memcg->cgwb_list); for (i = 0; i < MEMCG_CGWB_FRN_CNT; i++) @@ -5224,16 +5229,22 @@ mem_cgroup_css_alloc(struct cgroup_subsys_state *parent_css) static int mem_cgroup_css_online(struct cgroup_subsys_state *css) { struct mem_cgroup *memcg = mem_cgroup_from_css(css); + struct obj_cgroup *objcg; /* * A memcg must be visible for expand_shrinker_info() * by the time the maps are allocated. So, we allocate maps * here, when for_each_mem_cgroup() can't skip it. */ - if (alloc_shrinker_info(memcg)) { - mem_cgroup_id_remove(memcg); - return -ENOMEM; - } + if (alloc_shrinker_info(memcg)) + goto remove_id; + + objcg = obj_cgroup_alloc(); + if (!objcg) + goto free_shrinker; + + objcg->memcg = memcg; + rcu_assign_pointer(memcg->objcg, objcg); /* Online state pins memcg ID, memcg ID pins CSS */ refcount_set(&memcg->id.ref, 1); @@ -5243,6 +5254,12 @@ static int mem_cgroup_css_online(struct cgroup_subsys_state *css) queue_delayed_work(system_unbound_wq, &stats_flush_dwork, 2UL*HZ); return 0; + +free_shrinker: + free_shrinker_info(memcg); +remove_id: + mem_cgroup_id_remove(memcg); + return -ENOMEM; } static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) @@ -5266,6 +5283,7 @@ static void mem_cgroup_css_offline(struct cgroup_subsys_state *css) page_counter_set_low(&memcg->memory, 0); memcg_offline_kmem(memcg); + memcg_reparent_objcgs(memcg); reparent_shrinker_deferred(memcg); wb_memcg_offline(memcg);
Pagecache pages are charged at the allocation time and holding a reference to the original memory cgroup until being reclaimed. Depending on the memory pressure, specific patterns of the page sharing between different cgroups and the cgroup creation and destruction rates, a large number of dying memory cgroups can be pinned by pagecache pages. It makes the page reclaim less efficient and wastes memory. We can convert LRU pages and most other raw memcg pins to the objcg direction to fix this problem, and then the page->memcg will always point to an object cgroup pointer. Therefore, the infrastructure of objcg no longer only serves CONFIG_MEMCG_KMEM. In this patch, we move the infrastructure of the objcg out of the scope of the CONFIG_MEMCG_KMEM so that the LRU pages can reuse it to charge pages. We know that the LRU pages are not accounted at the root level. But the page->memcg_data points to the root_mem_cgroup. So the page->memcg_data of the LRU pages always points to a valid pointer. But the root_mem_cgroup dose not have an object cgroup. If we use obj_cgroup APIs to charge the LRU pages, we should set the page->memcg_data to a root object cgroup. So we also allocate an object cgroup for the root_mem_cgroup. Signed-off-by: Muchun Song <songmuchun@bytedance.com> --- include/linux/memcontrol.h | 4 ++- mm/memcontrol.c | 66 +++++++++++++++++++++++++++++----------------- 2 files changed, 45 insertions(+), 25 deletions(-)