Message ID | 20200922203700.2879671-5-guro@fb.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: allow mapping accounted kernel pages to userspace | expand |
On Tue, Sep 22, 2020 at 1:37 PM Roman Gushchin <guro@fb.com> wrote: > > PageKmemcg flag is currently defined as a page type (like buddy, > offline, table and guard). Semantically it means that the page > was accounted as a kernel memory by the page allocator and has > to be uncharged on the release. > > As a side effect of defining the flag as a page type, the accounted > page can't be mapped to userspace (look at page_has_type() and > comments above). In particular, this blocks the accounting of > vmalloc-backed memory used by some bpf maps, because these maps > do map the memory to userspace. > > One option is to fix it by complicating the access to page->mapcount, > which provides some free bits for page->page_type. > > But it's way better to move this flag into page->memcg_data flags. > Indeed, the flag makes no sense without enabled memory cgroups > and memory cgroup pointer set in particular. > > This commit replaces PageKmemcg() and __SetPageKmemcg() with > PageMemcgKmem() and SetPageMemcgKmem(). __ClearPageKmemcg() > can be simple deleted because clear_page_mem_cgroup() already > does the job. > > As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will > be compiled out. > > Signed-off-by: Roman Gushchin <guro@fb.com> Reviewed-by: Shakeel Butt <shakeelb@google.com>
On Tue, Sep 22, 2020 at 01:37:00PM -0700, Roman Gushchin wrote: > PageKmemcg flag is currently defined as a page type (like buddy, > offline, table and guard). Semantically it means that the page > was accounted as a kernel memory by the page allocator and has > to be uncharged on the release. > > As a side effect of defining the flag as a page type, the accounted > page can't be mapped to userspace (look at page_has_type() and > comments above). In particular, this blocks the accounting of > vmalloc-backed memory used by some bpf maps, because these maps > do map the memory to userspace. > > One option is to fix it by complicating the access to page->mapcount, > which provides some free bits for page->page_type. > > But it's way better to move this flag into page->memcg_data flags. > Indeed, the flag makes no sense without enabled memory cgroups > and memory cgroup pointer set in particular. > > This commit replaces PageKmemcg() and __SetPageKmemcg() with > PageMemcgKmem() and SetPageMemcgKmem(). __ClearPageKmemcg() > can be simple deleted because clear_page_mem_cgroup() already > does the job. > > As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will > be compiled out. > > Signed-off-by: Roman Gushchin <guro@fb.com> That sounds good to me! > --- > include/linux/memcontrol.h | 58 ++++++++++++++++++++++++++++++++++++-- > include/linux/page-flags.h | 11 ++------ > mm/memcontrol.c | 14 +++------ > mm/page_alloc.c | 2 +- > 4 files changed, 62 insertions(+), 23 deletions(-) > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > index 9a49f1e1c0c7..390db58500d5 100644 > --- a/include/linux/memcontrol.h > +++ b/include/linux/memcontrol.h > @@ -346,8 +346,14 @@ extern struct mem_cgroup *root_mem_cgroup; > enum page_memcg_flags { > /* page->memcg_data is a pointer to an objcgs vector */ > PG_MEMCG_OBJ_CGROUPS, > + /* page has been accounted as a non-slab kernel page */ > + PG_MEMCG_KMEM, > + /* the next bit after the last actual flag */ > + PG_MEMCG_LAST_FLAG, *_NR_FLAGS would be customary. > }; > > +#define MEMCG_FLAGS_MASK ((1UL << PG_MEMCG_LAST_FLAG) - 1) Probably best to stick to the same prefix as the enum items. > + * PageMemcgKmem - check if the page has MemcgKmem flag set > + * @page: a pointer to the page struct > + * > + * Checks if the page has MemcgKmem flag set. The caller must ensure that > + * the page has an associated memory cgroup. It's not safe to call this function > + * against some types of pages, e.g. slab pages. > + */ > +static inline bool PageMemcgKmem(struct page *page) > +{ > + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); > + return test_bit(PG_MEMCG_KMEM, &page->memcg_data); > +} > + > +/* > + * SetPageMemcgKmem - set the page's MemcgKmem flag > + * @page: a pointer to the page struct > + * > + * Set the page's MemcgKmem flag. The caller must ensure that the page has > + * an associated memory cgroup. It's not safe to call this function > + * against some types of pages, e.g. slab pages. > + */ > +static inline void SetPageMemcgKmem(struct page *page) > +{ > + VM_BUG_ON_PAGE(!page->memcg_data, page); > + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); > + __set_bit(PG_MEMCG_KMEM, &page->memcg_data); It may be good to keep the __ prefix from __SetPageMemcg as long as this uses __set_bit, in case we later add atomic bit futzing.
On Thu, Sep 24, 2020 at 04:14:17PM -0400, Johannes Weiner wrote: > On Tue, Sep 22, 2020 at 01:37:00PM -0700, Roman Gushchin wrote: > > PageKmemcg flag is currently defined as a page type (like buddy, > > offline, table and guard). Semantically it means that the page > > was accounted as a kernel memory by the page allocator and has > > to be uncharged on the release. > > > > As a side effect of defining the flag as a page type, the accounted > > page can't be mapped to userspace (look at page_has_type() and > > comments above). In particular, this blocks the accounting of > > vmalloc-backed memory used by some bpf maps, because these maps > > do map the memory to userspace. > > > > One option is to fix it by complicating the access to page->mapcount, > > which provides some free bits for page->page_type. > > > > But it's way better to move this flag into page->memcg_data flags. > > Indeed, the flag makes no sense without enabled memory cgroups > > and memory cgroup pointer set in particular. > > > > This commit replaces PageKmemcg() and __SetPageKmemcg() with > > PageMemcgKmem() and SetPageMemcgKmem(). __ClearPageKmemcg() > > can be simple deleted because clear_page_mem_cgroup() already > > does the job. > > > > As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will > > be compiled out. > > > > Signed-off-by: Roman Gushchin <guro@fb.com> > > That sounds good to me! Great! > > > --- > > include/linux/memcontrol.h | 58 ++++++++++++++++++++++++++++++++++++-- > > include/linux/page-flags.h | 11 ++------ > > mm/memcontrol.c | 14 +++------ > > mm/page_alloc.c | 2 +- > > 4 files changed, 62 insertions(+), 23 deletions(-) > > > > diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h > > index 9a49f1e1c0c7..390db58500d5 100644 > > --- a/include/linux/memcontrol.h > > +++ b/include/linux/memcontrol.h > > @@ -346,8 +346,14 @@ extern struct mem_cgroup *root_mem_cgroup; > > enum page_memcg_flags { > > /* page->memcg_data is a pointer to an objcgs vector */ > > PG_MEMCG_OBJ_CGROUPS, > > + /* page has been accounted as a non-slab kernel page */ > > + PG_MEMCG_KMEM, > > + /* the next bit after the last actual flag */ > > + PG_MEMCG_LAST_FLAG, > > *_NR_FLAGS would be customary. Ok, __NR_PAGE_MEMCG_FLAGS ? Similar to __NR_PAGE_FLAGS. > > > }; > > > > +#define MEMCG_FLAGS_MASK ((1UL << PG_MEMCG_LAST_FLAG) - 1) > > Probably best to stick to the same prefix as the enum items. You mean PG_MEMCG_FLAGS_MASK? > > > + * PageMemcgKmem - check if the page has MemcgKmem flag set > > + * @page: a pointer to the page struct > > + * > > + * Checks if the page has MemcgKmem flag set. The caller must ensure that > > + * the page has an associated memory cgroup. It's not safe to call this function > > + * against some types of pages, e.g. slab pages. > > + */ > > +static inline bool PageMemcgKmem(struct page *page) > > +{ > > + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); > > + return test_bit(PG_MEMCG_KMEM, &page->memcg_data); > > +} > > + > > +/* > > + * SetPageMemcgKmem - set the page's MemcgKmem flag > > + * @page: a pointer to the page struct > > + * > > + * Set the page's MemcgKmem flag. The caller must ensure that the page has > > + * an associated memory cgroup. It's not safe to call this function > > + * against some types of pages, e.g. slab pages. > > + */ > > +static inline void SetPageMemcgKmem(struct page *page) > > +{ > > + VM_BUG_ON_PAGE(!page->memcg_data, page); > > + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); > > + __set_bit(PG_MEMCG_KMEM, &page->memcg_data); > > It may be good to keep the __ prefix from __SetPageMemcg as long as > this uses __set_bit, in case we later add atomic bit futzing. Yeah, I agree. I though about it. Maybe not so useful now, but more future-proof. Thanks!
diff --git a/include/linux/memcontrol.h b/include/linux/memcontrol.h index 9a49f1e1c0c7..390db58500d5 100644 --- a/include/linux/memcontrol.h +++ b/include/linux/memcontrol.h @@ -346,8 +346,14 @@ extern struct mem_cgroup *root_mem_cgroup; enum page_memcg_flags { /* page->memcg_data is a pointer to an objcgs vector */ PG_MEMCG_OBJ_CGROUPS, + /* page has been accounted as a non-slab kernel page */ + PG_MEMCG_KMEM, + /* the next bit after the last actual flag */ + PG_MEMCG_LAST_FLAG, }; +#define MEMCG_FLAGS_MASK ((1UL << PG_MEMCG_LAST_FLAG) - 1) + /* * page_mem_cgroup - get the memory cgroup associated with a page * @page: a pointer to the page struct @@ -359,8 +365,12 @@ enum page_memcg_flags { */ static inline struct mem_cgroup *page_mem_cgroup(struct page *page) { + unsigned long memcg_data = page->memcg_data; + VM_BUG_ON_PAGE(PageSlab(page), page); - return (struct mem_cgroup *)page->memcg_data; + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data), page); + + return (struct mem_cgroup *)(memcg_data & ~MEMCG_FLAGS_MASK); } /* @@ -379,7 +389,7 @@ static inline struct mem_cgroup *page_mem_cgroup_check(struct page *page) if (test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) return NULL; - return (struct mem_cgroup *)memcg_data; + return (struct mem_cgroup *)(memcg_data & ~MEMCG_FLAGS_MASK); } /* @@ -408,6 +418,36 @@ static inline void clear_page_mem_cgroup(struct page *page) page->memcg_data = 0; } +/* + * PageMemcgKmem - check if the page has MemcgKmem flag set + * @page: a pointer to the page struct + * + * Checks if the page has MemcgKmem flag set. The caller must ensure that + * the page has an associated memory cgroup. It's not safe to call this function + * against some types of pages, e.g. slab pages. + */ +static inline bool PageMemcgKmem(struct page *page) +{ + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); + return test_bit(PG_MEMCG_KMEM, &page->memcg_data); +} + +/* + * SetPageMemcgKmem - set the page's MemcgKmem flag + * @page: a pointer to the page struct + * + * Set the page's MemcgKmem flag. The caller must ensure that the page has + * an associated memory cgroup. It's not safe to call this function + * against some types of pages, e.g. slab pages. + */ +static inline void SetPageMemcgKmem(struct page *page) +{ + VM_BUG_ON_PAGE(!page->memcg_data, page); + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_OBJ_CGROUPS, &page->memcg_data), page); + __set_bit(PG_MEMCG_KMEM, &page->memcg_data); +} + + #ifdef CONFIG_MEMCG_KMEM /* * page_obj_cgroups - get the object cgroups vector associated with a page @@ -426,6 +466,7 @@ static inline struct obj_cgroup **page_obj_cgroups(struct page *page) VM_BUG_ON_PAGE(memcg_data && !test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data), page); __clear_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data); + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_KMEM, &memcg_data), page); return (struct obj_cgroup **)memcg_data; } @@ -442,8 +483,10 @@ static inline struct obj_cgroup **page_obj_cgroups_check(struct page *page) { unsigned long memcg_data = page->memcg_data; - if (memcg_data && test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) + if (memcg_data && test_bit(PG_MEMCG_OBJ_CGROUPS, &memcg_data)) { + VM_BUG_ON_PAGE(test_bit(PG_MEMCG_KMEM, &memcg_data), page); return (struct obj_cgroup **)memcg_data; + } return NULL; } @@ -1115,6 +1158,15 @@ static inline void clear_page_mem_cgroup(struct page *page) { } +static inline bool PageMemcgKmem(struct page *page) +{ + return false; +} + +static inline void SetPageMemcgKmem(struct page *page) +{ +} + static inline bool mem_cgroup_is_root(struct mem_cgroup *memcg) { return true; diff --git a/include/linux/page-flags.h b/include/linux/page-flags.h index fbbb841a9346..a7ca01ae78d9 100644 --- a/include/linux/page-flags.h +++ b/include/linux/page-flags.h @@ -712,9 +712,8 @@ PAGEFLAG_FALSE(DoubleMap) #define PAGE_MAPCOUNT_RESERVE -128 #define PG_buddy 0x00000080 #define PG_offline 0x00000100 -#define PG_kmemcg 0x00000200 -#define PG_table 0x00000400 -#define PG_guard 0x00000800 +#define PG_table 0x00000200 +#define PG_guard 0x00000400 #define PageType(page, flag) \ ((page->page_type & (PAGE_TYPE_BASE | flag)) == PAGE_TYPE_BASE) @@ -765,12 +764,6 @@ PAGE_TYPE_OPS(Buddy, buddy) */ PAGE_TYPE_OPS(Offline, offline) -/* - * If kmemcg is enabled, the buddy allocator will set PageKmemcg() on - * pages allocated with __GFP_ACCOUNT. It gets cleared on page free. - */ -PAGE_TYPE_OPS(Kmemcg, kmemcg) - /* * Marks pages in use as page tables. */ diff --git a/mm/memcontrol.c b/mm/memcontrol.c index 69e3dbb3d2cf..1d22fa4c4a88 100644 --- a/mm/memcontrol.c +++ b/mm/memcontrol.c @@ -3081,7 +3081,7 @@ int __memcg_kmem_charge_page(struct page *page, gfp_t gfp, int order) ret = __memcg_kmem_charge(memcg, gfp, 1 << order); if (!ret) { set_page_mem_cgroup(page, memcg); - __SetPageKmemcg(page); + SetPageMemcgKmem(page); return 0; } css_put(&memcg->css); @@ -3106,10 +3106,6 @@ void __memcg_kmem_uncharge_page(struct page *page, int order) __memcg_kmem_uncharge(memcg, nr_pages); clear_page_mem_cgroup(page); css_put(&memcg->css); - - /* slab pages do not have PageKmemcg flag set */ - if (PageKmemcg(page)) - __ClearPageKmemcg(page); } static bool consume_obj_stock(struct obj_cgroup *objcg, unsigned int nr_bytes) @@ -6890,12 +6886,10 @@ static void uncharge_page(struct page *page, struct uncharge_gather *ug) nr_pages = compound_nr(page); ug->nr_pages += nr_pages; - if (!PageKmemcg(page)) { - ug->pgpgout++; - } else { + if (PageMemcgKmem(page)) ug->nr_kmem += nr_pages; - __ClearPageKmemcg(page); - } + else + ug->pgpgout++; ug->dummy_page = page; clear_page_mem_cgroup(page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index d4d181e15e7c..6807e37d78ba 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1197,7 +1197,7 @@ static __always_inline bool free_pages_prepare(struct page *page, } if (PageMappingFlags(page)) page->mapping = NULL; - if (memcg_kmem_enabled() && PageKmemcg(page)) + if (memcg_kmem_enabled() && PageMemcgKmem(page)) __memcg_kmem_uncharge_page(page, order); if (check_free) bad += check_free_page(page);
PageKmemcg flag is currently defined as a page type (like buddy, offline, table and guard). Semantically it means that the page was accounted as a kernel memory by the page allocator and has to be uncharged on the release. As a side effect of defining the flag as a page type, the accounted page can't be mapped to userspace (look at page_has_type() and comments above). In particular, this blocks the accounting of vmalloc-backed memory used by some bpf maps, because these maps do map the memory to userspace. One option is to fix it by complicating the access to page->mapcount, which provides some free bits for page->page_type. But it's way better to move this flag into page->memcg_data flags. Indeed, the flag makes no sense without enabled memory cgroups and memory cgroup pointer set in particular. This commit replaces PageKmemcg() and __SetPageKmemcg() with PageMemcgKmem() and SetPageMemcgKmem(). __ClearPageKmemcg() can be simple deleted because clear_page_mem_cgroup() already does the job. As a bonus, on !CONFIG_MEMCG build the PageMemcgKmem() check will be compiled out. Signed-off-by: Roman Gushchin <guro@fb.com> --- include/linux/memcontrol.h | 58 ++++++++++++++++++++++++++++++++++++-- include/linux/page-flags.h | 11 ++------ mm/memcontrol.c | 14 +++------ mm/page_alloc.c | 2 +- 4 files changed, 62 insertions(+), 23 deletions(-)