mbox series

[rfc,0/5] mm: allow mapping accounted kernel pages to userspace

Message ID 20200910202659.1378404-1-guro@fb.com (mailing list archive)
Headers show
Series mm: allow mapping accounted kernel pages to userspace | expand

Message

Roman Gushchin Sept. 10, 2020, 8:26 p.m. UTC
Currently a non-slab kernel page which has been charged to a memory
cgroup can't be mapped to userspace. The underlying reason is simple:
PageKmemcg flag is defined as a page type (like buddy, offline, etc),
so it takes a bit from a page->mapped counter. Pages with a type set
can't be mapped to userspace.

But in general the kmemcg flag has nothing to do with mapping to
userspace. It only means that the page has been accounted by the page
allocator, so it has to be properly uncharged on release.

Some bpf maps are mapping the vmalloc-based memory to userspace, and
their memory can't be accounted because of this implementation detail.

This patchset removes this limitation by moving the PageKmemcg flag
into one of the free bits of the page->mem_cgroup pointer. Also it
formalizes all accesses to the page->mem_cgroup and page->obj_cgroups
using new helpers, adds several checks and removes a couple of obsolete
functions. As the result the code became more robust with fewer
open-coded bits tricks.

The first patch in the series is a bugfix, which I already sent separately.
Including it in rfc to make the whole series compile.


Roman Gushchin (5):
  mm: memcg/slab: fix racy access to page->mem_cgroup in
    mem_cgroup_from_obj()
  mm: memcontrol: use helpers to access page's memcg data
  mm: memcontrol/slab: use helpers to access slab page's memcg_data
  mm: introduce page memcg flags
  mm: convert page kmemcg type to a page memcg flag

 include/linux/memcontrol.h       | 161 +++++++++++++++++++++++++++++--
 include/linux/mm.h               |  22 -----
 include/linux/mm_types.h         |   5 +-
 include/linux/page-flags.h       |  11 +--
 include/trace/events/writeback.h |   2 +-
 mm/debug.c                       |   4 +-
 mm/huge_memory.c                 |   4 +-
 mm/memcontrol.c                  | 116 ++++++++++------------
 mm/migrate.c                     |   2 +-
 mm/page_alloc.c                  |   6 +-
 mm/page_io.c                     |   4 +-
 mm/slab.h                        |  28 +-----
 mm/workingset.c                  |   4 +-
 13 files changed, 221 insertions(+), 148 deletions(-)

Comments

Shakeel Butt Sept. 11, 2020, 5:34 p.m. UTC | #1
On Thu, Sep 10, 2020 at 1:27 PM Roman Gushchin <guro@fb.com> wrote:
>
> Currently a non-slab kernel page which has been charged to a memory
> cgroup can't be mapped to userspace. The underlying reason is simple:
> PageKmemcg flag is defined as a page type (like buddy, offline, etc),
> so it takes a bit from a page->mapped counter. Pages with a type set
> can't be mapped to userspace.
>
> But in general the kmemcg flag has nothing to do with mapping to
> userspace. It only means that the page has been accounted by the page
> allocator, so it has to be properly uncharged on release.
>
> Some bpf maps are mapping the vmalloc-based memory to userspace, and
> their memory can't be accounted because of this implementation detail.
>
> This patchset removes this limitation by moving the PageKmemcg flag
> into one of the free bits of the page->mem_cgroup pointer. Also it
> formalizes all accesses to the page->mem_cgroup and page->obj_cgroups
> using new helpers, adds several checks and removes a couple of obsolete
> functions. As the result the code became more robust with fewer
> open-coded bits tricks.
>
> The first patch in the series is a bugfix, which I already sent separately.
> Including it in rfc to make the whole series compile.
>
>

This would be a really beneficial feature. I tried to fix the similar
issue for kvm_vcpu_mmap [1] but using the actual page flag bit but
your solution would be non controversial.

I think this might also help the accounting of TCP zerocopy receive
mmapped memory. The memory is charged in skbs but once it is mmapped,
the skbs get uncharged and we can have a very large amount of
uncharged memory.

I will take a look at the series.
Shakeel Butt Sept. 11, 2020, 5:34 p.m. UTC | #2
On Fri, Sep 11, 2020 at 10:34 AM Shakeel Butt <shakeelb@google.com> wrote:
>
> On Thu, Sep 10, 2020 at 1:27 PM Roman Gushchin <guro@fb.com> wrote:
> >
> > Currently a non-slab kernel page which has been charged to a memory
> > cgroup can't be mapped to userspace. The underlying reason is simple:
> > PageKmemcg flag is defined as a page type (like buddy, offline, etc),
> > so it takes a bit from a page->mapped counter. Pages with a type set
> > can't be mapped to userspace.
> >
> > But in general the kmemcg flag has nothing to do with mapping to
> > userspace. It only means that the page has been accounted by the page
> > allocator, so it has to be properly uncharged on release.
> >
> > Some bpf maps are mapping the vmalloc-based memory to userspace, and
> > their memory can't be accounted because of this implementation detail.
> >
> > This patchset removes this limitation by moving the PageKmemcg flag
> > into one of the free bits of the page->mem_cgroup pointer. Also it
> > formalizes all accesses to the page->mem_cgroup and page->obj_cgroups
> > using new helpers, adds several checks and removes a couple of obsolete
> > functions. As the result the code became more robust with fewer
> > open-coded bits tricks.
> >
> > The first patch in the series is a bugfix, which I already sent separately.
> > Including it in rfc to make the whole series compile.
> >
> >
>
> This would be a really beneficial feature. I tried to fix the similar
> issue for kvm_vcpu_mmap [1] but using the actual page flag bit but
> your solution would be non controversial.
>
> I think this might also help the accounting of TCP zerocopy receive
> mmapped memory. The memory is charged in skbs but once it is mmapped,
> the skbs get uncharged and we can have a very large amount of
> uncharged memory.
>
> I will take a look at the series.

[1] https://lore.kernel.org/kvm/20190329012836.47013-1-shakeelb@google.com/
Roman Gushchin Sept. 11, 2020, 9:36 p.m. UTC | #3
On Fri, Sep 11, 2020 at 10:34:57AM -0700, Shakeel Butt wrote:
> On Fri, Sep 11, 2020 at 10:34 AM Shakeel Butt <shakeelb@google.com> wrote:
> >
> > On Thu, Sep 10, 2020 at 1:27 PM Roman Gushchin <guro@fb.com> wrote:
> > >
> > > Currently a non-slab kernel page which has been charged to a memory
> > > cgroup can't be mapped to userspace. The underlying reason is simple:
> > > PageKmemcg flag is defined as a page type (like buddy, offline, etc),
> > > so it takes a bit from a page->mapped counter. Pages with a type set
> > > can't be mapped to userspace.
> > >
> > > But in general the kmemcg flag has nothing to do with mapping to
> > > userspace. It only means that the page has been accounted by the page
> > > allocator, so it has to be properly uncharged on release.
> > >
> > > Some bpf maps are mapping the vmalloc-based memory to userspace, and
> > > their memory can't be accounted because of this implementation detail.
> > >
> > > This patchset removes this limitation by moving the PageKmemcg flag
> > > into one of the free bits of the page->mem_cgroup pointer. Also it
> > > formalizes all accesses to the page->mem_cgroup and page->obj_cgroups
> > > using new helpers, adds several checks and removes a couple of obsolete
> > > functions. As the result the code became more robust with fewer
> > > open-coded bits tricks.
> > >
> > > The first patch in the series is a bugfix, which I already sent separately.
> > > Including it in rfc to make the whole series compile.
> > >
> > >
> >
> > This would be a really beneficial feature. I tried to fix the similar
> > issue for kvm_vcpu_mmap [1] but using the actual page flag bit but
> > your solution would be non controversial.
> >
> > I think this might also help the accounting of TCP zerocopy receive
> > mmapped memory. The memory is charged in skbs but once it is mmapped,
> > the skbs get uncharged and we can have a very large amount of
> > uncharged memory.
> >
> > I will take a look at the series.
> 
> [1] https://lore.kernel.org/kvm/20190329012836.47013-1-shakeelb@google.com/

Cool, thank you for the link!

It's very nice that this feature is useful behind the bpf case.

Thanks!