mbox series

[bpf-next,0/5] bpf, mm: introduce cgroup.memory=nobpf

Message ID 20230205065805.19598-1-laoar.shao@gmail.com (mailing list archive)
Headers show
Series bpf, mm: introduce cgroup.memory=nobpf | expand

Message

Yafang Shao Feb. 5, 2023, 6:58 a.m. UTC
The bpf memory accouting has some known problems in contianer
environment,

- The container memory usage is not consistent if there's pinned bpf
  program
  After the container restart, the leftover bpf programs won't account
  to the new generation, so the memory usage of the container is not
  consistent. This issue can be resolved by introducing selectable
  memcg, but we don't have an agreement on the solution yet. See also
  the discussions at https://lwn.net/Articles/905150/ .

- The leftover non-preallocated bpf map can't be limited
  The leftover bpf map will be reparented, and thus it will be limited by 
  the parent, rather than the container itself. Furthermore, if the
  parent is destroyed, it be will limited by its parent's parent, and so
  on. It can also be resolved by introducing selectable memcg.

- The memory dynamically allocated in bpf prog is charged into root memcg
  only
  Nowdays the bpf prog can dynamically allocate memory, for example via
  bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
  pool, so it will charge into root memcg only. That needs to be
  addressed by a new proposal.

So let's give the user an option to disable bpf memory accouting.

The idea of "cgroup.memory=nobpf" is originally by Tejun[1].

[1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/

Yafang Shao (5):
  mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
  bpf: use bpf_map_kvcalloc in bpf_local_storage
  bpf: introduce bpf_memcg_flags()
  bpf: allow to disable bpf map memory accounting
  bpf: allow to disable bpf prog memory accounting

 Documentation/admin-guide/kernel-parameters.txt |  1 +
 include/linux/bpf.h                             | 16 ++++++++++++++++
 include/linux/memcontrol.h                      | 11 +++++++++++
 kernel/bpf/bpf_local_storage.c                  |  4 ++--
 kernel/bpf/core.c                               | 13 +++++++------
 kernel/bpf/memalloc.c                           |  3 ++-
 kernel/bpf/syscall.c                            | 20 ++++++++++++++++++--
 mm/memcontrol.c                                 | 18 ++++++++++++++++++
 8 files changed, 75 insertions(+), 11 deletions(-)

Comments

Johannes Weiner Feb. 8, 2023, 7:29 p.m. UTC | #1
On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
> 
> - The container memory usage is not consistent if there's pinned bpf
>   program
>   After the container restart, the leftover bpf programs won't account
>   to the new generation, so the memory usage of the container is not
>   consistent. This issue can be resolved by introducing selectable
>   memcg, but we don't have an agreement on the solution yet. See also
>   the discussions at https://lwn.net/Articles/905150/ .
> 
> - The leftover non-preallocated bpf map can't be limited
>   The leftover bpf map will be reparented, and thus it will be limited by 
>   the parent, rather than the container itself. Furthermore, if the
>   parent is destroyed, it be will limited by its parent's parent, and so
>   on. It can also be resolved by introducing selectable memcg.
> 
> - The memory dynamically allocated in bpf prog is charged into root memcg
>   only
>   Nowdays the bpf prog can dynamically allocate memory, for example via
>   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
>   pool, so it will charge into root memcg only. That needs to be
>   addressed by a new proposal.
> 
> So let's give the user an option to disable bpf memory accouting.
> 
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].

I'm not the most familiar with bpf internals, but the memcg bits and
adding the boot time flag look good to me:

Acked-by: Johannes Weiner <hannes@cmpxchg.org>
Roman Gushchin Feb. 8, 2023, 8:54 p.m. UTC | #2
On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> The bpf memory accouting has some known problems in contianer
> environment,
> 
> - The container memory usage is not consistent if there's pinned bpf
>   program
>   After the container restart, the leftover bpf programs won't account
>   to the new generation, so the memory usage of the container is not
>   consistent. This issue can be resolved by introducing selectable
>   memcg, but we don't have an agreement on the solution yet. See also
>   the discussions at https://lwn.net/Articles/905150/ .
> 
> - The leftover non-preallocated bpf map can't be limited
>   The leftover bpf map will be reparented, and thus it will be limited by 
>   the parent, rather than the container itself. Furthermore, if the
>   parent is destroyed, it be will limited by its parent's parent, and so
>   on. It can also be resolved by introducing selectable memcg.
> 
> - The memory dynamically allocated in bpf prog is charged into root memcg
>   only
>   Nowdays the bpf prog can dynamically allocate memory, for example via
>   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
>   pool, so it will charge into root memcg only. That needs to be
>   addressed by a new proposal.
> 
> So let's give the user an option to disable bpf memory accouting.
> 
> The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
> 
> [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
> 
> Yafang Shao (5):
>   mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
>   bpf: use bpf_map_kvcalloc in bpf_local_storage
>   bpf: introduce bpf_memcg_flags()
>   bpf: allow to disable bpf map memory accounting
>   bpf: allow to disable bpf prog memory accounting

Hello Yafang!

Overall the patch looks good to me, please, feel free to add
Acked-by: Roman Gushchin <roman.gushchin@linux.dev>

I'd squash patch (3) into (4), but up to you.

Thanks!
Yafang Shao Feb. 9, 2023, 11:28 a.m. UTC | #3
On Thu, Feb 9, 2023 at 4:54 AM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Sun, Feb 05, 2023 at 06:58:00AM +0000, Yafang Shao wrote:
> > The bpf memory accouting has some known problems in contianer
> > environment,
> >
> > - The container memory usage is not consistent if there's pinned bpf
> >   program
> >   After the container restart, the leftover bpf programs won't account
> >   to the new generation, so the memory usage of the container is not
> >   consistent. This issue can be resolved by introducing selectable
> >   memcg, but we don't have an agreement on the solution yet. See also
> >   the discussions at https://lwn.net/Articles/905150/ .
> >
> > - The leftover non-preallocated bpf map can't be limited
> >   The leftover bpf map will be reparented, and thus it will be limited by
> >   the parent, rather than the container itself. Furthermore, if the
> >   parent is destroyed, it be will limited by its parent's parent, and so
> >   on. It can also be resolved by introducing selectable memcg.
> >
> > - The memory dynamically allocated in bpf prog is charged into root memcg
> >   only
> >   Nowdays the bpf prog can dynamically allocate memory, for example via
> >   bpf_obj_new(), but it only allocate from the global bpf_mem_alloc
> >   pool, so it will charge into root memcg only. That needs to be
> >   addressed by a new proposal.
> >
> > So let's give the user an option to disable bpf memory accouting.
> >
> > The idea of "cgroup.memory=nobpf" is originally by Tejun[1].
> >
> > [1]. https://lwn.net/ml/linux-mm/YxjOawzlgE458ezL@slm.duckdns.org/
> >
> > Yafang Shao (5):
> >   mm: memcontrol: add new kernel parameter cgroup.memory=nobpf
> >   bpf: use bpf_map_kvcalloc in bpf_local_storage
> >   bpf: introduce bpf_memcg_flags()
> >   bpf: allow to disable bpf map memory accounting
> >   bpf: allow to disable bpf prog memory accounting
>
> Hello Yafang!
>
> Overall the patch looks good to me, please, feel free to add
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev>
>
> I'd squash patch (3) into (4), but up to you.
>

Sure. Will do it.