[RFC,bpf-next,0/8] bpf, cgroup: Add bpf support for cgroup controller

Message ID	20230922112846.4265-1-laoar.shao@gmail.com (mailing list archive)
Headers	show Received: from lindbergh.monkeyblade.net (lindbergh.monkeyblade.net [23.128.96.19]) (using TLSv1.2 with cipher ECDHE-RSA-AES256-GCM-SHA384 (256/256 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id C10001F18A for <bpf@vger.kernel.org>; Fri, 22 Sep 2023 11:29:04 +0000 (UTC) From: Yafang Shao <laoar.shao@gmail.com> To: ast@kernel.org, daniel@iogearbox.net, john.fastabend@gmail.com, andrii@kernel.org, martin.lau@linux.dev, song@kernel.org, yonghong.song@linux.dev, kpsingh@kernel.org, sdf@google.com, haoluo@google.com, jolsa@kernel.org, tj@kernel.org, lizefan.x@bytedance.com, hannes@cmpxchg.org, yosryahmed@google.com, mkoutny@suse.com Cc: cgroups@vger.kernel.org, bpf@vger.kernel.org, Yafang Shao <laoar.shao@gmail.com> Subject: [RFC PATCH bpf-next 0/8] bpf, cgroup: Add bpf support for cgroup controller Date: Fri, 22 Sep 2023 11:28:38 +0000 Message-Id: <20230922112846.4265-1-laoar.shao@gmail.com> Precedence: bulk MIME-Version: 1.0 Content-Transfer-Encoding: 8bit
Series	bpf, cgroup: Add bpf support for cgroup controller \| expand [RFC,bpf-next,0/8] bpf, cgroup: Add bpf support for cgroup controller [RFC,bpf-next,1/8] bpf: Fix missed rcu read lock in bpf_task_under_cgroup() [RFC,bpf-next,2/8] cgroup: Enable task_under_cgroup_hierarchy() on cgroup1 [RFC,bpf-next,3/8] cgroup: Add cgroup_get_from_id_within_subsys() [RFC,bpf-next,4/8] bpf: Add new kfuncs support for cgroup controller [RFC,bpf-next,5/8] selftests/bpf: Fix issues in setup_classid_environment() [RFC,bpf-next,6/8] selftests/bpf: Add parallel support for classid [RFC,bpf-next,7/8] selftests/bpf: Add new cgroup helper get_classid_cgroup_id() [RFC,bpf-next,8/8] selftests/bpf: Add selftests for cgroup controller

Yafang Shao Sept. 22, 2023, 11:28 a.m. UTC

Currently, BPF is primarily confined to cgroup2, with the exception of
cgroup_iter, which supports cgroup1 fds. Unfortunately, this limitation
prevents us from harnessing the full potential of BPF within cgroup1
environments.

In our endeavor to seamlessly integrate BPF within our Kubernetes
environment, which relies on cgroup1, we have been exploring the
possibility of transitioning to cgroup2. While this transition is
forward-looking, it poses challenges due to the necessity for numerous
applications to adapt.

While we acknowledge that cgroup2 represents the future, we also recognize
that such transitions demand time and effort. As a result, we are
considering an alternative approach. Instead of migrating to cgroup2, we
are contemplating modifications to the BPF kernel code to ensure
compatibility with cgroup1. These adjustments appear to be relatively
minor, making this option more feasible.

Given the widespread use of cgroup1 in container environments, this change
would be beneficial to many users.

Based on our investigation, the optimal way to enable BPF on cgroup1 is to
utilize the cgroup controller. The cgroup becomes active only when it has
one or more of its controllers enabled. In production environments, a task
is consistently managed by at least one cgroup controller. Consequently, we
can always establish a direct link between a task and a cgroup controller,
enabling us to perform actions based on this connection. As a consequence,
this patchset introduces the following new kfuncs: 

- bpf_cgroup_id_from_task_within_controller
  Retrieves the cgroup ID from a task within a specific cgroup controller.
- bpf_cgroup_acquire_from_id_within_controller
  Acquires the cgroup from a cgroup ID within a specific cgroup controller.
- bpf_cgroup_ancestor_id_from_task_within_controller
  Retrieves the ancestor cgroup ID from a task within a specific cgroup
  controller.

The advantage of these new BPF kfuncs is their ability to abstract away the
complexities of cgroup hierarchies, irrespective of whether they involve
cgroup1 or cgroup2.

In the future, we may expand controller-based support to other BPF
functionalities, such as bpf_cgrp_storage, the attachment and detachment
of cgroups, skb_under_cgroup, and more.

Changes:
- bpf, cgroup: Enable cgroup_array map on cgroup1
  https://lore.kernel.org/bpf/20230903142800.3870-1-laoar.shao@gmail.com/

Yafang Shao (8):
  bpf: Fix missed rcu read lock in bpf_task_under_cgroup()
  cgroup: Enable task_under_cgroup_hierarchy() on cgroup1
  cgroup: Add cgroup_get_from_id_within_subsys()
  bpf: Add new kfuncs support for cgroup controller
  selftests/bpf: Fix issues in setup_classid_environment()
  selftests/bpf: Add parallel support for classid
  selftests/bpf: Add new cgroup helper get_classid_cgroup_id()
  selftests/bpf: Add selftests for cgroup controller

 include/linux/cgroup-defs.h                   |  20 +++
 include/linux/cgroup.h                        |  31 +++-
 kernel/bpf/helpers.c                          |  77 ++++++++-
 kernel/cgroup/cgroup-internal.h               |  20 ---
 kernel/cgroup/cgroup.c                        |  32 +++-
 tools/testing/selftests/bpf/cgroup_helpers.c  |  65 ++++++--
 tools/testing/selftests/bpf/cgroup_helpers.h  |   3 +-
 .../bpf/prog_tests/cgroup_controller.c        | 149 ++++++++++++++++++
 .../selftests/bpf/prog_tests/cgroup_v1v2.c    |   2 +-
 .../bpf/progs/test_cgroup_controller.c        |  80 ++++++++++
 10 files changed, 430 insertions(+), 49 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_controller.c
 create mode 100644 tools/testing/selftests/bpf/progs/test_cgroup_controller.c

Tejun Heo Sept. 22, 2023, 4:52 p.m. UTC | #1

Hello,

On Fri, Sep 22, 2023 at 11:28:38AM +0000, Yafang Shao wrote:
> - bpf_cgroup_id_from_task_within_controller
>   Retrieves the cgroup ID from a task within a specific cgroup controller.
> - bpf_cgroup_acquire_from_id_within_controller
>   Acquires the cgroup from a cgroup ID within a specific cgroup controller.
> - bpf_cgroup_ancestor_id_from_task_within_controller
>   Retrieves the ancestor cgroup ID from a task within a specific cgroup
>   controller.
> 
> The advantage of these new BPF kfuncs is their ability to abstract away the
> complexities of cgroup hierarchies, irrespective of whether they involve
> cgroup1 or cgroup2.

I'm afraid this is more likely to bring the unnecessary complexities of
cgroup1 into cgroup2.

> In the future, we may expand controller-based support to other BPF
> functionalities, such as bpf_cgrp_storage, the attachment and detachment
> of cgroups, skb_under_cgroup, and more.

I'm okay with minor / trivial quality of life improvements to cgroup1 but
this goes much beyond that and is starting to complications to cgroup2
users, which I think is a pretty bad idea long term. I'm afraid I'm gonna
have to nack this approach.

Thanks.

Yafang Shao Sept. 24, 2023, 6:32 a.m. UTC | #2

On Sat, Sep 23, 2023 at 12:52 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Fri, Sep 22, 2023 at 11:28:38AM +0000, Yafang Shao wrote:
> > - bpf_cgroup_id_from_task_within_controller
> >   Retrieves the cgroup ID from a task within a specific cgroup controller.
> > - bpf_cgroup_acquire_from_id_within_controller
> >   Acquires the cgroup from a cgroup ID within a specific cgroup controller.
> > - bpf_cgroup_ancestor_id_from_task_within_controller
> >   Retrieves the ancestor cgroup ID from a task within a specific cgroup
> >   controller.
> >
> > The advantage of these new BPF kfuncs is their ability to abstract away the
> > complexities of cgroup hierarchies, irrespective of whether they involve
> > cgroup1 or cgroup2.
>
> I'm afraid this is more likely to bring the unnecessary complexities of
> cgroup1 into cgroup2.

I concur with the idea that we should avoid introducing the
complexities of cgroup1 into cgroup2. Which specific change do you
believe might introduce these complexities into cgroup2? Is it the
modification within task_under_cgroup_hierarchy() or
cgroup_get_from_id()?

In fact, we have the option to utilize
bpf_cgroup_ancestor_id_from_task_within_controller() as a substitute
for bpf_task_under_cgroup(), which allows us to sidestep the need for
changes within task_under_cgroup_hierarchy() altogether.

>
> > In the future, we may expand controller-based support to other BPF
> > functionalities, such as bpf_cgrp_storage, the attachment and detachment
> > of cgroups, skb_under_cgroup, and more.
>
> I'm okay with minor / trivial quality of life improvements to cgroup1 but
> this goes much beyond that and is starting to complications to cgroup2
> users, which I think is a pretty bad idea long term. I'm afraid I'm gonna
> have to nack this approach.
>
> Thanks.
>
> --
> tejun

Kui-Feng Lee Sept. 25, 2023, 6:22 p.m. UTC | #3

On 9/22/23 04:28, Yafang Shao wrote:
> Currently, BPF is primarily confined to cgroup2, with the exception of
> cgroup_iter, which supports cgroup1 fds. Unfortunately, this limitation
> prevents us from harnessing the full potential of BPF within cgroup1
> environments.
> 
> In our endeavor to seamlessly integrate BPF within our Kubernetes
> environment, which relies on cgroup1, we have been exploring the
> possibility of transitioning to cgroup2. While this transition is
> forward-looking, it poses challenges due to the necessity for numerous
> applications to adapt.
> 
> While we acknowledge that cgroup2 represents the future, we also recognize
> that such transitions demand time and effort. As a result, we are
> considering an alternative approach. Instead of migrating to cgroup2, we
> are contemplating modifications to the BPF kernel code to ensure
> compatibility with cgroup1. These adjustments appear to be relatively
> minor, making this option more feasible.

Do you mean giving up moving to cgroup2? Or, is it just a tentative
solution?

Tejun Heo Sept. 25, 2023, 6:43 p.m. UTC | #4

Hello,

On Sun, Sep 24, 2023 at 02:32:14PM +0800, Yafang Shao wrote:
> On Sat, Sep 23, 2023 at 12:52 AM Tejun Heo <tj@kernel.org> wrote:
> >
> > Hello,
> >
> > On Fri, Sep 22, 2023 at 11:28:38AM +0000, Yafang Shao wrote:
> > > - bpf_cgroup_id_from_task_within_controller
> > >   Retrieves the cgroup ID from a task within a specific cgroup controller.
> > > - bpf_cgroup_acquire_from_id_within_controller
> > >   Acquires the cgroup from a cgroup ID within a specific cgroup controller.
> > > - bpf_cgroup_ancestor_id_from_task_within_controller
> > >   Retrieves the ancestor cgroup ID from a task within a specific cgroup
> > >   controller.
> > >
> > > The advantage of these new BPF kfuncs is their ability to abstract away the
> > > complexities of cgroup hierarchies, irrespective of whether they involve
> > > cgroup1 or cgroup2.
> >
> > I'm afraid this is more likely to bring the unnecessary complexities of
> > cgroup1 into cgroup2.
> 
> I concur with the idea that we should avoid introducing the
> complexities of cgroup1 into cgroup2. Which specific change do you
> believe might introduce these complexities into cgroup2? Is it the
> modification within task_under_cgroup_hierarchy() or
> cgroup_get_from_id()?

The helpers you are adding only makes sense for cgroup1. e.g.
bpf_cgroup_ancestor_id_from_task_within_controller() makes no sense in
cgroup2. The ancestor ids don't change according to controllers. The only
thing you would ask in cgroup2 is the level at which a given controller is
enabled at along with the straight-forward "where am I in the hierarchy?"
questions. I really don't want to expose interfaces which assume that the
hierarchies change according to the controller in question.

Also, as pointed out before, this doesn't cover cgroup1 named hierarchies
which leaves out a good potion of cgroup1 use cases.

> In fact, we have the option to utilize
> bpf_cgroup_ancestor_id_from_task_within_controller() as a substitute
> for bpf_task_under_cgroup(), which allows us to sidestep the need for
> changes within task_under_cgroup_hierarchy() altogether.

I don't think this is the direction we should take. If you really want,
please tie the interface directly to the hierarchies. Don't hitch hierarchy
identificdation on the controllers. e.g. Introduce cgroup1 only interface
which takes both hierarchy ID and cgroup ID to operate on them.

Thanks.

Yafang Shao Sept. 26, 2023, 3:01 a.m. UTC | #5

On Tue, Sep 26, 2023 at 2:43 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Sun, Sep 24, 2023 at 02:32:14PM +0800, Yafang Shao wrote:
> > On Sat, Sep 23, 2023 at 12:52 AM Tejun Heo <tj@kernel.org> wrote:
> > >
> > > Hello,
> > >
> > > On Fri, Sep 22, 2023 at 11:28:38AM +0000, Yafang Shao wrote:
> > > > - bpf_cgroup_id_from_task_within_controller
> > > >   Retrieves the cgroup ID from a task within a specific cgroup controller.
> > > > - bpf_cgroup_acquire_from_id_within_controller
> > > >   Acquires the cgroup from a cgroup ID within a specific cgroup controller.
> > > > - bpf_cgroup_ancestor_id_from_task_within_controller
> > > >   Retrieves the ancestor cgroup ID from a task within a specific cgroup
> > > >   controller.
> > > >
> > > > The advantage of these new BPF kfuncs is their ability to abstract away the
> > > > complexities of cgroup hierarchies, irrespective of whether they involve
> > > > cgroup1 or cgroup2.
> > >
> > > I'm afraid this is more likely to bring the unnecessary complexities of
> > > cgroup1 into cgroup2.
> >
> > I concur with the idea that we should avoid introducing the
> > complexities of cgroup1 into cgroup2. Which specific change do you
> > believe might introduce these complexities into cgroup2? Is it the
> > modification within task_under_cgroup_hierarchy() or
> > cgroup_get_from_id()?
>
> The helpers you are adding only makes sense for cgroup1. e.g.
> bpf_cgroup_ancestor_id_from_task_within_controller() makes no sense in
> cgroup2. The ancestor ids don't change according to controllers. The only
> thing you would ask in cgroup2 is the level at which a given controller is
> enabled at along with the straight-forward "where am I in the hierarchy?"
> questions. I really don't want to expose interfaces which assume that the
> hierarchies change according to the controller in question.

Makes sense.

>
> Also, as pointed out before, this doesn't cover cgroup1 named hierarchies
> which leaves out a good potion of cgroup1 use cases.
>
> > In fact, we have the option to utilize
> > bpf_cgroup_ancestor_id_from_task_within_controller() as a substitute
> > for bpf_task_under_cgroup(), which allows us to sidestep the need for
> > changes within task_under_cgroup_hierarchy() altogether.
>
> I don't think this is the direction we should take. If you really want,
> please tie the interface directly to the hierarchies. Don't hitch hierarchy
> identificdation on the controllers. e.g. Introduce cgroup1 only interface
> which takes both hierarchy ID and cgroup ID to operate on them.

Thanks for your suggestion. I will think about it.
BTW, I can't find the hierarchy ID of systemd (/sys/fs/cgroup/systemd)
in /proc/cgroups. Is this intentional as part of the design, or might
it be possible that we overlooked it?
In the userspace, where can we find the hierarchy ID of a named hierarchy?

Yafang Shao Sept. 26, 2023, 3:08 a.m. UTC | #6

On Tue, Sep 26, 2023 at 2:22 AM Kui-Feng Lee <sinquersw@gmail.com> wrote:
>
>
>
> On 9/22/23 04:28, Yafang Shao wrote:
> > Currently, BPF is primarily confined to cgroup2, with the exception of
> > cgroup_iter, which supports cgroup1 fds. Unfortunately, this limitation
> > prevents us from harnessing the full potential of BPF within cgroup1
> > environments.
> >
> > In our endeavor to seamlessly integrate BPF within our Kubernetes
> > environment, which relies on cgroup1, we have been exploring the
> > possibility of transitioning to cgroup2. While this transition is
> > forward-looking, it poses challenges due to the necessity for numerous
> > applications to adapt.
> >
> > While we acknowledge that cgroup2 represents the future, we also recognize
> > that such transitions demand time and effort. As a result, we are
> > considering an alternative approach. Instead of migrating to cgroup2, we
> > are contemplating modifications to the BPF kernel code to ensure
> > compatibility with cgroup1. These adjustments appear to be relatively
> > minor, making this option more feasible.
>
> Do you mean giving up moving to cgroup2? Or, is it just a tentative
> solution?

Our transition to cgroup2 won't happen in the near future. It's a
long-term job.

Tejun Heo Sept. 26, 2023, 6:25 p.m. UTC | #7

Hello,

On Tue, Sep 26, 2023 at 11:01:08AM +0800, Yafang Shao wrote:
> Thanks for your suggestion. I will think about it.
> BTW, I can't find the hierarchy ID of systemd (/sys/fs/cgroup/systemd)
> in /proc/cgroups. Is this intentional as part of the design, or might
> it be possible that we overlooked it?
> In the userspace, where can we find the hierarchy ID of a named hierarchy?

Yeah, /proc/cgroups only prints the hierarchies which have controllers
attached to them. The file is pretty sad in general. However,
/proc/PID/cgroup prints all existing hierarchies along with their IDs and
identifiers (controllers or names). Hopefully, that should be enough?

Thanks.

Yafang Shao Sept. 27, 2023, 2:27 a.m. UTC | #8

On Wed, Sep 27, 2023 at 2:25 AM Tejun Heo <tj@kernel.org> wrote:
>
> Hello,
>
> On Tue, Sep 26, 2023 at 11:01:08AM +0800, Yafang Shao wrote:
> > Thanks for your suggestion. I will think about it.
> > BTW, I can't find the hierarchy ID of systemd (/sys/fs/cgroup/systemd)
> > in /proc/cgroups. Is this intentional as part of the design, or might
> > it be possible that we overlooked it?
> > In the userspace, where can we find the hierarchy ID of a named hierarchy?
>
> Yeah, /proc/cgroups only prints the hierarchies which have controllers
> attached to them. The file is pretty sad in general. However,
> /proc/PID/cgroup prints all existing hierarchies along with their IDs and
> identifiers (controllers or names). Hopefully, that should be enough?

Right, /proc/self/cgroup can show the hierarchies. Thanks for your explanation.

[RFC,bpf-next,0/8] bpf, cgroup: Add bpf support for cgroup controller

Message

Comments