mbox series

[RFC,bpf-next,0/4] bpf: Introduce cgroup_task iter

Message ID 20230716121046.17110-1-laoar.shao@gmail.com (mailing list archive)
Headers show
Series bpf: Introduce cgroup_task iter | expand

Message

Yafang Shao July 16, 2023, 12:10 p.m. UTC
This patch introduces cgroup_task iter, which allows for efficient
iteration of tasks within a specific cgroup. For example, we can effiently
get the nr_{running,blocked} of a container with this new feature.

The cgroup_task iteration serves as an alternative to task_iter in
container environments due to certain limitations associated with
task_iter.

- Firstly, task_iter only supports the 'current' pidns.
  However, since our data collector operates on the host, we may need to
  collect information from multiple containers simultaneously. Using
  task_iter would require us to fork the collector for each container,
  which is not ideal.

- Additionally, task_iter is unable to collect task information from
containers running in the host pidns.
  In our container environment, we have containers running in the host
  pidns, and we would like to collect task information from them as well.

- Lastly, task_iter does not support multiple-container pods.
  In a Kubernetes environment, a single pod may contain multiple
  containers, all sharing the same pidns. However, we are only interested
  in iterating tasks within the main container, which is not possible with
  task_iter.

To address the first issue, we could potentially extend task_iter to
support specifying a pidns other than the current one. However, for the
other two issues, extending task_iter would not provide a solution.
Therefore, we believe it is preferable to introduce the cgroup_task iter to
handle these scenarios effectively.

Patch #1: Preparation
Patch #2: Add cgroup_task iter
Patch #3: Add support for cgroup_task iter in bpftool
Patch #4: Selftests for cgroup_task iter

Yafang Shao (4):
  bpf: Add __bpf_iter_attach_cgroup()
  bpf: Add cgroup_task iter
  bpftool: Add support for cgroup_task
  selftests/bpf: Add selftest for cgroup_task iter

 include/linux/btf_ids.h                       |  14 ++
 kernel/bpf/cgroup_iter.c                      | 181 ++++++++++++++--
 tools/bpf/bpftool/link.c                      |   3 +-
 .../bpf/prog_tests/cgroup_task_iter.c         | 197 ++++++++++++++++++
 .../selftests/bpf/progs/cgroup_task_iter.c    |  39 ++++
 5 files changed, 419 insertions(+), 15 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_task_iter.c
 create mode 100644 tools/testing/selftests/bpf/progs/cgroup_task_iter.c

Comments

Yafang Shao July 27, 2023, 2:29 p.m. UTC | #1
On Sun, Jul 16, 2023 at 8:10 PM Yafang Shao <laoar.shao@gmail.com> wrote:
>
> This patch introduces cgroup_task iter, which allows for efficient
> iteration of tasks within a specific cgroup. For example, we can effiently
> get the nr_{running,blocked} of a container with this new feature.
>
> The cgroup_task iteration serves as an alternative to task_iter in
> container environments due to certain limitations associated with
> task_iter.
>
> - Firstly, task_iter only supports the 'current' pidns.
>   However, since our data collector operates on the host, we may need to
>   collect information from multiple containers simultaneously. Using
>   task_iter would require us to fork the collector for each container,
>   which is not ideal.
>
> - Additionally, task_iter is unable to collect task information from
> containers running in the host pidns.
>   In our container environment, we have containers running in the host
>   pidns, and we would like to collect task information from them as well.
>
> - Lastly, task_iter does not support multiple-container pods.
>   In a Kubernetes environment, a single pod may contain multiple
>   containers, all sharing the same pidns. However, we are only interested
>   in iterating tasks within the main container, which is not possible with
>   task_iter.
>
> To address the first issue, we could potentially extend task_iter to
> support specifying a pidns other than the current one. However, for the
> other two issues, extending task_iter would not provide a solution.
> Therefore, we believe it is preferable to introduce the cgroup_task iter to
> handle these scenarios effectively.
>
> Patch #1: Preparation
> Patch #2: Add cgroup_task iter
> Patch #3: Add support for cgroup_task iter in bpftool
> Patch #4: Selftests for cgroup_task iter
>
> Yafang Shao (4):
>   bpf: Add __bpf_iter_attach_cgroup()
>   bpf: Add cgroup_task iter
>   bpftool: Add support for cgroup_task
>   selftests/bpf: Add selftest for cgroup_task iter
>
>  include/linux/btf_ids.h                       |  14 ++
>  kernel/bpf/cgroup_iter.c                      | 181 ++++++++++++++--
>  tools/bpf/bpftool/link.c                      |   3 +-
>  .../bpf/prog_tests/cgroup_task_iter.c         | 197 ++++++++++++++++++
>  .../selftests/bpf/progs/cgroup_task_iter.c    |  39 ++++
>  5 files changed, 419 insertions(+), 15 deletions(-)
>  create mode 100644 tools/testing/selftests/bpf/prog_tests/cgroup_task_iter.c
>  create mode 100644 tools/testing/selftests/bpf/progs/cgroup_task_iter.c
>

Just a kind reminder.

Anyone is interested in this idea ?