Message ID | 20220824233117.1312810-1-haoluo@google.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: rstat: cgroup hierarchical | expand |
On Wed, Aug 24, 2022 at 4:31 PM Hao Luo <haoluo@google.com> wrote: > > This patch series allows for using bpf to collect hierarchical cgroup > stats efficiently by integrating with the rstat framework. The rstat > framework provides an efficient way to collect cgroup stats percpu and > propagate them through the cgroup hierarchy. > > The stats are exposed to userspace in textual form by reading files in > bpffs, similar to cgroupfs stats by using a cgroup_iter program. > cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > - walking a cgroup's descendants in pre-order. > - walking a cgroup's descendants in post-order. > - walking a cgroup's ancestors. > - process only a single object. > > When attaching cgroup_iter, one needs to set a cgroup to the iter_link > created from attaching. This cgroup can be passed either as a file > descriptor or a cgroup id. That cgroup serves as the starting point of > the walk. > > One can also terminate the walk early by returning 1 from the iter > program. > > Note that because walking cgroup hierarchy holds cgroup_mutex, the iter > program is called with cgroup_mutex held. > > ** Background on rstat for stats collection ** > (I am using a subscriber analogy that is not commonly used) > > The rstat framework maintains a tree of cgroups that have updates and > which cpus have updates. A subscriber to the rstat framework maintains > their own stats. The framework is used to tell the subscriber when > and what to flush, for the most efficient stats propagation. The > workflow is as follows: > > - When a subscriber updates a cgroup on a cpu, it informs the rstat > framework by calling cgroup_rstat_updated(cgrp, cpu). > > - When a subscriber wants to read some stats for a cgroup, it asks > the rstat framework to initiate a stats flush (propagation) by calling > cgroup_rstat_flush(cgrp). > > - When the rstat framework initiates a flush, it makes callbacks to > subscribers to aggregate stats on cpus that have updates, and > propagate updates to their parent. > > Currently, the main subscribers to the rstat framework are cgroup > subsystems (e.g. memory, block). This patch series allow bpf programs to > become subscribers as well. > > Patches in this series are organized as follows: > * Patches 1-2 introduce cgroup_iter prog, and a selftest. > * Patches 3-5 allow bpf programs to integrate with rstat by adding the > necessary hook points and kfunc. A comprehensive selftest that > demonstrates the entire workflow for using bpf and rstat to > efficiently collect and output cgroup stats is added. > > --- > Changelog: > v8 -> v9: > - Make UNSPEC (an invalid option) as the default order for cgroup_iter. > - Use enum for specifying cgroup_iter order, instead of u32. > - Add BPF_ITER_RESHCED to cgroup_iter. > - Add cgroup_hierarchical_stats to s390x denylist. What 'RESEND' is for? It seems to confuse patchwork and BPF CI. The v9 series made it to patchwork... Please just bump the version to v10 next time. Don't add things to subject, since automation cannot recognize that yet.
On Wed, Aug 24, 2022 at 5:29 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Aug 24, 2022 at 4:31 PM Hao Luo <haoluo@google.com> wrote: > > > > This patch series allows for using bpf to collect hierarchical cgroup > > stats efficiently by integrating with the rstat framework. The rstat > > framework provides an efficient way to collect cgroup stats percpu and > > propagate them through the cgroup hierarchy. > > > > The stats are exposed to userspace in textual form by reading files in > > bpffs, similar to cgroupfs stats by using a cgroup_iter program. > > cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > > - walking a cgroup's descendants in pre-order. > > - walking a cgroup's descendants in post-order. > > - walking a cgroup's ancestors. > > - process only a single object. > > > > When attaching cgroup_iter, one needs to set a cgroup to the iter_link > > created from attaching. This cgroup can be passed either as a file > > descriptor or a cgroup id. That cgroup serves as the starting point of > > the walk. > > > > One can also terminate the walk early by returning 1 from the iter > > program. > > > > Note that because walking cgroup hierarchy holds cgroup_mutex, the iter > > program is called with cgroup_mutex held. > > > > ** Background on rstat for stats collection ** > > (I am using a subscriber analogy that is not commonly used) > > > > The rstat framework maintains a tree of cgroups that have updates and > > which cpus have updates. A subscriber to the rstat framework maintains > > their own stats. The framework is used to tell the subscriber when > > and what to flush, for the most efficient stats propagation. The > > workflow is as follows: > > > > - When a subscriber updates a cgroup on a cpu, it informs the rstat > > framework by calling cgroup_rstat_updated(cgrp, cpu). > > > > - When a subscriber wants to read some stats for a cgroup, it asks > > the rstat framework to initiate a stats flush (propagation) by calling > > cgroup_rstat_flush(cgrp). > > > > - When the rstat framework initiates a flush, it makes callbacks to > > subscribers to aggregate stats on cpus that have updates, and > > propagate updates to their parent. > > > > Currently, the main subscribers to the rstat framework are cgroup > > subsystems (e.g. memory, block). This patch series allow bpf programs to > > become subscribers as well. > > > > Patches in this series are organized as follows: > > * Patches 1-2 introduce cgroup_iter prog, and a selftest. > > * Patches 3-5 allow bpf programs to integrate with rstat by adding the > > necessary hook points and kfunc. A comprehensive selftest that > > demonstrates the entire workflow for using bpf and rstat to > > efficiently collect and output cgroup stats is added. > > > > --- > > Changelog: > > v8 -> v9: > > - Make UNSPEC (an invalid option) as the default order for cgroup_iter. > > - Use enum for specifying cgroup_iter order, instead of u32. > > - Add BPF_ITER_RESHCED to cgroup_iter. > > - Add cgroup_hierarchical_stats to s390x denylist. > > What 'RESEND' is for? > It seems to confuse patchwork and BPF CI. > > The v9 series made it to patchwork... > > Please just bump the version to v10 next time. > Don't add things to subject, since automation cannot recognize > that yet. Sorry about that. I thought it was RESEND because no content has changed. It was just adding an entry in s390 denylist. Are we good now? Or I need to send a v10?
On Wed, Aug 24, 2022 at 5:42 PM Hao Luo <haoluo@google.com> wrote: > > On Wed, Aug 24, 2022 at 5:29 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Wed, Aug 24, 2022 at 4:31 PM Hao Luo <haoluo@google.com> wrote: > > > > > > This patch series allows for using bpf to collect hierarchical cgroup > > > stats efficiently by integrating with the rstat framework. The rstat > > > framework provides an efficient way to collect cgroup stats percpu and > > > propagate them through the cgroup hierarchy. > > > > > > The stats are exposed to userspace in textual form by reading files in > > > bpffs, similar to cgroupfs stats by using a cgroup_iter program. > > > cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > > > - walking a cgroup's descendants in pre-order. > > > - walking a cgroup's descendants in post-order. > > > - walking a cgroup's ancestors. > > > - process only a single object. > > > > > > When attaching cgroup_iter, one needs to set a cgroup to the iter_link > > > created from attaching. This cgroup can be passed either as a file > > > descriptor or a cgroup id. That cgroup serves as the starting point of > > > the walk. > > > > > > One can also terminate the walk early by returning 1 from the iter > > > program. > > > > > > Note that because walking cgroup hierarchy holds cgroup_mutex, the iter > > > program is called with cgroup_mutex held. > > > > > > ** Background on rstat for stats collection ** > > > (I am using a subscriber analogy that is not commonly used) > > > > > > The rstat framework maintains a tree of cgroups that have updates and > > > which cpus have updates. A subscriber to the rstat framework maintains > > > their own stats. The framework is used to tell the subscriber when > > > and what to flush, for the most efficient stats propagation. The > > > workflow is as follows: > > > > > > - When a subscriber updates a cgroup on a cpu, it informs the rstat > > > framework by calling cgroup_rstat_updated(cgrp, cpu). > > > > > > - When a subscriber wants to read some stats for a cgroup, it asks > > > the rstat framework to initiate a stats flush (propagation) by calling > > > cgroup_rstat_flush(cgrp). > > > > > > - When the rstat framework initiates a flush, it makes callbacks to > > > subscribers to aggregate stats on cpus that have updates, and > > > propagate updates to their parent. > > > > > > Currently, the main subscribers to the rstat framework are cgroup > > > subsystems (e.g. memory, block). This patch series allow bpf programs to > > > become subscribers as well. > > > > > > Patches in this series are organized as follows: > > > * Patches 1-2 introduce cgroup_iter prog, and a selftest. > > > * Patches 3-5 allow bpf programs to integrate with rstat by adding the > > > necessary hook points and kfunc. A comprehensive selftest that > > > demonstrates the entire workflow for using bpf and rstat to > > > efficiently collect and output cgroup stats is added. > > > > > > --- > > > Changelog: > > > v8 -> v9: > > > - Make UNSPEC (an invalid option) as the default order for cgroup_iter. > > > - Use enum for specifying cgroup_iter order, instead of u32. > > > - Add BPF_ITER_RESHCED to cgroup_iter. > > > - Add cgroup_hierarchical_stats to s390x denylist. > > > > What 'RESEND' is for? > > It seems to confuse patchwork and BPF CI. > > > > The v9 series made it to patchwork... > > > > Please just bump the version to v10 next time. > > Don't add things to subject, since automation cannot recognize > > that yet. > > Sorry about that. I thought it was RESEND because no content has > changed. It was just adding an entry in s390 denylist. > > Are we good now? Or I need to send a v10? No need. Assuming that 'RESEND' version will be green in CI.
On Wed, Aug 24, 2022 at 5:47 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Wed, Aug 24, 2022 at 5:42 PM Hao Luo <haoluo@google.com> wrote: > > > > On Wed, Aug 24, 2022 at 5:29 PM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Wed, Aug 24, 2022 at 4:31 PM Hao Luo <haoluo@google.com> wrote: > > > > > > > > This patch series allows for using bpf to collect hierarchical cgroup > > > > stats efficiently by integrating with the rstat framework. The rstat > > > > framework provides an efficient way to collect cgroup stats percpu and > > > > propagate them through the cgroup hierarchy. > > > > > > > > The stats are exposed to userspace in textual form by reading files in > > > > bpffs, similar to cgroupfs stats by using a cgroup_iter program. > > > > cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > > > > - walking a cgroup's descendants in pre-order. > > > > - walking a cgroup's descendants in post-order. > > > > - walking a cgroup's ancestors. > > > > - process only a single object. > > > > > > > > When attaching cgroup_iter, one needs to set a cgroup to the iter_link > > > > created from attaching. This cgroup can be passed either as a file > > > > descriptor or a cgroup id. That cgroup serves as the starting point of > > > > the walk. > > > > > > > > One can also terminate the walk early by returning 1 from the iter > > > > program. > > > > > > > > Note that because walking cgroup hierarchy holds cgroup_mutex, the iter > > > > program is called with cgroup_mutex held. > > > > > > > > ** Background on rstat for stats collection ** > > > > (I am using a subscriber analogy that is not commonly used) > > > > > > > > The rstat framework maintains a tree of cgroups that have updates and > > > > which cpus have updates. A subscriber to the rstat framework maintains > > > > their own stats. The framework is used to tell the subscriber when > > > > and what to flush, for the most efficient stats propagation. The > > > > workflow is as follows: > > > > > > > > - When a subscriber updates a cgroup on a cpu, it informs the rstat > > > > framework by calling cgroup_rstat_updated(cgrp, cpu). > > > > > > > > - When a subscriber wants to read some stats for a cgroup, it asks > > > > the rstat framework to initiate a stats flush (propagation) by calling > > > > cgroup_rstat_flush(cgrp). > > > > > > > > - When the rstat framework initiates a flush, it makes callbacks to > > > > subscribers to aggregate stats on cpus that have updates, and > > > > propagate updates to their parent. > > > > > > > > Currently, the main subscribers to the rstat framework are cgroup > > > > subsystems (e.g. memory, block). This patch series allow bpf programs to > > > > become subscribers as well. > > > > > > > > Patches in this series are organized as follows: > > > > * Patches 1-2 introduce cgroup_iter prog, and a selftest. > > > > * Patches 3-5 allow bpf programs to integrate with rstat by adding the > > > > necessary hook points and kfunc. A comprehensive selftest that > > > > demonstrates the entire workflow for using bpf and rstat to > > > > efficiently collect and output cgroup stats is added. > > > > > > > > --- > > > > Changelog: > > > > v8 -> v9: > > > > - Make UNSPEC (an invalid option) as the default order for cgroup_iter. > > > > - Use enum for specifying cgroup_iter order, instead of u32. > > > > - Add BPF_ITER_RESHCED to cgroup_iter. > > > > - Add cgroup_hierarchical_stats to s390x denylist. > > > > > > What 'RESEND' is for? > > > It seems to confuse patchwork and BPF CI. > > > > > > The v9 series made it to patchwork... > > > > > > Please just bump the version to v10 next time. > > > Don't add things to subject, since automation cannot recognize > > > that yet. > > > > Sorry about that. I thought it was RESEND because no content has > > changed. It was just adding an entry in s390 denylist. > > > > Are we good now? Or I need to send a v10? > > No need. Assuming that 'RESEND' version will be green in CI. Sounds good. I will monitor the CI. :)
Hello: This series was applied to bpf/bpf-next.git (master) by Alexei Starovoitov <ast@kernel.org>: On Wed, 24 Aug 2022 16:31:12 -0700 you wrote: > This patch series allows for using bpf to collect hierarchical cgroup > stats efficiently by integrating with the rstat framework. The rstat > framework provides an efficient way to collect cgroup stats percpu and > propagate them through the cgroup hierarchy. > > The stats are exposed to userspace in textual form by reading files in > bpffs, similar to cgroupfs stats by using a cgroup_iter program. > cgroup_iter is a type of bpf_iter. It walks over cgroups in four modes: > - walking a cgroup's descendants in pre-order. > - walking a cgroup's descendants in post-order. > - walking a cgroup's ancestors. > - process only a single object. > > [...] Here is the summary with links: - [RESEND,bpf-next,v9,1/5] bpf: Introduce cgroup iter https://git.kernel.org/bpf/bpf-next/c/d4ccaf58a847 - [RESEND,bpf-next,v9,2/5] selftests/bpf: Test cgroup_iter. https://git.kernel.org/bpf/bpf-next/c/fe0dd9d4b740 - [RESEND,bpf-next,v9,3/5] cgroup: bpf: enable bpf programs to integrate with rstat https://git.kernel.org/bpf/bpf-next/c/a319185be9f5 - [RESEND,bpf-next,v9,4/5] selftests/bpf: extend cgroup helpers https://git.kernel.org/bpf/bpf-next/c/434992bb6037 - [RESEND,bpf-next,v9,5/5] selftests/bpf: add a selftest for cgroup hierarchical stats collection https://git.kernel.org/bpf/bpf-next/c/88886309d2e8 You are awesome, thank you!