Message ID | 20241010175552.1895980-1-yonghong.song@linux.dev (mailing list archive) |
---|---|
Headers | show |
Series | bpf: Support private stack for bpf progs | expand |
Hello, On Thu, Oct 10, 2024 at 10:55:52AM -0700, Yonghong Song wrote: > The main motivation for private stack comes from nested scheduler in > sched-ext from Tejun. The basic idea is that > - each cgroup will its own associated bpf program, > - bpf program with parent cgroup will call bpf programs > in immediate child cgroups. > > Let us say we have the following cgroup hierarchy: > root_cg (prog0): > cg1 (prog1): > cg11 (prog11): > cg111 (prog111) > cg112 (prog112) > cg12 (prog12): > cg121 (prog121) > cg122 (prog122) > cg2 (prog2): > cg21 (prog21) > cg22 (prog22) > cg23 (prog23) Thank you so much for working on this. I have some basic and a bit tangential questions around how stacks are allocated. So, for sched_ext, each scheduler would be represented by struct_ops and I think the interface to load them would be attaching a struct_ops to a cgroup. - I suppose each operation in a struct_ops would count as a separate program and would thus allocate 512 * nr_cpus stacks, right? - If the same scheduler implementation is attached to more than one cgroups, would each instance be treated as a separate set of programs or would they share the stack? - Most struct_ops operations won't need to be nested and thus wouldn't need to use a private stack. Would it be possible to indicate which one should use a private stack? Thanks.
On Tue, Oct 15, 2024 at 2:28 PM Tejun Heo <tj@kernel.org> wrote: > > Hello, > > On Thu, Oct 10, 2024 at 10:55:52AM -0700, Yonghong Song wrote: > > The main motivation for private stack comes from nested scheduler in > > sched-ext from Tejun. The basic idea is that > > - each cgroup will its own associated bpf program, > > - bpf program with parent cgroup will call bpf programs > > in immediate child cgroups. > > > > Let us say we have the following cgroup hierarchy: > > root_cg (prog0): > > cg1 (prog1): > > cg11 (prog11): > > cg111 (prog111) > > cg112 (prog112) > > cg12 (prog12): > > cg121 (prog121) > > cg122 (prog122) > > cg2 (prog2): > > cg21 (prog21) > > cg22 (prog22) > > cg23 (prog23) > > Thank you so much for working on this. I have some basic and a bit > tangential questions around how stacks are allocated. So, for sched_ext, > each scheduler would be represented by struct_ops and I think the interface > to load them would be attaching a struct_ops to a cgroup. > > - I suppose each operation in a struct_ops would count as a separate program > and would thus allocate 512 * nr_cpus stacks, right? It's one stack per program. Its size will be ~512 * nr_cpus * max_allowed_recursion. We hope max_allowed_recursion == 4 or something small. > - If the same scheduler implementation is attached to more than one cgroups, > would each instance be treated as a separate set of programs or would they > share the stack? I think there is only one sched_ext struct_ops with its set of progs. They are global and not "attached to a cgroup". > - Most struct_ops operations won't need to be nested and thus wouldn't need > to use a private stack. Would it be possible to indicate which one should > use a private stack? See my other reply. One of bpf_verifier_ops callbacks would need to indicate back to trampoline which callback is nested with limited recursion.