Message ID | 20241108025616.17625-1-alexei.starovoitov@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | bpf: range_tree for bpf arena | expand |
On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > From: Alexei Starovoitov <ast@kernel.org> > > Introduce range_tree (internval tree plus rbtree) to track > unallocated ranges in bpf arena and replace maple_tree with it. > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > The previous approach to reuse drm_mm to replace maple_tree reached > dead end, since sizeof(struct drm_mm_node) = 168 and > sizeof(struct maple_node) = 256 while > sizeof(struct range_node) = 64 introduced in this patch. > Not only it's smaller, but the algorithm splits and merges > adjacent ranges. Ultimate performance doesn't matter. > The main objective of range_tree is to work in context > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > Alexei Starovoitov (2): > bpf: Introduce range_tree data structure and use it in bpf arena > selftests/bpf: Add a test for arena range tree algorithm > > kernel/bpf/Makefile | 2 +- > kernel/bpf/arena.c | 34 ++- > kernel/bpf/range_tree.c | 262 ++++++++++++++++++ > kernel/bpf/range_tree.h | 21 ++ > .../bpf/progs/verifier_arena_large.c | 110 +++++++- > 5 files changed, 412 insertions(+), 17 deletions(-) > create mode 100644 kernel/bpf/range_tree.c > create mode 100644 kernel/bpf/range_tree.h > > -- > 2.43.5 > I skimmed through just to familiarize myself, superficially the range addition logic seems correct. I'll just bikeshed a bit, take it for what it's worth. I found some naming choices a bit weird. rn_start and rn_last, just doesn't match in my head. If it's "start", then it's "end" (or "finish", but it's weird for this case). If it's "last", then it should have "first". "start"/"end" sounds best in my head, fwiw. As for an API, is_range_tree_set() caught my eye as well. I'd expect to see a consistent "range_tree_" prefix for the internal API for this data structure. So "range_tree_is_set()" was what I expected. But all minor, feel free to follow up if you agree.
Hello: This series was applied to bpf/bpf-next.git (master) by Andrii Nakryiko <andrii@kernel.org>: On Thu, 7 Nov 2024 18:56:14 -0800 you wrote: > From: Alexei Starovoitov <ast@kernel.org> > > Introduce range_tree (internval tree plus rbtree) to track > unallocated ranges in bpf arena and replace maple_tree with it. > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > The previous approach to reuse drm_mm to replace maple_tree reached > dead end, since sizeof(struct drm_mm_node) = 168 and > sizeof(struct maple_node) = 256 while > sizeof(struct range_node) = 64 introduced in this patch. > Not only it's smaller, but the algorithm splits and merges > adjacent ranges. Ultimate performance doesn't matter. > The main objective of range_tree is to work in context > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > [...] Here is the summary with links: - [bpf-next,1/2] bpf: Introduce range_tree data structure and use it in bpf arena https://git.kernel.org/bpf/bpf-next/c/b795379757eb - [bpf-next,2/2] selftests/bpf: Add a test for arena range tree algorithm https://git.kernel.org/bpf/bpf-next/c/e58358afa84e You are awesome, thank you!
On Wed, Nov 13, 2024 at 1:59 PM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > From: Alexei Starovoitov <ast@kernel.org> > > > > Introduce range_tree (internval tree plus rbtree) to track > > unallocated ranges in bpf arena and replace maple_tree with it. > > This is a step towards making bpf_arena|free_alloc_pages non-sleepable. > > The previous approach to reuse drm_mm to replace maple_tree reached > > dead end, since sizeof(struct drm_mm_node) = 168 and > > sizeof(struct maple_node) = 256 while > > sizeof(struct range_node) = 64 introduced in this patch. > > Not only it's smaller, but the algorithm splits and merges > > adjacent ranges. Ultimate performance doesn't matter. > > The main objective of range_tree is to work in context > > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. > > > > Alexei Starovoitov (2): > > bpf: Introduce range_tree data structure and use it in bpf arena > > selftests/bpf: Add a test for arena range tree algorithm > > > > kernel/bpf/Makefile | 2 +- > > kernel/bpf/arena.c | 34 ++- > > kernel/bpf/range_tree.c | 262 ++++++++++++++++++ > > kernel/bpf/range_tree.h | 21 ++ > > .../bpf/progs/verifier_arena_large.c | 110 +++++++- > > 5 files changed, 412 insertions(+), 17 deletions(-) > > create mode 100644 kernel/bpf/range_tree.c > > create mode 100644 kernel/bpf/range_tree.h > > > > -- > > 2.43.5 > > > > I skimmed through just to familiarize myself, superficially the range > addition logic seems correct. > > I'll just bikeshed a bit, take it for what it's worth. I found some > naming choices a bit weird. > > rn_start and rn_last, just doesn't match in my head. If it's "start", > then it's "end" (or "finish", but it's weird for this case). If it's > "last", then it should have "first". "start"/"end" sounds best in my > head, fwiw. Agree. It bothered me too a bit, but I kept it as-is to be consistent with xbitmap. So prefer to keep it this way. > > As for an API, is_range_tree_set() caught my eye as well. I'd expect > to see a consistent "range_tree_" prefix for the internal API for this > data structure. So "range_tree_is_set()" was what I expected. This is what I tried first, but looking at how it can be used the "_is_" part in the middle is too easy to misread. if (!range_tree_is_set(rt, pgoff, page_cnt)) range_tree_set(rt, pgoff, page_cnt); // not so bad here if (!range_tree_is_set(rt, pgoff, page_cnt)) // is above "_set" or "_is_set" range_tree_clear(rt, pgoff, page_cnt); Hence I moved "is_" to the beginning to make it more visually different: if (!is_range_tree_set(rt, pgoff, page_cnt)) range_tree_clear(rt, pgoff, page_cnt); Not sure whether the consistent "range_tree_" prefix is a better trade off. No strong opinion.
From: Alexei Starovoitov <ast@kernel.org> Introduce range_tree (internval tree plus rbtree) to track unallocated ranges in bpf arena and replace maple_tree with it. This is a step towards making bpf_arena|free_alloc_pages non-sleepable. The previous approach to reuse drm_mm to replace maple_tree reached dead end, since sizeof(struct drm_mm_node) = 168 and sizeof(struct maple_node) = 256 while sizeof(struct range_node) = 64 introduced in this patch. Not only it's smaller, but the algorithm splits and merges adjacent ranges. Ultimate performance doesn't matter. The main objective of range_tree is to work in context where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc. Alexei Starovoitov (2): bpf: Introduce range_tree data structure and use it in bpf arena selftests/bpf: Add a test for arena range tree algorithm kernel/bpf/Makefile | 2 +- kernel/bpf/arena.c | 34 ++- kernel/bpf/range_tree.c | 262 ++++++++++++++++++ kernel/bpf/range_tree.h | 21 ++ .../bpf/progs/verifier_arena_large.c | 110 +++++++- 5 files changed, 412 insertions(+), 17 deletions(-) create mode 100644 kernel/bpf/range_tree.c create mode 100644 kernel/bpf/range_tree.h