mbox series

[bpf-next,0/2] bpf: range_tree for bpf arena

Message ID 20241108025616.17625-1-alexei.starovoitov@gmail.com (mailing list archive)
Headers show
Series bpf: range_tree for bpf arena | expand

Message

Alexei Starovoitov Nov. 8, 2024, 2:56 a.m. UTC
From: Alexei Starovoitov <ast@kernel.org>

Introduce range_tree (internval tree plus rbtree) to track
unallocated ranges in bpf arena and replace maple_tree with it.
This is a step towards making bpf_arena|free_alloc_pages non-sleepable.
The previous approach to reuse drm_mm to replace maple_tree reached
dead end, since sizeof(struct drm_mm_node) = 168 and
sizeof(struct maple_node) = 256 while
sizeof(struct range_node) = 64 introduced in this patch.
Not only it's smaller, but the algorithm splits and merges
adjacent ranges. Ultimate performance doesn't matter.
The main objective of range_tree is to work in context
where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc.

Alexei Starovoitov (2):
  bpf: Introduce range_tree data structure and use it in bpf arena
  selftests/bpf: Add a test for arena range tree algorithm

 kernel/bpf/Makefile                           |   2 +-
 kernel/bpf/arena.c                            |  34 ++-
 kernel/bpf/range_tree.c                       | 262 ++++++++++++++++++
 kernel/bpf/range_tree.h                       |  21 ++
 .../bpf/progs/verifier_arena_large.c          | 110 +++++++-
 5 files changed, 412 insertions(+), 17 deletions(-)
 create mode 100644 kernel/bpf/range_tree.c
 create mode 100644 kernel/bpf/range_tree.h

Comments

Andrii Nakryiko Nov. 13, 2024, 9:59 p.m. UTC | #1
On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> From: Alexei Starovoitov <ast@kernel.org>
>
> Introduce range_tree (internval tree plus rbtree) to track
> unallocated ranges in bpf arena and replace maple_tree with it.
> This is a step towards making bpf_arena|free_alloc_pages non-sleepable.
> The previous approach to reuse drm_mm to replace maple_tree reached
> dead end, since sizeof(struct drm_mm_node) = 168 and
> sizeof(struct maple_node) = 256 while
> sizeof(struct range_node) = 64 introduced in this patch.
> Not only it's smaller, but the algorithm splits and merges
> adjacent ranges. Ultimate performance doesn't matter.
> The main objective of range_tree is to work in context
> where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc.
>
> Alexei Starovoitov (2):
>   bpf: Introduce range_tree data structure and use it in bpf arena
>   selftests/bpf: Add a test for arena range tree algorithm
>
>  kernel/bpf/Makefile                           |   2 +-
>  kernel/bpf/arena.c                            |  34 ++-
>  kernel/bpf/range_tree.c                       | 262 ++++++++++++++++++
>  kernel/bpf/range_tree.h                       |  21 ++
>  .../bpf/progs/verifier_arena_large.c          | 110 +++++++-
>  5 files changed, 412 insertions(+), 17 deletions(-)
>  create mode 100644 kernel/bpf/range_tree.c
>  create mode 100644 kernel/bpf/range_tree.h
>
> --
> 2.43.5
>

I skimmed through just to familiarize myself, superficially the range
addition logic seems correct.

I'll just bikeshed a bit, take it for what it's worth. I found some
naming choices a bit weird.

rn_start and rn_last, just doesn't match in my head. If it's "start",
then it's "end" (or "finish", but it's weird for this case). If it's
"last", then it should have "first". "start"/"end" sounds best in my
head, fwiw.

As for an API, is_range_tree_set() caught my eye as well. I'd expect
to see a consistent "range_tree_" prefix for the internal API for this
data structure. So "range_tree_is_set()" was what I expected.

But all minor, feel free to follow up if you agree.
patchwork-bot+netdevbpf@kernel.org Nov. 13, 2024, 10:10 p.m. UTC | #2
Hello:

This series was applied to bpf/bpf-next.git (master)
by Andrii Nakryiko <andrii@kernel.org>:

On Thu,  7 Nov 2024 18:56:14 -0800 you wrote:
> From: Alexei Starovoitov <ast@kernel.org>
> 
> Introduce range_tree (internval tree plus rbtree) to track
> unallocated ranges in bpf arena and replace maple_tree with it.
> This is a step towards making bpf_arena|free_alloc_pages non-sleepable.
> The previous approach to reuse drm_mm to replace maple_tree reached
> dead end, since sizeof(struct drm_mm_node) = 168 and
> sizeof(struct maple_node) = 256 while
> sizeof(struct range_node) = 64 introduced in this patch.
> Not only it's smaller, but the algorithm splits and merges
> adjacent ranges. Ultimate performance doesn't matter.
> The main objective of range_tree is to work in context
> where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc.
> 
> [...]

Here is the summary with links:
  - [bpf-next,1/2] bpf: Introduce range_tree data structure and use it in bpf arena
    https://git.kernel.org/bpf/bpf-next/c/b795379757eb
  - [bpf-next,2/2] selftests/bpf: Add a test for arena range tree algorithm
    https://git.kernel.org/bpf/bpf-next/c/e58358afa84e

You are awesome, thank you!
Alexei Starovoitov Nov. 14, 2024, 12:48 a.m. UTC | #3
On Wed, Nov 13, 2024 at 1:59 PM Andrii Nakryiko
<andrii.nakryiko@gmail.com> wrote:
>
> On Thu, Nov 7, 2024 at 6:56 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > From: Alexei Starovoitov <ast@kernel.org>
> >
> > Introduce range_tree (internval tree plus rbtree) to track
> > unallocated ranges in bpf arena and replace maple_tree with it.
> > This is a step towards making bpf_arena|free_alloc_pages non-sleepable.
> > The previous approach to reuse drm_mm to replace maple_tree reached
> > dead end, since sizeof(struct drm_mm_node) = 168 and
> > sizeof(struct maple_node) = 256 while
> > sizeof(struct range_node) = 64 introduced in this patch.
> > Not only it's smaller, but the algorithm splits and merges
> > adjacent ranges. Ultimate performance doesn't matter.
> > The main objective of range_tree is to work in context
> > where kmalloc/kfree are not safe. It achieves that via bpf_mem_alloc.
> >
> > Alexei Starovoitov (2):
> >   bpf: Introduce range_tree data structure and use it in bpf arena
> >   selftests/bpf: Add a test for arena range tree algorithm
> >
> >  kernel/bpf/Makefile                           |   2 +-
> >  kernel/bpf/arena.c                            |  34 ++-
> >  kernel/bpf/range_tree.c                       | 262 ++++++++++++++++++
> >  kernel/bpf/range_tree.h                       |  21 ++
> >  .../bpf/progs/verifier_arena_large.c          | 110 +++++++-
> >  5 files changed, 412 insertions(+), 17 deletions(-)
> >  create mode 100644 kernel/bpf/range_tree.c
> >  create mode 100644 kernel/bpf/range_tree.h
> >
> > --
> > 2.43.5
> >
>
> I skimmed through just to familiarize myself, superficially the range
> addition logic seems correct.
>
> I'll just bikeshed a bit, take it for what it's worth. I found some
> naming choices a bit weird.
>
> rn_start and rn_last, just doesn't match in my head. If it's "start",
> then it's "end" (or "finish", but it's weird for this case). If it's
> "last", then it should have "first". "start"/"end" sounds best in my
> head, fwiw.

Agree. It bothered me too a bit, but I kept it as-is to be
consistent with xbitmap. So prefer to keep it this way.

>
> As for an API, is_range_tree_set() caught my eye as well. I'd expect
> to see a consistent "range_tree_" prefix for the internal API for this
> data structure. So "range_tree_is_set()" was what I expected.

This is what I tried first, but looking at how it can be used
the "_is_" part in the middle is too easy to misread.

if (!range_tree_is_set(rt, pgoff, page_cnt))
   range_tree_set(rt, pgoff, page_cnt);   // not so bad here

if (!range_tree_is_set(rt, pgoff, page_cnt))
   // is above "_set" or "_is_set"
   range_tree_clear(rt, pgoff, page_cnt);


Hence I moved "is_" to the beginning to make it more visually different:

if (!is_range_tree_set(rt, pgoff, page_cnt))
   range_tree_clear(rt, pgoff, page_cnt);

Not sure whether the consistent "range_tree_" prefix is a better trade off.
No strong opinion.