Message ID | 20211215060102.3793196-1-song@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | bpf_prog_pack allocator | expand |
On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: > > Changes v1 => v2: > 1. Use text_poke instead of writing through linear mapping. (Peter) > 2. Avoid making changes to non-x86_64 code. > > Most BPF programs are small, but they consume a page each. For systems > with busy traffic and many BPF programs, this could also add significant > pressure to instruction TLB. > > This set tries to solve this problem with customized allocator that pack > multiple programs into a huge page. > > Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. > Patch 7 uses this allocator in x86_64 jit compiler. > There are test failures, please see [0]. But I was also wondering if there could be an explicit selftest added to validate that all this huge page machinery is actually activated and working as expected? [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true > Song Liu (7): > x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP > bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem > bpf: use size instead of pages in bpf_binary_header > bpf: add a pointer of bpf_binary_header to bpf_prog > x86/alternative: introduce text_poke_jit > bpf: introduce bpf_prog_pack allocator > bpf, x86_64: use bpf_prog_pack allocator > > arch/x86/Kconfig | 1 + > arch/x86/include/asm/text-patching.h | 1 + > arch/x86/kernel/alternative.c | 28 ++++ > arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- > include/linux/bpf.h | 4 +- > include/linux/filter.h | 23 ++- > kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- > kernel/bpf/trampoline.c | 6 +- > 8 files changed, 328 insertions(+), 41 deletions(-) > > -- > 2.30.2
> On Dec 16, 2021, at 12:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: >> >> Changes v1 => v2: >> 1. Use text_poke instead of writing through linear mapping. (Peter) >> 2. Avoid making changes to non-x86_64 code. >> >> Most BPF programs are small, but they consume a page each. For systems >> with busy traffic and many BPF programs, this could also add significant >> pressure to instruction TLB. >> >> This set tries to solve this problem with customized allocator that pack >> multiple programs into a huge page. >> >> Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. >> Patch 7 uses this allocator in x86_64 jit compiler. >> > > There are test failures, please see [0]. But I was also wondering if > there could be an explicit selftest added to validate that all this > huge page machinery is actually activated and working as expected? We can enable some debug option that dumps the page table. Then from the page table, we can confirm the programs are running on a huge page. This only works on x86_64 though. WDYT? Thanks, Song > > [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true > >> Song Liu (7): >> x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP >> bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem >> bpf: use size instead of pages in bpf_binary_header >> bpf: add a pointer of bpf_binary_header to bpf_prog >> x86/alternative: introduce text_poke_jit >> bpf: introduce bpf_prog_pack allocator >> bpf, x86_64: use bpf_prog_pack allocator >> >> arch/x86/Kconfig | 1 + >> arch/x86/include/asm/text-patching.h | 1 + >> arch/x86/kernel/alternative.c | 28 ++++ >> arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- >> include/linux/bpf.h | 4 +- >> include/linux/filter.h | 23 ++- >> kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- >> kernel/bpf/trampoline.c | 6 +- >> 8 files changed, 328 insertions(+), 41 deletions(-) >> >> -- >> 2.30.2
On Thu, Dec 16, 2021 at 5:53 PM Song Liu <songliubraving@fb.com> wrote: > > > > > On Dec 16, 2021, at 12:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: > >> > >> Changes v1 => v2: > >> 1. Use text_poke instead of writing through linear mapping. (Peter) > >> 2. Avoid making changes to non-x86_64 code. > >> > >> Most BPF programs are small, but they consume a page each. For systems > >> with busy traffic and many BPF programs, this could also add significant > >> pressure to instruction TLB. > >> > >> This set tries to solve this problem with customized allocator that pack > >> multiple programs into a huge page. > >> > >> Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. > >> Patch 7 uses this allocator in x86_64 jit compiler. > >> > > > > There are test failures, please see [0]. But I was also wondering if > > there could be an explicit selftest added to validate that all this > > huge page machinery is actually activated and working as expected? > > We can enable some debug option that dumps the page table. Then from the > page table, we can confirm the programs are running on a huge page. This > only works on x86_64 though. WDYT? > I don't know what exactly is involved, so it's hard to say. Ideally whatever we do doesn't complicate our CI setup. Can we use BPF tracing magic to check this from inside the kernel somehow? > Thanks, > Song > > > > > > [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true > > > >> Song Liu (7): > >> x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP > >> bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem > >> bpf: use size instead of pages in bpf_binary_header > >> bpf: add a pointer of bpf_binary_header to bpf_prog > >> x86/alternative: introduce text_poke_jit > >> bpf: introduce bpf_prog_pack allocator > >> bpf, x86_64: use bpf_prog_pack allocator > >> > >> arch/x86/Kconfig | 1 + > >> arch/x86/include/asm/text-patching.h | 1 + > >> arch/x86/kernel/alternative.c | 28 ++++ > >> arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- > >> include/linux/bpf.h | 4 +- > >> include/linux/filter.h | 23 ++- > >> kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- > >> kernel/bpf/trampoline.c | 6 +- > >> 8 files changed, 328 insertions(+), 41 deletions(-) > >> > >> -- > >> 2.30.2 >
On Fri, Dec 17, 2021 at 8:42 AM Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Thu, Dec 16, 2021 at 5:53 PM Song Liu <songliubraving@fb.com> wrote: > > > > > > > > > On Dec 16, 2021, at 12:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > > > On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: > > >> > > >> Changes v1 => v2: > > >> 1. Use text_poke instead of writing through linear mapping. (Peter) > > >> 2. Avoid making changes to non-x86_64 code. > > >> > > >> Most BPF programs are small, but they consume a page each. For systems > > >> with busy traffic and many BPF programs, this could also add significant > > >> pressure to instruction TLB. > > >> > > >> This set tries to solve this problem with customized allocator that pack > > >> multiple programs into a huge page. > > >> > > >> Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. > > >> Patch 7 uses this allocator in x86_64 jit compiler. > > >> > > > > > > There are test failures, please see [0]. But I was also wondering if > > > there could be an explicit selftest added to validate that all this > > > huge page machinery is actually activated and working as expected? > > > > We can enable some debug option that dumps the page table. Then from the > > page table, we can confirm the programs are running on a huge page. This > > only works on x86_64 though. WDYT? > > > > I don't know what exactly is involved, so it's hard to say. Ideally > whatever we do doesn't complicate our CI setup. Can we use BPF tracing > magic to check this from inside the kernel somehow? > But I don't feel strongly about this, if it's hard to detect, it's fine to not have a specific test (especially that it's very architecture-specific) > > Thanks, > > Song > > > > > > > > > > [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true > > > > > >> Song Liu (7): > > >> x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP > > >> bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem > > >> bpf: use size instead of pages in bpf_binary_header > > >> bpf: add a pointer of bpf_binary_header to bpf_prog > > >> x86/alternative: introduce text_poke_jit > > >> bpf: introduce bpf_prog_pack allocator > > >> bpf, x86_64: use bpf_prog_pack allocator > > >> > > >> arch/x86/Kconfig | 1 + > > >> arch/x86/include/asm/text-patching.h | 1 + > > >> arch/x86/kernel/alternative.c | 28 ++++ > > >> arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- > > >> include/linux/bpf.h | 4 +- > > >> include/linux/filter.h | 23 ++- > > >> kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- > > >> kernel/bpf/trampoline.c | 6 +- > > >> 8 files changed, 328 insertions(+), 41 deletions(-) > > >> > > >> -- > > >> 2.30.2 > >
> On Dec 17, 2021, at 8:43 AM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > On Fri, Dec 17, 2021 at 8:42 AM Andrii Nakryiko > <andrii.nakryiko@gmail.com> wrote: >> >> On Thu, Dec 16, 2021 at 5:53 PM Song Liu <songliubraving@fb.com> wrote: >>> >>> >>> >>>> On Dec 16, 2021, at 12:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: >>>> >>>> On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: >>>>> >>>>> Changes v1 => v2: >>>>> 1. Use text_poke instead of writing through linear mapping. (Peter) >>>>> 2. Avoid making changes to non-x86_64 code. >>>>> >>>>> Most BPF programs are small, but they consume a page each. For systems >>>>> with busy traffic and many BPF programs, this could also add significant >>>>> pressure to instruction TLB. >>>>> >>>>> This set tries to solve this problem with customized allocator that pack >>>>> multiple programs into a huge page. >>>>> >>>>> Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. >>>>> Patch 7 uses this allocator in x86_64 jit compiler. >>>>> >>>> >>>> There are test failures, please see [0]. But I was also wondering if >>>> there could be an explicit selftest added to validate that all this >>>> huge page machinery is actually activated and working as expected? >>> >>> We can enable some debug option that dumps the page table. Then from the >>> page table, we can confirm the programs are running on a huge page. This >>> only works on x86_64 though. WDYT? >>> >> >> I don't know what exactly is involved, so it's hard to say. Ideally >> whatever we do doesn't complicate our CI setup. Can we use BPF tracing >> magic to check this from inside the kernel somehow? >> > > But I don't feel strongly about this, if it's hard to detect, it's > fine to not have a specific test (especially that it's very > architecture-specific) It will be more or less architecture-specific, as we need somehow walk the page table (with debug option or with BPF iterator). I will try something. Thanks, Song > >>> Thanks, >>> Song >>> >>> >>>> >>>> [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true >>>> >>>>> Song Liu (7): >>>>> x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP >>>>> bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem >>>>> bpf: use size instead of pages in bpf_binary_header >>>>> bpf: add a pointer of bpf_binary_header to bpf_prog >>>>> x86/alternative: introduce text_poke_jit >>>>> bpf: introduce bpf_prog_pack allocator >>>>> bpf, x86_64: use bpf_prog_pack allocator >>>>> >>>>> arch/x86/Kconfig | 1 + >>>>> arch/x86/include/asm/text-patching.h | 1 + >>>>> arch/x86/kernel/alternative.c | 28 ++++ >>>>> arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- >>>>> include/linux/bpf.h | 4 +- >>>>> include/linux/filter.h | 23 ++- >>>>> kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- >>>>> kernel/bpf/trampoline.c | 6 +- >>>>> 8 files changed, 328 insertions(+), 41 deletions(-) >>>>> >>>>> -- >>>>> 2.30.2 >>>
On Fri, Dec 17, 2021 at 9:13 AM Song Liu <songliubraving@fb.com> wrote: > > > > > On Dec 17, 2021, at 8:43 AM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > > > > On Fri, Dec 17, 2021 at 8:42 AM Andrii Nakryiko > > <andrii.nakryiko@gmail.com> wrote: > >> > >> On Thu, Dec 16, 2021 at 5:53 PM Song Liu <songliubraving@fb.com> wrote: > >>> > >>> > >>> > >>>> On Dec 16, 2021, at 12:06 PM, Andrii Nakryiko <andrii.nakryiko@gmail.com> wrote: > >>>> > >>>> On Tue, Dec 14, 2021 at 10:01 PM Song Liu <song@kernel.org> wrote: > >>>>> > >>>>> Changes v1 => v2: > >>>>> 1. Use text_poke instead of writing through linear mapping. (Peter) > >>>>> 2. Avoid making changes to non-x86_64 code. > >>>>> > >>>>> Most BPF programs are small, but they consume a page each. For systems > >>>>> with busy traffic and many BPF programs, this could also add significant > >>>>> pressure to instruction TLB. > >>>>> > >>>>> This set tries to solve this problem with customized allocator that pack > >>>>> multiple programs into a huge page. > >>>>> > >>>>> Patches 1-5 prepare the work. Patch 6 contains key logic of the allocator. > >>>>> Patch 7 uses this allocator in x86_64 jit compiler. > >>>>> > >>>> > >>>> There are test failures, please see [0]. But I was also wondering if > >>>> there could be an explicit selftest added to validate that all this > >>>> huge page machinery is actually activated and working as expected? > >>> > >>> We can enable some debug option that dumps the page table. Then from the > >>> page table, we can confirm the programs are running on a huge page. This > >>> only works on x86_64 though. WDYT? > >>> > >> > >> I don't know what exactly is involved, so it's hard to say. Ideally > >> whatever we do doesn't complicate our CI setup. Can we use BPF tracing > >> magic to check this from inside the kernel somehow? > >> > > > > But I don't feel strongly about this, if it's hard to detect, it's > > fine to not have a specific test (especially that it's very > > architecture-specific) > > It will be more or less architecture-specific, as we need somehow walk > the page table (with debug option or with BPF iterator). I will try > something. If BPF iterator approach works, that would be great! > > Thanks, > Song > > > > > >>> Thanks, > >>> Song > >>> > >>> > >>>> > >>>> [0] https://github.com/kernel-patches/bpf/runs/4530372387?check_suite_focus=true > >>>> > >>>>> Song Liu (7): > >>>>> x86/Kconfig: select HAVE_ARCH_HUGE_VMALLOC with HAVE_ARCH_HUGE_VMAP > >>>>> bpf: use bytes instead of pages for bpf_jit_[charge|uncharge]_modmem > >>>>> bpf: use size instead of pages in bpf_binary_header > >>>>> bpf: add a pointer of bpf_binary_header to bpf_prog > >>>>> x86/alternative: introduce text_poke_jit > >>>>> bpf: introduce bpf_prog_pack allocator > >>>>> bpf, x86_64: use bpf_prog_pack allocator > >>>>> > >>>>> arch/x86/Kconfig | 1 + > >>>>> arch/x86/include/asm/text-patching.h | 1 + > >>>>> arch/x86/kernel/alternative.c | 28 ++++ > >>>>> arch/x86/net/bpf_jit_comp.c | 93 ++++++++++-- > >>>>> include/linux/bpf.h | 4 +- > >>>>> include/linux/filter.h | 23 ++- > >>>>> kernel/bpf/core.c | 213 ++++++++++++++++++++++++--- > >>>>> kernel/bpf/trampoline.c | 6 +- > >>>>> 8 files changed, 328 insertions(+), 41 deletions(-) > >>>>> > >>>>> -- > >>>>> 2.30.2 > >>> >