diff mbox series

[RESEND,v8,16/16] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of

Message ID 20240505160628.2323363-17-rppt@kernel.org (mailing list archive)
State New
Headers show
Series mm: jit/text allocator | expand

Commit Message

Mike Rapoport May 5, 2024, 4:06 p.m. UTC
From: "Mike Rapoport (IBM)" <rppt@kernel.org>

BPF just-in-time compiler depended on CONFIG_MODULES because it used
module_alloc() to allocate memory for the generated code.

Since code allocations are now implemented with execmem, drop dependency of
CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.

Suggested-by: Björn Töpel <bjorn@kernel.org>
Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
---
 kernel/bpf/Kconfig | 2 +-
 1 file changed, 1 insertion(+), 1 deletion(-)

Comments

Klara Modin May 16, 2024, 11 p.m. UTC | #1
Hi,

On 2024-05-05 18:06, Mike Rapoport wrote:
> From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> 
> BPF just-in-time compiler depended on CONFIG_MODULES because it used
> module_alloc() to allocate memory for the generated code.
> 
> Since code allocations are now implemented with execmem, drop dependency of
> CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.
> 
> Suggested-by: Björn Töpel <bjorn@kernel.org>
> Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> ---
>   kernel/bpf/Kconfig | 2 +-
>   1 file changed, 1 insertion(+), 1 deletion(-)
> 
> diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> index bc25f5098a25..f999e4e0b344 100644
> --- a/kernel/bpf/Kconfig
> +++ b/kernel/bpf/Kconfig
> @@ -43,7 +43,7 @@ config BPF_JIT
>   	bool "Enable BPF Just In Time compiler"
>   	depends on BPF
>   	depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
> -	depends on MODULES
> +	select EXECMEM
>   	help
>   	  BPF programs are normally handled by a BPF interpreter. This option
>   	  allows the kernel to generate native code when a program is loaded

This does not seem to work entirely. If build with BPF_JIT without 
module support for my Raspberry Pi 3 B I get warnings in my kernel log 
(easiest way to trigger it seems to be trying to ssh into it, which fails).

Kind regards,
Klara Modin
ldrop login: [   43.741638] Internal error: BRK handler: 00000000f2000100 [#1] SMP
[   43.749269] CPU: 3 PID: 2083 Comm: sshd Not tainted 6.9.0-01786-g2c9e5d4a0082 #25
[   43.758216] Hardware name: Raspberry Pi 3 Model B (DT)
[   43.764769] pstate: 80000005 (Nzcv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   43.773199] pc : 0xffff8000814dd8b8
[   43.778084] lr : __seccomp_filter (include/linux/bpf.h:1234 include/linux/filter.h:657 include/linux/filter.h:664 include/linux/filter.h:681 kernel/seccomp.c:426 kernel/seccomp.c:1222) 
[   43.783784] sp : ffff8000855a3d40
[   43.788471] x29: ffff8000855a3d90 x28: 0000000000000000 x27: 0000000000000001
[   43.797082] x26: 00000000000000de x25: 0000000000000000 x24: 000000007fff0000
[   43.805652] x23: 0000000080000000 x22: ffff8000855a3d48 x21: ffff000005446480
[   43.814189] x20: ffff0000046ad300 x19: ffff80008147d000 x18: 0000000000000000
[   43.822694] x17: 0000000000000000 x16: 0000000000000000 x15: 0000000000000000
[   43.831160] x14: 0000000000000000 x13: 0000000000000000 x12: 0000000000000000
[   43.839577] x11: 0000000000000000 x10: 0000000000000000 x9 : 0000000000000000
[   43.847966] x8 : 0000000000000000 x7 : 0000000000001000 x6 : 0000000000000022
[   43.856311] x5 : 0000000000000003 x4 : 0000000000000000 x3 : 0000000000000001
[   43.864636] x2 : ffff8000814dd8b8 x1 : ffff80008147d048 x0 : ffff8000855a3d48
[   43.872958] Call trace:
[   43.876450]  0xffff8000814dd8b8
[   43.880610] __secure_computing (kernel/seccomp.c:1363) 
[   43.885622] syscall_trace_enter (arch/arm64/kernel/ptrace.c:2242 (discriminator 1)) 
[   43.890826] el0_svc_common.constprop.0 (arch/arm64/kernel/syscall.c:128) 
[   43.896593] do_el0_svc (arch/arm64/kernel/syscall.c:153) 
[   43.900909] el0_svc (arch/arm64/include/asm/irqflags.h:56 arch/arm64/include/asm/irqflags.h:77 arch/arm64/kernel/entry-common.c:165 arch/arm64/kernel/entry-common.c:178 arch/arm64/kernel/entry-common.c:713) 
[   43.904922] el0t_64_sync_handler (arch/arm64/kernel/entry-common.c:731) 
[   43.910232] el0t_64_sync (arch/arm64/kernel/entry.S:598) 
[ 43.914795] Code: d4202000 d4202000 d4202000 d4202000 (d4202000)
All code
========
   0:*	00 20                	add    %ah,(%rax)		<-- trapping instruction
   2:	20 d4                	and    %dl,%ah
   4:	00 20                	add    %ah,(%rax)
   6:	20 d4                	and    %dl,%ah
   8:	00 20                	add    %ah,(%rax)
   a:	20 d4                	and    %dl,%ah
   c:	00 20                	add    %ah,(%rax)
   e:	20 d4                	and    %dl,%ah
  10:	00 20                	add    %ah,(%rax)
  12:	20 d4                	and    %dl,%ah

Code starting with the faulting instruction
===========================================
   0:	00 20                	add    %ah,(%rax)
   2:	20 d4                	and    %dl,%ah
[   43.921826] ---[ end trace 0000000000000000 ]---
[   43.927335] note: sshd[2083] exited with irqs disabled
[   43.933417] note: sshd[2083] exited with preempt_count 1
[   43.934685] ------------[ cut here ]------------
[   43.945156] WARNING: CPU: 3 PID: 0 at kernel/context_tracking.c:128 ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128 (discriminator 1)) 
[   43.956500] CPU: 3 PID: 0 Comm: swapper/3 Tainted: G      D            6.9.0-01786-g2c9e5d4a0082 #25
[   43.967570] Hardware name: Raspberry Pi 3 Model B (DT)
[   43.973716] pstate: 200003c5 (nzCv DAIF -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   43.981774] pc : ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128 (discriminator 1)) 
[   43.987920] lr : ct_idle_enter (kernel/context_tracking.c:321) 
[   43.992926] sp : ffff80008144bdd0
[   43.997312] x29: ffff80008144bdd0 x28: ffff000002061100 x27: 0000000000000000
[   44.005623] x26: ffff80008154bde0 x25: ffff000001a590c0 x24: 0000000000000000
[   44.013899] x23: 0000000000000000 x22: ffff000001a590c0 x21: ffff80008118ad28
[   44.022186] x20: ffff80008118ac08 x19: ffff00003a1bd610 x18: ffff8000855a3878
[   44.030483] x17: ffffffffffffffff x16: 0000000000000000 x15: 0000ffffbbbce000
[   44.038794] x14: 04d1d6f476a588c8 x13: 00000000000003bb x12: 0000000000000001
[   44.047107] x11: 0000000000000001 x10: 0000000000000a00 x9 : ffff80008144bd30
[   44.055426] x8 : ffff000001a59b20 x7 : 0000000000000000 x6 : 000000003ad2e995
[   44.063758] x5 : 4000000000000002 x4 : ffff7fffb91c3000 x3 : ffff80008144bdd0
[   44.072105] x2 : 4000000000000000 x1 : ffff800080ffa610 x0 : ffff800080ffa610
[   44.080461] Call trace:
[   44.084011] ct_kernel_exit.constprop.0 (kernel/context_tracking.c:128 (discriminator 1)) 
[   44.089869] ct_idle_enter (kernel/context_tracking.c:321) 
[   44.094563] default_idle_call (kernel/sched/idle.c:117) 
[   44.099622] do_idle (kernel/sched/idle.c:192 kernel/sched/idle.c:332) 
[   44.103959] cpu_startup_entry (kernel/sched/idle.c:429) 
[   44.108970] secondary_start_kernel (arch/arm64/include/asm/atomic_ll_sc.h:95 (discriminator 2) arch/arm64/include/asm/atomic.h:28 (discriminator 2) include/linux/atomic/atomic-arch-fallback.h:546 (discriminator 2) include/linux/atomic/atomic-arch-fallback.h:994 (discriminator 2) include/linux/atomic/atomic-instrumented.h:436 (discriminator 2) include/linux/sched/mm.h:36 (discriminator 2) arch/arm64/kernel/smp.c:214 (discriminator 2)) 
[   44.114569] __secondary_switched (arch/arm64/kernel/head.S:418) 
[   44.119783] ---[ end trace 0000000000000000 ]---
[   44.125645] ------------[ cut here ]------------
[   44.131296] Trying to vfree() bad address (000000004a17c299)
[   44.138024] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3189 remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) 
[   44.146675] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
[   44.158229] Hardware name: Raspberry Pi 3 Model B (DT)
[   44.164433] Workqueue: events bpf_prog_free_deferred
[   44.170492] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.178601] pc : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) 
[   44.183705] lr : remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) 
[   44.188772] sp : ffff800082a13c70
[   44.193112] x29: ffff800082a13c70 x28: 0000000000000000 x27: 0000000000000000
[   44.201384] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[   44.209658] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: ffff8000814dd880
[   44.217924] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[   44.226206] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[   44.234460] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[   44.242710] x11: ffff8000811a6370 x10: 0000000000000144 x9 : ffff8000811fe370
[   44.250959] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[   44.259206] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   44.267457] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[   44.275703] Call trace:
[   44.279158] remove_vm_area (mm/vmalloc.c:3189 (discriminator 1)) 
[   44.283858] vfree (mm/vmalloc.c:3322) 
[   44.287835] execmem_free (mm/execmem.c:70) 
[   44.292347] bpf_jit_free_exec+0x10/0x1c 
[   44.297283] bpf_prog_pack_free (kernel/bpf/core.c:1006) 
[   44.302457] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195) 
[   44.307951] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474) 
[   44.312342] bpf_prog_free_deferred (kernel/bpf/core.c:2785) 
[   44.317785] process_one_work (kernel/workqueue.c:3273) 
[   44.322684] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2)) 
[   44.327292] kthread (kernel/kthread.c:388) 
[   44.331342] ret_from_fork (arch/arm64/kernel/entry.S:861) 
[   44.335758] ---[ end trace 0000000000000000 ]---
[   44.341288] ------------[ cut here ]------------
[   44.346777] Trying to vfree() nonexistent vm area (000000004a17c299)
[   44.354077] WARNING: CPU: 1 PID: 193 at mm/vmalloc.c:3324 vfree (mm/vmalloc.c:3324 (discriminator 1)) 
[   44.361988] CPU: 1 PID: 193 Comm: kworker/1:2 Tainted: G      D W          6.9.0-01786-g2c9e5d4a0082 #25
[   44.373301] Hardware name: Raspberry Pi 3 Model B (DT)
[   44.379397] Workqueue: events bpf_prog_free_deferred
[   44.385342] pstate: 60000005 (nZCv daif -PAN -UAO -TCO -DIT -SSBS BTYPE=--)
[   44.393343] pc : vfree (mm/vmalloc.c:3324 (discriminator 1)) 
[   44.397723] lr : vfree (mm/vmalloc.c:3324 (discriminator 1)) 
[   44.402088] sp : ffff800082a13c90
[   44.406326] x29: ffff800082a13c90 x28: 0000000000000000 x27: 0000000000000000
[   44.414509] x26: 0000000000000000 x25: ffff00003a44efa0 x24: 00000000d4202000
[   44.422704] x23: ffff800081223dd0 x22: ffff00003a198a40 x21: 0000000000000000
[   44.430908] x20: 00000000d4202000 x19: ffff8000814dd880 x18: 0000000000000006
[   44.439122] x17: 0000000000000000 x16: 0000000000000020 x15: 0000000000000002
[   44.447338] x14: ffff8000811a6370 x13: 0000000020000000 x12: 0000000000000000
[   44.455553] x11: ffff8000811a6370 x10: 0000000000000166 x9 : ffff8000811fe370
[   44.463771] x8 : 0000000000017fe8 x7 : 00000000fffff000 x6 : ffff8000811fe370
[   44.471989] x5 : 0000000000000000 x4 : 0000000000000000 x3 : 0000000000000000
[   44.480208] x2 : 0000000000000000 x1 : 0000000000000000 x0 : ffff000002203240
[   44.488420] Call trace:
[   44.491847] vfree (mm/vmalloc.c:3324 (discriminator 1)) 
[   44.495900] execmem_free (mm/execmem.c:70) 
[   44.500394] bpf_jit_free_exec+0x10/0x1c 
[   44.505329] bpf_prog_pack_free (kernel/bpf/core.c:1006) 
[   44.510507] bpf_jit_binary_pack_free (kernel/bpf/core.c:1195) 
[   44.516017] bpf_jit_free (include/linux/filter.h:1083 arch/arm64/net/bpf_jit_comp.c:2474) 
[   44.520424] bpf_prog_free_deferred (kernel/bpf/core.c:2785) 
[   44.525864] process_one_work (kernel/workqueue.c:3273) 
[   44.530754] worker_thread (kernel/workqueue.c:3342 (discriminator 2) kernel/workqueue.c:3429 (discriminator 2)) 
[   44.535364] kthread (kernel/kthread.c:388) 
[   44.539417] ret_from_fork (arch/arm64/kernel/entry.S:861) 
[   44.543791] ---[ end trace 0000000000000000 ]---
# bad: [dbd9e2e056d8577375ae4b31ada94f8aa3769e8a] Add linux-next specific files for 20240516
git bisect start 'next/master'
# status: waiting for good commit(s), bad commit known
# good: [8c06da67d0bd3139a97f301b4aa9c482b9d4f29e] Merge tag 'livepatching-for-6.10' of git://git.kernel.org/pub/scm/linux/kernel/git/livepatching/livepatching
git bisect good 8c06da67d0bd3139a97f301b4aa9c482b9d4f29e
# good: [147d3734724040bb0aff1252299e48947a6c8858] Merge branch 'master' of git://linuxtv.org/mchehab/media-next.git
git bisect good 147d3734724040bb0aff1252299e48947a6c8858
# bad: [729cf96da8de5e7ae70fef40a1b864bc00c2dca1] Merge branch 'next' of git://git.kernel.org/pub/scm/virt/kvm/kvm.git
git bisect bad 729cf96da8de5e7ae70fef40a1b864bc00c2dca1
# good: [4364438497c638785b1394aab764a15b6baefaf3] Merge branch 'drm-xe-next' of https://gitlab.freedesktop.org/drm/xe/kernel
git bisect good 4364438497c638785b1394aab764a15b6baefaf3
# bad: [b3ead6c10eccbfa446ce30927f94472c278cd3d7] Merge branch 'for-next' of git://git.kernel.org/pub/scm/linux/kernel/git/robh/linux.git
git bisect bad b3ead6c10eccbfa446ce30927f94472c278cd3d7
# bad: [d83384f475a4cfa0e9bda1cab538d99360fa2c48] Merge branch 'for-mfd-next' of git://git.kernel.org/pub/scm/linux/kernel/git/lee/mfd.git
git bisect bad d83384f475a4cfa0e9bda1cab538d99360fa2c48
# bad: [9564f97e8e3ec6bdbf0c105b45fa2516d64c4685] Merge branch 'for-next' of git://git.kernel.dk/linux-block.git
git bisect bad 9564f97e8e3ec6bdbf0c105b45fa2516d64c4685
# bad: [0e6c77dedcb11f510c0dbdaf6455b918b28f1b62] Merge branch 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/dtor/input.git
git bisect bad 0e6c77dedcb11f510c0dbdaf6455b918b28f1b62
# good: [5852f2afcdd9b7c9dedec4fdf14b8b079349828f] Input: drop explicit initialization of struct i2c_device_id::driver_data to 0
git bisect good 5852f2afcdd9b7c9dedec4fdf14b8b079349828f
# good: [223b5e57d0d50b0c07b933350dbcde92018d3080] mm/execmem, arch: convert remaining overrides of module_alloc to execmem
git bisect good 223b5e57d0d50b0c07b933350dbcde92018d3080
# good: [14e56fb2ed1dbc3c3171d12ab435b0f691f6f215] x86/ftrace: enable dynamic ftrace without CONFIG_MODULES
git bisect good 14e56fb2ed1dbc3c3171d12ab435b0f691f6f215
# good: [7582b7be16d0ba90e3dbd9575a730cabd9eb852a] kprobes: remove dependency on CONFIG_MODULES
git bisect good 7582b7be16d0ba90e3dbd9575a730cabd9eb852a
# bad: [86d899efdd58c98a0d196e31945009fc47a56264] Merge branch 'modules-next' of git://git.kernel.org/pub/scm/linux/kernel/git/mcgrof/linux.git
git bisect bad 86d899efdd58c98a0d196e31945009fc47a56264
# bad: [2c9e5d4a008293407836d29d35dfd4353615bd2f] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of
git bisect bad 2c9e5d4a008293407836d29d35dfd4353615bd2f
# first bad commit: [2c9e5d4a008293407836d29d35dfd4353615bd2f] bpf: remove CONFIG_BPF_JIT dependency on CONFIG_MODULES of
Will Deacon May 17, 2024, 3:46 p.m. UTC | #2
Hi Klara,

On Fri, May 17, 2024 at 01:00:31AM +0200, Klara Modin wrote:
> On 2024-05-05 18:06, Mike Rapoport wrote:
> > From: "Mike Rapoport (IBM)" <rppt@kernel.org>
> > 
> > BPF just-in-time compiler depended on CONFIG_MODULES because it used
> > module_alloc() to allocate memory for the generated code.
> > 
> > Since code allocations are now implemented with execmem, drop dependency of
> > CONFIG_BPF_JIT on CONFIG_MODULES and make it select CONFIG_EXECMEM.
> > 
> > Suggested-by: Björn Töpel <bjorn@kernel.org>
> > Signed-off-by: Mike Rapoport (IBM) <rppt@kernel.org>
> > ---
> >   kernel/bpf/Kconfig | 2 +-
> >   1 file changed, 1 insertion(+), 1 deletion(-)
> > 
> > diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
> > index bc25f5098a25..f999e4e0b344 100644
> > --- a/kernel/bpf/Kconfig
> > +++ b/kernel/bpf/Kconfig
> > @@ -43,7 +43,7 @@ config BPF_JIT
> >   	bool "Enable BPF Just In Time compiler"
> >   	depends on BPF
> >   	depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
> > -	depends on MODULES
> > +	select EXECMEM
> >   	help
> >   	  BPF programs are normally handled by a BPF interpreter. This option
> >   	  allows the kernel to generate native code when a program is loaded
> 
> This does not seem to work entirely. If build with BPF_JIT without module
> support for my Raspberry Pi 3 B I get warnings in my kernel log (easiest way
> to trigger it seems to be trying to ssh into it, which fails).

Thanks for the report. I was able to reproduce this using QEMU and it
looks like the problem is because bpf_arch_text_copy() silently fails
to write to the read-only area as a result of patch_map() faulting and
the resulting -EFAULT being chucked away.

Please can you try the diff below?

Will

--->8

diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
index 255534930368..94b9fea65aca 100644
--- a/arch/arm64/kernel/patching.c
+++ b/arch/arm64/kernel/patching.c
@@ -36,7 +36,7 @@ static void __kprobes *patch_map(void *addr, int fixmap)
 
        if (image)
                page = phys_to_page(__pa_symbol(addr));
-       else if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
+       else if (IS_ENABLED(CONFIG_EXECMEM))
                page = vmalloc_to_page(addr);
        else
                return addr;
Klara Modin May 17, 2024, 4:09 p.m. UTC | #3
On 2024-05-17 17:46, Will Deacon wrote:
> Hi Klara,
> 
> On Fri, May 17, 2024 at 01:00:31AM +0200, Klara Modin wrote:
>>
>> This does not seem to work entirely. If build with BPF_JIT without module
>> support for my Raspberry Pi 3 B I get warnings in my kernel log (easiest way
>> to trigger it seems to be trying to ssh into it, which fails).
> 
> Thanks for the report. I was able to reproduce this using QEMU and it
> looks like the problem is because bpf_arch_text_copy() silently fails
> to write to the read-only area as a result of patch_map() faulting and
> the resulting -EFAULT being chucked away.
> 
> Please can you try the diff below?
> 
> Will
> 
> --->8
> 
> diff --git a/arch/arm64/kernel/patching.c b/arch/arm64/kernel/patching.c
> index 255534930368..94b9fea65aca 100644
> --- a/arch/arm64/kernel/patching.c
> +++ b/arch/arm64/kernel/patching.c
> @@ -36,7 +36,7 @@ static void __kprobes *patch_map(void *addr, int fixmap)
>   
>          if (image)
>                  page = phys_to_page(__pa_symbol(addr));
> -       else if (IS_ENABLED(CONFIG_STRICT_MODULE_RWX))
> +       else if (IS_ENABLED(CONFIG_EXECMEM))
>                  page = vmalloc_to_page(addr);
>          else
>                  return addr;
> 

This seems to work from my short testing.

Thanks,
Tested-by: Klara Modin <klarasmodin@gmail.com>
diff mbox series

Patch

diff --git a/kernel/bpf/Kconfig b/kernel/bpf/Kconfig
index bc25f5098a25..f999e4e0b344 100644
--- a/kernel/bpf/Kconfig
+++ b/kernel/bpf/Kconfig
@@ -43,7 +43,7 @@  config BPF_JIT
 	bool "Enable BPF Just In Time compiler"
 	depends on BPF
 	depends on HAVE_CBPF_JIT || HAVE_EBPF_JIT
-	depends on MODULES
+	select EXECMEM
 	help
 	  BPF programs are normally handled by a BPF interpreter. This option
 	  allows the kernel to generate native code when a program is loaded