diff mbox series

[bpf-next,v3,2/2] bpf, x86: Skip bounds checking for PROBE_MEM with SMAP

Message ID 20241103193512.4076710-3-memxor@gmail.com (mailing list archive)
State New, archived
Delegated to: BPF
Headers show
Series Zero overhead PROBE_MEM | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 9 maintainers not CCed: sdf@fomichev.me kpsingh@kernel.org netdev@vger.kernel.org dsahern@kernel.org john.fastabend@gmail.com martin.lau@linux.dev haoluo@google.com hpa@zytor.com jolsa@kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: labels should not be indented WARNING: line length of 114 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc

Commit Message

Kumar Kartikeya Dwivedi Nov. 3, 2024, 7:35 p.m. UTC
The previous patch changed the do_user_addr_fault page fault handler to
invoke BPF's fixup routines (by searching exception tables and calling
ex_handler_bpf). This would only occur when SMAP is enabled, such that
any user address access from BPF programs running in kernel mode would
reach this path and invoke the fixup routines.

Relying on this behavior, disable any bounds checking instrumentation in
the BPF JIT for x86 when X86_FEATURE_SMAP is available. All BPF
programs execute with SMAP enabled, therefore when this feature is
available, we can assume that SMAP will be enabled during program
execution at runtime.

This optimizes PROBE_MEM loads down to a normal unchecked load
instruction. Any page faults for user or kernel addresses will be
handled using the fixup routines, and the generation exception table
entries for such load instructions.

All in all, this ensures that PROBE_MEM loads will now incur no runtime
overhead, and become practically free.

Acked-by: Puranjay Mohan <puranjay@kernel.org>
Signed-off-by: Kumar Kartikeya Dwivedi <memxor@gmail.com>
---
 arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
 1 file changed, 9 insertions(+), 2 deletions(-)

Comments

Peter Zijlstra Nov. 4, 2024, 7:53 p.m. UTC | #1
On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
>  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
>  1 file changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> index 06b080b61aa5..7e3bd589efc3 100644
> --- a/arch/x86/net/bpf_jit_comp.c
> +++ b/arch/x86/net/bpf_jit_comp.c
> @@ -1954,8 +1954,8 @@ st:			if (is_imm8(insn->off))
>  		case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
>  			insn_off = insn->off;
>  
> -			if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> -			    BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> +			if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> +			     BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
>  				/* Conservatively check that src_reg + insn->off is a kernel address:
>  				 *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
>  				 *   and

Well, I can see why you'd want to get rid of that, that's quite
dreadful code you generate there.

Can't you do something like:

  lea off(%src), %r10
  mov %r10, %r11
  inc %r10
  sar $63, %r11
  and %r11, %r10
  dec %r10

  mov (%r10), %rax

I realize that's not exactly pretty either, but no jumps. Not sure
this'll help much if anything with the TDX thing though.
Alexei Starovoitov Nov. 5, 2024, 6:35 p.m. UTC | #2
On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
>
> On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
> >  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> >  1 file changed, 9 insertions(+), 2 deletions(-)
> >
> > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > index 06b080b61aa5..7e3bd589efc3 100644
> > --- a/arch/x86/net/bpf_jit_comp.c
> > +++ b/arch/x86/net/bpf_jit_comp.c
> > @@ -1954,8 +1954,8 @@ st:                     if (is_imm8(insn->off))
> >               case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
> >                       insn_off = insn->off;
> >
> > -                     if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > -                         BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> > +                     if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > +                          BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
> >                               /* Conservatively check that src_reg + insn->off is a kernel address:
> >                                *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
> >                                *   and
>
> Well, I can see why you'd want to get rid of that, that's quite
> dreadful code you generate there.
>
> Can't you do something like:
>
>   lea off(%src), %r10
>   mov %r10, %r11
>   inc %r10
>   sar $63, %r11
>   and %r11, %r10
>   dec %r10
>
>   mov (%r10), %rax

That's a Linus's hack for mask_user_address() and
earlier in valid_user_address().
I don't think it works because of
#define VSYSCALL_ADDR (-10UL << 20)

We had to filter out that range.

I don't understand why valid_user_address() is not broken,
since fault handler considers vsyscall address to be user addr
in fault_in_kernel_space().
And user addr faulting doesn't have extable handling logic.

> I realize that's not exactly pretty either, but no jumps. Not sure
> this'll help much if anything with the TDX thing though.

to clarify... this is not bpf specific. This bpf JIT logic is
nothing but inlined version of copy_from_kernel_nofault().
So if confidential computing has an issue lots of pieces are affected.

So this patch set is the preferred way to accelerate this
inlined copy_from_kernel_nofault().
If it lands we can follow up and optimize copy_from_kernel_nofault()
with cpu_feature_enabled(X86_FEATURE_SMAP) as well.
Though I'm not sure whether slab get_freepointer_safe() cares
that much about saving nanoseconds. But bpf does.
Peter Zijlstra Nov. 6, 2024, 3:21 p.m. UTC | #3
On Tue, Nov 05, 2024 at 10:35:40AM -0800, Alexei Starovoitov wrote:
> On Mon, Nov 4, 2024 at 11:54 AM Peter Zijlstra <peterz@infradead.org> wrote:
> >
> > On Sun, Nov 03, 2024 at 11:35:12AM -0800, Kumar Kartikeya Dwivedi wrote:
> > >  arch/x86/net/bpf_jit_comp.c | 11 +++++++++--
> > >  1 file changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
> > > index 06b080b61aa5..7e3bd589efc3 100644
> > > --- a/arch/x86/net/bpf_jit_comp.c
> > > +++ b/arch/x86/net/bpf_jit_comp.c
> > > @@ -1954,8 +1954,8 @@ st:                     if (is_imm8(insn->off))
> > >               case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
> > >                       insn_off = insn->off;
> > >
> > > -                     if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > > -                         BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
> > > +                     if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
> > > +                          BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
> > >                               /* Conservatively check that src_reg + insn->off is a kernel address:
> > >                                *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
> > >                                *   and
> >
> > Well, I can see why you'd want to get rid of that, that's quite
> > dreadful code you generate there.
> >
> > Can't you do something like:
> >
> >   lea off(%src), %r10
> >   mov %r10, %r11
> >   inc %r10
> >   sar $63, %r11
> >   and %r11, %r10
> >   dec %r10
> >
> >   mov (%r10), %rax
> 
> That's a Linus's hack for mask_user_address() and
> earlier in valid_user_address().

Yes, something along those lines. Preserves everything with MSB 1, and
maps the rest to ~0.

> I don't think it works because of
> #define VSYSCALL_ADDR (-10UL << 20)
> 
> We had to filter out that range.

Range of _1_ page. Also, nobody should ever touch that page these days
anyway.

> I don't understand why valid_user_address() is not broken,
> since fault handler considers vsyscall address to be user addr
> in fault_in_kernel_space().
> And user addr faulting doesn't have extable handling logic.

The vsyscall page has it's own magical exception handling that does
emulation.

> > I realize that's not exactly pretty either, but no jumps. Not sure
> > this'll help much if anything with the TDX thing though.
> 
> to clarify... this is not bpf specific. This bpf JIT logic is
> nothing but inlined version of copy_from_kernel_nofault().
> So if confidential computing has an issue lots of pieces are affected.

It's not copy_from_kernel_nofault() that's a problem per-se, it's
thinking it's 'safe' for random input that is.
diff mbox series

Patch

diff --git a/arch/x86/net/bpf_jit_comp.c b/arch/x86/net/bpf_jit_comp.c
index 06b080b61aa5..7e3bd589efc3 100644
--- a/arch/x86/net/bpf_jit_comp.c
+++ b/arch/x86/net/bpf_jit_comp.c
@@ -1954,8 +1954,8 @@  st:			if (is_imm8(insn->off))
 		case BPF_LDX | BPF_PROBE_MEMSX | BPF_W:
 			insn_off = insn->off;
 
-			if (BPF_MODE(insn->code) == BPF_PROBE_MEM ||
-			    BPF_MODE(insn->code) == BPF_PROBE_MEMSX) {
+			if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
+			     BPF_MODE(insn->code) == BPF_PROBE_MEMSX) && !cpu_feature_enabled(X86_FEATURE_SMAP)) {
 				/* Conservatively check that src_reg + insn->off is a kernel address:
 				 *   src_reg + insn->off > TASK_SIZE_MAX + PAGE_SIZE
 				 *   and
@@ -2002,6 +2002,9 @@  st:			if (is_imm8(insn->off))
 				/* populate jmp_offset for JAE above to jump to start_of_ldx */
 				start_of_ldx = prog;
 				end_of_jmp[-1] = start_of_ldx - end_of_jmp;
+			} else if ((BPF_MODE(insn->code) == BPF_PROBE_MEM ||
+				    BPF_MODE(insn->code) == BPF_PROBE_MEMSX)) {
+				start_of_ldx = prog;
 			}
 			if (BPF_MODE(insn->code) == BPF_PROBE_MEMSX ||
 			    BPF_MODE(insn->code) == BPF_MEMSX)
@@ -2014,9 +2017,13 @@  st:			if (is_imm8(insn->off))
 				u8 *_insn = image + proglen + (start_of_ldx - temp);
 				s64 delta;
 
+				if (cpu_feature_enabled(X86_FEATURE_SMAP))
+					goto extable_fixup;
+
 				/* populate jmp_offset for JMP above */
 				start_of_ldx[-1] = prog - start_of_ldx;
 
+			extable_fixup:
 				if (!bpf_prog->aux->extable)
 					break;