diff mbox series

[bpf-next] bpf: allow readonly direct path access for skfilter

Message ID 20211123205607.452497-1-zenczykowski@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [bpf-next] bpf: allow readonly direct path access for skfilter | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/cc_maintainers warning 9 maintainers not CCed: kafai@fb.com nathan@kernel.org songliubraving@fb.com john.fastabend@gmail.com ndesaulniers@google.com kpsingh@kernel.org yhs@fb.com llvm@lists.linux.dev andrii@kernel.org
netdev/build_clang success Errors and warnings before: 22 this patch: 22
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 20 this patch: 20
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 11 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next fail VM_Test
bpf/vmtest-bpf-next-PR fail PR summary

Commit Message

Maciej Żenczykowski Nov. 23, 2021, 8:56 p.m. UTC
From: Maciej Żenczykowski <maze@google.com>

skfilter bpf programs can read the packet directly via llvm.bpf.load.byte/
/half/word which are 8/16/32-bit primitive bpf instructions and thus
behave basically as well as DPA reads.  But there is no 64-bit equivalent,
due to the support for the equivalent 64-bit bpf opcode never having been
added (unclear why, there was a patch posted).
DPA uses a slightly different mechanism, so doesn't suffer this limitation.

Using 64-bit reads, 128-bit ipv6 address comparisons can be done in just
2 steps, instead of the 4 steps needed with llvm.bpf.word.

This should hopefully allow simpler (less instructions, and possibly less
logic and maybe even less jumps) programs.  Less jumps may also mean vastly
faster bpf verifier times (it can be exponential in the number of jumps...).

This can be particularly important when trying to do something like scan
a netlink message for a pattern (2000 iteration loop) to decide whether
a message should be dropped, or delivered to userspace (thus waking it up).

I'm requiring CAP_NET_ADMIN because I'm not sure of the security
implications...

Tested: only build tested
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 kernel/bpf/verifier.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

Maciej Żenczykowski Nov. 23, 2021, 11:02 p.m. UTC | #1
Note: this is more of an RFC... question in patch format... is this
even a good idea?

On Tue, Nov 23, 2021 at 12:56 PM Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
>
> From: Maciej Żenczykowski <maze@google.com>
>
> skfilter bpf programs can read the packet directly via llvm.bpf.load.byte/
> /half/word which are 8/16/32-bit primitive bpf instructions and thus
> behave basically as well as DPA reads.  But there is no 64-bit equivalent,
> due to the support for the equivalent 64-bit bpf opcode never having been
> added (unclear why, there was a patch posted).
> DPA uses a slightly different mechanism, so doesn't suffer this limitation.
>
> Using 64-bit reads, 128-bit ipv6 address comparisons can be done in just
> 2 steps, instead of the 4 steps needed with llvm.bpf.word.
>
> This should hopefully allow simpler (less instructions, and possibly less
> logic and maybe even less jumps) programs.  Less jumps may also mean vastly
> faster bpf verifier times (it can be exponential in the number of jumps...).
>
> This can be particularly important when trying to do something like scan
> a netlink message for a pattern (2000 iteration loop) to decide whether
> a message should be dropped, or delivered to userspace (thus waking it up).
>
> I'm requiring CAP_NET_ADMIN because I'm not sure of the security
> implications...
>
> Tested: only build tested
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>  kernel/bpf/verifier.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 331b170d9fcc..0c2e25fb9844 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3258,6 +3258,11 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
>         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>
>         switch (prog_type) {
> +       case BPF_PROG_TYPE_SOCKET_FILTER:
> +               if (meta || !capable(CAP_NET_ADMIN))
> +                       return false;
> +               fallthrough;
> +
>         /* Program types only with direct read access go here! */
>         case BPF_PROG_TYPE_LWT_IN:
>         case BPF_PROG_TYPE_LWT_OUT:
> --
> 2.34.0.rc2.393.gf8c9666880-goog
>
Song Liu Nov. 27, 2021, 2:13 a.m. UTC | #2
On Tue, Nov 23, 2021 at 3:02 PM Maciej Żenczykowski <maze@google.com> wrote:
>
> Note: this is more of an RFC... question in patch format... is this
> even a good idea?
>
> On Tue, Nov 23, 2021 at 12:56 PM Maciej Żenczykowski
> <zenczykowski@gmail.com> wrote:
> >
> > From: Maciej Żenczykowski <maze@google.com>
> >
> > skfilter bpf programs can read the packet directly via llvm.bpf.load.byte/
> > /half/word which are 8/16/32-bit primitive bpf instructions and thus
> > behave basically as well as DPA reads.  But there is no 64-bit equivalent,
> > due to the support for the equivalent 64-bit bpf opcode never having been
> > added (unclear why, there was a patch posted).
> > DPA uses a slightly different mechanism, so doesn't suffer this limitation.
> >
> > Using 64-bit reads, 128-bit ipv6 address comparisons can be done in just
> > 2 steps, instead of the 4 steps needed with llvm.bpf.word.
> >
> > This should hopefully allow simpler (less instructions, and possibly less
> > logic and maybe even less jumps) programs.  Less jumps may also mean vastly
> > faster bpf verifier times (it can be exponential in the number of jumps...).
> >
> > This can be particularly important when trying to do something like scan
> > a netlink message for a pattern (2000 iteration loop) to decide whether
> > a message should be dropped, or delivered to userspace (thus waking it up).
> >
> > I'm requiring CAP_NET_ADMIN because I'm not sure of the security
> > implications...

I don't know BPF_PROG_TYPE_SOCKET_FILTER very well, but the patch
seems reasonable to me. It will be great if we can show the performance
impact with a benchmark or a selftests.

Thanks,
Song
Alexei Starovoitov Nov. 30, 2021, 1:42 a.m. UTC | #3
On Tue, Nov 23, 2021 at 12:56 PM Maciej Żenczykowski
<zenczykowski@gmail.com> wrote:
>
> From: Maciej Żenczykowski <maze@google.com>
>
> skfilter bpf programs can read the packet directly via llvm.bpf.load.byte/
> /half/word which are 8/16/32-bit primitive bpf instructions and thus
> behave basically as well as DPA reads.  But there is no 64-bit equivalent,
> due to the support for the equivalent 64-bit bpf opcode never having been
> added (unclear why, there was a patch posted).
> DPA uses a slightly different mechanism, so doesn't suffer this limitation.
>
> Using 64-bit reads, 128-bit ipv6 address comparisons can be done in just
> 2 steps, instead of the 4 steps needed with llvm.bpf.word.

llvm.bpf.word is a pseudo instruction.
It's actually a function call for classic bpf.
See bpf_gen_ld_abs.
We used to have ugly special cases for them in JITs,
but then got rid of it.
Don't use them if performance is a requirement.

> This should hopefully allow simpler (less instructions, and possibly less
> logic and maybe even less jumps) programs.  Less jumps may also mean vastly
> faster bpf verifier times (it can be exponential in the number of jumps...).
>
> This can be particularly important when trying to do something like scan
> a netlink message for a pattern (2000 iteration loop) to decide whether
> a message should be dropped, or delivered to userspace (thus waking it up).
>
> I'm requiring CAP_NET_ADMIN because I'm not sure of the security
> implications...
>
> Tested: only build tested
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>  kernel/bpf/verifier.c | 5 +++++
>  1 file changed, 5 insertions(+)
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 331b170d9fcc..0c2e25fb9844 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -3258,6 +3258,11 @@ static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
>         enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
>
>         switch (prog_type) {
> +       case BPF_PROG_TYPE_SOCKET_FILTER:
> +               if (meta || !capable(CAP_NET_ADMIN))
> +                       return false;

probably needs CAP_BPF too.

Other than that I think it's fine.
diff mbox series

Patch

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 331b170d9fcc..0c2e25fb9844 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -3258,6 +3258,11 @@  static bool may_access_direct_pkt_data(struct bpf_verifier_env *env,
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
 
 	switch (prog_type) {
+	case BPF_PROG_TYPE_SOCKET_FILTER:
+		if (meta || !capable(CAP_NET_ADMIN))
+			return false;
+		fallthrough;
+
 	/* Program types only with direct read access go here! */
 	case BPF_PROG_TYPE_LWT_IN:
 	case BPF_PROG_TYPE_LWT_OUT: