diff mbox series

[bpf-next,4/4] bpf: export bpf_jit_current

Message ID 20210924095542.33697-5-lmb@cloudflare.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Fix up bpf_jit_limit some more | expand

Checks

Context Check Description
bpf/vmtest-bpf-next-PR success PR summary
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 7 maintainers not CCed: kpsingh@kernel.org masahiroy@kernel.org john.fastabend@gmail.com yhs@fb.com dvyukov@google.com songliubraving@fb.com kafai@fb.com
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 3339 this patch: 3339
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 35 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 3451 this patch: 3451
netdev/header_inline success Link
bpf/vmtest-bpf-next success VM_Test

Commit Message

Lorenz Bauer Sept. 24, 2021, 9:55 a.m. UTC
Expose bpf_jit_current as a read only value via sysctl.

Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
---
 include/linux/filter.h     | 1 +
 kernel/bpf/core.c          | 3 +--
 net/core/sysctl_net_core.c | 7 +++++++
 3 files changed, 9 insertions(+), 2 deletions(-)

Comments

Daniel Borkmann Sept. 27, 2021, 1:34 p.m. UTC | #1
On 9/24/21 11:55 AM, Lorenz Bauer wrote:
> Expose bpf_jit_current as a read only value via sysctl.
> 
> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
> ---
>   include/linux/filter.h     | 1 +
>   kernel/bpf/core.c          | 3 +--
>   net/core/sysctl_net_core.c | 7 +++++++
>   3 files changed, 9 insertions(+), 2 deletions(-)
> 
> diff --git a/include/linux/filter.h b/include/linux/filter.h
> index ef03ff34234d..b2143ad5ce00 100644
> --- a/include/linux/filter.h
> +++ b/include/linux/filter.h
> @@ -1052,6 +1052,7 @@ extern int bpf_jit_harden;
>   extern int bpf_jit_kallsyms;
>   extern long bpf_jit_limit;
>   extern long bpf_jit_limit_max;
> +extern atomic_long_t bpf_jit_current;
>   
>   typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
>   
> diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
> index e844a2a4c99a..93f95e9ee8be 100644
> --- a/kernel/bpf/core.c
> +++ b/kernel/bpf/core.c
> @@ -525,6 +525,7 @@ int bpf_jit_kallsyms __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON);
>   int bpf_jit_harden   __read_mostly;
>   long bpf_jit_limit   __read_mostly;
>   long bpf_jit_limit_max __read_mostly;
> +atomic_long_t bpf_jit_current __read_mostly;
>   
>   static void
>   bpf_prog_ksym_set_addr(struct bpf_prog *prog)
> @@ -800,8 +801,6 @@ int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
>   	return slot;
>   }
>   
> -static atomic_long_t bpf_jit_current;
> -
>   /* Can be overridden by an arch's JIT compiler if it has a custom,
>    * dedicated BPF backend memory area, or if neither of the two
>    * below apply.
> diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
> index 5f88526ad61c..674aac163b84 100644
> --- a/net/core/sysctl_net_core.c
> +++ b/net/core/sysctl_net_core.c
> @@ -421,6 +421,13 @@ static struct ctl_table net_core_table[] = {
>   		.extra1		= &long_one,
>   		.extra2		= &bpf_jit_limit_max,
>   	},
> +	{
> +		.procname	= "bpf_jit_current",
> +		.data		= &bpf_jit_current,
> +		.maxlen		= sizeof(long),
> +		.mode		= 0400,
> +		.proc_handler	= proc_dolongvec_minmax_bpf_restricted,

Overall series looks good to me. The only nit I would have is that the above could (in theory)
be subject to atomic_long_t vs long type confusion. I would rather prefer to have a small handler
which properly reads out the atomic_long_t and then passes it onwards as a temporary/plain long
to user space.

Thanks,
Daniel

> +	},
>   #endif
>   	{
>   		.procname	= "netdev_tstamp_prequeue",
>
Jakub Sitnicki Sept. 27, 2021, 2:01 p.m. UTC | #2
On Mon, Sep 27, 2021 at 03:34 PM CEST, Daniel Borkmann wrote:
> On 9/24/21 11:55 AM, Lorenz Bauer wrote:
>> Expose bpf_jit_current as a read only value via sysctl.
>> Signed-off-by: Lorenz Bauer <lmb@cloudflare.com>
>> ---

I find exposing stats via system configuration variables a bit
unexpected. Not sure if there is any example today that we're following.

Maybe an entry under /sys/kernel/debug would be a better fit?

That way we don't have to commit to a sysctl that might go away if we
start charging JIT allocs against memory cgroup quota.

Although that brings up question against which cgroup iptables xt_bpf
allocations should be charged? Root cgroup?
Lorenz Bauer Sept. 28, 2021, 9:02 a.m. UTC | #3
On Mon, 27 Sept 2021 at 15:01, Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> I find exposing stats via system configuration variables a bit
> unexpected. Not sure if there is any example today that we're following.
>
> Maybe an entry under /sys/kernel/debug would be a better fit?
>
> That way we don't have to commit to a sysctl that might go away if we
> start charging JIT allocs against memory cgroup quota.

I had a look around, there are no other obvious places in debugfs or
proc where we already have bpf info exposed. It currently all goes via
sysctl.

There are examples of readonly sysctls:
$ sudo find /proc/sys -perm 0444 | wc -l
90

There are no examples of sysctls with mode 0400 however:
$ sudo find /proc/sys -perm 0400 | wc -l
0

I find it kind of weird that the bpf sysctls are so tightly locked
down (CAP_SYS_ADMIN && root) even for reading. Maybe something I can
change?
Nicolas Dichtel Sept. 29, 2021, 2:56 p.m. UTC | #4
Le 24/09/2021 à 11:55, Lorenz Bauer a écrit :
> Expose bpf_jit_current as a read only value via sysctl.
bpf_jit_current unit is 'pages' and bpf_jit_limit unit is bytes.
Maybe exposing those values with the same unit will ease debugging.


Regards,
Nicolas
diff mbox series

Patch

diff --git a/include/linux/filter.h b/include/linux/filter.h
index ef03ff34234d..b2143ad5ce00 100644
--- a/include/linux/filter.h
+++ b/include/linux/filter.h
@@ -1052,6 +1052,7 @@  extern int bpf_jit_harden;
 extern int bpf_jit_kallsyms;
 extern long bpf_jit_limit;
 extern long bpf_jit_limit_max;
+extern atomic_long_t bpf_jit_current;
 
 typedef void (*bpf_jit_fill_hole_t)(void *area, unsigned int size);
 
diff --git a/kernel/bpf/core.c b/kernel/bpf/core.c
index e844a2a4c99a..93f95e9ee8be 100644
--- a/kernel/bpf/core.c
+++ b/kernel/bpf/core.c
@@ -525,6 +525,7 @@  int bpf_jit_kallsyms __read_mostly = IS_BUILTIN(CONFIG_BPF_JIT_DEFAULT_ON);
 int bpf_jit_harden   __read_mostly;
 long bpf_jit_limit   __read_mostly;
 long bpf_jit_limit_max __read_mostly;
+atomic_long_t bpf_jit_current __read_mostly;
 
 static void
 bpf_prog_ksym_set_addr(struct bpf_prog *prog)
@@ -800,8 +801,6 @@  int bpf_jit_add_poke_descriptor(struct bpf_prog *prog,
 	return slot;
 }
 
-static atomic_long_t bpf_jit_current;
-
 /* Can be overridden by an arch's JIT compiler if it has a custom,
  * dedicated BPF backend memory area, or if neither of the two
  * below apply.
diff --git a/net/core/sysctl_net_core.c b/net/core/sysctl_net_core.c
index 5f88526ad61c..674aac163b84 100644
--- a/net/core/sysctl_net_core.c
+++ b/net/core/sysctl_net_core.c
@@ -421,6 +421,13 @@  static struct ctl_table net_core_table[] = {
 		.extra1		= &long_one,
 		.extra2		= &bpf_jit_limit_max,
 	},
+	{
+		.procname	= "bpf_jit_current",
+		.data		= &bpf_jit_current,
+		.maxlen		= sizeof(long),
+		.mode		= 0400,
+		.proc_handler	= proc_dolongvec_minmax_bpf_restricted,
+	},
 #endif
 	{
 		.procname	= "netdev_tstamp_prequeue",