diff mbox series

[v4,bpf-next,2/3] mm/bpf: Add bpf_get_kmem_cache() kfunc

Message ID 20241002180956.1781008-3-namhyung@kernel.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf: Add kmem_cache iterator and kfunc | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 22 of 22 maintainers
netdev/build_clang success Errors and warnings before: 7 this patch: 7
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 62 this patch: 62
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 32 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 10 this patch: 10
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-10 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / test (test_verifier, false, 360) / test_verifier on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test (test_maps, false, 360) / test_maps on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test (test_progs, false, 360) / test_progs on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-17 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-16 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 success Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / test (test_verifier, false, 360) / test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-35 success Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-41 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test (test_progs, false, 360) / test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-40 success Logs for x86_64-llvm-18 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-36 success Logs for x86_64-llvm-18 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-gcc / test (test_progs_no_alu32_parallel, true, 30) / test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / test (test_progs, false, 360) / test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test (test_maps, false, 360) / test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-25 success Logs for x86_64-gcc / test (test_verifier, false, 360) / test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-gcc / veristat / veristat on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-gcc / test (test_progs_parallel, true, 30) / test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-37 success Logs for x86_64-llvm-18 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-38 success Logs for x86_64-llvm-18 / test (test_progs_cpuv4, false, 360) / test_progs_cpuv4 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-39 success Logs for x86_64-llvm-18 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-30 success Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-31 success Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17

Commit Message

Namhyung Kim Oct. 2, 2024, 6:09 p.m. UTC
The bpf_get_kmem_cache() is to get a slab cache information from a
virtual address like virt_to_cache().  If the address is a pointer
to a slab object, it'd return a valid kmem_cache pointer, otherwise
NULL is returned.

It doesn't grab a reference count of the kmem_cache so the caller is
responsible to manage the access.  The intended use case for now is to
symbolize locks in slab objects from the lock contention tracepoints.

Suggested-by: Vlastimil Babka <vbabka@suse.cz>
Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
Signed-off-by: Namhyung Kim <namhyung@kernel.org>
---
 kernel/bpf/helpers.c |  1 +
 mm/slab_common.c     | 19 +++++++++++++++++++
 2 files changed, 20 insertions(+)

Comments

Namhyung Kim Oct. 4, 2024, 5:31 a.m. UTC | #1
On Wed, Oct 02, 2024 at 11:09:55AM -0700, Namhyung Kim wrote:
> The bpf_get_kmem_cache() is to get a slab cache information from a
> virtual address like virt_to_cache().  If the address is a pointer
> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> NULL is returned.
> 
> It doesn't grab a reference count of the kmem_cache so the caller is
> responsible to manage the access.  The intended use case for now is to
> symbolize locks in slab objects from the lock contention tracepoints.
> 
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/bpf/helpers.c |  1 +
>  mm/slab_common.c     | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)
> 
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>  BTF_KFUNCS_END(common_btf_ids)
>  
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7443244656150325..5484e1cd812f698e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>  }
>  EXPORT_SYMBOL(ksize);
>  
> +#ifdef CONFIG_BPF_SYSCALL
> +#include <linux/btf.h>
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +{
> +	struct slab *slab;
> +
> +	if (!virt_addr_valid(addr))

Hmm.. 32-bit systems don't like this.  Is it ok to change the type of
the parameter (addr) to 'unsigned long'?  Or do you want to keep it as
u64 and add a cast here?

Thanks,
Namhyung


> +		return NULL;
> +
> +	slab = virt_to_slab((void *)(long)addr);
> +	return slab ? slab->slab_cache : NULL;
> +}
> +
> +__bpf_kfunc_end_defs();
> +#endif /* CONFIG_BPF_SYSCALL */
> +
>  /* Tracepoints definitions. */
>  EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> -- 
> 2.46.1.824.gd892dcdcdd-goog
>
Song Liu Oct. 4, 2024, 8:10 p.m. UTC | #2
On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> The bpf_get_kmem_cache() is to get a slab cache information from a
> virtual address like virt_to_cache().  If the address is a pointer
> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> NULL is returned.
>
> It doesn't grab a reference count of the kmem_cache so the caller is
> responsible to manage the access.  The intended use case for now is to
> symbolize locks in slab objects from the lock contention tracepoints.
>
> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> ---
>  kernel/bpf/helpers.c |  1 +
>  mm/slab_common.c     | 19 +++++++++++++++++++
>  2 files changed, 20 insertions(+)
>
> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> --- a/kernel/bpf/helpers.c
> +++ b/kernel/bpf/helpers.c
> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>  BTF_KFUNCS_END(common_btf_ids)
>
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 7443244656150325..5484e1cd812f698e 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>  }
>  EXPORT_SYMBOL(ksize);
>
> +#ifdef CONFIG_BPF_SYSCALL
> +#include <linux/btf.h>
> +
> +__bpf_kfunc_start_defs();
> +
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +{
> +       struct slab *slab;
> +
> +       if (!virt_addr_valid(addr))
> +               return NULL;
> +
> +       slab = virt_to_slab((void *)(long)addr);
> +       return slab ? slab->slab_cache : NULL;
> +}

Do we need to hold a refcount to the slab_cache? Given
we make this kfunc available everywhere, including
sleepable contexts, I think it is necessary.

Thanks
Song

> +
> +__bpf_kfunc_end_defs();
> +#endif /* CONFIG_BPF_SYSCALL */
> +
>  /* Tracepoints definitions. */
>  EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
> --
> 2.46.1.824.gd892dcdcdd-goog
>
Roman Gushchin Oct. 4, 2024, 9:25 p.m. UTC | #3
On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > The bpf_get_kmem_cache() is to get a slab cache information from a
> > virtual address like virt_to_cache().  If the address is a pointer
> > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > NULL is returned.
> >
> > It doesn't grab a reference count of the kmem_cache so the caller is
> > responsible to manage the access.  The intended use case for now is to
> > symbolize locks in slab objects from the lock contention tracepoints.
> >
> > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > ---
> >  kernel/bpf/helpers.c |  1 +
> >  mm/slab_common.c     | 19 +++++++++++++++++++
> >  2 files changed, 20 insertions(+)
> >
> > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > --- a/kernel/bpf/helpers.c
> > +++ b/kernel/bpf/helpers.c
> > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> >  BTF_KFUNCS_END(common_btf_ids)
> >
> >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 7443244656150325..5484e1cd812f698e 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> >  }
> >  EXPORT_SYMBOL(ksize);
> >
> > +#ifdef CONFIG_BPF_SYSCALL
> > +#include <linux/btf.h>
> > +
> > +__bpf_kfunc_start_defs();
> > +
> > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > +{
> > +       struct slab *slab;
> > +
> > +       if (!virt_addr_valid(addr))
> > +               return NULL;
> > +
> > +       slab = virt_to_slab((void *)(long)addr);
> > +       return slab ? slab->slab_cache : NULL;
> > +}
> 
> Do we need to hold a refcount to the slab_cache? Given
> we make this kfunc available everywhere, including
> sleepable contexts, I think it is necessary.

It's a really good question.

If the callee somehow owns the slab object, as in the example
provided in the series (current task), it's not necessarily.

If a user can pass a random address, you're right, we need to
grab the slab_cache's refcnt. But then we also can't guarantee
that the object still belongs to the same slab_cache, the
function becomes racy by the definition.
Song Liu Oct. 4, 2024, 9:36 p.m. UTC | #4
On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
>
> On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >
> > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > virtual address like virt_to_cache().  If the address is a pointer
> > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > NULL is returned.
> > >
> > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > responsible to manage the access.  The intended use case for now is to
> > > symbolize locks in slab objects from the lock contention tracepoints.
> > >
> > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > ---
> > >  kernel/bpf/helpers.c |  1 +
> > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > >  2 files changed, 20 insertions(+)
> > >
> > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > --- a/kernel/bpf/helpers.c
> > > +++ b/kernel/bpf/helpers.c
> > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > >  BTF_KFUNCS_END(common_btf_ids)
> > >
> > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > index 7443244656150325..5484e1cd812f698e 100644
> > > --- a/mm/slab_common.c
> > > +++ b/mm/slab_common.c
> > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > >  }
> > >  EXPORT_SYMBOL(ksize);
> > >
> > > +#ifdef CONFIG_BPF_SYSCALL
> > > +#include <linux/btf.h>
> > > +
> > > +__bpf_kfunc_start_defs();
> > > +
> > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > +{
> > > +       struct slab *slab;
> > > +
> > > +       if (!virt_addr_valid(addr))
> > > +               return NULL;
> > > +
> > > +       slab = virt_to_slab((void *)(long)addr);
> > > +       return slab ? slab->slab_cache : NULL;
> > > +}
> >
> > Do we need to hold a refcount to the slab_cache? Given
> > we make this kfunc available everywhere, including
> > sleepable contexts, I think it is necessary.
>
> It's a really good question.
>
> If the callee somehow owns the slab object, as in the example
> provided in the series (current task), it's not necessarily.
>
> If a user can pass a random address, you're right, we need to
> grab the slab_cache's refcnt. But then we also can't guarantee
> that the object still belongs to the same slab_cache, the
> function becomes racy by the definition.

To be safe, we can limit the kfunc to sleepable context only. Then
we can lock slab_mutex for virt_to_slab, and hold a refcount
to slab_cache. We will need a KF_RELEASE kfunc to release
the refcount later.

IIUC, this limitation (sleepable context only) shouldn't be a problem
for perf use case?

Thanks,
Song
Namhyung Kim Oct. 4, 2024, 9:58 p.m. UTC | #5
On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> >
> > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >
> > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > NULL is returned.
> > > >
> > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > responsible to manage the access.  The intended use case for now is to
> > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > >
> > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > ---
> > > >  kernel/bpf/helpers.c |  1 +
> > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > >  2 files changed, 20 insertions(+)
> > > >
> > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > --- a/kernel/bpf/helpers.c
> > > > +++ b/kernel/bpf/helpers.c
> > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > >  BTF_KFUNCS_END(common_btf_ids)
> > > >
> > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > --- a/mm/slab_common.c
> > > > +++ b/mm/slab_common.c
> > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > >  }
> > > >  EXPORT_SYMBOL(ksize);
> > > >
> > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > +#include <linux/btf.h>
> > > > +
> > > > +__bpf_kfunc_start_defs();
> > > > +
> > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > +{
> > > > +       struct slab *slab;
> > > > +
> > > > +       if (!virt_addr_valid(addr))
> > > > +               return NULL;
> > > > +
> > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > +       return slab ? slab->slab_cache : NULL;
> > > > +}
> > >
> > > Do we need to hold a refcount to the slab_cache? Given
> > > we make this kfunc available everywhere, including
> > > sleepable contexts, I think it is necessary.
> >
> > It's a really good question.
> >
> > If the callee somehow owns the slab object, as in the example
> > provided in the series (current task), it's not necessarily.
> >
> > If a user can pass a random address, you're right, we need to
> > grab the slab_cache's refcnt. But then we also can't guarantee
> > that the object still belongs to the same slab_cache, the
> > function becomes racy by the definition.
> 
> To be safe, we can limit the kfunc to sleepable context only. Then
> we can lock slab_mutex for virt_to_slab, and hold a refcount
> to slab_cache. We will need a KF_RELEASE kfunc to release
> the refcount later.

Then it needs to call kmem_cache_destroy() for release which contains
rcu_barrier. :(

> 
> IIUC, this limitation (sleepable context only) shouldn't be a problem
> for perf use case?

No, it would be called from the lock contention path including
spinlocks. :(

Can we limit it to non-sleepable ctx and not to pass arbtrary address
somehow (or not to save the result pointer)?

Thanks,
Namhyung
Song Liu Oct. 4, 2024, 10:57 p.m. UTC | #6
On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > >
> > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >
> > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > NULL is returned.
> > > > >
> > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > responsible to manage the access.  The intended use case for now is to
> > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > >
> > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > ---
> > > > >  kernel/bpf/helpers.c |  1 +
> > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > >  2 files changed, 20 insertions(+)
> > > > >
> > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > --- a/kernel/bpf/helpers.c
> > > > > +++ b/kernel/bpf/helpers.c
> > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > >
> > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > --- a/mm/slab_common.c
> > > > > +++ b/mm/slab_common.c
> > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > >  }
> > > > >  EXPORT_SYMBOL(ksize);
> > > > >
> > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > +#include <linux/btf.h>
> > > > > +
> > > > > +__bpf_kfunc_start_defs();
> > > > > +
> > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > +{
> > > > > +       struct slab *slab;
> > > > > +
> > > > > +       if (!virt_addr_valid(addr))
> > > > > +               return NULL;
> > > > > +
> > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > +}
> > > >
> > > > Do we need to hold a refcount to the slab_cache? Given
> > > > we make this kfunc available everywhere, including
> > > > sleepable contexts, I think it is necessary.
> > >
> > > It's a really good question.
> > >
> > > If the callee somehow owns the slab object, as in the example
> > > provided in the series (current task), it's not necessarily.
> > >
> > > If a user can pass a random address, you're right, we need to
> > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > that the object still belongs to the same slab_cache, the
> > > function becomes racy by the definition.
> >
> > To be safe, we can limit the kfunc to sleepable context only. Then
> > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > to slab_cache. We will need a KF_RELEASE kfunc to release
> > the refcount later.
>
> Then it needs to call kmem_cache_destroy() for release which contains
> rcu_barrier. :(
>
> >
> > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > for perf use case?
>
> No, it would be called from the lock contention path including
> spinlocks. :(
>
> Can we limit it to non-sleepable ctx and not to pass arbtrary address
> somehow (or not to save the result pointer)?

I hacked something like the following. It is not ideal, because we are
taking spinlock_t pointer instead of void pointer. To use this with void
'pointer, we will need some verifier changes.

Thanks,
Song


diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
index 3709fb142881..7311a26ecb01 100644
--- i/kernel/bpf/helpers.c
+++ w/kernel/bpf/helpers.c
@@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
-BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
+BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
| KF_RCU_PROTECTED)
 BTF_KFUNCS_END(common_btf_ids)

 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git i/mm/slab_common.c w/mm/slab_common.c
index 5484e1cd812f..3e3e5f172f2e 100644
--- i/mm/slab_common.c
+++ w/mm/slab_common.c
@@ -1327,14 +1327,15 @@ EXPORT_SYMBOL(ksize);

 __bpf_kfunc_start_defs();

-__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
+__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr)
 {
        struct slab *slab;
+       unsigned long a = (unsigned long)addr;

-       if (!virt_addr_valid(addr))
+       if (!virt_addr_valid(a))
                return NULL;

-       slab = virt_to_slab((void *)(long)addr);
+       slab = virt_to_slab(addr);
        return slab ? slab->slab_cache : NULL;
 }

@@ -1346,4 +1347,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
 EXPORT_TRACEPOINT_SYMBOL(kfree);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
-
diff --git i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
index 3f6ec15a1bf6..8238155a5055 100644
--- i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
+++ w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
@@ -16,7 +16,7 @@ struct {
        __uint(max_entries, 1024);
 } slab_hash SEC(".maps");

-extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
+extern struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr) __ksym;

 /* result, will be checked by userspace */
 int found;
@@ -46,21 +46,23 @@ int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
 SEC("raw_tp/bpf_test_finish")
 int BPF_PROG(check_task_struct)
 {
-       __u64 curr = bpf_get_current_task();
+       struct task_struct *curr = bpf_get_current_task_btf();
        struct kmem_cache *s;
        char *name;

-       s = bpf_get_kmem_cache(curr);
+       s = bpf_get_kmem_cache(&curr->alloc_lock);
        if (s == NULL) {
                found = -1;
                return 0;
        }

+       bpf_rcu_read_lock();
        name = bpf_map_lookup_elem(&slab_hash, &s);
        if (name && !bpf_strncmp(name, 11, "task_struct"))
                found = 1;
        else
                found = -2;
+       bpf_rcu_read_unlock();

        return 0;
 }
Namhyung Kim Oct. 4, 2024, 11:28 p.m. UTC | #7
On Fri, Oct 04, 2024 at 03:57:26PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > > NULL is returned.
> > > > > >
> > > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > > responsible to manage the access.  The intended use case for now is to
> > > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > > >
> > > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  kernel/bpf/helpers.c |  1 +
> > > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > > >  2 files changed, 20 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > > >
> > > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > > --- a/mm/slab_common.c
> > > > > > +++ b/mm/slab_common.c
> > > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(ksize);
> > > > > >
> > > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > > +#include <linux/btf.h>
> > > > > > +
> > > > > > +__bpf_kfunc_start_defs();
> > > > > > +
> > > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > > +{
> > > > > > +       struct slab *slab;
> > > > > > +
> > > > > > +       if (!virt_addr_valid(addr))
> > > > > > +               return NULL;
> > > > > > +
> > > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > > +}
> > > > >
> > > > > Do we need to hold a refcount to the slab_cache? Given
> > > > > we make this kfunc available everywhere, including
> > > > > sleepable contexts, I think it is necessary.
> > > >
> > > > It's a really good question.
> > > >
> > > > If the callee somehow owns the slab object, as in the example
> > > > provided in the series (current task), it's not necessarily.
> > > >
> > > > If a user can pass a random address, you're right, we need to
> > > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > > that the object still belongs to the same slab_cache, the
> > > > function becomes racy by the definition.
> > >
> > > To be safe, we can limit the kfunc to sleepable context only. Then
> > > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > > to slab_cache. We will need a KF_RELEASE kfunc to release
> > > the refcount later.
> >
> > Then it needs to call kmem_cache_destroy() for release which contains
> > rcu_barrier. :(
> >
> > >
> > > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > > for perf use case?
> >
> > No, it would be called from the lock contention path including
> > spinlocks. :(
> >
> > Can we limit it to non-sleepable ctx and not to pass arbtrary address
> > somehow (or not to save the result pointer)?
> 
> I hacked something like the following. It is not ideal, because we are
> taking spinlock_t pointer instead of void pointer. To use this with void
> 'pointer, we will need some verifier changes.

Thanks a lot for doing this!!  I'll take a look at the verifier what
needs to be done.

Namhyung

> 
> 
> diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> index 3709fb142881..7311a26ecb01 100644
> --- i/kernel/bpf/helpers.c
> +++ w/kernel/bpf/helpers.c
> @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> | KF_RCU_PROTECTED)
>  BTF_KFUNCS_END(common_btf_ids)
> 
>  static const struct btf_kfunc_id_set common_kfunc_set = {
> diff --git i/mm/slab_common.c w/mm/slab_common.c
> index 5484e1cd812f..3e3e5f172f2e 100644
> --- i/mm/slab_common.c
> +++ w/mm/slab_common.c
> @@ -1327,14 +1327,15 @@ EXPORT_SYMBOL(ksize);
> 
>  __bpf_kfunc_start_defs();
> 
> -__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr)
>  {
>         struct slab *slab;
> +       unsigned long a = (unsigned long)addr;
> 
> -       if (!virt_addr_valid(addr))
> +       if (!virt_addr_valid(a))
>                 return NULL;
> 
> -       slab = virt_to_slab((void *)(long)addr);
> +       slab = virt_to_slab(addr);
>         return slab ? slab->slab_cache : NULL;
>  }
> 
> @@ -1346,4 +1347,3 @@ EXPORT_TRACEPOINT_SYMBOL(kmalloc);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);
>  EXPORT_TRACEPOINT_SYMBOL(kfree);
>  EXPORT_TRACEPOINT_SYMBOL(kmem_cache_free);
> -
> diff --git i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> index 3f6ec15a1bf6..8238155a5055 100644
> --- i/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> +++ w/tools/testing/selftests/bpf/progs/kmem_cache_iter.c
> @@ -16,7 +16,7 @@ struct {
>         __uint(max_entries, 1024);
>  } slab_hash SEC(".maps");
> 
> -extern struct kmem_cache *bpf_get_kmem_cache(__u64 addr) __ksym;
> +extern struct kmem_cache *bpf_get_kmem_cache(spinlock_t *addr) __ksym;
> 
>  /* result, will be checked by userspace */
>  int found;
> @@ -46,21 +46,23 @@ int slab_info_collector(struct bpf_iter__kmem_cache *ctx)
>  SEC("raw_tp/bpf_test_finish")
>  int BPF_PROG(check_task_struct)
>  {
> -       __u64 curr = bpf_get_current_task();
> +       struct task_struct *curr = bpf_get_current_task_btf();
>         struct kmem_cache *s;
>         char *name;
> 
> -       s = bpf_get_kmem_cache(curr);
> +       s = bpf_get_kmem_cache(&curr->alloc_lock);
>         if (s == NULL) {
>                 found = -1;
>                 return 0;
>         }
> 
> +       bpf_rcu_read_lock();
>         name = bpf_map_lookup_elem(&slab_hash, &s);
>         if (name && !bpf_strncmp(name, 11, "task_struct"))
>                 found = 1;
>         else
>                 found = -2;
> +       bpf_rcu_read_unlock();
> 
>         return 0;
>  }
Alexei Starovoitov Oct. 4, 2024, 11:44 p.m. UTC | #8
On Fri, Oct 4, 2024 at 3:57 PM Song Liu <song@kernel.org> wrote:
>
> On Fri, Oct 4, 2024 at 2:58 PM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Fri, Oct 04, 2024 at 02:36:30PM -0700, Song Liu wrote:
> > > On Fri, Oct 4, 2024 at 2:25 PM Roman Gushchin <roman.gushchin@linux.dev> wrote:
> > > >
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > > On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > > >
> > > > > > The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > > > virtual address like virt_to_cache().  If the address is a pointer
> > > > > > to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > > > NULL is returned.
> > > > > >
> > > > > > It doesn't grab a reference count of the kmem_cache so the caller is
> > > > > > responsible to manage the access.  The intended use case for now is to
> > > > > > symbolize locks in slab objects from the lock contention tracepoints.
> > > > > >
> > > > > > Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > > > Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > > > Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > > > Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > > > > ---
> > > > > >  kernel/bpf/helpers.c |  1 +
> > > > > >  mm/slab_common.c     | 19 +++++++++++++++++++
> > > > > >  2 files changed, 20 insertions(+)
> > > > > >
> > > > > > diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> > > > > > index 4053f279ed4cc7ab..3709fb14288105c6 100644
> > > > > > --- a/kernel/bpf/helpers.c
> > > > > > +++ b/kernel/bpf/helpers.c
> > > > > > @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > > > > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > > > > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > > > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > > > >  BTF_KFUNCS_END(common_btf_ids)
> > > > > >
> > > > > >  static const struct btf_kfunc_id_set common_kfunc_set = {
> > > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > > > > > index 7443244656150325..5484e1cd812f698e 100644
> > > > > > --- a/mm/slab_common.c
> > > > > > +++ b/mm/slab_common.c
> > > > > > @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> > > > > >  }
> > > > > >  EXPORT_SYMBOL(ksize);
> > > > > >
> > > > > > +#ifdef CONFIG_BPF_SYSCALL
> > > > > > +#include <linux/btf.h>
> > > > > > +
> > > > > > +__bpf_kfunc_start_defs();
> > > > > > +
> > > > > > +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> > > > > > +{
> > > > > > +       struct slab *slab;
> > > > > > +
> > > > > > +       if (!virt_addr_valid(addr))
> > > > > > +               return NULL;
> > > > > > +
> > > > > > +       slab = virt_to_slab((void *)(long)addr);
> > > > > > +       return slab ? slab->slab_cache : NULL;
> > > > > > +}
> > > > >
> > > > > Do we need to hold a refcount to the slab_cache? Given
> > > > > we make this kfunc available everywhere, including
> > > > > sleepable contexts, I think it is necessary.
> > > >
> > > > It's a really good question.
> > > >
> > > > If the callee somehow owns the slab object, as in the example
> > > > provided in the series (current task), it's not necessarily.
> > > >
> > > > If a user can pass a random address, you're right, we need to
> > > > grab the slab_cache's refcnt. But then we also can't guarantee
> > > > that the object still belongs to the same slab_cache, the
> > > > function becomes racy by the definition.
> > >
> > > To be safe, we can limit the kfunc to sleepable context only. Then
> > > we can lock slab_mutex for virt_to_slab, and hold a refcount
> > > to slab_cache. We will need a KF_RELEASE kfunc to release
> > > the refcount later.
> >
> > Then it needs to call kmem_cache_destroy() for release which contains
> > rcu_barrier. :(
> >
> > >
> > > IIUC, this limitation (sleepable context only) shouldn't be a problem
> > > for perf use case?
> >
> > No, it would be called from the lock contention path including
> > spinlocks. :(
> >
> > Can we limit it to non-sleepable ctx and not to pass arbtrary address
> > somehow (or not to save the result pointer)?
>
> I hacked something like the following. It is not ideal, because we are
> taking spinlock_t pointer instead of void pointer. To use this with void
> 'pointer, we will need some verifier changes.
>
> Thanks,
> Song
>
>
> diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> index 3709fb142881..7311a26ecb01 100644
> --- i/kernel/bpf/helpers.c
> +++ w/kernel/bpf/helpers.c
> @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> | KF_RCU_PROTECTED)

I don't think KF_TRUSTED_ARGS approach would fit here.
Namhyung's use case is tracing. The 'addr' will be some potentially
arbitrary address from somewhere. The chance to see a trusted pointer
is probably very low in such a tracing use case.

The verifier change can mainly be the following:

diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 7d9b38ffd220..e09eb108e956 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
bpf_verifier_env *env, struct bpf_insn *insn,
                        regs[BPF_REG_0].type = PTR_TO_BTF_ID;
                        regs[BPF_REG_0].btf_id = ptr_type_id;

+                       if (meta.func_id ==
special_kfunc_list[KF_get_kmem_cache])
+                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
+
                        if (is_iter_next_kfunc(&meta)) {
                                struct bpf_reg_state *cur_iter;

The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
It will be readonly via ptr_to_btf_id logic.
s->flags;
s->size;
s->offset;
access will be allowed but the verifier will sanitize them
with an inlined version of probe_read_kernel.
Even KF_RET_NULL can be dropped.
Song Liu Oct. 4, 2024, 11:56 p.m. UTC | #9
On Fri, Oct 4, 2024 at 4:44 PM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
[...]
> > diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> > index 3709fb142881..7311a26ecb01 100644
> > --- i/kernel/bpf/helpers.c
> > +++ w/kernel/bpf/helpers.c
> > @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> > | KF_RCU_PROTECTED)
>
> I don't think KF_TRUSTED_ARGS approach would fit here.
> Namhyung's use case is tracing. The 'addr' will be some potentially
> arbitrary address from somewhere. The chance to see a trusted pointer
> is probably very low in such a tracing use case.

I thought the primary use case was to trace lock contention, for
example, queued_spin_lock_slowpath(). Of course, a more
general solution is better.

>
> The verifier change can mainly be the following:
>
> diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> index 7d9b38ffd220..e09eb108e956 100644
> --- a/kernel/bpf/verifier.c
> +++ b/kernel/bpf/verifier.c
> @@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
> bpf_verifier_env *env, struct bpf_insn *insn,
>                         regs[BPF_REG_0].type = PTR_TO_BTF_ID;
>                         regs[BPF_REG_0].btf_id = ptr_type_id;
>
> +                       if (meta.func_id ==
> special_kfunc_list[KF_get_kmem_cache])
> +                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> +
>                         if (is_iter_next_kfunc(&meta)) {
>                                 struct bpf_reg_state *cur_iter;

This is easier than I thought.

Thanks,
Song

> The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
> It will be readonly via ptr_to_btf_id logic.
> s->flags;
> s->size;
> s->offset;
> access will be allowed but the verifier will sanitize them
> with an inlined version of probe_read_kernel.
> Even KF_RET_NULL can be dropped.
Namhyung Kim Oct. 6, 2024, 7 p.m. UTC | #10
Hello,

On Fri, Oct 04, 2024 at 04:56:57PM -0700, Song Liu wrote:
> On Fri, Oct 4, 2024 at 4:44 PM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> [...]
> > > diff --git i/kernel/bpf/helpers.c w/kernel/bpf/helpers.c
> > > index 3709fb142881..7311a26ecb01 100644
> > > --- i/kernel/bpf/helpers.c
> > > +++ w/kernel/bpf/helpers.c
> > > @@ -3090,7 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> > >  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> > >  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> > > -BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> > > +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL | KF_TRUSTED_ARGS
> > > | KF_RCU_PROTECTED)
> >
> > I don't think KF_TRUSTED_ARGS approach would fit here.
> > Namhyung's use case is tracing. The 'addr' will be some potentially
> > arbitrary address from somewhere. The chance to see a trusted pointer
> > is probably very low in such a tracing use case.
> 
> I thought the primary use case was to trace lock contention, for
> example, queued_spin_lock_slowpath(). Of course, a more
> general solution is better.

Right, my intended use case is the lock contention profiling so probably
it's ok to limit it for trusted pointers if it helps.  But as Song said,
a general solution should be better. :)

> 
> >
> > The verifier change can mainly be the following:
> >
> > diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
> > index 7d9b38ffd220..e09eb108e956 100644
> > --- a/kernel/bpf/verifier.c
> > +++ b/kernel/bpf/verifier.c
> > @@ -12834,6 +12834,9 @@ static int check_kfunc_call(struct
> > bpf_verifier_env *env, struct bpf_insn *insn,
> >                         regs[BPF_REG_0].type = PTR_TO_BTF_ID;
> >                         regs[BPF_REG_0].btf_id = ptr_type_id;
> >
> > +                       if (meta.func_id == special_kfunc_list[KF_get_kmem_cache])
> > +                               regs[BPF_REG_0].type |= PTR_UNTRUSTED;
> > +
> >                         if (is_iter_next_kfunc(&meta)) {
> >                                 struct bpf_reg_state *cur_iter;
> 
> This is easier than I thought.

Indeed!  Thanks for providing the code.

> 
> > The returned 'struct kmem_cache *' won't be refcnt-ed (acquired).
> > It will be readonly via ptr_to_btf_id logic.
> > s->flags;
> > s->size;
> > s->offset;
> > access will be allowed but the verifier will sanitize them
> > with an inlined version of probe_read_kernel.
> > Even KF_RET_NULL can be dropped.

Ok, I'll check this out.  By having PTR_UNTRUSTED, are the callers
still required to check NULL or is it handled by probe_read_kernel()?

Thanks,
Namhyung
Vlastimil Babka Oct. 7, 2024, 12:57 p.m. UTC | #11
On 10/4/24 11:25 PM, Roman Gushchin wrote:
> On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
>> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
>>>
>>> The bpf_get_kmem_cache() is to get a slab cache information from a
>>> virtual address like virt_to_cache().  If the address is a pointer
>>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
>>> NULL is returned.
>>>
>>> It doesn't grab a reference count of the kmem_cache so the caller is
>>> responsible to manage the access.  The intended use case for now is to
>>> symbolize locks in slab objects from the lock contention tracepoints.
>>>
>>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
>>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
>>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
>>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>


So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
thought the perf use case was:

- at the beginning it iterates the kmem caches and stores anything of
possible interest in bpf maps or somewhere - hence we have the iterator
- during profiling, from object it gets to a cache, but doesn't need to
access the cache - just store the kmem_cache address in the perf record
- after profiling itself, use the information in the maps from the first
step together with cache pointers from the second step to calculate
whatever is necessary

So at no point it should be necessary to take refcount to a kmem_cache?

But maybe "bpf_get_kmem_cache()" is implemented here as too generic
given the above use case and it should be implemented in a way that the
pointer it returns cannot be used to access anything (which could be
unsafe), but only as a bpf map key - so it should return e.g. an
unsigned long instead?

>>> ---
>>>  kernel/bpf/helpers.c |  1 +
>>>  mm/slab_common.c     | 19 +++++++++++++++++++
>>>  2 files changed, 20 insertions(+)
>>>
>>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
>>> index 4053f279ed4cc7ab..3709fb14288105c6 100644
>>> --- a/kernel/bpf/helpers.c
>>> +++ b/kernel/bpf/helpers.c
>>> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
>>>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
>>>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
>>>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
>>> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
>>>  BTF_KFUNCS_END(common_btf_ids)
>>>
>>>  static const struct btf_kfunc_id_set common_kfunc_set = {
>>> diff --git a/mm/slab_common.c b/mm/slab_common.c
>>> index 7443244656150325..5484e1cd812f698e 100644
>>> --- a/mm/slab_common.c
>>> +++ b/mm/slab_common.c
>>> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
>>>  }
>>>  EXPORT_SYMBOL(ksize);
>>>
>>> +#ifdef CONFIG_BPF_SYSCALL
>>> +#include <linux/btf.h>
>>> +
>>> +__bpf_kfunc_start_defs();
>>> +
>>> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
>>> +{
>>> +       struct slab *slab;
>>> +
>>> +       if (!virt_addr_valid(addr))
>>> +               return NULL;
>>> +
>>> +       slab = virt_to_slab((void *)(long)addr);
>>> +       return slab ? slab->slab_cache : NULL;
>>> +}
>>
>> Do we need to hold a refcount to the slab_cache? Given
>> we make this kfunc available everywhere, including
>> sleepable contexts, I think it is necessary.
> 
> It's a really good question.
> 
> If the callee somehow owns the slab object, as in the example
> provided in the series (current task), it's not necessarily.
> 
> If a user can pass a random address, you're right, we need to
> grab the slab_cache's refcnt. But then we also can't guarantee
> that the object still belongs to the same slab_cache, the
> function becomes racy by the definition.
Namhyung Kim Oct. 9, 2024, 7:17 a.m. UTC | #12
On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >>>
> >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> >>> virtual address like virt_to_cache().  If the address is a pointer
> >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> >>> NULL is returned.
> >>>
> >>> It doesn't grab a reference count of the kmem_cache so the caller is
> >>> responsible to manage the access.  The intended use case for now is to
> >>> symbolize locks in slab objects from the lock contention tracepoints.
> >>>
> >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> 
> 
> So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> thought the perf use case was:
> 
> - at the beginning it iterates the kmem caches and stores anything of
> possible interest in bpf maps or somewhere - hence we have the iterator
> - during profiling, from object it gets to a cache, but doesn't need to
> access the cache - just store the kmem_cache address in the perf record
> - after profiling itself, use the information in the maps from the first
> step together with cache pointers from the second step to calculate
> whatever is necessary

Correct.

> 
> So at no point it should be necessary to take refcount to a kmem_cache?
> 
> But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> given the above use case and it should be implemented in a way that the
> pointer it returns cannot be used to access anything (which could be
> unsafe), but only as a bpf map key - so it should return e.g. an
> unsigned long instead?

Yep, this should work for my use case.  Maybe we don't need the
iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
we can get the necessary info at the moment.  But I think it'd be less
efficient as more work need to be done at the event (lock contention).
It'd better setting up necessary info in a map before monitoring (using
the iterator), and just looking up the map with the kfunc while
monitoring the lock contention.

Thanks,
Namhyung

> 
> >>> ---
> >>>  kernel/bpf/helpers.c |  1 +
> >>>  mm/slab_common.c     | 19 +++++++++++++++++++
> >>>  2 files changed, 20 insertions(+)
> >>>
> >>> diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
> >>> index 4053f279ed4cc7ab..3709fb14288105c6 100644
> >>> --- a/kernel/bpf/helpers.c
> >>> +++ b/kernel/bpf/helpers.c
> >>> @@ -3090,6 +3090,7 @@ BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
> >>>  BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
> >>>  BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
> >>>  BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
> >>> +BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
> >>>  BTF_KFUNCS_END(common_btf_ids)
> >>>
> >>>  static const struct btf_kfunc_id_set common_kfunc_set = {
> >>> diff --git a/mm/slab_common.c b/mm/slab_common.c
> >>> index 7443244656150325..5484e1cd812f698e 100644
> >>> --- a/mm/slab_common.c
> >>> +++ b/mm/slab_common.c
> >>> @@ -1322,6 +1322,25 @@ size_t ksize(const void *objp)
> >>>  }
> >>>  EXPORT_SYMBOL(ksize);
> >>>
> >>> +#ifdef CONFIG_BPF_SYSCALL
> >>> +#include <linux/btf.h>
> >>> +
> >>> +__bpf_kfunc_start_defs();
> >>> +
> >>> +__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> >>> +{
> >>> +       struct slab *slab;
> >>> +
> >>> +       if (!virt_addr_valid(addr))
> >>> +               return NULL;
> >>> +
> >>> +       slab = virt_to_slab((void *)(long)addr);
> >>> +       return slab ? slab->slab_cache : NULL;
> >>> +}
> >>
> >> Do we need to hold a refcount to the slab_cache? Given
> >> we make this kfunc available everywhere, including
> >> sleepable contexts, I think it is necessary.
> > 
> > It's a really good question.
> > 
> > If the callee somehow owns the slab object, as in the example
> > provided in the series (current task), it's not necessarily.
> > 
> > If a user can pass a random address, you're right, we need to
> > grab the slab_cache's refcnt. But then we also can't guarantee
> > that the object still belongs to the same slab_cache, the
> > function becomes racy by the definition.
Namhyung Kim Oct. 10, 2024, 4:46 p.m. UTC | #13
On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > >>>
> > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > >>> virtual address like virt_to_cache().  If the address is a pointer
> > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > >>> NULL is returned.
> > >>>
> > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > >>> responsible to manage the access.  The intended use case for now is to
> > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > >>>
> > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > 
> > 
> > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > thought the perf use case was:
> > 
> > - at the beginning it iterates the kmem caches and stores anything of
> > possible interest in bpf maps or somewhere - hence we have the iterator
> > - during profiling, from object it gets to a cache, but doesn't need to
> > access the cache - just store the kmem_cache address in the perf record
> > - after profiling itself, use the information in the maps from the first
> > step together with cache pointers from the second step to calculate
> > whatever is necessary
> 
> Correct.
> 
> > 
> > So at no point it should be necessary to take refcount to a kmem_cache?
> > 
> > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > given the above use case and it should be implemented in a way that the
> > pointer it returns cannot be used to access anything (which could be
> > unsafe), but only as a bpf map key - so it should return e.g. an
> > unsigned long instead?
> 
> Yep, this should work for my use case.  Maybe we don't need the
> iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> we can get the necessary info at the moment.  But I think it'd be less
> efficient as more work need to be done at the event (lock contention).
> It'd better setting up necessary info in a map before monitoring (using
> the iterator), and just looking up the map with the kfunc while
> monitoring the lock contention.

Maybe it's still better to return a non-refcounted pointer for future
use.  I'll leave it for v5.

Thanks,
Namhyung
Alexei Starovoitov Oct. 10, 2024, 5:04 p.m. UTC | #14
On Thu, Oct 10, 2024 at 9:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
>
> On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> > On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > >>>
> > > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > > >>> virtual address like virt_to_cache().  If the address is a pointer
> > > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > >>> NULL is returned.
> > > >>>
> > > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > > >>> responsible to manage the access.  The intended use case for now is to
> > > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > > >>>
> > > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > >
> > >
> > > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > > thought the perf use case was:
> > >
> > > - at the beginning it iterates the kmem caches and stores anything of
> > > possible interest in bpf maps or somewhere - hence we have the iterator
> > > - during profiling, from object it gets to a cache, but doesn't need to
> > > access the cache - just store the kmem_cache address in the perf record
> > > - after profiling itself, use the information in the maps from the first
> > > step together with cache pointers from the second step to calculate
> > > whatever is necessary
> >
> > Correct.
> >
> > >
> > > So at no point it should be necessary to take refcount to a kmem_cache?
> > >
> > > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > > given the above use case and it should be implemented in a way that the
> > > pointer it returns cannot be used to access anything (which could be
> > > unsafe), but only as a bpf map key - so it should return e.g. an
> > > unsigned long instead?
> >
> > Yep, this should work for my use case.  Maybe we don't need the
> > iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> > we can get the necessary info at the moment.  But I think it'd be less
> > efficient as more work need to be done at the event (lock contention).
> > It'd better setting up necessary info in a map before monitoring (using
> > the iterator), and just looking up the map with the kfunc while
> > monitoring the lock contention.
>
> Maybe it's still better to return a non-refcounted pointer for future
> use.  I'll leave it for v5.

Pls keep it as:
__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)

just make sure it's PTR_UNTRUSTED.
No need to make it return long or void *.
The users can do:
  bpf_core_cast(any_value, struct kmem_cache);
anyway, but it would be an unnecessary step.
Namhyung Kim Oct. 10, 2024, 10:56 p.m. UTC | #15
On Thu, Oct 10, 2024 at 10:04:24AM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 10, 2024 at 9:46 AM Namhyung Kim <namhyung@kernel.org> wrote:
> >
> > On Wed, Oct 09, 2024 at 12:17:12AM -0700, Namhyung Kim wrote:
> > > On Mon, Oct 07, 2024 at 02:57:08PM +0200, Vlastimil Babka wrote:
> > > > On 10/4/24 11:25 PM, Roman Gushchin wrote:
> > > > > On Fri, Oct 04, 2024 at 01:10:58PM -0700, Song Liu wrote:
> > > > >> On Wed, Oct 2, 2024 at 11:10 AM Namhyung Kim <namhyung@kernel.org> wrote:
> > > > >>>
> > > > >>> The bpf_get_kmem_cache() is to get a slab cache information from a
> > > > >>> virtual address like virt_to_cache().  If the address is a pointer
> > > > >>> to a slab object, it'd return a valid kmem_cache pointer, otherwise
> > > > >>> NULL is returned.
> > > > >>>
> > > > >>> It doesn't grab a reference count of the kmem_cache so the caller is
> > > > >>> responsible to manage the access.  The intended use case for now is to
> > > > >>> symbolize locks in slab objects from the lock contention tracepoints.
> > > > >>>
> > > > >>> Suggested-by: Vlastimil Babka <vbabka@suse.cz>
> > > > >>> Acked-by: Roman Gushchin <roman.gushchin@linux.dev> (mm/*)
> > > > >>> Acked-by: Vlastimil Babka <vbabka@suse.cz> #mm/slab
> > > > >>> Signed-off-by: Namhyung Kim <namhyung@kernel.org>
> > > >
> > > >
> > > > So IIRC from our discussions with Namhyung and Arnaldo at LSF/MM I
> > > > thought the perf use case was:
> > > >
> > > > - at the beginning it iterates the kmem caches and stores anything of
> > > > possible interest in bpf maps or somewhere - hence we have the iterator
> > > > - during profiling, from object it gets to a cache, but doesn't need to
> > > > access the cache - just store the kmem_cache address in the perf record
> > > > - after profiling itself, use the information in the maps from the first
> > > > step together with cache pointers from the second step to calculate
> > > > whatever is necessary
> > >
> > > Correct.
> > >
> > > >
> > > > So at no point it should be necessary to take refcount to a kmem_cache?
> > > >
> > > > But maybe "bpf_get_kmem_cache()" is implemented here as too generic
> > > > given the above use case and it should be implemented in a way that the
> > > > pointer it returns cannot be used to access anything (which could be
> > > > unsafe), but only as a bpf map key - so it should return e.g. an
> > > > unsigned long instead?
> > >
> > > Yep, this should work for my use case.  Maybe we don't need the
> > > iterator when bpf_get_kmem_cache() kfunc returns the valid pointer as
> > > we can get the necessary info at the moment.  But I think it'd be less
> > > efficient as more work need to be done at the event (lock contention).
> > > It'd better setting up necessary info in a map before monitoring (using
> > > the iterator), and just looking up the map with the kfunc while
> > > monitoring the lock contention.
> >
> > Maybe it's still better to return a non-refcounted pointer for future
> > use.  I'll leave it for v5.
> 
> Pls keep it as:
> __bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
> 
> just make sure it's PTR_UNTRUSTED.

Sure, will do.

> No need to make it return long or void *.
> The users can do:
>   bpf_core_cast(any_value, struct kmem_cache);
> anyway, but it would be an unnecessary step.

Yeah I thought there would be a way to do that.

Thanks,
Namhyung
diff mbox series

Patch

diff --git a/kernel/bpf/helpers.c b/kernel/bpf/helpers.c
index 4053f279ed4cc7ab..3709fb14288105c6 100644
--- a/kernel/bpf/helpers.c
+++ b/kernel/bpf/helpers.c
@@ -3090,6 +3090,7 @@  BTF_ID_FLAGS(func, bpf_iter_bits_new, KF_ITER_NEW)
 BTF_ID_FLAGS(func, bpf_iter_bits_next, KF_ITER_NEXT | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_iter_bits_destroy, KF_ITER_DESTROY)
 BTF_ID_FLAGS(func, bpf_copy_from_user_str, KF_SLEEPABLE)
+BTF_ID_FLAGS(func, bpf_get_kmem_cache, KF_RET_NULL)
 BTF_KFUNCS_END(common_btf_ids)
 
 static const struct btf_kfunc_id_set common_kfunc_set = {
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 7443244656150325..5484e1cd812f698e 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -1322,6 +1322,25 @@  size_t ksize(const void *objp)
 }
 EXPORT_SYMBOL(ksize);
 
+#ifdef CONFIG_BPF_SYSCALL
+#include <linux/btf.h>
+
+__bpf_kfunc_start_defs();
+
+__bpf_kfunc struct kmem_cache *bpf_get_kmem_cache(u64 addr)
+{
+	struct slab *slab;
+
+	if (!virt_addr_valid(addr))
+		return NULL;
+
+	slab = virt_to_slab((void *)(long)addr);
+	return slab ? slab->slab_cache : NULL;
+}
+
+__bpf_kfunc_end_defs();
+#endif /* CONFIG_BPF_SYSCALL */
+
 /* Tracepoints definitions. */
 EXPORT_TRACEPOINT_SYMBOL(kmalloc);
 EXPORT_TRACEPOINT_SYMBOL(kmem_cache_alloc);