diff mbox series

[bpf-next,v6,1/3] bpf: Allow trusted pointers to be passed to KF_TRUSTED_ARGS kfuncs

Message ID 20221020222416.3415511-2-void@manifault.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series Support storing struct task_struct objects as kptrs | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for bpf-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1362 this patch: 1362
netdev/cc_maintainers warning 15 maintainers not CCed: kuba@kernel.org davem@davemloft.net pablo@netfilter.org netfilter-devel@vger.kernel.org netdev@vger.kernel.org pabeni@redhat.com linux-doc@vger.kernel.org fw@strlen.de corbet@lwn.net linux-kselftest@vger.kernel.org shuah@kernel.org mykolal@fb.com coreteam@netfilter.org edumazet@google.com kadlec@netfilter.org
netdev/build_clang success Errors and warnings before: 159 this patch: 159
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1352 this patch: 1352
netdev/checkpatch warning CHECK: Prefer using the BIT macro WARNING: line length of 81 exceeds 80 columns WARNING: line length of 83 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 94 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1 fail Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-21 success Logs for test_progs_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-5 success Logs for llvm-toolchain
bpf/vmtest-bpf-next-VM_Test-6 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-3 success Logs for build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-4 success Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-2 success Logs for build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for test_maps on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-14 success Logs for test_progs_no_alu32 on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-17 success Logs for test_progs_no_alu32_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-20 success Logs for test_progs_parallel on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for test_verifier on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-9 success Logs for test_maps on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-11 success Logs for test_progs on x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for test_progs on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-15 success Logs for test_progs_no_alu32 on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-18 success Logs for test_progs_no_alu32_parallel on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-24 success Logs for test_verifier on x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-10 success Logs for test_progs on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-13 success Logs for test_progs_no_alu32 on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-22 success Logs for test_verifier on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-16 success Logs for test_progs_no_alu32_parallel on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for test_maps on s390x with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for test_progs_parallel on s390x with gcc

Commit Message

David Vernet Oct. 20, 2022, 10:24 p.m. UTC
Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
to the verifier that it should enforce that a BPF program passes it a
"safe", trusted pointer. Currently, "safe" means that the pointer is
either PTR_TO_CTX, or is refcounted. There may be cases, however, where
the kernel passes a BPF program a safe / trusted pointer to an object
that the BPF program wishes to use as a kptr, but because the object
does not yet have a ref_obj_id from the perspective of the verifier, the
program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
kfunc.

The solution is to expand the set of pointers that are considered
trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
with these pointers without getting rejected by the verifier.

There is already a PTR_UNTRUSTED flag that is set in some scenarios,
such as when a BPF program reads a kptr directly from a map
without performing a bpf_kptr_xchg() call. These pointers of course can
and should be rejected by the verifier. Unfortunately, however,
PTR_UNTRUSTED does not cover all the cases for safety that need to
be addressed to adequately protect kfuncs. Specifically, pointers
obtained by a BPF program "walking" a struct are _not_ considered
PTR_UNTRUSTED according to BPF. For example, say that we were to add a
kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
that a task was unsafe to pass to a kfunc, the verifier would mistakenly
allow the following unsafe BPF program to be loaded:

SEC("tp_btf/task_newtask")
int BPF_PROG(unsafe_acquire_task,
             struct task_struct *task,
             u64 clone_flags)
{
        struct task_struct *acquired, *nested;

        nested = task->last_wakee;

        /* Would not be rejected by the verifier. */
        acquired = bpf_task_acquire(nested);
        if (!acquired)
                return 0;

        bpf_task_release(acquired);
        return 0;
}

To address this, this patch defines a new type flag called PTR_WALKED
which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
a struct. A pointer passed directly from the kernel begins with
(PTR_WALKED & type) == 0, meaning of course that it is not obtained from
walking another struct. Any pointer received from walking that object,
however, would inherit that flag and become a walked pointer.

Additionally, because some kfuncs still only want BPF programs to be
able to send them an arg that they "own" (i.e. which they own a refcount
on) another kfunc arg flag called KF_OWNED_ARGS is added which is
identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
the arg must also have a refcount.

A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
and then another patch will validate this feature by ensuring that the
verifier rejects a kfunc invocation with a nested pointer.

Signed-off-by: David Vernet <void@manifault.com>
---
 Documentation/bpf/kfuncs.rst                  | 50 ++++++++++----
 include/linux/bpf.h                           |  6 ++
 include/linux/btf.h                           | 57 ++++++++++++++--
 kernel/bpf/btf.c                              | 18 ++++-
 kernel/bpf/verifier.c                         | 66 ++++++++++++++-----
 net/bpf/test_run.c                            |  2 +-
 net/netfilter/nf_conntrack_bpf.c              |  8 +--
 net/netfilter/nf_nat_bpf.c                    |  2 +-
 .../selftests/bpf/prog_tests/map_kptr.c       |  2 +-
 tools/testing/selftests/bpf/verifier/calls.c  |  4 +-
 .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
 11 files changed, 169 insertions(+), 48 deletions(-)

Comments

Alexei Starovoitov Nov. 1, 2022, 12:02 a.m. UTC | #1
On Thu, Oct 20, 2022 at 05:24:14PM -0500, David Vernet wrote:
> Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> to the verifier that it should enforce that a BPF program passes it a
> "safe", trusted pointer. Currently, "safe" means that the pointer is
> either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> the kernel passes a BPF program a safe / trusted pointer to an object
> that the BPF program wishes to use as a kptr, but because the object
> does not yet have a ref_obj_id from the perspective of the verifier, the
> program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> kfunc.
> 
> The solution is to expand the set of pointers that are considered
> trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> with these pointers without getting rejected by the verifier.
> 
> There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> such as when a BPF program reads a kptr directly from a map
> without performing a bpf_kptr_xchg() call. These pointers of course can
> and should be rejected by the verifier. Unfortunately, however,
> PTR_UNTRUSTED does not cover all the cases for safety that need to
> be addressed to adequately protect kfuncs. Specifically, pointers
> obtained by a BPF program "walking" a struct are _not_ considered
> PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> allow the following unsafe BPF program to be loaded:
> 
> SEC("tp_btf/task_newtask")
> int BPF_PROG(unsafe_acquire_task,
>              struct task_struct *task,
>              u64 clone_flags)
> {
>         struct task_struct *acquired, *nested;
> 
>         nested = task->last_wakee;
> 
>         /* Would not be rejected by the verifier. */
>         acquired = bpf_task_acquire(nested);
>         if (!acquired)
>                 return 0;
> 
>         bpf_task_release(acquired);
>         return 0;
> }
> 
> To address this, this patch defines a new type flag called PTR_WALKED
> which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
> a struct. A pointer passed directly from the kernel begins with
> (PTR_WALKED & type) == 0, meaning of course that it is not obtained from
> walking another struct. Any pointer received from walking that object,
> however, would inherit that flag and become a walked pointer.
> 
> Additionally, because some kfuncs still only want BPF programs to be
> able to send them an arg that they "own" (i.e. which they own a refcount
> on) another kfunc arg flag called KF_OWNED_ARGS is added which is
> identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
> the arg must also have a refcount.
> 
> A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> and then another patch will validate this feature by ensuring that the
> verifier rejects a kfunc invocation with a nested pointer.
> 
> Signed-off-by: David Vernet <void@manifault.com>
> ---
>  Documentation/bpf/kfuncs.rst                  | 50 ++++++++++----
>  include/linux/bpf.h                           |  6 ++
>  include/linux/btf.h                           | 57 ++++++++++++++--
>  kernel/bpf/btf.c                              | 18 ++++-
>  kernel/bpf/verifier.c                         | 66 ++++++++++++++-----
>  net/bpf/test_run.c                            |  2 +-
>  net/netfilter/nf_conntrack_bpf.c              |  8 +--
>  net/netfilter/nf_nat_bpf.c                    |  2 +-
>  .../selftests/bpf/prog_tests/map_kptr.c       |  2 +-
>  tools/testing/selftests/bpf/verifier/calls.c  |  4 +-
>  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
>  11 files changed, 169 insertions(+), 48 deletions(-)
> 
> diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> index 0f858156371d..8e2825150a8d 100644
> --- a/Documentation/bpf/kfuncs.rst
> +++ b/Documentation/bpf/kfuncs.rst
> @@ -137,30 +137,54 @@ KF_ACQUIRE and KF_RET_NULL flags.
>  --------------------------
>  
>  The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> -indicates that the all pointer arguments will always have a guaranteed lifetime,
> -and pointers to kernel objects are always passed to helpers in their unmodified
> -form (as obtained from acquire kfuncs).
> +indicates that the all pointer arguments will always have a guaranteed
> +lifetime, and pointers to kernel objects are always passed to helpers in their
> +unmodified form (either as passed by the main kernel, or as obtained from
> +acquire kfuncs).
>  
> -It can be used to enforce that a pointer to a refcounted object acquired from a
> -kfunc or BPF helper is passed as an argument to this kfunc without any
> -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> -the original object.
> +It can be used to enforce that a safe pointer passed to the program by the
> +kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
> +as an argument to this kfunc without any modifications (e.g. pointer
> +arithmetic) such that it is trusted and points to the original object.
>  
>  Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
>  but those can have a non-zero offset.
>  
> -This flag is often used for kfuncs that operate (change some property, perform
> -some operation) on an object that was obtained using an acquire kfunc. Such
> -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> -performed on the expected object.
> +This flag is often used for kfuncs that receive a trusted pointer from the
> +kernel, and which do not require a reference to be held by the program. For
> +example, if there's a kernel object that was allocated by the main kernel, and
> +which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
> +be used to ensure that the pointer is actually a trusted kernel pointer before
> +a reference is acquired on it in a KF_ACQUIRE kfunc.
> +
> +2.4.6 KF_OWNED_ARGS flag
> +------------------------
> +
> +The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
> +more restrictive in that it also requires the BPF program to hold a reference
> +on the object.
>  
> -2.4.6 KF_SLEEPABLE flag
> +In other words, it can be used to enforce that a pointer to a refcounted object
> +acquired from a kfunc or BPF helper is passed as an argument to this kfunc
> +without any modifications (e.g. pointer arithmetic) such that it is trusted and
> +points to the original object that was allocated or owned by the BPF program.
> +
> +This flag is often used for kfuncs that operate (change some property, perform
> +some operation) on an object that was obtained using an acquire kfunc. For
> +example, if an acquire kfunc allocates an object on behalf of a program,
> +KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
> +allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
> +would likely not be sufficiently restrictive as the kfunc does not want to
> +allow the BPF program to mutate another instance of the same object type which
> +was allocated by the main kernel.
> +
> +2.4.7 KF_SLEEPABLE flag
>  -----------------------
>  
>  The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
>  be called by sleepable BPF programs (BPF_F_SLEEPABLE).
>  
> -2.4.7 KF_DESTRUCTIVE flag
> +2.4.8 KF_DESTRUCTIVE flag
>  --------------------------
>  
>  The KF_DESTRUCTIVE flag is used to indicate functions calling which is
> diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> index 9e7d46d16032..ccdbefd72a95 100644
> --- a/include/linux/bpf.h
> +++ b/include/linux/bpf.h
> @@ -457,6 +457,12 @@ enum bpf_type_flag {
>  	/* Size is known at compile time. */
>  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
>  
> +	/* PTR was obtained from walking a struct. This is used with
> +	 * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
> +	 * kfunc with KF_TRUSTED_ARGS.
> +	 */
> +	PTR_WALKED		= BIT(11 + BPF_BASE_TYPE_BITS),
> +
>  	__BPF_TYPE_FLAG_MAX,
>  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
>  };
> diff --git a/include/linux/btf.h b/include/linux/btf.h
> index f9aababc5d78..7f5a438196a2 100644
> --- a/include/linux/btf.h
> +++ b/include/linux/btf.h
> @@ -17,9 +17,48 @@
>  #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
>  #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
>  #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> -/* Trusted arguments are those which are meant to be referenced arguments with
> - * unchanged offset. It is used to enforce that pointers obtained from acquire
> - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> +/* Trusted arguments are those which are meant to be guaranteed valid
> + * arguments, with an unchanged offset. It is used to enforce that pointers
> + * obtained from either acquire kfuncs or the main kernel remain unmodified
> + * when being passed to helpers taking trusted args.
> + *
> + * Consider, for example, the following task tracepoint:
> + *
> + *	SEC("tp_btf/task_newtask")
> + *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> + *	{
> + *		...
> + *	}
> + *
> + * And the following kfunc:
> + *
> + *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> + *
> + * All invocations to the kfunc must pass the unmodified, unwalked task:
> + *
> + *	bpf_task_acquire(task);		    // Allowed
> + *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
> + *
> + * Users may also pass referenced tasks directly to the kfunc:
> + *
> + *	struct task_struct *acquired;
> + *
> + *	acquired = bpf_task_acquire(task);	// Allowed, same as above
> + *	bpf_task_acquire(acquired);		// Allowed
> + *	bpf_task_acquire(task);			// Allowed
> + *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> + *
> + * If users wish to only allow referenced objects to be passed to a kfunc, they
> + * may instead specify the KF_OWNED_ARGS flag.
> + */
> +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> +#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> +#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> +/* Owned arguments are similar to trusted arguments, but are even more
> + * restrictive.  Owned arguments are arguments which are "owned" by the BPF
> + * program, meaning it has acquired a reference to the object via an acquire
> + * kfunc. Just as with trusted arguments, the verifier enforces that owned
> + * arguments have an unchanged offset when they're passed to kfuncs.

I don't think the kfunc writers will be able to use KF_OWNED_ARGS vs KF_TRUSTED_ARGS properly.
refcnt-ed or not is not a property that they should worry about.
Let's evaluate this patch set without KF_OWNED_ARGS and bpf_ct_* are still KF_TRUSTED_ARGS
and the other side of the verifier is relaxed to accept non-refcnted PTR_TO_BTF_ID
into kfunc.
What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
We've introduced nf_conn___init vs nf_conf specifically to express the relationship
between allocated nf_conn and other nf_conn-s via different types.
Why is this not enough?

I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.

Separately...
I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
This PTR_WALKED looks like new thing.
If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
as PTR_WALKED is doing.
I mean we can introduce PTR_TRUSTED and add this flag to return value
of bpf_get_current_task_btf() and arguments of tracepoints.
As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
backward compat behavior of PTR_TO_BTF_ID.
PTR_WALKED is sort-of doing the same, but not conservative enough.
Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.

I might have missed earlier discussions on this patch set. Apologies if so.
David Vernet Nov. 1, 2022, 6:11 p.m. UTC | #2
On Mon, Oct 31, 2022 at 05:02:39PM -0700, Alexei Starovoitov wrote:
> On Thu, Oct 20, 2022 at 05:24:14PM -0500, David Vernet wrote:
> > Kfuncs currently support specifying the KF_TRUSTED_ARGS flag to signal
> > to the verifier that it should enforce that a BPF program passes it a
> > "safe", trusted pointer. Currently, "safe" means that the pointer is
> > either PTR_TO_CTX, or is refcounted. There may be cases, however, where
> > the kernel passes a BPF program a safe / trusted pointer to an object
> > that the BPF program wishes to use as a kptr, but because the object
> > does not yet have a ref_obj_id from the perspective of the verifier, the
> > program would be unable to pass it to a KF_ACQUIRE | KF_TRUSTED_ARGS
> > kfunc.
> > 
> > The solution is to expand the set of pointers that are considered
> > trusted according to KF_TRUSTED_ARGS, so that programs can invoke kfuncs
> > with these pointers without getting rejected by the verifier.
> > 
> > There is already a PTR_UNTRUSTED flag that is set in some scenarios,
> > such as when a BPF program reads a kptr directly from a map
> > without performing a bpf_kptr_xchg() call. These pointers of course can
> > and should be rejected by the verifier. Unfortunately, however,
> > PTR_UNTRUSTED does not cover all the cases for safety that need to
> > be addressed to adequately protect kfuncs. Specifically, pointers
> > obtained by a BPF program "walking" a struct are _not_ considered
> > PTR_UNTRUSTED according to BPF. For example, say that we were to add a
> > kfunc called bpf_task_acquire(), with KF_ACQUIRE | KF_TRUSTED_ARGS, to
> > acquire a struct task_struct *. If we only used PTR_UNTRUSTED to signal
> > that a task was unsafe to pass to a kfunc, the verifier would mistakenly
> > allow the following unsafe BPF program to be loaded:
> > 
> > SEC("tp_btf/task_newtask")
> > int BPF_PROG(unsafe_acquire_task,
> >              struct task_struct *task,
> >              u64 clone_flags)
> > {
> >         struct task_struct *acquired, *nested;
> > 
> >         nested = task->last_wakee;
> > 
> >         /* Would not be rejected by the verifier. */
> >         acquired = bpf_task_acquire(nested);
> >         if (!acquired)
> >                 return 0;
> > 
> >         bpf_task_release(acquired);
> >         return 0;
> > }
> > 
> > To address this, this patch defines a new type flag called PTR_WALKED
> > which tracks whether a PTR_TO_BTF_ID pointer was retrieved from walking
> > a struct. A pointer passed directly from the kernel begins with
> > (PTR_WALKED & type) == 0, meaning of course that it is not obtained from
> > walking another struct. Any pointer received from walking that object,
> > however, would inherit that flag and become a walked pointer.
> > 
> > Additionally, because some kfuncs still only want BPF programs to be
> > able to send them an arg that they "own" (i.e. which they own a refcount
> > on) another kfunc arg flag called KF_OWNED_ARGS is added which is
> > identical to KF_TRUSTED_ARGS, but imposes the stricter requirement that
> > the arg must also have a refcount.
> > 
> > A subsequent patch will add kfuncs for storing a task kfunc as a kptr,
> > and then another patch will validate this feature by ensuring that the
> > verifier rejects a kfunc invocation with a nested pointer.
> > 
> > Signed-off-by: David Vernet <void@manifault.com>
> > ---
> >  Documentation/bpf/kfuncs.rst                  | 50 ++++++++++----
> >  include/linux/bpf.h                           |  6 ++
> >  include/linux/btf.h                           | 57 ++++++++++++++--
> >  kernel/bpf/btf.c                              | 18 ++++-
> >  kernel/bpf/verifier.c                         | 66 ++++++++++++++-----
> >  net/bpf/test_run.c                            |  2 +-
> >  net/netfilter/nf_conntrack_bpf.c              |  8 +--
> >  net/netfilter/nf_nat_bpf.c                    |  2 +-
> >  .../selftests/bpf/prog_tests/map_kptr.c       |  2 +-
> >  tools/testing/selftests/bpf/verifier/calls.c  |  4 +-
> >  .../testing/selftests/bpf/verifier/map_kptr.c |  2 +-
> >  11 files changed, 169 insertions(+), 48 deletions(-)
> > 
> > diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
> > index 0f858156371d..8e2825150a8d 100644
> > --- a/Documentation/bpf/kfuncs.rst
> > +++ b/Documentation/bpf/kfuncs.rst
> > @@ -137,30 +137,54 @@ KF_ACQUIRE and KF_RET_NULL flags.
> >  --------------------------
> >  
> >  The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> > -indicates that the all pointer arguments will always have a guaranteed lifetime,
> > -and pointers to kernel objects are always passed to helpers in their unmodified
> > -form (as obtained from acquire kfuncs).
> > +indicates that the all pointer arguments will always have a guaranteed
> > +lifetime, and pointers to kernel objects are always passed to helpers in their
> > +unmodified form (either as passed by the main kernel, or as obtained from
> > +acquire kfuncs).
> >  
> > -It can be used to enforce that a pointer to a refcounted object acquired from a
> > -kfunc or BPF helper is passed as an argument to this kfunc without any
> > -modifications (e.g. pointer arithmetic) such that it is trusted and points to
> > -the original object.
> > +It can be used to enforce that a safe pointer passed to the program by the
> > +kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
> > +as an argument to this kfunc without any modifications (e.g. pointer
> > +arithmetic) such that it is trusted and points to the original object.
> >  
> >  Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
> >  but those can have a non-zero offset.
> >  
> > -This flag is often used for kfuncs that operate (change some property, perform
> > -some operation) on an object that was obtained using an acquire kfunc. Such
> > -kfuncs need an unchanged pointer to ensure the integrity of the operation being
> > -performed on the expected object.
> > +This flag is often used for kfuncs that receive a trusted pointer from the
> > +kernel, and which do not require a reference to be held by the program. For
> > +example, if there's a kernel object that was allocated by the main kernel, and
> > +which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
> > +be used to ensure that the pointer is actually a trusted kernel pointer before
> > +a reference is acquired on it in a KF_ACQUIRE kfunc.
> > +
> > +2.4.6 KF_OWNED_ARGS flag
> > +------------------------
> > +
> > +The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
> > +more restrictive in that it also requires the BPF program to hold a reference
> > +on the object.
> >  
> > -2.4.6 KF_SLEEPABLE flag
> > +In other words, it can be used to enforce that a pointer to a refcounted object
> > +acquired from a kfunc or BPF helper is passed as an argument to this kfunc
> > +without any modifications (e.g. pointer arithmetic) such that it is trusted and
> > +points to the original object that was allocated or owned by the BPF program.
> > +
> > +This flag is often used for kfuncs that operate (change some property, perform
> > +some operation) on an object that was obtained using an acquire kfunc. For
> > +example, if an acquire kfunc allocates an object on behalf of a program,
> > +KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
> > +allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
> > +would likely not be sufficiently restrictive as the kfunc does not want to
> > +allow the BPF program to mutate another instance of the same object type which
> > +was allocated by the main kernel.
> > +
> > +2.4.7 KF_SLEEPABLE flag
> >  -----------------------
> >  
> >  The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
> >  be called by sleepable BPF programs (BPF_F_SLEEPABLE).
> >  
> > -2.4.7 KF_DESTRUCTIVE flag
> > +2.4.8 KF_DESTRUCTIVE flag
> >  --------------------------
> >  
> >  The KF_DESTRUCTIVE flag is used to indicate functions calling which is
> > diff --git a/include/linux/bpf.h b/include/linux/bpf.h
> > index 9e7d46d16032..ccdbefd72a95 100644
> > --- a/include/linux/bpf.h
> > +++ b/include/linux/bpf.h
> > @@ -457,6 +457,12 @@ enum bpf_type_flag {
> >  	/* Size is known at compile time. */
> >  	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
> >  
> > +	/* PTR was obtained from walking a struct. This is used with
> > +	 * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
> > +	 * kfunc with KF_TRUSTED_ARGS.
> > +	 */
> > +	PTR_WALKED		= BIT(11 + BPF_BASE_TYPE_BITS),
> > +
> >  	__BPF_TYPE_FLAG_MAX,
> >  	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
> >  };
> > diff --git a/include/linux/btf.h b/include/linux/btf.h
> > index f9aababc5d78..7f5a438196a2 100644
> > --- a/include/linux/btf.h
> > +++ b/include/linux/btf.h
> > @@ -17,9 +17,48 @@
> >  #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
> >  #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
> >  #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
> > -/* Trusted arguments are those which are meant to be referenced arguments with
> > - * unchanged offset. It is used to enforce that pointers obtained from acquire
> > - * kfuncs remain unmodified when being passed to helpers taking trusted args.
> > +/* Trusted arguments are those which are meant to be guaranteed valid
> > + * arguments, with an unchanged offset. It is used to enforce that pointers
> > + * obtained from either acquire kfuncs or the main kernel remain unmodified
> > + * when being passed to helpers taking trusted args.
> > + *
> > + * Consider, for example, the following task tracepoint:
> > + *
> > + *	SEC("tp_btf/task_newtask")
> > + *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
> > + *	{
> > + *		...
> > + *	}
> > + *
> > + * And the following kfunc:
> > + *
> > + *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
> > + *
> > + * All invocations to the kfunc must pass the unmodified, unwalked task:
> > + *
> > + *	bpf_task_acquire(task);		    // Allowed
> > + *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
> > + *
> > + * Users may also pass referenced tasks directly to the kfunc:
> > + *
> > + *	struct task_struct *acquired;
> > + *
> > + *	acquired = bpf_task_acquire(task);	// Allowed, same as above
> > + *	bpf_task_acquire(acquired);		// Allowed
> > + *	bpf_task_acquire(task);			// Allowed
> > + *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
> > + *
> > + * If users wish to only allow referenced objects to be passed to a kfunc, they
> > + * may instead specify the KF_OWNED_ARGS flag.
> > + */
> > +#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
> > +#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
> > +#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
> > +/* Owned arguments are similar to trusted arguments, but are even more
> > + * restrictive.  Owned arguments are arguments which are "owned" by the BPF
> > + * program, meaning it has acquired a reference to the object via an acquire
> > + * kfunc. Just as with trusted arguments, the verifier enforces that owned
> > + * arguments have an unchanged offset when they're passed to kfuncs.
> 
> I don't think the kfunc writers will be able to use KF_OWNED_ARGS vs KF_TRUSTED_ARGS properly.
> refcnt-ed or not is not a property that they should worry about.

Ok, I think I agree with you that making a separate type as we did with
struct nf_conn___init is a better solution. I'll drop this in v7. First
though, we'll have to get aligned on PTR_WALKED vs. PTR_TRUSTED.

> Let's evaluate this patch set without KF_OWNED_ARGS and bpf_ct_* are still KF_TRUSTED_ARGS
> and the other side of the verifier is relaxed to accept non-refcnted PTR_TO_BTF_ID
> into kfunc.

Sounds good, assuming Kumar agrees.

> What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> We've introduced nf_conn___init vs nf_conf specifically to express the relationship
> between allocated nf_conn and other nf_conn-s via different types.
> Why is this not enough?

Kumar should have more context here (he originally suggested this in
[0]), but AFAICT you're correct that this should be sufficient. I added
a negative test case that correctly fails if a BPF program tries to call
these helpers with a struct nf_conn* rather than a struct
nf_conn__init*.

[0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/

> I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.

Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
[1], we'll have to figure out how to avoid trace progs with unsafe args
from calling these kfuncs. Maybe the right thing to do is allow-listing
rather than deny-listing, as you pointed out.

[1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/

> Separately...
> I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.

That would be nice if we could do it. I assume that the issue is we're
breaking backwards compat if we do, so I'd be curious to hear what the
plan was if you're aware. The only plan that I've seen so far is what
Kumar spelled out above in [1] above.

> This PTR_WALKED looks like new thing.
> If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> as PTR_WALKED is doing.
> I mean we can introduce PTR_TRUSTED and add this flag to return value
> of bpf_get_current_task_btf() and arguments of tracepoints.
> As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> backward compat behavior of PTR_TO_BTF_ID.
> PTR_WALKED is sort-of doing the same, but not conservative enough.
> Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.

I very much prefer the idea of allowlisting instead of denylisting,
though I wish we'd taken that approach from the start rather than going
with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
(and currently does) imply PTR_TRUSTED.

If we're going to go with an allowlist approach, then I think we should
just get rid of PTR_UNTRUSTED altogether. Is that what you're
suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
PTR_WALKED seems like a more natural type modifier addition.

> I might have missed earlier discussions on this patch set. Apologies if so.

Just FYI, the main initial thread where this was all discussed was [2].

[2]: https://lore.kernel.org/all/CAP01T76OR3J_P8YMq4ZgKHBpuZyA0zgsPy+tq9htbX=j6AVyOg@mail.gmail.com/
Alexei Starovoitov Nov. 1, 2022, 8:22 p.m. UTC | #3
On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
>
> > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > We've introduced / vs nf_conf specifically to express the relationship
> > between allocated nf_conn and other nf_conn-s via different types.
> > Why is this not enough?
>
> Kumar should have more context here (he originally suggested this in
> [0]),

Quoting:
"
Unfortunately a side effect of this change is that now since
PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
functions would begin working with tp_btf args.
"
I couldn't find any tracepoint that has nf_conn___init as an argument.
The whole point of that new type was to return it to bpf prog,
so the verifier type matches it when it's passed into bpf_ct_*
in turn.
So I don't see a need for a new OWNED flag still.
If nf_conn___init is passed into tracepoint it's a bug and
we gotta fix it.

> but AFAICT you're correct that this should be sufficient. I added
> a negative test case that correctly fails if a BPF program tries to call
> these helpers with a struct nf_conn* rather than a struct
> nf_conn__init*.
>
> [0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/
>
> > I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> > and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
>
> Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
> [1], we'll have to figure out how to avoid trace progs with unsafe args
> from calling these kfuncs. Maybe the right thing to do is allow-listing
> rather than deny-listing, as you pointed out.
>
> [1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/

That is still the plan. more or less.

> > Separately...
> > I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
>
> That would be nice if we could do it. I assume that the issue is we're
> breaking backwards compat if we do, so I'd be curious to hear what the
> plan was if you're aware. The only plan that I've seen so far is what
> Kumar spelled out above in [1] above.

Right. Backward compat with existing usage of PTR_TO_BTF_ID
is the main issue.

>
> > This PTR_WALKED looks like new thing.
> > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > as PTR_WALKED is doing.
> > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > of bpf_get_current_task_btf() and arguments of tracepoints.
> > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > backward compat behavior of PTR_TO_BTF_ID.
> > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
>
> I very much prefer the idea of allowlisting instead of denylisting,
> though I wish we'd taken that approach from the start rather than going
> with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> (and currently does) imply PTR_TRUSTED.

I kind agree, but we gotta have both because of backward compat.
We cannot change PTR_TO_BTF_ID as a whole right now.

Note PTR_TO_BTF_ID appears in kfuncs too.
I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
only in tracepoint args and as return value from
certain helpers like bpf_get_current_task_btf().
afaik it's all safe. There is no uaf here.
uaf is for kfunc. Especially fexit.
Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.

>
> If we're going to go with an allowlist approach, then I think we should
> just get rid of PTR_UNTRUSTED altogether. Is that what you're
> suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
> PTR_WALKED seems like a more natural type modifier addition.

Eventually either PTR_TRUSTED or PTR_UNTRUSTED will be removed.

> > I might have missed earlier discussions on this patch set. Apologies if so.
>
> Just FYI, the main initial thread where this was all discussed was [2].
>
> [2]: https://lore.kernel.org/all/CAP01T76OR3J_P8YMq4ZgKHBpuZyA0zgsPy+tq9htbX=j6AVyOg@mail.gmail.com/
David Vernet Nov. 1, 2022, 9:36 p.m. UTC | #4
On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> >
> > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > We've introduced / vs nf_conf specifically to express the relationship
> > > between allocated nf_conn and other nf_conn-s via different types.
> > > Why is this not enough?
> >
> > Kumar should have more context here (he originally suggested this in
> > [0]),
> 
> Quoting:
> "
> Unfortunately a side effect of this change is that now since
> PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> functions would begin working with tp_btf args.
> "
> I couldn't find any tracepoint that has nf_conn___init as an argument.
> The whole point of that new type was to return it to bpf prog,
> so the verifier type matches it when it's passed into bpf_ct_*
> in turn.
> So I don't see a need for a new OWNED flag still.
> If nf_conn___init is passed into tracepoint it's a bug and
> we gotta fix it.

Yep, this is what I'm seeing as well. I think you're right that
KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
types is the way to enable an ownership model like this.

> > but AFAICT you're correct that this should be sufficient. I added
> > a negative test case that correctly fails if a BPF program tries to call
> > these helpers with a struct nf_conn* rather than a struct
> > nf_conn__init*.
> >
> > [0]: https://lore.kernel.org/all/CAP01T77PTK+bD2mBrxJShKNPhEypT2+nSHcr3=uuJbrghv_wFg@mail.gmail.com/
> >
> > > I prefer to keep only one flag KF_TRUSTED_ARGS that kfunc-s need to use
> > > and eventually make all kfuncs KF_TRUSTED_ARGS by default and remove that flag.
> >
> > Yes, KF_TRUSTED_ARGS really should be the default. As Kumar describes in
> > [1], we'll have to figure out how to avoid trace progs with unsafe args
> > from calling these kfuncs. Maybe the right thing to do is allow-listing
> > rather than deny-listing, as you pointed out.
> >
> > [1]: https://lore.kernel.org/bpf/CAP01T77goGbF3GVithEuJ7yMQR9PxHNA9GXFODq_nfA66G=F9g@mail.gmail.com/
> 
> That is still the plan. more or less.
> 
> > > Separately...
> > > I think there was a plan to strengthen PTR_TO_BTF_ID and apply PTR_UNTRUSTED.
> >
> > That would be nice if we could do it. I assume that the issue is we're
> > breaking backwards compat if we do, so I'd be curious to hear what the
> > plan was if you're aware. The only plan that I've seen so far is what
> > Kumar spelled out above in [1] above.
> 
> Right. Backward compat with existing usage of PTR_TO_BTF_ID
> is the main issue.
> 
> >
> > > This PTR_WALKED looks like new thing.
> > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > as PTR_WALKED is doing.
> > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > backward compat behavior of PTR_TO_BTF_ID.
> > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> >
> > I very much prefer the idea of allowlisting instead of denylisting,
> > though I wish we'd taken that approach from the start rather than going
> > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > (and currently does) imply PTR_TRUSTED.
> 
> I kind agree, but we gotta have both because of backward compat.
> We cannot change PTR_TO_BTF_ID as a whole right now.
> 
> Note PTR_TO_BTF_ID appears in kfuncs too.
> I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> only in tracepoint args and as return value from
> certain helpers like bpf_get_current_task_btf().
> afaik it's all safe. There is no uaf here.
> uaf is for kfunc. Especially fexit.
> Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.

Ok, this feels like the right approach to me. Unless I'm missing
something, modulo doing our due diligence and checking if there are any
existing kfuncs that are relying on different behavior, once this lands
I think we could maybe even make KF_TRUSTED_ARGS the default for all
kfuncs? That should probably be done in a separate patch set though.

> > If we're going to go with an allowlist approach, then I think we should
> > just get rid of PTR_UNTRUSTED altogether. Is that what you're
> > suggesting? Otherwise, if we don't get rid of PTR_UNTRUSTED, then
> > PTR_WALKED seems like a more natural type modifier addition.
> 
> Eventually either PTR_TRUSTED or PTR_UNTRUSTED will be removed.

In that case I'm fine with doing PTR_TRUSTED. Would ideally like to get
Kumar's input before doing it in v7 to avoid too much more churn.
Kumar Kartikeya Dwivedi Nov. 1, 2022, 10:31 p.m. UTC | #5
On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
>
> On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > >
> > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > Why is this not enough?
> > >
> > > Kumar should have more context here (he originally suggested this in
> > > [0]),
> >
> > Quoting:
> > "
> > Unfortunately a side effect of this change is that now since
> > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > functions would begin working with tp_btf args.
> > "
> > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > The whole point of that new type was to return it to bpf prog,
> > so the verifier type matches it when it's passed into bpf_ct_*
> > in turn.
> > So I don't see a need for a new OWNED flag still.
> > If nf_conn___init is passed into tracepoint it's a bug and
> > we gotta fix it.
>
> Yep, this is what I'm seeing as well. I think you're right that
> KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> types is the way to enable an ownership model like this.
>

It's not just nf_conn___init. Some CT helpers also take nf_conn.
e.g. bpf_ct_change_timeout, bpf_ct_change_status.
Right now they are only allowed in XDP and TC programs, so the tracing
args part is not a problem _right now_.

So currently it may not be possible to pass such a trusted but
ref_obj_id == 0 nf_conn to those helpers.
But based on changes unrelated to this, it may become possible in the
future to obtain such a trusted nf_conn pointer.
It is hard to then go and audit all possible cases where this can be
passed into helpers/kfuncs.

It is a requirement of those kfuncs that the nf_conn has its refcount
held while they are called.
KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
It seems better to me to keep that restriction instead of relaxing it,
if it is part of the contract.

It is fine to not require people to dive into these details and just
use KF_TRUSTED_ARGS in general, but we need something to cover special
cases like these where the object is only stable while we hold an
active refcount, RCU protection is not enough against reuse.

It could be 'expert only' __ref suffix on the nf_conn arg, or
KF_OWNED_ARGS, or something else.

> > > [...]
> > >
> > > > This PTR_WALKED looks like new thing.
> > > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > > as PTR_WALKED is doing.
> > > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > > backward compat behavior of PTR_TO_BTF_ID.
> > > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> > >
> > > I very much prefer the idea of allowlisting instead of denylisting,
> > > though I wish we'd taken that approach from the start rather than going
> > > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > > (and currently does) imply PTR_TRUSTED.
> >
> > I kind agree, but we gotta have both because of backward compat.
> > We cannot change PTR_TO_BTF_ID as a whole right now.
> >
> > Note PTR_TO_BTF_ID appears in kfuncs too.
> > I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> > only in tracepoint args and as return value from
> > certain helpers like bpf_get_current_task_btf().
> > afaik it's all safe. There is no uaf here.
> > uaf is for kfunc. Especially fexit.
> > Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
>
> Ok, this feels like the right approach to me. Unless I'm missing
> something, modulo doing our due diligence and checking if there are any
> existing kfuncs that are relying on different behavior, once this lands
> I think we could maybe even make KF_TRUSTED_ARGS the default for all
> kfuncs? That should probably be done in a separate patch set though.
>

I do like the allowlist vs denylist point from Alexei. It was also
what I originally suggested in [0], but when I went looking, pointer
walking is really the only case that was problematic, which was being
marked by PTR_WALKED. The other case of handling fexit is unrelated to
both.
But it's always better to be safe than sorry.

[0]: https://lore.kernel.org/bpf/CAP01T76zg0kABh36ekC4FTxDsdiYBaP7agErO=YadfFmaJ1LKQ@mail.gmail.com
Alexei Starovoitov Nov. 2, 2022, 12:32 a.m. UTC | #6
On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
> >
> > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > > >
> > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > Why is this not enough?
> > > >
> > > > Kumar should have more context here (he originally suggested this in
> > > > [0]),
> > >
> > > Quoting:
> > > "
> > > Unfortunately a side effect of this change is that now since
> > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > functions would begin working with tp_btf args.
> > > "
> > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > The whole point of that new type was to return it to bpf prog,
> > > so the verifier type matches it when it's passed into bpf_ct_*
> > > in turn.
> > > So I don't see a need for a new OWNED flag still.
> > > If nf_conn___init is passed into tracepoint it's a bug and
> > > we gotta fix it.
> >
> > Yep, this is what I'm seeing as well. I think you're right that
> > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > types is the way to enable an ownership model like this.
> >
> 
> It's not just nf_conn___init. Some CT helpers also take nf_conn.
> e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> Right now they are only allowed in XDP and TC programs, so the tracing
> args part is not a problem _right now_.

... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.

> So currently it may not be possible to pass such a trusted but
> ref_obj_id == 0 nf_conn to those helpers.
> But based on changes unrelated to this, it may become possible in the
> future to obtain such a trusted nf_conn pointer.

From kfunc's pov trusted pointer means valid pointer.
It doesn't need to be ref_obj_id refcounted from the verifier pov.
It can be refcounted on the kernel side and it will be trusted.
The code that calls trace_*() passes only trusted pointers into tp-s.
If there is a tracepoint somewhere in the kernel that uses a volatile
pointer to potentially uaf kernel object it's a bug that should be fixed.

> It is a requirement of those kfuncs that the nf_conn has its refcount
> held while they are called.

and it will be. Just not by the verifier.

> KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> It seems better to me to keep that restriction instead of relaxing it,
> if it is part of the contract.

Disagree as explained above.

> It is fine to not require people to dive into these details and just
> use KF_TRUSTED_ARGS in general, but we need something to cover special
> cases like these where the object is only stable while we hold an
> active refcount, RCU protection is not enough against reuse.

This is not related to RCU. Let's not mix RCU concerns in here.
It's a different topic.

> It could be 'expert only' __ref suffix on the nf_conn arg, or
> KF_OWNED_ARGS, or something else.

I'm still against that.

> > > > [...]
> > > >
> > > > > This PTR_WALKED looks like new thing.
> > > > > If we really need it PTR_TO_BTF_ID should be allowlisted instead of denylisted
> > > > > as PTR_WALKED is doing.
> > > > > I mean we can introduce PTR_TRUSTED and add this flag to return value
> > > > > of bpf_get_current_task_btf() and arguments of tracepoints.
> > > > > As soon as any ptr walking is done we can clear PTR_TRUSTED to keep
> > > > > backward compat behavior of PTR_TO_BTF_ID.
> > > > > PTR_WALKED is sort-of doing the same, but not conservative enough.
> > > > > Too many things produce PTR_TO_BTF_ID. Auditing it all is challenging.
> > > >
> > > > I very much prefer the idea of allowlisting instead of denylisting,
> > > > though I wish we'd taken that approach from the start rather than going
> > > > with PTR_UNTRUSTED. It feels wrong to have both PTR_UNTRUSTED and
> > > > PTR_TRUSTED as type modifiers, as the absence of PTR_UNTRUSTED should
> > > > (and currently does) imply PTR_TRUSTED.
> > >
> > > I kind agree, but we gotta have both because of backward compat.
> > > We cannot change PTR_TO_BTF_ID as a whole right now.
> > >
> > > Note PTR_TO_BTF_ID appears in kfuncs too.
> > > I'm proposing to use PTR_TO_BTF_ID | PTR_TRUSTED
> > > only in tracepoint args and as return value from
> > > certain helpers like bpf_get_current_task_btf().
> > > afaik it's all safe. There is no uaf here.
> > > uaf is for kfunc. Especially fexit.
> > > Those will stay PTR_TO_BTF_ID. Without PTR_TRUSTED.
> >
> > Ok, this feels like the right approach to me. Unless I'm missing
> > something, modulo doing our due diligence and checking if there are any
> > existing kfuncs that are relying on different behavior, once this lands
> > I think we could maybe even make KF_TRUSTED_ARGS the default for all
> > kfuncs? That should probably be done in a separate patch set though.
> >
> 
> I do like the allowlist vs denylist point from Alexei. It was also
> what I originally suggested in [0], but when I went looking, pointer
> walking is really the only case that was problematic, which was being
> marked by PTR_WALKED. The other case of handling fexit is unrelated to
> both.

Args of fentry and fexit will not have PTR_TO_BTF_ID | PTR_TRUSTED.
So not an issue.
We can allowlist certain hooks. Like all of bpf-lsm hooks and many others.
But not all of them.
Kumar Kartikeya Dwivedi Nov. 2, 2022, 1 a.m. UTC | #7
On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
> > >
> > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > > > >
> > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > Why is this not enough?
> > > > >
> > > > > Kumar should have more context here (he originally suggested this in
> > > > > [0]),
> > > >
> > > > Quoting:
> > > > "
> > > > Unfortunately a side effect of this change is that now since
> > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > functions would begin working with tp_btf args.
> > > > "
> > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > The whole point of that new type was to return it to bpf prog,
> > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > in turn.
> > > > So I don't see a need for a new OWNED flag still.
> > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > we gotta fix it.
> > >
> > > Yep, this is what I'm seeing as well. I think you're right that
> > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > types is the way to enable an ownership model like this.
> > >
> >
> > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > Right now they are only allowed in XDP and TC programs, so the tracing
> > args part is not a problem _right now_.
>
> ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
>
> > So currently it may not be possible to pass such a trusted but
> > ref_obj_id == 0 nf_conn to those helpers.
> > But based on changes unrelated to this, it may become possible in the
> > future to obtain such a trusted nf_conn pointer.
>
> From kfunc's pov trusted pointer means valid pointer.
> It doesn't need to be ref_obj_id refcounted from the verifier pov.
> It can be refcounted on the kernel side and it will be trusted.
> The code that calls trace_*() passes only trusted pointers into tp-s.
> If there is a tracepoint somewhere in the kernel that uses a volatile
> pointer to potentially uaf kernel object it's a bug that should be fixed.
>

This is all fine. I'm asking you to distinguish between
trusted-not-refcounted and trusted-and-refcounted.
It is necessary for nf_conn, since the object can be reused if the
refcount is not held.
Some other CPU could be reusing the same memory and allocating a new
nf_conn on it while we change its status.
So it's not ok to call bpf_ct_change_timeout/status on trusted
nf_conn, but only on trusted+refcounted nf_conn.

Trusted doesn't capture the difference between 'valid' vs 'valid and
owned by prog' anymore with the new definition
for PTR_TO_BTF_ID.

Yes, in most cases the tracepoints/tracing functions whitelisted will
have the caller ensure that,
but we should then allow trusted nf_conn in those hooks explicitly,
not implicitly by default everywhere.
Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.

> > It is a requirement of those kfuncs that the nf_conn has its refcount
> > held while they are called.
>
> and it will be. Just not by the verifier.
>
> > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > It seems better to me to keep that restriction instead of relaxing it,
> > if it is part of the contract.
>
> Disagree as explained above.
>
> > It is fine to not require people to dive into these details and just
> > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > cases like these where the object is only stable while we hold an
> > active refcount, RCU protection is not enough against reuse.
>
> This is not related to RCU. Let's not mix RCU concerns in here.
> It's a different topic.
>

What I meant is that in the normal case, usually objects aren't reused
while the RCU read lock is held.
In case of nf_conn, the refcount needs to be held to ensure that,
since it uses SLAB_TYPESAFE_BY_RCU.
This is why bpf_ct_lookup needs to bump the refcount and match the key
after that again, and cannot just return the looked up ct directly.

> > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > KF_OWNED_ARGS, or something else.
>
> I'm still against that.
>

I understand (and agree) that you don't want to complicate things further.
It's fine if you want to deal with this later when the above concern
materializes. But it will be yet another thing to keep in mind for the
future.
Alexei Starovoitov Nov. 2, 2022, 2:34 a.m. UTC | #8
On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
>
> On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
> > > >
> > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > > > > >
> > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > Why is this not enough?
> > > > > >
> > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > [0]),
> > > > >
> > > > > Quoting:
> > > > > "
> > > > > Unfortunately a side effect of this change is that now since
> > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > functions would begin working with tp_btf args.
> > > > > "
> > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > The whole point of that new type was to return it to bpf prog,
> > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > in turn.
> > > > > So I don't see a need for a new OWNED flag still.
> > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > we gotta fix it.
> > > >
> > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > types is the way to enable an ownership model like this.
> > > >
> > >
> > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > args part is not a problem _right now_.
> >
> > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> >
> > > So currently it may not be possible to pass such a trusted but
> > > ref_obj_id == 0 nf_conn to those helpers.
> > > But based on changes unrelated to this, it may become possible in the
> > > future to obtain such a trusted nf_conn pointer.
> >
> > From kfunc's pov trusted pointer means valid pointer.
> > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > It can be refcounted on the kernel side and it will be trusted.
> > The code that calls trace_*() passes only trusted pointers into tp-s.
> > If there is a tracepoint somewhere in the kernel that uses a volatile
> > pointer to potentially uaf kernel object it's a bug that should be fixed.
> >
>
> This is all fine. I'm asking you to distinguish between
> trusted-not-refcounted and trusted-and-refcounted.

That's not what you're asking :)

> It is necessary for nf_conn, since the object can be reused if the
> refcount is not held.

of course. No one argues the opposite.

> Some other CPU could be reusing the same memory and allocating a new
> nf_conn on it while we change its status.
> So it's not ok to call bpf_ct_change_timeout/status on trusted
> nf_conn, but only on trusted+refcounted nf_conn.

and here we start to disagree.

> Trusted doesn't capture the difference between 'valid' vs 'valid and
> owned by prog' anymore with the new definition
> for PTR_TO_BTF_ID.

and here we disagree completely.
You're asking to distinguish refcnt++ done by the program
and recognized by the verifier as ref_obj_id > 0 vs
refcnt++ done by the kernel code before it calls into tracepoint.
That's odd, right?
I don't think people adding kfuncs should care what piece
of code before kfunc did refcnt++.

> Yes, in most cases the tracepoints/tracing functions whitelisted will
> have the caller ensure that,
> but we should then allow trusted nf_conn in those hooks explicitly,
> not implicitly by default everywhere.
> Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
>
> > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > held while they are called.
> >
> > and it will be. Just not by the verifier.
> >
> > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > It seems better to me to keep that restriction instead of relaxing it,
> > > if it is part of the contract.
> >
> > Disagree as explained above.
> >
> > > It is fine to not require people to dive into these details and just
> > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > cases like these where the object is only stable while we hold an
> > > active refcount, RCU protection is not enough against reuse.
> >
> > This is not related to RCU. Let's not mix RCU concerns in here.
> > It's a different topic.
> >
>
> What I meant is that in the normal case, usually objects aren't reused
> while the RCU read lock is held.
> In case of nf_conn, the refcount needs to be held to ensure that,
> since it uses SLAB_TYPESAFE_BY_RCU.
> This is why bpf_ct_lookup needs to bump the refcount and match the key
> after that again, and cannot just return the looked up ct directly.

bpf_ct_lookup needs to bump a refcnt?!
bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
that calls nf_conntrack_find_get() that does
the search and incs the refcnt in a generic kernel code.
There is nothing bpf specific stuff here. bpf kfunc didn't
add any special refcnt incs.

There are no tracepoints in netfilter, so this discussion
is all theoretical, but if there was then the code
should have made sure that refcnt is held before
passing nf_conn into tracepoint.

> > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > KF_OWNED_ARGS, or something else.
> >
> > I'm still against that.
> >
>
> I understand (and agree) that you don't want to complicate things further.
> It's fine if you want to deal with this later when the above concern
> materializes. But it will be yet another thing to keep in mind for the
> future.

I don't share the concern.
With nf_conn there is none, right?
But imagine there is only RCU protected pointer that
is passed into tracepoint somewhere.
The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
the kernel code doesn't do refcnt++ either.
But it's still safe and this arg should still be
PTR_TO_BTF_ID | PTR_TRUSTED.
The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
Since RCU is held before calling into tracepoint the bpf prog
has to be non sleepable. Additional rcu_read_lock done by
the prog is redundant, but doesn't hurt.
When prog is calling kfunc the pointer is still valid and
kfunc can safely operate on it assuming that object is not going away.
That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
You documented it yourself :)
"
The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
indicates that the all pointer arguments will always have a guaranteed lifetime,
"
Kumar Kartikeya Dwivedi Nov. 2, 2022, 8:48 p.m. UTC | #9
On Wed, Nov 02, 2022 at 08:04:57AM IST, Alexei Starovoitov wrote:
> On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> >
> > On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
> > > > >
> > > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > > > > > >
> > > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > > Why is this not enough?
> > > > > > >
> > > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > > [0]),
> > > > > >
> > > > > > Quoting:
> > > > > > "
> > > > > > Unfortunately a side effect of this change is that now since
> > > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > > functions would begin working with tp_btf args.
> > > > > > "
> > > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > > The whole point of that new type was to return it to bpf prog,
> > > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > > in turn.
> > > > > > So I don't see a need for a new OWNED flag still.
> > > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > > we gotta fix it.
> > > > >
> > > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > > types is the way to enable an ownership model like this.
> > > > >
> > > >
> > > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > > args part is not a problem _right now_.
> > >
> > > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> > >
> > > > So currently it may not be possible to pass such a trusted but
> > > > ref_obj_id == 0 nf_conn to those helpers.
> > > > But based on changes unrelated to this, it may become possible in the
> > > > future to obtain such a trusted nf_conn pointer.
> > >
> > > From kfunc's pov trusted pointer means valid pointer.
> > > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > > It can be refcounted on the kernel side and it will be trusted.
> > > The code that calls trace_*() passes only trusted pointers into tp-s.
> > > If there is a tracepoint somewhere in the kernel that uses a volatile
> > > pointer to potentially uaf kernel object it's a bug that should be fixed.
> > >
> >
> > This is all fine. I'm asking you to distinguish between
> > trusted-not-refcounted and trusted-and-refcounted.
>
> That's not what you're asking :)
>
> > It is necessary for nf_conn, since the object can be reused if the
> > refcount is not held.
>
> of course. No one argues the opposite.
>
> > Some other CPU could be reusing the same memory and allocating a new
> > nf_conn on it while we change its status.
> > So it's not ok to call bpf_ct_change_timeout/status on trusted
> > nf_conn, but only on trusted+refcounted nf_conn.
>
> and here we start to disagree.
>
> > Trusted doesn't capture the difference between 'valid' vs 'valid and
> > owned by prog' anymore with the new definition
> > for PTR_TO_BTF_ID.
>
> and here we disagree completely.
> You're asking to distinguish refcnt++ done by the program
> and recognized by the verifier as ref_obj_id > 0 vs
> refcnt++ done by the kernel code before it calls into tracepoint.
> That's odd, right?
> I don't think people adding kfuncs should care what piece
> of code before kfunc did refcnt++.
>

I think we're talking past each other. Your point is that whenever PTR_TRUSTED
is set (tracepoint args, etc.) the kernel itself should ensure it holds the
refcount on behalf of the program. There shouldn't be a scenario where
PTR_TRUSTED nf_conn is obtained and the program or the kernel isn't holding
an active refcount during its use with helpers/kfuncs.

I was not making that assumption, but it's not wrong. Which is why I wanted to
limit their use to only when verifier is tracking a reference in the program
for the nf_conn returned from an acquire kfunc.

> > Yes, in most cases the tracepoints/tracing functions whitelisted will
> > have the caller ensure that,
> > but we should then allow trusted nf_conn in those hooks explicitly,
> > not implicitly by default everywhere.
> > Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
> >
> > > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > > held while they are called.
> > >
> > > and it will be. Just not by the verifier.
> > >
> > > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > > It seems better to me to keep that restriction instead of relaxing it,
> > > > if it is part of the contract.
> > >
> > > Disagree as explained above.
> > >
> > > > It is fine to not require people to dive into these details and just
> > > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > > cases like these where the object is only stable while we hold an
> > > > active refcount, RCU protection is not enough against reuse.
> > >
> > > This is not related to RCU. Let's not mix RCU concerns in here.
> > > It's a different topic.
> > >
> >
> > What I meant is that in the normal case, usually objects aren't reused
> > while the RCU read lock is held.
> > In case of nf_conn, the refcount needs to be held to ensure that,
> > since it uses SLAB_TYPESAFE_BY_RCU.
> > This is why bpf_ct_lookup needs to bump the refcount and match the key
> > after that again, and cannot just return the looked up ct directly.
>
> bpf_ct_lookup needs to bump a refcnt?!
> bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
> that calls nf_conntrack_find_get() that does
> the search and incs the refcnt in a generic kernel code.
> There is nothing bpf specific stuff here. bpf kfunc didn't
> add any special refcnt incs.
>
> There are no tracepoints in netfilter, so this discussion
> is all theoretical, but if there was then the code
> should have made sure that refcnt is held before
> passing nf_conn into tracepoint.
>

Right, if so, it makes sense. Thanks for explaining.

> > > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > > KF_OWNED_ARGS, or something else.
> > >
> > > I'm still against that.
> > >
> >
> > I understand (and agree) that you don't want to complicate things further.
> > It's fine if you want to deal with this later when the above concern
> > materializes. But it will be yet another thing to keep in mind for the
> > future.
>
> I don't share the concern.
> With nf_conn there is none, right?
> But imagine there is only RCU protected pointer that
> is passed into tracepoint somewhere.
> The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
> the kernel code doesn't do refcnt++ either.
> But it's still safe and this arg should still be
> PTR_TO_BTF_ID | PTR_TRUSTED.
> The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
> Since RCU is held before calling into tracepoint the bpf prog
> has to be non sleepable. Additional rcu_read_lock done by
> the prog is redundant, but doesn't hurt.
> When prog is calling kfunc the pointer is still valid and
> kfunc can safely operate on it assuming that object is not going away.
> That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
> You documented it yourself :)
> "
> The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> indicates that the all pointer arguments will always have a guaranteed lifetime,
> "

So based on the above, verifier should only mark nf_conn as PTR_TRUSTED in cases
where the kernel already holds the refcount on behalf of the program, otherwise
for the non-PTR_TRUSTED case program must have ref_obj_id > 0?

I.e. 'the guaranteed lifetime' for the nf_conn case also means the refcount is
held in context where program can obtain PTR_TRUSTED nf_conn pointer.

It will surely be true in the tracepoint case if there ever is one. Won't work
easily for fentry/fexit but you already said that PTR_TRUSTED shouldn't be set
in that case, atleast by default and without allowlisting those BTF IDs.
Alexei Starovoitov Nov. 3, 2022, 12:51 a.m. UTC | #10
On Thu, Nov 03, 2022 at 02:18:35AM +0530, Kumar Kartikeya Dwivedi wrote:
> On Wed, Nov 02, 2022 at 08:04:57AM IST, Alexei Starovoitov wrote:
> > On Tue, Nov 1, 2022 at 6:01 PM Kumar Kartikeya Dwivedi <memxor@gmail.com> wrote:
> > >
> > > On Wed, 2 Nov 2022 at 06:02, Alexei Starovoitov
> > > <alexei.starovoitov@gmail.com> wrote:
> > > >
> > > > On Wed, Nov 02, 2022 at 04:01:11AM +0530, Kumar Kartikeya Dwivedi wrote:
> > > > > On Wed, 2 Nov 2022 at 03:06, David Vernet <void@manifault.com> wrote:
> > > > > >
> > > > > > On Tue, Nov 01, 2022 at 01:22:39PM -0700, Alexei Starovoitov wrote:
> > > > > > > On Tue, Nov 1, 2022 at 11:11 AM David Vernet <void@manifault.com> wrote:
> > > > > > > >
> > > > > > > > > What kind of bpf prog will be able to pass 'struct nf_conn___init *' into these bpf_ct_* ?
> > > > > > > > > We've introduced / vs nf_conf specifically to express the relationship
> > > > > > > > > between allocated nf_conn and other nf_conn-s via different types.
> > > > > > > > > Why is this not enough?
> > > > > > > >
> > > > > > > > Kumar should have more context here (he originally suggested this in
> > > > > > > > [0]),
> > > > > > >
> > > > > > > Quoting:
> > > > > > > "
> > > > > > > Unfortunately a side effect of this change is that now since
> > > > > > > PTR_TO_BTF_ID without ref_obj_id is considered trusted, the bpf_ct_*
> > > > > > > functions would begin working with tp_btf args.
> > > > > > > "
> > > > > > > I couldn't find any tracepoint that has nf_conn___init as an argument.
> > > > > > > The whole point of that new type was to return it to bpf prog,
> > > > > > > so the verifier type matches it when it's passed into bpf_ct_*
> > > > > > > in turn.
> > > > > > > So I don't see a need for a new OWNED flag still.
> > > > > > > If nf_conn___init is passed into tracepoint it's a bug and
> > > > > > > we gotta fix it.
> > > > > >
> > > > > > Yep, this is what I'm seeing as well. I think you're right that
> > > > > > KF_OWNED_ARGS is just strictly unnecessary and that creating wrapper
> > > > > > types is the way to enable an ownership model like this.
> > > > > >
> > > > >
> > > > > It's not just nf_conn___init. Some CT helpers also take nf_conn.
> > > > > e.g. bpf_ct_change_timeout, bpf_ct_change_status.
> > > > > Right now they are only allowed in XDP and TC programs, so the tracing
> > > > > args part is not a problem _right now_.
> > > >
> > > > ... and it will be fine to use bpf_ct_change_timeout from tp_btf as well.
> > > >
> > > > > So currently it may not be possible to pass such a trusted but
> > > > > ref_obj_id == 0 nf_conn to those helpers.
> > > > > But based on changes unrelated to this, it may become possible in the
> > > > > future to obtain such a trusted nf_conn pointer.
> > > >
> > > > From kfunc's pov trusted pointer means valid pointer.
> > > > It doesn't need to be ref_obj_id refcounted from the verifier pov.
> > > > It can be refcounted on the kernel side and it will be trusted.
> > > > The code that calls trace_*() passes only trusted pointers into tp-s.
> > > > If there is a tracepoint somewhere in the kernel that uses a volatile
> > > > pointer to potentially uaf kernel object it's a bug that should be fixed.
> > > >
> > >
> > > This is all fine. I'm asking you to distinguish between
> > > trusted-not-refcounted and trusted-and-refcounted.
> >
> > That's not what you're asking :)
> >
> > > It is necessary for nf_conn, since the object can be reused if the
> > > refcount is not held.
> >
> > of course. No one argues the opposite.
> >
> > > Some other CPU could be reusing the same memory and allocating a new
> > > nf_conn on it while we change its status.
> > > So it's not ok to call bpf_ct_change_timeout/status on trusted
> > > nf_conn, but only on trusted+refcounted nf_conn.
> >
> > and here we start to disagree.
> >
> > > Trusted doesn't capture the difference between 'valid' vs 'valid and
> > > owned by prog' anymore with the new definition
> > > for PTR_TO_BTF_ID.
> >
> > and here we disagree completely.
> > You're asking to distinguish refcnt++ done by the program
> > and recognized by the verifier as ref_obj_id > 0 vs
> > refcnt++ done by the kernel code before it calls into tracepoint.
> > That's odd, right?
> > I don't think people adding kfuncs should care what piece
> > of code before kfunc did refcnt++.
> >
> 
> I think we're talking past each other. Your point is that whenever PTR_TRUSTED
> is set (tracepoint args, etc.) the kernel itself should ensure it holds the
> refcount on behalf of the program. There shouldn't be a scenario where
> PTR_TRUSTED nf_conn is obtained and the program or the kernel isn't holding
> an active refcount during its use with helpers/kfuncs.

correct.

> I was not making that assumption, but it's not wrong. Which is why I wanted to
> limit their use to only when verifier is tracking a reference in the program
> for the nf_conn returned from an acquire kfunc.

limit the use for PTR_TRUSTED to verifier only?
and introduce PTR_TRUSTED_BECAUSE_IT_CAME_FROM_KERNEL as well?
But the patch we're discussing will be treating them the same way...
Or you're proposing the kfunc writer also suppose to add two flags to a kfunc:
KF_TRUSTED_ARGS | KF_TRUSTED_BECAUSE_KERNEL_ARGS ?

and that's where it comes back to the point I keep stressing in this thread:
The kfunc writer should not be exposed to such details.
The little people need to understand about the verifier when the write bpf
programs or provide kfuncs the better.

> > > Yes, in most cases the tracepoints/tracing functions whitelisted will
> > > have the caller ensure that,
> > > but we should then allow trusted nf_conn in those hooks explicitly,
> > > not implicitly by default everywhere.
> > > Until then it should be restricted to ref_obj_id > 0 IMO as it is right now.
> > >
> > > > > It is a requirement of those kfuncs that the nf_conn has its refcount
> > > > > held while they are called.
> > > >
> > > > and it will be. Just not by the verifier.
> > > >
> > > > > KF_TRUSTED_ARGS was encoding this requirement before, but it wouldn't anymore.
> > > > > It seems better to me to keep that restriction instead of relaxing it,
> > > > > if it is part of the contract.
> > > >
> > > > Disagree as explained above.
> > > >
> > > > > It is fine to not require people to dive into these details and just
> > > > > use KF_TRUSTED_ARGS in general, but we need something to cover special
> > > > > cases like these where the object is only stable while we hold an
> > > > > active refcount, RCU protection is not enough against reuse.
> > > >
> > > > This is not related to RCU. Let's not mix RCU concerns in here.
> > > > It's a different topic.
> > > >
> > >
> > > What I meant is that in the normal case, usually objects aren't reused
> > > while the RCU read lock is held.
> > > In case of nf_conn, the refcount needs to be held to ensure that,
> > > since it uses SLAB_TYPESAFE_BY_RCU.
> > > This is why bpf_ct_lookup needs to bump the refcount and match the key
> > > after that again, and cannot just return the looked up ct directly.
> >
> > bpf_ct_lookup needs to bump a refcnt?!
> > bpf_skb_ct_lookup calls __bpf_nf_ct_lookup
> > that calls nf_conntrack_find_get() that does
> > the search and incs the refcnt in a generic kernel code.
> > There is nothing bpf specific stuff here. bpf kfunc didn't
> > add any special refcnt incs.
> >
> > There are no tracepoints in netfilter, so this discussion
> > is all theoretical, but if there was then the code
> > should have made sure that refcnt is held before
> > passing nf_conn into tracepoint.
> >
> 
> Right, if so, it makes sense. Thanks for explaining.
> 
> > > > > It could be 'expert only' __ref suffix on the nf_conn arg, or
> > > > > KF_OWNED_ARGS, or something else.
> > > >
> > > > I'm still against that.
> > > >
> > >
> > > I understand (and agree) that you don't want to complicate things further.
> > > It's fine if you want to deal with this later when the above concern
> > > materializes. But it will be yet another thing to keep in mind for the
> > > future.
> >
> > I don't share the concern.
> > With nf_conn there is none, right?
> > But imagine there is only RCU protected pointer that
> > is passed into tracepoint somewhere.
> > The verifier doesn't recognize refcnt++ on it and ref_obj_id == 0
> > the kernel code doesn't do refcnt++ either.
> > But it's still safe and this arg should still be
> > PTR_TO_BTF_ID | PTR_TRUSTED.
> > The bpf prog can pass it further into kfunc that has KF_TRUSTED_ARGS.
> > Since RCU is held before calling into tracepoint the bpf prog
> > has to be non sleepable. Additional rcu_read_lock done by
> > the prog is redundant, but doesn't hurt.
> > When prog is calling kfunc the pointer is still valid and
> > kfunc can safely operate on it assuming that object is not going away.
> > That is the definition of KF_TRUSTED_ARGS from pov of kfunc.
> > You documented it yourself :)
> > "
> > The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
> > indicates that the all pointer arguments will always have a guaranteed lifetime,
> > "
> 
> So based on the above, verifier should only mark nf_conn as PTR_TRUSTED in cases
> where the kernel already holds the refcount on behalf of the program, 

yes, but 'holding the reference' is not the only option.
Holding rcu_read_lock is fine too.
Or any other mechanism that ensures that the pointer has guaranteed lifetime.

and back to my earlier point...
using your logic we'd need PTR_TRUSTED_BECAUSE_RCU in the verifier and
corresponding KF_TRUSTED_ARGS_BECAUSE_RCU flag on kfunc.
That just doesn't scale.

PTR_RCU flag might actually be needed for some other reason, but
KF_TRUSTED_ARGS_BECAUSE_RCU is "too much info" for kfunc providers.

> otherwise
> for the non-PTR_TRUSTED case program must have ref_obj_id > 0?
> 
> I.e. 'the guaranteed lifetime' for the nf_conn case also means the refcount is
> held in context where program can obtain PTR_TRUSTED nf_conn pointer.

yes.

> It will surely be true in the tracepoint case if there ever is one. Won't work
> easily for fentry/fexit but you already said that PTR_TRUSTED shouldn't be set
> in that case, atleast by default and without allowlisting those BTF IDs.

Correct. allowlisting kernel func's btf_ids is one approach.
I might argue that fentry hook on any kernel func that has 'struct nf_conn *'
as an argument is also safe (barring recursion into bpf_ct_* kfuncs).
Dual compiler of the kernel (native + bpf) will help this kind of safety analysis.
diff mbox series

Patch

diff --git a/Documentation/bpf/kfuncs.rst b/Documentation/bpf/kfuncs.rst
index 0f858156371d..8e2825150a8d 100644
--- a/Documentation/bpf/kfuncs.rst
+++ b/Documentation/bpf/kfuncs.rst
@@ -137,30 +137,54 @@  KF_ACQUIRE and KF_RET_NULL flags.
 --------------------------
 
 The KF_TRUSTED_ARGS flag is used for kfuncs taking pointer arguments. It
-indicates that the all pointer arguments will always have a guaranteed lifetime,
-and pointers to kernel objects are always passed to helpers in their unmodified
-form (as obtained from acquire kfuncs).
+indicates that the all pointer arguments will always have a guaranteed
+lifetime, and pointers to kernel objects are always passed to helpers in their
+unmodified form (either as passed by the main kernel, or as obtained from
+acquire kfuncs).
 
-It can be used to enforce that a pointer to a refcounted object acquired from a
-kfunc or BPF helper is passed as an argument to this kfunc without any
-modifications (e.g. pointer arithmetic) such that it is trusted and points to
-the original object.
+It can be used to enforce that a safe pointer passed to the program by the
+kernel, or a refcounted object acquired from a kfunc or BPF helper, is passed
+as an argument to this kfunc without any modifications (e.g. pointer
+arithmetic) such that it is trusted and points to the original object.
 
 Meanwhile, it is also allowed pass pointers to normal memory to such kfuncs,
 but those can have a non-zero offset.
 
-This flag is often used for kfuncs that operate (change some property, perform
-some operation) on an object that was obtained using an acquire kfunc. Such
-kfuncs need an unchanged pointer to ensure the integrity of the operation being
-performed on the expected object.
+This flag is often used for kfuncs that receive a trusted pointer from the
+kernel, and which do not require a reference to be held by the program. For
+example, if there's a kernel object that was allocated by the main kernel, and
+which the BPF program wishes to store in a map as a kptr, KF_TRUSTED_ARGS can
+be used to ensure that the pointer is actually a trusted kernel pointer before
+a reference is acquired on it in a KF_ACQUIRE kfunc.
+
+2.4.6 KF_OWNED_ARGS flag
+------------------------
+
+The KF_OWNED_ARGS flag is identical to the KF_TRUSTED_ARGS flag, though it is
+more restrictive in that it also requires the BPF program to hold a reference
+on the object.
 
-2.4.6 KF_SLEEPABLE flag
+In other words, it can be used to enforce that a pointer to a refcounted object
+acquired from a kfunc or BPF helper is passed as an argument to this kfunc
+without any modifications (e.g. pointer arithmetic) such that it is trusted and
+points to the original object that was allocated or owned by the BPF program.
+
+This flag is often used for kfuncs that operate (change some property, perform
+some operation) on an object that was obtained using an acquire kfunc. For
+example, if an acquire kfunc allocates an object on behalf of a program,
+KF_OWNED_ARGS would be an appropriate flag to specify for other kfuncs which
+allow the program to mutate that object. KF_TRUSTED_ARGS, on the other hand,
+would likely not be sufficiently restrictive as the kfunc does not want to
+allow the BPF program to mutate another instance of the same object type which
+was allocated by the main kernel.
+
+2.4.7 KF_SLEEPABLE flag
 -----------------------
 
 The KF_SLEEPABLE flag is used for kfuncs that may sleep. Such kfuncs can only
 be called by sleepable BPF programs (BPF_F_SLEEPABLE).
 
-2.4.7 KF_DESTRUCTIVE flag
+2.4.8 KF_DESTRUCTIVE flag
 --------------------------
 
 The KF_DESTRUCTIVE flag is used to indicate functions calling which is
diff --git a/include/linux/bpf.h b/include/linux/bpf.h
index 9e7d46d16032..ccdbefd72a95 100644
--- a/include/linux/bpf.h
+++ b/include/linux/bpf.h
@@ -457,6 +457,12 @@  enum bpf_type_flag {
 	/* Size is known at compile time. */
 	MEM_FIXED_SIZE		= BIT(10 + BPF_BASE_TYPE_BITS),
 
+	/* PTR was obtained from walking a struct. This is used with
+	 * PTR_TO_BTF_ID to determine whether the pointer is safe to pass to a
+	 * kfunc with KF_TRUSTED_ARGS.
+	 */
+	PTR_WALKED		= BIT(11 + BPF_BASE_TYPE_BITS),
+
 	__BPF_TYPE_FLAG_MAX,
 	__BPF_TYPE_LAST_FLAG	= __BPF_TYPE_FLAG_MAX - 1,
 };
diff --git a/include/linux/btf.h b/include/linux/btf.h
index f9aababc5d78..7f5a438196a2 100644
--- a/include/linux/btf.h
+++ b/include/linux/btf.h
@@ -17,9 +17,48 @@ 
 #define KF_RELEASE	(1 << 1) /* kfunc is a release function */
 #define KF_RET_NULL	(1 << 2) /* kfunc returns a pointer that may be NULL */
 #define KF_KPTR_GET	(1 << 3) /* kfunc returns reference to a kptr */
-/* Trusted arguments are those which are meant to be referenced arguments with
- * unchanged offset. It is used to enforce that pointers obtained from acquire
- * kfuncs remain unmodified when being passed to helpers taking trusted args.
+/* Trusted arguments are those which are meant to be guaranteed valid
+ * arguments, with an unchanged offset. It is used to enforce that pointers
+ * obtained from either acquire kfuncs or the main kernel remain unmodified
+ * when being passed to helpers taking trusted args.
+ *
+ * Consider, for example, the following task tracepoint:
+ *
+ *	SEC("tp_btf/task_newtask")
+ *	int BPF_PROG(new_task_tp, struct task_struct *task, u64 clone_flags)
+ *	{
+ *		...
+ *	}
+ *
+ * And the following kfunc:
+ *
+ *	BTF_ID_FLAGS(func, bpf_task_acquire, KF_ACQUIRE | KF_RET_NULL | KF_TRUSTED_ARGS)
+ *
+ * All invocations to the kfunc must pass the unmodified, unwalked task:
+ *
+ *	bpf_task_acquire(task);		    // Allowed
+ *	bpf_task_acquire(task->last_wakee); // Rejected, walked task
+ *
+ * Users may also pass referenced tasks directly to the kfunc:
+ *
+ *	struct task_struct *acquired;
+ *
+ *	acquired = bpf_task_acquire(task);	// Allowed, same as above
+ *	bpf_task_acquire(acquired);		// Allowed
+ *	bpf_task_acquire(task);			// Allowed
+ *	bpf_task_acquire(acquired->last_wakee); // Rejected, walked task
+ *
+ * If users wish to only allow referenced objects to be passed to a kfunc, they
+ * may instead specify the KF_OWNED_ARGS flag.
+ */
+#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
+#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
+#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
+/* Owned arguments are similar to trusted arguments, but are even more
+ * restrictive.  Owned arguments are arguments which are "owned" by the BPF
+ * program, meaning it has acquired a reference to the object via an acquire
+ * kfunc. Just as with trusted arguments, the verifier enforces that owned
+ * arguments have an unchanged offset when they're passed to kfuncs.
  *
  * Consider
  *	struct foo {
@@ -36,7 +75,7 @@ 
  *	struct bar *b = alloc_bar(); // Acquire kfunc
  *
  * If a kfunc set_foo_data() wants to operate only on the allocated object, it
- * will set the KF_TRUSTED_ARGS flag, which will prevent unsafe usage like:
+ * will set the KR_ARGS_OWNED flag, which will prevent unsafe usage like:
  *
  *	set_foo_data(f, 42);	   // Allowed
  *	set_foo_data(f->next, 42); // Rejected, non-referenced pointer
@@ -47,10 +86,14 @@ 
  * by looking at the type of the member at the offset, but due to the
  * requirement of trusted argument, this deduction will be strict and not done
  * for this case.
+ *
+ * Note as well that if tracepoints existed which took a struct foo *f argument
+ * that was passed from the kernel, the verifier would also reject
+ * set_foo_bar(f, 42) on it, as the BPF program had not acquired a reference on
+ * it. If the kfunc had instead specified KF_TRUSTED_ARGS, this would be
+ * permitted.
  */
-#define KF_TRUSTED_ARGS (1 << 4) /* kfunc only takes trusted pointer arguments */
-#define KF_SLEEPABLE    (1 << 5) /* kfunc may sleep */
-#define KF_DESTRUCTIVE  (1 << 6) /* kfunc performs destructive actions */
+#define KF_OWNED_ARGS   (1 << 7) /* kfunc performs destructive actions */
 
 /*
  * Return the name of the passed struct, if exists, or halt the build if for
diff --git a/kernel/bpf/btf.c b/kernel/bpf/btf.c
index eba603cec2c5..a3712abae108 100644
--- a/kernel/bpf/btf.c
+++ b/kernel/bpf/btf.c
@@ -6227,7 +6227,7 @@  static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 				    bool processing_call)
 {
 	enum bpf_prog_type prog_type = resolve_prog_type(env->prog);
-	bool rel = false, kptr_get = false, trusted_args = false;
+	bool rel = false, kptr_get = false, trusted_args = false, owned_args = false;
 	bool sleepable = false;
 	struct bpf_verifier_log *log = &env->log;
 	u32 i, nargs, ref_id, ref_obj_id = 0;
@@ -6265,7 +6265,8 @@  static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		/* Only kfunc can be release func */
 		rel = kfunc_meta->flags & KF_RELEASE;
 		kptr_get = kfunc_meta->flags & KF_KPTR_GET;
-		trusted_args = kfunc_meta->flags & KF_TRUSTED_ARGS;
+		owned_args = kfunc_meta->flags & KF_OWNED_ARGS;
+		trusted_args = owned_args || (kfunc_meta->flags & KF_TRUSTED_ARGS);
 		sleepable = kfunc_meta->flags & KF_SLEEPABLE;
 	}
 
@@ -6333,8 +6334,19 @@  static int btf_check_func_arg_match(struct bpf_verifier_env *env,
 		/* Check if argument must be a referenced pointer, args + i has
 		 * been verified to be a pointer (after skipping modifiers).
 		 * PTR_TO_CTX is ok without having non-zero ref_obj_id.
+		 *
+		 * All object pointers must be refcounted, other than:
+		 * - PTR_TO_CTX
+		 * - Trusted pointers (i.e. pointers with no type modifiers).
+		 *   Kfuncs that have specified KF_OWNED_ARGS require
+		 *   references even if a pointer is otherwise trusted.
 		 */
-		if (is_kfunc && trusted_args && (obj_ptr && reg->type != PTR_TO_CTX) && !reg->ref_obj_id) {
+		if (is_kfunc &&
+		    trusted_args &&
+		    obj_ptr &&
+		    base_type(reg->type) != PTR_TO_CTX &&
+		    (owned_args || type_flag(reg->type)) &&
+		    !reg->ref_obj_id) {
 			bpf_log(log, "R%d must be referenced\n", regno);
 			return -EINVAL;
 		}
diff --git a/kernel/bpf/verifier.c b/kernel/bpf/verifier.c
index 6f6d2d511c06..7b8b20feeb62 100644
--- a/kernel/bpf/verifier.c
+++ b/kernel/bpf/verifier.c
@@ -541,7 +541,7 @@  static bool is_cmpxchg_insn(const struct bpf_insn *insn)
 static const char *reg_type_str(struct bpf_verifier_env *env,
 				enum bpf_reg_type type)
 {
-	char postfix[16] = {0}, prefix[32] = {0};
+	char postfix[16] = {0}, prefix[64] = {0};
 	static const char * const str[] = {
 		[NOT_INIT]		= "?",
 		[SCALAR_VALUE]		= "scalar",
@@ -573,16 +573,14 @@  static const char *reg_type_str(struct bpf_verifier_env *env,
 			strncpy(postfix, "_or_null", 16);
 	}
 
-	if (type & MEM_RDONLY)
-		strncpy(prefix, "rdonly_", 32);
-	if (type & MEM_ALLOC)
-		strncpy(prefix, "alloc_", 32);
-	if (type & MEM_USER)
-		strncpy(prefix, "user_", 32);
-	if (type & MEM_PERCPU)
-		strncpy(prefix, "percpu_", 32);
-	if (type & PTR_UNTRUSTED)
-		strncpy(prefix, "untrusted_", 32);
+	snprintf(prefix, sizeof(prefix), "%s%s%s%s%s%s",
+		 type & MEM_RDONLY ? "rdonly_" : "",
+		 type & MEM_ALLOC ? "alloc_" : "",
+		 type & MEM_USER ? "user_" : "",
+		 type & MEM_PERCPU ? "percpu_" : "",
+		 type & PTR_UNTRUSTED ? "untrusted_" : "",
+		 type & PTR_WALKED ? "walked_" : ""
+	);
 
 	snprintf(env->type_str_buf, TYPE_STR_BUF_LEN, "%s%s%s",
 		 prefix, str[base_type(type)], postfix);
@@ -4558,6 +4556,9 @@  static int check_ptr_to_btf_access(struct bpf_verifier_env *env,
 	if (type_flag(reg->type) & PTR_UNTRUSTED)
 		flag |= PTR_UNTRUSTED;
 
+	/* Mark this and any future pointers as having been obtained from walking a struct. */
+	flag |= PTR_WALKED;
+
 	if (atype == BPF_READ && value_regno >= 0)
 		mark_btf_ld_reg(env, regs, value_regno, ret, reg->btf, btf_id, flag);
 
@@ -5661,6 +5662,7 @@  static const struct bpf_reg_types btf_id_sock_common_types = {
 		PTR_TO_TCP_SOCK,
 		PTR_TO_XDP_SOCK,
 		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | PTR_WALKED,
 	},
 	.btf_id = &btf_sock_ids[BTF_SOCK_TYPE_SOCK_COMMON],
 };
@@ -5694,9 +5696,19 @@  static const struct bpf_reg_types scalar_types = { .types = { SCALAR_VALUE } };
 static const struct bpf_reg_types context_types = { .types = { PTR_TO_CTX } };
 static const struct bpf_reg_types alloc_mem_types = { .types = { PTR_TO_MEM | MEM_ALLOC } };
 static const struct bpf_reg_types const_map_ptr_types = { .types = { CONST_PTR_TO_MAP } };
-static const struct bpf_reg_types btf_ptr_types = { .types = { PTR_TO_BTF_ID } };
+static const struct bpf_reg_types btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID,
+		PTR_TO_BTF_ID | PTR_WALKED
+	},
+};
 static const struct bpf_reg_types spin_lock_types = { .types = { PTR_TO_MAP_VALUE } };
-static const struct bpf_reg_types percpu_btf_ptr_types = { .types = { PTR_TO_BTF_ID | MEM_PERCPU } };
+static const struct bpf_reg_types percpu_btf_ptr_types = {
+	.types = {
+		PTR_TO_BTF_ID | MEM_PERCPU,
+		PTR_TO_BTF_ID | MEM_PERCPU | PTR_WALKED,
+	}
+};
 static const struct bpf_reg_types func_ptr_types = { .types = { PTR_TO_FUNC } };
 static const struct bpf_reg_types stack_ptr_types = { .types = { PTR_TO_STACK } };
 static const struct bpf_reg_types const_str_ptr_types = { .types = { PTR_TO_MAP_VALUE } };
@@ -5860,6 +5872,7 @@  int check_func_arg_reg_off(struct bpf_verifier_env *env,
 	 * fixed offset.
 	 */
 	case PTR_TO_BTF_ID:
+	case PTR_TO_BTF_ID | PTR_WALKED:
 		/* When referenced PTR_TO_BTF_ID is passed to release function,
 		 * it's fixed offset must be 0.	In the other cases, fixed offset
 		 * can be non-zero.
@@ -12136,8 +12149,30 @@  static bool reg_type_mismatch_ok(enum bpf_reg_type type)
  */
 static bool reg_type_mismatch(enum bpf_reg_type src, enum bpf_reg_type prev)
 {
-	return src != prev && (!reg_type_mismatch_ok(src) ||
-			       !reg_type_mismatch_ok(prev));
+	/* Compare only the base types of the registers, to avoid confusing the
+	 * verifier with the following type of code:
+	 *
+	 * struct fib6_nh *fib6_nh;
+	 * struct nexthop *nh;
+	 *
+	 * fib6_nh = &rt->fib6_nh[0];
+	 *
+	 * nh = rt->nh;
+	 * if (nh)
+	 *	fib6_nh = &nh->nh_info->fib6_nh;
+	 *
+	 * If we did not compare base types, the verifier would reject this
+	 * because the register in the former branch will have PTR_TO_BTF_ID,
+	 * whereas the latter branch will have PTR_TO_BTF_ID | PTR_WALKED.
+	 *
+	 * The safety of the memory access is validated in check_mem_access()
+	 * before this function is called. The intention here is rather to
+	 * prevent a program from doing something like using PTR_TO_BTF_ID in
+	 * one path, and PTR_TO_CTX in another, as it would cause the
+	 * convert_ctx_access() handling to be incorrect.
+	 */
+	return base_type(src) != base_type(prev) &&
+	       (!reg_type_mismatch_ok(src) || !reg_type_mismatch_ok(prev));
 }
 
 static int do_check(struct bpf_verifier_env *env)
@@ -13499,6 +13534,7 @@  static int convert_ctx_accesses(struct bpf_verifier_env *env)
 			break;
 		case PTR_TO_BTF_ID:
 		case PTR_TO_BTF_ID | PTR_UNTRUSTED:
+		case PTR_TO_BTF_ID | PTR_WALKED:
 			if (type == BPF_READ) {
 				insn->code = BPF_LDX | BPF_PROBE_MEM |
 					BPF_SIZE((insn)->code);
diff --git a/net/bpf/test_run.c b/net/bpf/test_run.c
index 13d578ce2a09..a298bef13e12 100644
--- a/net/bpf/test_run.c
+++ b/net/bpf/test_run.c
@@ -758,7 +758,7 @@  BTF_ID_FLAGS(func, bpf_kfunc_call_test_fail3)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_pass1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail1)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_mem_len_fail2)
-BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_kfunc_call_test_ref, KF_OWNED_ARGS)
 BTF_ID_FLAGS(func, bpf_kfunc_call_test_destructive, KF_DESTRUCTIVE)
 BTF_SET8_END(test_sk_check_kfunc_ids)
 
diff --git a/net/netfilter/nf_conntrack_bpf.c b/net/netfilter/nf_conntrack_bpf.c
index 8639e7efd0e2..ef937f4b4fe4 100644
--- a/net/netfilter/nf_conntrack_bpf.c
+++ b/net/netfilter/nf_conntrack_bpf.c
@@ -479,10 +479,10 @@  BTF_ID_FLAGS(func, bpf_skb_ct_alloc, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_skb_ct_lookup, KF_ACQUIRE | KF_RET_NULL)
 BTF_ID_FLAGS(func, bpf_ct_insert_entry, KF_ACQUIRE | KF_RET_NULL | KF_RELEASE)
 BTF_ID_FLAGS(func, bpf_ct_release, KF_RELEASE)
-BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_set_status, KF_TRUSTED_ARGS)
-BTF_ID_FLAGS(func, bpf_ct_change_status, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_timeout, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_timeout, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_status, KF_OWNED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_change_status, KF_OWNED_ARGS)
 BTF_SET8_END(nf_ct_kfunc_set)
 
 static const struct btf_kfunc_id_set nf_conntrack_kfunc_set = {
diff --git a/net/netfilter/nf_nat_bpf.c b/net/netfilter/nf_nat_bpf.c
index 0fa5a0bbb0ff..69d6668756c4 100644
--- a/net/netfilter/nf_nat_bpf.c
+++ b/net/netfilter/nf_nat_bpf.c
@@ -57,7 +57,7 @@  int bpf_ct_set_nat_info(struct nf_conn___init *nfct,
 __diag_pop()
 
 BTF_SET8_START(nf_nat_kfunc_set)
-BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_TRUSTED_ARGS)
+BTF_ID_FLAGS(func, bpf_ct_set_nat_info, KF_OWNED_ARGS)
 BTF_SET8_END(nf_nat_kfunc_set)
 
 static const struct btf_kfunc_id_set nf_bpf_nat_kfunc_set = {
diff --git a/tools/testing/selftests/bpf/prog_tests/map_kptr.c b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
index 0d66b1524208..1070ce936d32 100644
--- a/tools/testing/selftests/bpf/prog_tests/map_kptr.c
+++ b/tools/testing/selftests/bpf/prog_tests/map_kptr.c
@@ -20,7 +20,7 @@  struct {
 	{ "reject_bad_type_match", "invalid kptr access, R1 type=untrusted_ptr_prog_test_ref_kfunc" },
 	{ "marked_as_untrusted_or_null", "R1 type=untrusted_ptr_or_null_ expected=percpu_ptr_" },
 	{ "correct_btf_id_check_size", "access beyond struct prog_test_ref_kfunc at off 32 size 4" },
-	{ "inherit_untrusted_on_walk", "R1 type=untrusted_ptr_ expected=percpu_ptr_" },
+	{ "inherit_untrusted_on_walk", "R1 type=untrusted_walked_ptr_ expected=percpu_ptr_" },
 	{ "reject_kptr_xchg_on_unref", "off=8 kptr isn't referenced kptr" },
 	{ "reject_kptr_get_no_map_val", "arg#0 expected pointer to map value" },
 	{ "reject_kptr_get_no_null_map_val", "arg#0 expected pointer to map value" },
diff --git a/tools/testing/selftests/bpf/verifier/calls.c b/tools/testing/selftests/bpf/verifier/calls.c
index e1a937277b54..3327b3e75ce8 100644
--- a/tools/testing/selftests/bpf/verifier/calls.c
+++ b/tools/testing/selftests/bpf/verifier/calls.c
@@ -181,7 +181,7 @@ 
 	},
 	.result_unpriv = REJECT,
 	.result = REJECT,
-	.errstr = "negative offset ptr_ ptr R1 off=-4 disallowed",
+	.errstr = "negative offset walked_ptr_ ptr R1 off=-4 disallowed",
 },
 {
 	"calls: invalid kfunc call: PTR_TO_BTF_ID with variable offset",
@@ -243,7 +243,7 @@ 
 	},
 	.result_unpriv = REJECT,
 	.result = REJECT,
-	.errstr = "R1 must be referenced",
+	.errstr = "arg#0 pointer type STRUCT prog_test_ref_kfunc must point to scalar",
 },
 {
 	"calls: valid kfunc call: referenced arg needs refcounted PTR_TO_BTF_ID",
diff --git a/tools/testing/selftests/bpf/verifier/map_kptr.c b/tools/testing/selftests/bpf/verifier/map_kptr.c
index 6914904344c0..003bae55d79e 100644
--- a/tools/testing/selftests/bpf/verifier/map_kptr.c
+++ b/tools/testing/selftests/bpf/verifier/map_kptr.c
@@ -242,7 +242,7 @@ 
 	.prog_type = BPF_PROG_TYPE_SCHED_CLS,
 	.fixup_map_kptr = { 1 },
 	.result = REJECT,
-	.errstr = "R1 type=untrusted_ptr_ expected=percpu_ptr_",
+	.errstr = "R1 type=untrusted_walked_ptr_ expected=percpu_ptr_",
 },
 {
 	"map_kptr: unref: no reference state created",