diff mbox series

[trace/for-next,1/3] bpf: put bpf_link's program when link is safe to be deallocated

Message ID 20241031210938.1696639-1-andrii@kernel.org (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series [trace/for-next,1/3] bpf: put bpf_link's program when link is safe to be deallocated | expand

Checks

Context Check Description
netdev/series_format warning Series does not have a cover letter; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 9 maintainers not CCed: sdf@fomichev.me kpsingh@kernel.org john.fastabend@gmail.com eddyz87@gmail.com martin.lau@linux.dev song@kernel.org haoluo@google.com yonghong.song@linux.dev jolsa@kernel.org
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 4 this patch: 4
netdev/checkpatch warning WARNING: line length of 96 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-4 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-9 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-8 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-11 success Logs for s390x-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-12 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-13 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-21 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-15 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-17 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-16 success Logs for x86_64-gcc / veristat
bpf/vmtest-bpf-next-VM_Test-22 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-24 success Logs for x86_64-llvm-18 / veristat
bpf/vmtest-bpf-next-VM_Test-14 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck

Commit Message

Andrii Nakryiko Oct. 31, 2024, 9:09 p.m. UTC
In general, BPF link's underlying BPF program should be considered to be
reachable through attach hook -> link -> prog chain, and, pessimistically,
we have to assume that as long as link's memory is not safe to free,
attach hook's code might hold a pointer to BPF program and use it.

As such, it's not (generally) correct to put link's program early before
waiting for RCU GPs to go through. More eager bpf_prog_put() that we
currently do is mostly correct due to BPF program's release code doing
similar RCU GP waiting, but as will be shown in the following patches,
BPF program can be non-sleepable (and, thus, reliant on only "classic"
RCU GP), while BPF link's attach hook can have sleepable semantics and
needs to be protected by RCU Tasks Trace, and for such cases BPF link
has to go through RCU Tasks Trace + "classic" RCU GPs before being
deallocated. And so, if we put BPF program early, we might free BPF
program before we free BPF link, leading to use-after-free situation.

So, this patch defers bpf_prog_put() until we are ready to perform
bpf_link's deallocation. At worst, this delays BPF program freeing by
one extra RCU GP, but that seems completely acceptable. Alternatively,
we'd need more elaborate ways to determine BPF hook, BPF link, and BPF
program lifetimes, and how they relate to each other, which seems like
an unnecessary complication.

Note, for most BPF links we still will perform eager bpf_prog_put() and
link dealloc, so for those BPF links there are no observable changes
whatsoever. Only BPF links that use deferred dealloc might notice
slightly delayed freeing of BPF programs.

Also, to reduce code and logic duplication, extract program put + link
dealloc logic into bpf_link_dealloc() helper.

Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
---
 kernel/bpf/syscall.c | 22 +++++++++++++++++-----
 1 file changed, 17 insertions(+), 5 deletions(-)

Comments

Steven Rostedt Nov. 1, 2024, 10:55 a.m. UTC | #1
On Thu, 31 Oct 2024 14:09:36 -0700
Andrii Nakryiko <andrii@kernel.org> wrote:

> In general, BPF link's underlying BPF program should be considered to be
> reachable through attach hook -> link -> prog chain, and, pessimistically,
> we have to assume that as long as link's memory is not safe to free,
> attach hook's code might hold a pointer to BPF program and use it.
> 
> As such, it's not (generally) correct to put link's program early before
> waiting for RCU GPs to go through. More eager bpf_prog_put() that we
> currently do is mostly correct due to BPF program's release code doing
> similar RCU GP waiting, but as will be shown in the following patches,
> BPF program can be non-sleepable (and, thus, reliant on only "classic"
> RCU GP), while BPF link's attach hook can have sleepable semantics and
> needs to be protected by RCU Tasks Trace, and for such cases BPF link
> has to go through RCU Tasks Trace + "classic" RCU GPs before being
> deallocated. And so, if we put BPF program early, we might free BPF
> program before we free BPF link, leading to use-after-free situation.
> 
> So, this patch defers bpf_prog_put() until we are ready to perform
> bpf_link's deallocation. At worst, this delays BPF program freeing by
> one extra RCU GP, but that seems completely acceptable. Alternatively,
> we'd need more elaborate ways to determine BPF hook, BPF link, and BPF
> program lifetimes, and how they relate to each other, which seems like
> an unnecessary complication.
> 
> Note, for most BPF links we still will perform eager bpf_prog_put() and
> link dealloc, so for those BPF links there are no observable changes
> whatsoever. Only BPF links that use deferred dealloc might notice
> slightly delayed freeing of BPF programs.
> 
> Also, to reduce code and logic duplication, extract program put + link
> dealloc logic into bpf_link_dealloc() helper.
> 
> Signed-off-by: Andrii Nakryiko <andrii@kernel.org>


Hi Andrii,

Do you want me to add this on top of my queue? If so, would it be possible
that I can get a tested-by from someone? As I don't do much to test BPF
patches.

-- Steve
Andrii Nakryiko Nov. 1, 2024, 5:54 p.m. UTC | #2
On Fri, Nov 1, 2024 at 3:55 AM Steven Rostedt <rostedt@goodmis.org> wrote:
>
> On Thu, 31 Oct 2024 14:09:36 -0700
> Andrii Nakryiko <andrii@kernel.org> wrote:
>
> > In general, BPF link's underlying BPF program should be considered to be
> > reachable through attach hook -> link -> prog chain, and, pessimistically,
> > we have to assume that as long as link's memory is not safe to free,
> > attach hook's code might hold a pointer to BPF program and use it.
> >
> > As such, it's not (generally) correct to put link's program early before
> > waiting for RCU GPs to go through. More eager bpf_prog_put() that we
> > currently do is mostly correct due to BPF program's release code doing
> > similar RCU GP waiting, but as will be shown in the following patches,
> > BPF program can be non-sleepable (and, thus, reliant on only "classic"
> > RCU GP), while BPF link's attach hook can have sleepable semantics and
> > needs to be protected by RCU Tasks Trace, and for such cases BPF link
> > has to go through RCU Tasks Trace + "classic" RCU GPs before being
> > deallocated. And so, if we put BPF program early, we might free BPF
> > program before we free BPF link, leading to use-after-free situation.
> >
> > So, this patch defers bpf_prog_put() until we are ready to perform
> > bpf_link's deallocation. At worst, this delays BPF program freeing by
> > one extra RCU GP, but that seems completely acceptable. Alternatively,
> > we'd need more elaborate ways to determine BPF hook, BPF link, and BPF
> > program lifetimes, and how they relate to each other, which seems like
> > an unnecessary complication.
> >
> > Note, for most BPF links we still will perform eager bpf_prog_put() and
> > link dealloc, so for those BPF links there are no observable changes
> > whatsoever. Only BPF links that use deferred dealloc might notice
> > slightly delayed freeing of BPF programs.
> >
> > Also, to reduce code and logic duplication, extract program put + link
> > dealloc logic into bpf_link_dealloc() helper.
> >
> > Signed-off-by: Andrii Nakryiko <andrii@kernel.org>
>
>
> Hi Andrii,
>
> Do you want me to add this on top of my queue? If so, would it be possible
> that I can get a tested-by from someone? As I don't do much to test BPF
> patches.

Hey Steven,

Yep, this should go on top of Mathieu's patch set. Let me send v2 with
fixed up stub definition. Jordan gave his Tested-By and Alexei seems
fine with this as well, so I think it should be good to go.

>
> -- Steve
diff mbox series

Patch

diff --git a/kernel/bpf/syscall.c b/kernel/bpf/syscall.c
index a8f1808a1ca5..aa7246a399f3 100644
--- a/kernel/bpf/syscall.c
+++ b/kernel/bpf/syscall.c
@@ -2976,12 +2976,24 @@  void bpf_link_inc(struct bpf_link *link)
 	atomic64_inc(&link->refcnt);
 }
 
+static void bpf_link_dealloc(struct bpf_link *link)
+{
+	/* now that we know that bpf_link itself can't be reached, put underlying BPF program */
+	if (link->prog)
+		bpf_prog_put(link->prog);
+
+	/* free bpf_link and its containing memory */
+	if (link->ops->dealloc_deferred)
+		link->ops->dealloc_deferred(link);
+	else
+		link->ops->dealloc(link);
+}
+
 static void bpf_link_defer_dealloc_rcu_gp(struct rcu_head *rcu)
 {
 	struct bpf_link *link = container_of(rcu, struct bpf_link, rcu);
 
-	/* free bpf_link and its containing memory */
-	link->ops->dealloc_deferred(link);
+	bpf_link_dealloc(link);
 }
 
 static void bpf_link_defer_dealloc_mult_rcu_gp(struct rcu_head *rcu)
@@ -3003,7 +3015,6 @@  static void bpf_link_free(struct bpf_link *link)
 		sleepable = link->prog->sleepable;
 		/* detach BPF program, clean up used resources */
 		ops->release(link);
-		bpf_prog_put(link->prog);
 	}
 	if (ops->dealloc_deferred) {
 		/* schedule BPF link deallocation; if underlying BPF program
@@ -3014,8 +3025,9 @@  static void bpf_link_free(struct bpf_link *link)
 			call_rcu_tasks_trace(&link->rcu, bpf_link_defer_dealloc_mult_rcu_gp);
 		else
 			call_rcu(&link->rcu, bpf_link_defer_dealloc_rcu_gp);
-	} else if (ops->dealloc)
-		ops->dealloc(link);
+	} else if (ops->dealloc) {
+		bpf_link_dealloc(link);
+	}
 }
 
 static void bpf_link_put_deferred(struct work_struct *work)