[v7,bpf-next,06/10] bpf: Add bpf_sock_destroy kfunc

Message ID	20230503225351.3700208-7-aditi.ghag@isovalent.com (mailing list archive)
State	Superseded
Delegated to:	BPF
Headers	show Return-Path: <bpf-owner@vger.kernel.org> From: Aditi Ghag <aditi.ghag@isovalent.com> To: bpf@vger.kernel.org Cc: kafai@fb.com, sdf@google.com, aditi.ghag@isovalent.com Subject: [PATCH v7 bpf-next 06/10] bpf: Add bpf_sock_destroy kfunc Date: Wed, 3 May 2023 22:53:47 +0000 Message-Id: <20230503225351.3700208-7-aditi.ghag@isovalent.com> In-Reply-To: <20230503225351.3700208-1-aditi.ghag@isovalent.com> References: <20230503225351.3700208-1-aditi.ghag@isovalent.com> MIME-Version: 1.0 Content-Transfer-Encoding: 8bit Precedence: bulk
Series	bpf: Add socket destroy capability \| expand [v7,bpf-next,00/10] bpf: Add socket destroy capability [v7,bpf-next,01/10] bpf: tcp: Avoid taking fast sock lock in iterator [v7,bpf-next,02/10] udp: seq_file: Helper function to match socket attributes [v7,bpf-next,03/10] bpf: udp: Encapsulate logic to get udp table [v7,bpf-next,04/10] udp: seq_file: Remove bpf_seq_afinfo from udp_iter_state [v7,bpf-next,05/10] bpf: udp: Implement batching for sockets iterator [v7,bpf-next,06/10] bpf: Add bpf_sock_destroy kfunc [v7,bpf-next,07/10] selftests/bpf: Add helper to get port using getsockname [v7,bpf-next,08/10] selftests/bpf: Test bpf_sock_destroy [v7,bpf-next,09/10] bpf: Add a kfunc filter function to 'struct btf_kfunc_id_set' [v7,bpf-next,10/10] selftests/bpf: Extend bpf_sock_destroy tests

Message ID

20230503225351.3700208-7-aditi.ghag@isovalent.com (mailing list archive)

State

Superseded

Delegated to:

BPF

Headers

From: Aditi Ghag <aditi.ghag@isovalent.com>
To: bpf@vger.kernel.org
Cc: kafai@fb.com, sdf@google.com, aditi.ghag@isovalent.com
Subject: [PATCH v7 bpf-next 06/10] bpf: Add bpf_sock_destroy kfunc
Date: Wed,  3 May 2023 22:53:47 +0000
Message-Id: <20230503225351.3700208-7-aditi.ghag@isovalent.com>
In-Reply-To: <20230503225351.3700208-1-aditi.ghag@isovalent.com>
References: <20230503225351.3700208-1-aditi.ghag@isovalent.com>
MIME-Version: 1.0
Content-Transfer-Encoding: 8bit
Precedence: bulk

Series

bpf: Add socket destroy capability | expand

Context	Check	Description
bpf/vmtest-bpf-next-PR	success	PR summary
bpf/vmtest-bpf-next-VM_Test-4	fail	Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-8	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-9	success	Logs for veristat
netdev/series_format	success	Posting correctly formatted
netdev/tree_selection	success	Clearly marked for bpf-next
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	fail	Errors and warnings before: 39 this patch: 40
netdev/cc_maintainers	warning	17 maintainers not CCed: kuba@kernel.org daniel@iogearbox.net dsahern@kernel.org yhs@fb.com kpsingh@kernel.org netdev@vger.kernel.org john.fastabend@gmail.com martin.lau@linux.dev song@kernel.org andrii@kernel.org jolsa@kernel.org davem@davemloft.net pabeni@redhat.com haoluo@google.com ast@kernel.org willemdebruijn.kernel@gmail.com edumazet@google.com
netdev/build_clang	success	Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	fail	Errors and warnings before: 39 this patch: 40
netdev/checkpatch	warning	WARNING: line length of 81 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-1	success	Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-7	success	Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-2	success	Logs for build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-3	fail	Logs for build for aarch64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-6	fail	Logs for build for x86_64 with llvm-16
bpf/vmtest-bpf-next-VM_Test-5	success	Logs for build for x86_64 with gcc

Context

Check

Description

bpf/vmtest-bpf-next-PR

success

PR summary

bpf/vmtest-bpf-next-VM_Test-4

fail

Logs for build for aarch64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-8

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-9

success

Logs for veristat

netdev/series_format

success

Posting correctly formatted

netdev/tree_selection

success

Clearly marked for bpf-next

netdev/fixes_present

success

Fixes tag not required for -next series

netdev/header_inline

success

No static functions without inline keyword in header files

netdev/build_32bit

fail

Errors and warnings before: 39 this patch: 40

netdev/cc_maintainers

warning

17 maintainers not CCed: kuba@kernel.org daniel@iogearbox.net dsahern@kernel.org yhs@fb.com kpsingh@kernel.org netdev@vger.kernel.org john.fastabend@gmail.com martin.lau@linux.dev song@kernel.org andrii@kernel.org jolsa@kernel.org davem@davemloft.net pabeni@redhat.com haoluo@google.com ast@kernel.org willemdebruijn.kernel@gmail.com edumazet@google.com

netdev/build_clang

success

Errors and warnings before: 8 this patch: 8

netdev/verify_signedoff

success

Signed-off-by tag matches author and committer

netdev/deprecated_api

success

None detected

netdev/check_selftest

success

No net selftest shell script

netdev/verify_fixes

success

No Fixes tag

netdev/build_allmodconfig_warn

fail

Errors and warnings before: 39 this patch: 40

netdev/checkpatch

warning

WARNING: line length of 81 exceeds 80 columns WARNING: line length of 93 exceeds 80 columns

netdev/kdoc

success

Errors and warnings before: 0 this patch: 0

netdev/source_inline

success

Was 0 now: 0

bpf/vmtest-bpf-next-VM_Test-1

success

Logs for ShellCheck

bpf/vmtest-bpf-next-VM_Test-7

success

Logs for set-matrix

bpf/vmtest-bpf-next-VM_Test-2

success

Logs for build for aarch64 with gcc

bpf/vmtest-bpf-next-VM_Test-3

fail

Logs for build for aarch64 with llvm-17

bpf/vmtest-bpf-next-VM_Test-6

fail

Logs for build for x86_64 with llvm-16

bpf/vmtest-bpf-next-VM_Test-5

success

Logs for build for x86_64 with gcc

Commit Message

Aditi Ghag May 3, 2023, 10:53 p.m. UTC

The socket destroy kfunc is used to forcefully terminate sockets from
certain BPF contexts. We plan to use the capability in Cilium to force
client sockets to reconnect when their remote load-balancing backends are
deleted. The other use case is on-the-fly policy enforcement where existing
socket connections prevented by policies need to be forcefully terminated.
The helper allows terminating sockets that may or may not be actively
sending traffic.

The helper is currently exposed to certain BPF iterators where users can
filter, and terminate selected sockets. Additionally, the helper can only
be called from these BPF contexts that ensure socket locking in order to
allow synchronous execution of destroy helpers that also acquire socket
locks. The previous commit that batches UDP sockets during iteration
facilitated a synchronous invocation of the destroy helper from BPF context
by skipping taking socket locks in the destroy handler. TCP iterators
already supported batching.
Follow-up commits will ensure that the kfunc can only be called from
programs with `BPF_TRACE_ITER` attach type.

The helper takes `sock_common` type argument, even though it expects, and
casts them to a `sock` pointer. This enables the verifier to allow the
sock_destroy kfunc to be called for TCP with `sock_common` and UDP with
`sock` structs. As a comparison, BPF helpers enable this behavior with the
`ARG_PTR_TO_BTF_ID_SOCK_COMMON` argument type. However, there is no such
option available with the verifier logic that handles kfuncs where BTF
types are inferred. Furthermore, as `sock_common` only has a subset of
certain fields of `sock`, casting pointer to the latter type might not
always be safe for certain sockets like request sockets, but these have
a special handling in the diag_destroy handlers.

Signed-off-by: Aditi Ghag <aditi.ghag@isovalent.com>
---
 net/core/filter.c | 57 +++++++++++++++++++++++++++++++++++++++++++++++
 net/ipv4/tcp.c    | 10 ++++++---
 net/ipv4/udp.c    |  6 +++--
 3 files changed, 68 insertions(+), 5 deletions(-)

Comments

Martin KaFai Lau May 5, 2023, 12:13 a.m. UTC | #1

On 5/3/23 3:53 PM, Aditi Ghag wrote:
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 727c5269867d..97d70b7959a1 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -11715,3 +11715,60 @@ static int __init bpf_kfunc_init(void)
>   	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
>   }
>   late_initcall(bpf_kfunc_init);
> +
> +/* Disables missing prototype warnings */
> +__diag_push();
> +__diag_ignore_all("-Wmissing-prototypes",
> +		  "Global functions as their definitions will be in vmlinux BTF");
> +
> +/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
> + *
> + * The function expects a non-NULL pointer to a socket, and invokes the
> + * protocol specific socket destroy handlers.
> + *
> + * The helper can only be called from BPF contexts that have acquired the socket
> + * locks.
> + *
> + * Parameters:
> + * @sock: Pointer to socket to be destroyed
> + *
> + * Return:
> + * On error, may return EPROTONOSUPPORT, EINVAL.
> + * EPROTONOSUPPORT if protocol specific destroy handler is not supported.
> + * 0 otherwise
> + */
> +__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
> +{
> +	struct sock *sk = (struct sock *)sock;
> +
> +	if (!sk)

If the kfunc has the KF_TRUSTED_ARGS flag, this NULL test is no longer needed. 
More details below.

> +		return -EINVAL;
> +
> +	/* The locking semantics that allow for synchronous execution of the
> +	 * destroy handlers are only supported for TCP and UDP.
> +	 * Supporting protocols will need to acquire lock_sock in the BPF context
> +	 * prior to invoking this kfunc.
> +	 */
> +	if (!sk->sk_prot->diag_destroy || (sk->sk_protocol != IPPROTO_TCP &&
> +					   sk->sk_protocol != IPPROTO_UDP))
> +		return -EOPNOTSUPP;
> +
> +	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
> +}
> +
> +__diag_pop()
> +
> +BTF_SET8_START(sock_destroy_kfunc_set)

nit. Rename it to a more generic name for future sk_iter related kfunc.
May be bpf_sk_iter_kfunc_set ?

> +BTF_ID_FLAGS(func, bpf_sock_destroy)

Follow up on the v6 patch-set regarding KF_TRUSTED_ARGS.
KF_TRUSTED_ARGS is needed here to avoid the cases where a PTR_TO_BTF_ID sk is 
obtained by following another pointer. eg. getting a sk pointer (may be even 
NULL) by following another sk pointer. The recent PTR_TRUSTED concept in the 
verifier can guard this. I tried and the following should do:

diff --git i/net/core/filter.c w/net/core/filter.c
index 68b228f3eca6..d82e038da0e3 100644
--- i/net/core/filter.c
+++ w/net/core/filter.c
@@ -11767,7 +11767,7 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
  __diag_pop()

  BTF_SET8_START(sock_destroy_kfunc_set)
-BTF_ID_FLAGS(func, bpf_sock_destroy)
+BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS)
  BTF_SET8_END(sock_destroy_kfunc_set)

  static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
diff --git i/net/ipv4/tcp_ipv4.c w/net/ipv4/tcp_ipv4.c
index 887f83a90d85..a769284e8291 100644
--- i/net/ipv4/tcp_ipv4.c
+++ w/net/ipv4/tcp_ipv4.c
@@ -3354,7 +3354,7 @@ static struct bpf_iter_reg tcp_reg_info = {
  	.ctx_arg_info_size	= 1,
  	.ctx_arg_info		= {
  		{ offsetof(struct bpf_iter__tcp, sk_common),
-		  PTR_TO_BTF_ID_OR_NULL },
+		  PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
  	},
  	.get_func_proto		= bpf_iter_tcp_get_func_proto,
  	.seq_info		= &tcp_seq_info,
diff --git i/net/ipv4/udp.c w/net/ipv4/udp.c
index 746c85f2bb03..945b641b363b 100644
--- i/net/ipv4/udp.c
+++ w/net/ipv4/udp.c
@@ -3646,7 +3646,7 @@ static struct bpf_iter_reg udp_reg_info = {
  	.ctx_arg_info_size	= 1,
  	.ctx_arg_info		= {
  		{ offsetof(struct bpf_iter__udp, udp_sk),
-		  PTR_TO_BTF_ID_OR_NULL },
+		  PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
  	},
  	.seq_info		= &udp_seq_info,
  };

Please take a look and run through the test_progs. If you agree on the changes, 
this should be included in the same patch 6.

> +BTF_SET8_END(sock_destroy_kfunc_set)
> +
> +static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
> +	.owner = THIS_MODULE,
> +	.set   = &sock_destroy_kfunc_set,
> +};
> +
> +static int init_subsystem(void)
> +{
> +	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);
> +}
> +late_initcall(init_subsystem);
> diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
> index 288693981b00..2259b4facc2f 100644
> --- a/net/ipv4/tcp.c
> +++ b/net/ipv4/tcp.c
> @@ -4679,8 +4679,10 @@ int tcp_abort(struct sock *sk, int err)
>   		return 0;
>   	}
>   
> -	/* Don't race with userspace socket closes such as tcp_close. */
> -	lock_sock(sk);
> +	/* BPF context ensures sock locking. */
> +	if (!has_current_bpf_ctx())
> +		/* Don't race with userspace socket closes such as tcp_close. */
> +		lock_sock(sk);
>   
>   	if (sk->sk_state == TCP_LISTEN) {
>   		tcp_set_state(sk, TCP_CLOSE);
> @@ -4702,9 +4704,11 @@ int tcp_abort(struct sock *sk, int err)
>   	}
>   
>   	bh_unlock_sock(sk);
> +

nit. unnecessary new line change.

>   	local_bh_enable();
>   	tcp_write_queue_purge(sk);
> -	release_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		release_sock(sk);
>   	return 0;
>   }
>   EXPORT_SYMBOL_GPL(tcp_abort);
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 150551acab9d..5f48cdf82a45 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -2925,7 +2925,8 @@ EXPORT_SYMBOL(udp_poll);
>   
>   int udp_abort(struct sock *sk, int err)
>   {
> -	lock_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		lock_sock(sk);
>   
>   	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
>   	 * with close()
> @@ -2938,7 +2939,8 @@ int udp_abort(struct sock *sk, int err)
>   	__udp_disconnect(sk, 0);
>   
>   out:
> -	release_sock(sk);
> +	if (!has_current_bpf_ctx())
> +		release_sock(sk);
>   
>   	return 0;
>   }

Martin KaFai Lau May 5, 2023, 6:49 p.m. UTC | #2

On 5/4/23 5:13 PM, Martin KaFai Lau wrote:
> 
> Follow up on the v6 patch-set regarding KF_TRUSTED_ARGS.
> KF_TRUSTED_ARGS is needed here to avoid the cases where a PTR_TO_BTF_ID sk is 
> obtained by following another pointer. eg. getting a sk pointer (may be even 
> NULL) by following another sk pointer. The recent PTR_TRUSTED concept in the 
> verifier can guard this. I tried and the following should do:
> 
> diff --git i/net/core/filter.c w/net/core/filter.c
> index 68b228f3eca6..d82e038da0e3 100644
> --- i/net/core/filter.c
> +++ w/net/core/filter.c
> @@ -11767,7 +11767,7 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
>   __diag_pop()
> 
>   BTF_SET8_START(sock_destroy_kfunc_set)
> -BTF_ID_FLAGS(func, bpf_sock_destroy)
> +BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS)
>   BTF_SET8_END(sock_destroy_kfunc_set)
> 
>   static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
> diff --git i/net/ipv4/tcp_ipv4.c w/net/ipv4/tcp_ipv4.c
> index 887f83a90d85..a769284e8291 100644
> --- i/net/ipv4/tcp_ipv4.c
> +++ w/net/ipv4/tcp_ipv4.c
> @@ -3354,7 +3354,7 @@ static struct bpf_iter_reg tcp_reg_info = {
>       .ctx_arg_info_size    = 1,
>       .ctx_arg_info        = {
>           { offsetof(struct bpf_iter__tcp, sk_common),
> -          PTR_TO_BTF_ID_OR_NULL },
> +          PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },

Alexei, what do you think about having "PTR_MAYBE_NULL | PTR_TRUSTED" here?
The verifier side looks fine (eg. is_trusted_reg() is taking PTR_MAYBE_NULL into 
consideration). However, it seems this will be the first "PTR_MAYBE_NULL | 
PTR_TRUSTED" use case and not sure if PTR_MAYBE_NULL may conceptually conflict 
with the PTR_TRUSTED idea (like PTR_TRUSTED should not be NULL).

>       },
>       .get_func_proto        = bpf_iter_tcp_get_func_proto,
>       .seq_info        = &tcp_seq_info,
> diff --git i/net/ipv4/udp.c w/net/ipv4/udp.c
> index 746c85f2bb03..945b641b363b 100644
> --- i/net/ipv4/udp.c
> +++ w/net/ipv4/udp.c
> @@ -3646,7 +3646,7 @@ static struct bpf_iter_reg udp_reg_info = {
>       .ctx_arg_info_size    = 1,
>       .ctx_arg_info        = {
>           { offsetof(struct bpf_iter__udp, udp_sk),
> -          PTR_TO_BTF_ID_OR_NULL },
> +          PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
>       },
>       .seq_info        = &udp_seq_info,
>   };

Alexei Starovoitov May 5, 2023, 8:05 p.m. UTC | #3

On Fri, May 5, 2023 at 11:49 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 5/4/23 5:13 PM, Martin KaFai Lau wrote:
> >
> > Follow up on the v6 patch-set regarding KF_TRUSTED_ARGS.
> > KF_TRUSTED_ARGS is needed here to avoid the cases where a PTR_TO_BTF_ID sk is
> > obtained by following another pointer. eg. getting a sk pointer (may be even
> > NULL) by following another sk pointer. The recent PTR_TRUSTED concept in the
> > verifier can guard this. I tried and the following should do:
> >
> > diff --git i/net/core/filter.c w/net/core/filter.c
> > index 68b228f3eca6..d82e038da0e3 100644
> > --- i/net/core/filter.c
> > +++ w/net/core/filter.c
> > @@ -11767,7 +11767,7 @@ __bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
> >   __diag_pop()
> >
> >   BTF_SET8_START(sock_destroy_kfunc_set)
> > -BTF_ID_FLAGS(func, bpf_sock_destroy)
> > +BTF_ID_FLAGS(func, bpf_sock_destroy, KF_TRUSTED_ARGS)
> >   BTF_SET8_END(sock_destroy_kfunc_set)
> >
> >   static int tracing_iter_filter(const struct bpf_prog *prog, u32 kfunc_id)
> > diff --git i/net/ipv4/tcp_ipv4.c w/net/ipv4/tcp_ipv4.c
> > index 887f83a90d85..a769284e8291 100644
> > --- i/net/ipv4/tcp_ipv4.c
> > +++ w/net/ipv4/tcp_ipv4.c
> > @@ -3354,7 +3354,7 @@ static struct bpf_iter_reg tcp_reg_info = {
> >       .ctx_arg_info_size    = 1,
> >       .ctx_arg_info        = {
> >           { offsetof(struct bpf_iter__tcp, sk_common),
> > -          PTR_TO_BTF_ID_OR_NULL },
> > +          PTR_TO_BTF_ID_OR_NULL | PTR_TRUSTED },
>
> Alexei, what do you think about having "PTR_MAYBE_NULL | PTR_TRUSTED" here?
> The verifier side looks fine (eg. is_trusted_reg() is taking PTR_MAYBE_NULL into
> consideration). However, it seems this will be the first "PTR_MAYBE_NULL |
> PTR_TRUSTED" use case and not sure if PTR_MAYBE_NULL may conceptually conflict
> with the PTR_TRUSTED idea (like PTR_TRUSTED should not be NULL).

Conceptually it should be fine. There are no real cases of
PTR_TRUSTED | PTR_MAYBE_NULL now, though check_reg_type() handles it.
Proceed with care, I guess :)

diff --git a/net/core/filter.c b/net/core/filter.c
index 727c5269867d..97d70b7959a1 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -11715,3 +11715,60 @@  static int __init bpf_kfunc_init(void)
 	return ret ?: register_btf_kfunc_id_set(BPF_PROG_TYPE_XDP, &bpf_kfunc_set_xdp);
 }
 late_initcall(bpf_kfunc_init);
+
+/* Disables missing prototype warnings */
+__diag_push();
+__diag_ignore_all("-Wmissing-prototypes",
+		  "Global functions as their definitions will be in vmlinux BTF");
+
+/* bpf_sock_destroy: Destroy the given socket with ECONNABORTED error code.
+ *
+ * The function expects a non-NULL pointer to a socket, and invokes the
+ * protocol specific socket destroy handlers.
+ *
+ * The helper can only be called from BPF contexts that have acquired the socket
+ * locks.
+ *
+ * Parameters:
+ * @sock: Pointer to socket to be destroyed
+ *
+ * Return:
+ * On error, may return EPROTONOSUPPORT, EINVAL.
+ * EPROTONOSUPPORT if protocol specific destroy handler is not supported.
+ * 0 otherwise
+ */
+__bpf_kfunc int bpf_sock_destroy(struct sock_common *sock)
+{
+	struct sock *sk = (struct sock *)sock;
+
+	if (!sk)
+		return -EINVAL;
+
+	/* The locking semantics that allow for synchronous execution of the
+	 * destroy handlers are only supported for TCP and UDP.
+	 * Supporting protocols will need to acquire lock_sock in the BPF context
+	 * prior to invoking this kfunc.
+	 */
+	if (!sk->sk_prot->diag_destroy || (sk->sk_protocol != IPPROTO_TCP &&
+					   sk->sk_protocol != IPPROTO_UDP))
+		return -EOPNOTSUPP;
+
+	return sk->sk_prot->diag_destroy(sk, ECONNABORTED);
+}
+
+__diag_pop()
+
+BTF_SET8_START(sock_destroy_kfunc_set)
+BTF_ID_FLAGS(func, bpf_sock_destroy)
+BTF_SET8_END(sock_destroy_kfunc_set)
+
+static const struct btf_kfunc_id_set bpf_sock_destroy_kfunc_set = {
+	.owner = THIS_MODULE,
+	.set   = &sock_destroy_kfunc_set,
+};
+
+static int init_subsystem(void)
+{
+	return register_btf_kfunc_id_set(BPF_PROG_TYPE_TRACING, &bpf_sock_destroy_kfunc_set);
+}
+late_initcall(init_subsystem);
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 288693981b00..2259b4facc2f 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -4679,8 +4679,10 @@  int tcp_abort(struct sock *sk, int err)
 		return 0;
 	}
 
-	/* Don't race with userspace socket closes such as tcp_close. */
-	lock_sock(sk);
+	/* BPF context ensures sock locking. */
+	if (!has_current_bpf_ctx())
+		/* Don't race with userspace socket closes such as tcp_close. */
+		lock_sock(sk);
 
 	if (sk->sk_state == TCP_LISTEN) {
 		tcp_set_state(sk, TCP_CLOSE);
@@ -4702,9 +4704,11 @@  int tcp_abort(struct sock *sk, int err)
 	}
 
 	bh_unlock_sock(sk);
+
 	local_bh_enable();
 	tcp_write_queue_purge(sk);
-	release_sock(sk);
+	if (!has_current_bpf_ctx())
+		release_sock(sk);
 	return 0;
 }
 EXPORT_SYMBOL_GPL(tcp_abort);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 150551acab9d..5f48cdf82a45 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2925,7 +2925,8 @@  EXPORT_SYMBOL(udp_poll);
 
 int udp_abort(struct sock *sk, int err)
 {
-	lock_sock(sk);
+	if (!has_current_bpf_ctx())
+		lock_sock(sk);
 
 	/* udp{v6}_destroy_sock() sets it under the sk lock, avoid racing
 	 * with close()
@@ -2938,7 +2939,8 @@  int udp_abort(struct sock *sk, int err)
 	__udp_disconnect(sk, 0);
 
 out:
-	release_sock(sk);
+	if (!has_current_bpf_ctx())
+		release_sock(sk);
 
 	return 0;
 }

[v7,bpf-next,06/10] bpf: Add bpf_sock_destroy kfunc

Checks

Commit Message

Comments

Patch