diff mbox series

net: Fix suspicious RCU usage in bpf_sk_reuseport_detach()

Message ID 166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: Fix suspicious RCU usage in bpf_sk_reuseport_detach() | expand

Checks

Context Check Description
bpf/vmtest-bpf-VM_Test-4 success Logs for llvm-toolchain
bpf/vmtest-bpf-VM_Test-3 success Logs for Kernel LATEST on z15 with gcc
bpf/vmtest-bpf-PR success PR summary
bpf/vmtest-bpf-VM_Test-1 success Logs for Kernel LATEST on ubuntu-latest with gcc
bpf/vmtest-bpf-VM_Test-2 success Logs for Kernel LATEST on ubuntu-latest with llvm-16
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 238172 this patch: 238172
netdev/cc_maintainers fail 1 blamed authors not CCed: jakub@cloudflare.com; 16 maintainers not CCed: john.fastabend@gmail.com song@kernel.org sdf@google.com ast@kernel.org martin.lau@linux.dev davem@davemloft.net daniel@iogearbox.net edumazet@google.com andrii@kernel.org kpsingh@kernel.org jakub@cloudflare.com bpf@vger.kernel.org jolsa@kernel.org pabeni@redhat.com haoluo@google.com yhs@fb.com
netdev/build_clang success Errors and warnings before: 574 this patch: 574
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 252822 this patch: 252822
netdev/checkpatch warning WARNING: line length of 89 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Howells Aug. 16, 2022, 9:34 a.m. UTC
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags()
to obtain the value of sk->sk_user_data, but that function is only usable
if the RCU read lock is held, and neither that function nor any of its
callers hold it.

Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
that checks to see if sk->sk_callback_lock() is held and use that here
instead.

Alternatively, making __rcu_dereference_sk_user_data_with_flags() use
rcu_dereference_checked() might suffice.

Without this, the following warning can be occasionally observed:

=============================
WARNING: suspicious RCU usage
6.0.0-rc1-build2+ #563 Not tainted
-----------------------------
include/net/sock.h:592 suspicious rcu_dereference_check() usage!

other info that might help us debug this:

rcu_scheduler_active = 2, debug_locks = 1
5 locks held by locktest/29873:
 #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121
 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70
 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0
 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd
 #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4

stack backtrace:
CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563
Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014
Call Trace:
 <TASK>
 dump_stack_lvl+0x4c/0x5f
 bpf_sk_reuseport_detach+0x6d/0xa4
 reuseport_detach_sock+0x75/0xdd
 inet_unhash+0xa5/0x1c0
 tcp_set_state+0x169/0x20f
 ? lockdep_sock_is_held+0x3a/0x3a
 ? __lock_release.isra.0+0x13e/0x220
 ? reacquire_held_locks+0x1bb/0x1bb
 ? hlock_class+0x31/0x96
 ? mark_lock+0x9e/0x1af
 __tcp_close+0x50/0x4b6
 tcp_close+0x28/0x70
 inet_release+0x8e/0xa7
 __sock_release+0x95/0x121
 sock_close+0x14/0x17
 __fput+0x20f/0x36a
 task_work_run+0xa3/0xcc
 exit_to_user_mode_prepare+0x9c/0x14d
 syscall_exit_to_user_mode+0x18/0x44
 entry_SYSCALL_64_after_hwframe+0x63/0xcd

Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()")
Signed-off-by: David Howells <dhowells@redhat.com>
cc: Hawkins Jiawei <yin31149@gmail.com>
cc: Jakub Kicinski <kuba@kernel.org>
cc: netdev@vger.kernel.org
---

 include/net/sock.h           |   25 +++++++++++++++++++++++++
 kernel/bpf/reuseport_array.c |    2 +-
 2 files changed, 26 insertions(+), 1 deletion(-)

Comments

Hawkins Jiawei Aug. 16, 2022, 10:34 a.m. UTC | #1
On Tue, 16 Aug 2022 at 17:34, David Howells <dhowells@redhat.com> wrote:
>
> Fix this by adding a new helper, __locked_read_sk_user_data_with_flags()
> that checks to see if sk->sk_callback_lock() is held and use that here
> instead.
Hi, I wonder if we make this more geniric, for I think maybe the future
code who use __rcu_dereference_sk_user_data_with_flags() may
also meet this bug.

To be more specific, maybe we can refactor
__rcu_dereference_sk_user_data_with_flags() to
__rcu_dereference_sk_user_data_with_flags_check(), like
rcu_dereference() and rcu_dereference_check(). Maybe:

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a1bbdf5805..cf123954eab9 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -578,18 +578,27 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk)
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
 
 /**
- * __rcu_dereference_sk_user_data_with_flags - return the pointer
- * only if argument flags all has been set in sk_user_data. Otherwise
- * return NULL
+ * __rcu_dereference_sk_user_data_with_flags_check - return the pointer
+ * only if argument flags all has been set in sk_user_data, with debug
+ * checking. Otherwise return NULL
  *
- * @sk: socket
- * @flags: flag bits
+ * Do __rcu_dereference_sk_user_data_with_flags(), but check that the
+ * conditions under which the rcu dereference will take place are correct,
+ * which is a bit like rcu_dereference_check() and rcu_derefence().
+ *
+ * @sk		: socket
+ * @flags	: flag bits
+ * @condition	: the conditions under which the rcu dereference will
+ * take place
  */
 static inline void *
-__rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
-					  uintptr_t flags)
+__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
+						uintptr_t flags, bool condition)
 {
-	uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk));
+	uintptr_t sk_user_data;
+
+	sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
+							condition);
 
 	WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK);
 
@@ -598,6 +607,8 @@ __rcu_dereference_sk_user_data_with_flags(const struct sock *sk,
 	return NULL;
 }
 
+#define __rcu_dereference_sk_user_data_with_flags(sk, flags) \
+	__rcu_dereference_sk_user_data_with_flags_check(sk, flags, 0)
 #define rcu_dereference_sk_user_data(sk)				\
 	__rcu_dereference_sk_user_data_with_flags(sk, 0)
 #define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags)		\

> +/**
> + * __locked_read_sk_user_data_with_flags - return the pointer
> + * only if argument flags all has been set in sk_user_data. Otherwise
> + * return NULL
> + *
> +               (uintptr_t)rcu_dereference_check(__sk_user_data(sk),
> +                                                lockdep_is_held(&sk->sk_callback_lock));

> diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
> index 85fa9dbfa8bf..82c61612f382 100644
> --- a/kernel/bpf/reuseport_array.c
> +++ b/kernel/bpf/reuseport_array.c
> @@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk)
>         struct sock __rcu **socks;
> 
>         write_lock_bh(&sk->sk_callback_lock);
> -       socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
> +       socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
>         if (socks) {
>                 WRITE_ONCE(sk->sk_user_data, NULL);
>                 /*
Then, as you point out, we can pass
condition(lockdep_is_held(&sk->sk_callback_lock)) to
__rcu_dereference_sk_user_data_with_flags_check() in order to
make compiler happy as below:

diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 85fa9dbfa8bf..a772610987c5 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -24,7 +24,10 @@ void bpf_sk_reuseport_detach(struct sock *sk)
 	struct sock __rcu **socks;
 
 	write_lock_bh(&sk->sk_callback_lock);
-	socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
+	socks = __rcu_dereference_sk_user_data_with_flags_check(
+			sk, SK_USER_DATA_BPF,
+			lockdep_is_held(&sk->sk_callback_lock));
+
 	if (socks) {
 		WRITE_ONCE(sk->sk_user_data, NULL);
 		/*
David Howells Aug. 16, 2022, 1:09 p.m. UTC | #2
Hawkins Jiawei <yin31149@gmail.com> wrote:

>  	if (socks) {
>  		WRITE_ONCE(sk->sk_user_data, NULL);

Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not
WRITE_ONCE()?

David
Jakub Kicinski Aug. 16, 2022, 6:21 p.m. UTC | #3
On Tue, 16 Aug 2022 18:34:52 +0800 Hawkins Jiawei wrote:
> +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,

This name is insanely long now.
David Howells Aug. 16, 2022, 8:01 p.m. UTC | #4
Jakub Kicinski <kuba@kernel.org> wrote:

> > +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
> 
> This name is insanely long now.

I know.  47 chars.  Do you have something you'd prefer?  Maybe
get_sk_user_data_checked()?

It's a shame C doesn't allow default arguments.

David
David Howells Aug. 16, 2022, 9:16 p.m. UTC | #5
Hawkins Jiawei <yin31149@gmail.com> wrote:

> +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
> +						uintptr_t flags, bool condition)

That doesn't work.  RCU_LOCKDEP_WARN() relies on anything passing on a
condition down to it to be a macro so that it can vanish the 'condition'
argument without causing an undefined symbol for 'lockdep_is_held' if lockdep
is disabled:

x86_64-linux-gnu-ld: kernel/bpf/reuseport_array.o: in function `bpf_sk_reuseport_detach':
/data/fs/linux-fs/build3/../kernel/bpf/reuseport_array.c:28: undefined reference to `lockdep_is_held'

So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
or we need to go with something like the first version of my patch where I
don't pass the condition through.  Do you have a preference?

David
Martin KaFai Lau Aug. 16, 2022, 9:49 p.m. UTC | #6
On Tue, Aug 16, 2022 at 02:09:46PM +0100, David Howells wrote:
> Hawkins Jiawei <yin31149@gmail.com> wrote:
> 
> >  	if (socks) {
> >  		WRITE_ONCE(sk->sk_user_data, NULL);
> 
> Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not
> WRITE_ONCE()?
It is not necessary.  The sk_user_data usage in reuseport_array
is protected by the sk_callback_lock alone.  The code
before the commit cf8c1e967224 is fine.  If the
__rcu_dereference_sk_user_data_with_flags() could be reused here as is,
an extra rcu_dereference is fine, so I did not mention it.
It seems it is not the case and new function naming is getting long,
so how about reverting the commit cf8c1e967224 and keep it as it was.
Jakub Kicinski Aug. 16, 2022, 11:44 p.m. UTC | #7
On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> or we need to go with something like the first version of my patch where I
> don't pass the condition through.  Do you have a preference?

I like your version because it documents what the lock protecting this
field is. 

In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
while setting this field, yet most places take the callback lock...

One the naming - maybe just drop the _with_flags() ? There's no version
of locked helper which does not take the flags. And not underscores?
Martin KaFai Lau Aug. 17, 2022, 12:43 a.m. UTC | #8
On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote:
> On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> > or we need to go with something like the first version of my patch where I
> > don't pass the condition through.  Do you have a preference?
> 
> I like your version because it documents what the lock protecting this
> field is.
> 
> In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> while setting this field, yet most places take the callback lock...
It needs to take a closer look at where the lock_sock() has already
been acquired and also need to consider the lock ordering with reuseport_lock.
It probably should work but may need a separate patch to discuss those
considerations ?

> 
> One the naming - maybe just drop the _with_flags() ? There's no version
> of locked helper which does not take the flags. And not underscores?
I am also good with a shorter name.

Could a comment be added to bpf_sk_reuseport_detach() mentioning
sk_user_data access is protected by the sk_callback_lock alone (or the lock
sock in the future) while reusing __locked_read_sk_user_data() with
a rcu_dereference().  It will be easier to understand if there is
actually any rcu reader in the future code reading.
Jakub Kicinski Aug. 17, 2022, 1:39 a.m. UTC | #9
On Tue, 16 Aug 2022 17:43:19 -0700 Martin KaFai Lau wrote:
> > I like your version because it documents what the lock protecting this
> > field is.
> > 
> > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> > while setting this field, yet most places take the callback lock...  
>
> It needs to take a closer look at where the lock_sock() has already
> been acquired and also need to consider the lock ordering with reuseport_lock.
> It probably should work but may need a separate patch to discuss those
> considerations ?

Right, the users of the field with a bit allocated protect the writes
with the callback lock, so we can hard code the check against the
callback lock for now and revisit later if needed.
Hawkins Jiawei Aug. 17, 2022, 3 a.m. UTC | #10
On Wed, 17 Aug 2022 at 07:44, Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote:
> > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro,
> > or we need to go with something like the first version of my patch where I
> > don't pass the condition through.  Do you have a preference?
>
> I like your version because it documents what the lock protecting this
> field is.
In my opinion, I still think we'd better refactor it to a more
geniric function, to avoid adding new functions when meeting
the same problem. However, if this is just a standalone problem,
maybe David's version shoule be better.

So maybe apply the David's version, and refactor it later when
meeting the same problem next time if needed.

On Wed, 17 Aug 2022 at 08:43, Martin KaFai Lau <kafai@fb.com> wrote:
>
> On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote:
> >
> > One the naming - maybe just drop the _with_flags() ? There's no version
> > of locked helper which does not take the flags. And not underscores?
> I am also good with a shorter name.
I also agree, the name is really long.
David Howells Aug. 17, 2022, 8:55 p.m. UTC | #11
Jakub Kicinski <kuba@kernel.org> wrote:

> I like your version because it documents what the lock protecting this
> field is. 
> 
> In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> while setting this field, yet most places take the callback lock...

So how do you want to proceed?  My first version of the patch with
sock_owned_by_user()?

David
Jakub Kicinski Aug. 17, 2022, 11:42 p.m. UTC | #12
On Wed, 17 Aug 2022 21:55:49 +0100 David Howells wrote:
> Jakub Kicinski <kuba@kernel.org> wrote:
> 
> > I like your version because it documents what the lock protecting this
> > field is. 
> > 
> > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would
> > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock
> > while setting this field, yet most places take the callback lock...  
> 
> So how do you want to proceed?  My first version of the patch with
> sock_owned_by_user()?

Sorry about the lack of clarity. I was sort of expecting the name 
to still be shortened, but what you have is probably good enough.

Applying v1, then, thanks!
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 05a1bbdf5805..d08cfe190a78 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -577,6 +577,31 @@  static inline bool sk_user_data_is_nocopy(const struct sock *sk)
 
 #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data)))
 
+/**
+ * __locked_read_sk_user_data_with_flags - return the pointer
+ * only if argument flags all has been set in sk_user_data. Otherwise
+ * return NULL
+ *
+ * @sk: socket
+ * @flags: flag bits
+ *
+ * The caller must be holding sk->sk_callback_lock.
+ */
+static inline void *
+__locked_read_sk_user_data_with_flags(const struct sock *sk,
+				      uintptr_t flags)
+{
+	uintptr_t sk_user_data =
+		(uintptr_t)rcu_dereference_check(__sk_user_data(sk),
+						 lockdep_is_held(&sk->sk_callback_lock));
+
+	WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK);
+
+	if ((sk_user_data & flags) == flags)
+		return (void *)(sk_user_data & SK_USER_DATA_PTRMASK);
+	return NULL;
+}
+
 /**
  * __rcu_dereference_sk_user_data_with_flags - return the pointer
  * only if argument flags all has been set in sk_user_data. Otherwise
diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c
index 85fa9dbfa8bf..82c61612f382 100644
--- a/kernel/bpf/reuseport_array.c
+++ b/kernel/bpf/reuseport_array.c
@@ -24,7 +24,7 @@  void bpf_sk_reuseport_detach(struct sock *sk)
 	struct sock __rcu **socks;
 
 	write_lock_bh(&sk->sk_callback_lock);
-	socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
+	socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF);
 	if (socks) {
 		WRITE_ONCE(sk->sk_user_data, NULL);
 		/*