Message ID | 166064248071.3502205.10036394558814861778.stgit@warthog.procyon.org.uk (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: Fix suspicious RCU usage in bpf_sk_reuseport_detach() | expand |
On Tue, 16 Aug 2022 at 17:34, David Howells <dhowells@redhat.com> wrote: > > Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() > that checks to see if sk->sk_callback_lock() is held and use that here > instead. Hi, I wonder if we make this more geniric, for I think maybe the future code who use __rcu_dereference_sk_user_data_with_flags() may also meet this bug. To be more specific, maybe we can refactor __rcu_dereference_sk_user_data_with_flags() to __rcu_dereference_sk_user_data_with_flags_check(), like rcu_dereference() and rcu_dereference_check(). Maybe: diff --git a/include/net/sock.h b/include/net/sock.h index 05a1bbdf5805..cf123954eab9 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -578,18 +578,27 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk) #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) /** - * __rcu_dereference_sk_user_data_with_flags - return the pointer - * only if argument flags all has been set in sk_user_data. Otherwise - * return NULL + * __rcu_dereference_sk_user_data_with_flags_check - return the pointer + * only if argument flags all has been set in sk_user_data, with debug + * checking. Otherwise return NULL * - * @sk: socket - * @flags: flag bits + * Do __rcu_dereference_sk_user_data_with_flags(), but check that the + * conditions under which the rcu dereference will take place are correct, + * which is a bit like rcu_dereference_check() and rcu_derefence(). + * + * @sk : socket + * @flags : flag bits + * @condition : the conditions under which the rcu dereference will + * take place */ static inline void * -__rcu_dereference_sk_user_data_with_flags(const struct sock *sk, - uintptr_t flags) +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk, + uintptr_t flags, bool condition) { - uintptr_t sk_user_data = (uintptr_t)rcu_dereference(__sk_user_data(sk)); + uintptr_t sk_user_data; + + sk_user_data = (uintptr_t)rcu_dereference_check(__sk_user_data(sk), + condition); WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); @@ -598,6 +607,8 @@ __rcu_dereference_sk_user_data_with_flags(const struct sock *sk, return NULL; } +#define __rcu_dereference_sk_user_data_with_flags(sk, flags) \ + __rcu_dereference_sk_user_data_with_flags_check(sk, flags, 0) #define rcu_dereference_sk_user_data(sk) \ __rcu_dereference_sk_user_data_with_flags(sk, 0) #define __rcu_assign_sk_user_data_with_flags(sk, ptr, flags) \ > +/** > + * __locked_read_sk_user_data_with_flags - return the pointer > + * only if argument flags all has been set in sk_user_data. Otherwise > + * return NULL > + * > + (uintptr_t)rcu_dereference_check(__sk_user_data(sk), > + lockdep_is_held(&sk->sk_callback_lock)); > diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c > index 85fa9dbfa8bf..82c61612f382 100644 > --- a/kernel/bpf/reuseport_array.c > +++ b/kernel/bpf/reuseport_array.c > @@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk) > struct sock __rcu **socks; > > write_lock_bh(&sk->sk_callback_lock); > - socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF); > + socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF); > if (socks) { > WRITE_ONCE(sk->sk_user_data, NULL); > /* Then, as you point out, we can pass condition(lockdep_is_held(&sk->sk_callback_lock)) to __rcu_dereference_sk_user_data_with_flags_check() in order to make compiler happy as below: diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index 85fa9dbfa8bf..a772610987c5 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -24,7 +24,10 @@ void bpf_sk_reuseport_detach(struct sock *sk) struct sock __rcu **socks; write_lock_bh(&sk->sk_callback_lock); - socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF); + socks = __rcu_dereference_sk_user_data_with_flags_check( + sk, SK_USER_DATA_BPF, + lockdep_is_held(&sk->sk_callback_lock)); + if (socks) { WRITE_ONCE(sk->sk_user_data, NULL); /*
Hawkins Jiawei <yin31149@gmail.com> wrote: > if (socks) { > WRITE_ONCE(sk->sk_user_data, NULL); Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not WRITE_ONCE()? David
On Tue, 16 Aug 2022 18:34:52 +0800 Hawkins Jiawei wrote:
> +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk,
This name is insanely long now.
Jakub Kicinski <kuba@kernel.org> wrote: > > +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk, > > This name is insanely long now. I know. 47 chars. Do you have something you'd prefer? Maybe get_sk_user_data_checked()? It's a shame C doesn't allow default arguments. David
Hawkins Jiawei <yin31149@gmail.com> wrote: > +__rcu_dereference_sk_user_data_with_flags_check(const struct sock *sk, > + uintptr_t flags, bool condition) That doesn't work. RCU_LOCKDEP_WARN() relies on anything passing on a condition down to it to be a macro so that it can vanish the 'condition' argument without causing an undefined symbol for 'lockdep_is_held' if lockdep is disabled: x86_64-linux-gnu-ld: kernel/bpf/reuseport_array.o: in function `bpf_sk_reuseport_detach': /data/fs/linux-fs/build3/../kernel/bpf/reuseport_array.c:28: undefined reference to `lockdep_is_held' So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro, or we need to go with something like the first version of my patch where I don't pass the condition through. Do you have a preference? David
On Tue, Aug 16, 2022 at 02:09:46PM +0100, David Howells wrote: > Hawkins Jiawei <yin31149@gmail.com> wrote: > > > if (socks) { > > WRITE_ONCE(sk->sk_user_data, NULL); > > Btw, shouldn't this be rcu_assign_pointer() or RCU_INIT_POINTER(), not > WRITE_ONCE()? It is not necessary. The sk_user_data usage in reuseport_array is protected by the sk_callback_lock alone. The code before the commit cf8c1e967224 is fine. If the __rcu_dereference_sk_user_data_with_flags() could be reused here as is, an extra rcu_dereference is fine, so I did not mention it. It seems it is not the case and new function naming is getting long, so how about reverting the commit cf8c1e967224 and keep it as it was.
On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote: > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro, > or we need to go with something like the first version of my patch where I > don't pass the condition through. Do you have a preference? I like your version because it documents what the lock protecting this field is. In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would that work for reuseport? Jakub S is fixing l2tp to hold the socket lock while setting this field, yet most places take the callback lock... One the naming - maybe just drop the _with_flags() ? There's no version of locked helper which does not take the flags. And not underscores?
On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote: > On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote: > > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro, > > or we need to go with something like the first version of my patch where I > > don't pass the condition through. Do you have a preference? > > I like your version because it documents what the lock protecting this > field is. > > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock > while setting this field, yet most places take the callback lock... It needs to take a closer look at where the lock_sock() has already been acquired and also need to consider the lock ordering with reuseport_lock. It probably should work but may need a separate patch to discuss those considerations ? > > One the naming - maybe just drop the _with_flags() ? There's no version > of locked helper which does not take the flags. And not underscores? I am also good with a shorter name. Could a comment be added to bpf_sk_reuseport_detach() mentioning sk_user_data access is protected by the sk_callback_lock alone (or the lock sock in the future) while reusing __locked_read_sk_user_data() with a rcu_dereference(). It will be easier to understand if there is actually any rcu reader in the future code reading.
On Tue, 16 Aug 2022 17:43:19 -0700 Martin KaFai Lau wrote: > > I like your version because it documents what the lock protecting this > > field is. > > > > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would > > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock > > while setting this field, yet most places take the callback lock... > > It needs to take a closer look at where the lock_sock() has already > been acquired and also need to consider the lock ordering with reuseport_lock. > It probably should work but may need a separate patch to discuss those > considerations ? Right, the users of the field with a bit allocated protect the writes with the callback lock, so we can hard code the check against the callback lock for now and revisit later if needed.
On Wed, 17 Aug 2022 at 07:44, Jakub Kicinski <kuba@kernel.org> wrote: > > On Tue, 16 Aug 2022 22:16:46 +0100 David Howells wrote: > > So either __rcu_dereference_sk_user_data_with_flags_check() has to be a macro, > > or we need to go with something like the first version of my patch where I > > don't pass the condition through. Do you have a preference? > > I like your version because it documents what the lock protecting this > field is. In my opinion, I still think we'd better refactor it to a more geniric function, to avoid adding new functions when meeting the same problem. However, if this is just a standalone problem, maybe David's version shoule be better. So maybe apply the David's version, and refactor it later when meeting the same problem next time if needed. On Wed, 17 Aug 2022 at 08:43, Martin KaFai Lau <kafai@fb.com> wrote: > > On Tue, Aug 16, 2022 at 04:44:35PM -0700, Jakub Kicinski wrote: > > > > One the naming - maybe just drop the _with_flags() ? There's no version > > of locked helper which does not take the flags. And not underscores? > I am also good with a shorter name. I also agree, the name is really long.
Jakub Kicinski <kuba@kernel.org> wrote: > I like your version because it documents what the lock protecting this > field is. > > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock > while setting this field, yet most places take the callback lock... So how do you want to proceed? My first version of the patch with sock_owned_by_user()? David
On Wed, 17 Aug 2022 21:55:49 +0100 David Howells wrote: > Jakub Kicinski <kuba@kernel.org> wrote: > > > I like your version because it documents what the lock protecting this > > field is. > > > > In fact should we also add && sock_owned_by_user(). Martin, WDYT? Would > > that work for reuseport? Jakub S is fixing l2tp to hold the socket lock > > while setting this field, yet most places take the callback lock... > > So how do you want to proceed? My first version of the patch with > sock_owned_by_user()? Sorry about the lack of clarity. I was sort of expecting the name to still be shortened, but what you have is probably good enough. Applying v1, then, thanks!
diff --git a/include/net/sock.h b/include/net/sock.h index 05a1bbdf5805..d08cfe190a78 100644 --- a/include/net/sock.h +++ b/include/net/sock.h @@ -577,6 +577,31 @@ static inline bool sk_user_data_is_nocopy(const struct sock *sk) #define __sk_user_data(sk) ((*((void __rcu **)&(sk)->sk_user_data))) +/** + * __locked_read_sk_user_data_with_flags - return the pointer + * only if argument flags all has been set in sk_user_data. Otherwise + * return NULL + * + * @sk: socket + * @flags: flag bits + * + * The caller must be holding sk->sk_callback_lock. + */ +static inline void * +__locked_read_sk_user_data_with_flags(const struct sock *sk, + uintptr_t flags) +{ + uintptr_t sk_user_data = + (uintptr_t)rcu_dereference_check(__sk_user_data(sk), + lockdep_is_held(&sk->sk_callback_lock)); + + WARN_ON_ONCE(flags & SK_USER_DATA_PTRMASK); + + if ((sk_user_data & flags) == flags) + return (void *)(sk_user_data & SK_USER_DATA_PTRMASK); + return NULL; +} + /** * __rcu_dereference_sk_user_data_with_flags - return the pointer * only if argument flags all has been set in sk_user_data. Otherwise diff --git a/kernel/bpf/reuseport_array.c b/kernel/bpf/reuseport_array.c index 85fa9dbfa8bf..82c61612f382 100644 --- a/kernel/bpf/reuseport_array.c +++ b/kernel/bpf/reuseport_array.c @@ -24,7 +24,7 @@ void bpf_sk_reuseport_detach(struct sock *sk) struct sock __rcu **socks; write_lock_bh(&sk->sk_callback_lock); - socks = __rcu_dereference_sk_user_data_with_flags(sk, SK_USER_DATA_BPF); + socks = __locked_read_sk_user_data_with_flags(sk, SK_USER_DATA_BPF); if (socks) { WRITE_ONCE(sk->sk_user_data, NULL); /*
bpf_sk_reuseport_detach() calls __rcu_dereference_sk_user_data_with_flags() to obtain the value of sk->sk_user_data, but that function is only usable if the RCU read lock is held, and neither that function nor any of its callers hold it. Fix this by adding a new helper, __locked_read_sk_user_data_with_flags() that checks to see if sk->sk_callback_lock() is held and use that here instead. Alternatively, making __rcu_dereference_sk_user_data_with_flags() use rcu_dereference_checked() might suffice. Without this, the following warning can be occasionally observed: ============================= WARNING: suspicious RCU usage 6.0.0-rc1-build2+ #563 Not tainted ----------------------------- include/net/sock.h:592 suspicious rcu_dereference_check() usage! other info that might help us debug this: rcu_scheduler_active = 2, debug_locks = 1 5 locks held by locktest/29873: #0: ffff88812734b550 (&sb->s_type->i_mutex_key#9){+.+.}-{3:3}, at: __sock_release+0x77/0x121 #1: ffff88812f5621b0 (sk_lock-AF_INET){+.+.}-{0:0}, at: tcp_close+0x1c/0x70 #2: ffff88810312f5c8 (&h->lhash2[i].lock){+.+.}-{2:2}, at: inet_unhash+0x76/0x1c0 #3: ffffffff83768bb8 (reuseport_lock){+...}-{2:2}, at: reuseport_detach_sock+0x18/0xdd #4: ffff88812f562438 (clock-AF_INET){++..}-{2:2}, at: bpf_sk_reuseport_detach+0x24/0xa4 stack backtrace: CPU: 1 PID: 29873 Comm: locktest Not tainted 6.0.0-rc1-build2+ #563 Hardware name: ASUS All Series/H97-PLUS, BIOS 2306 10/09/2014 Call Trace: <TASK> dump_stack_lvl+0x4c/0x5f bpf_sk_reuseport_detach+0x6d/0xa4 reuseport_detach_sock+0x75/0xdd inet_unhash+0xa5/0x1c0 tcp_set_state+0x169/0x20f ? lockdep_sock_is_held+0x3a/0x3a ? __lock_release.isra.0+0x13e/0x220 ? reacquire_held_locks+0x1bb/0x1bb ? hlock_class+0x31/0x96 ? mark_lock+0x9e/0x1af __tcp_close+0x50/0x4b6 tcp_close+0x28/0x70 inet_release+0x8e/0xa7 __sock_release+0x95/0x121 sock_close+0x14/0x17 __fput+0x20f/0x36a task_work_run+0xa3/0xcc exit_to_user_mode_prepare+0x9c/0x14d syscall_exit_to_user_mode+0x18/0x44 entry_SYSCALL_64_after_hwframe+0x63/0xcd Fixes: cf8c1e967224 ("net: refactor bpf_sk_reuseport_detach()") Signed-off-by: David Howells <dhowells@redhat.com> cc: Hawkins Jiawei <yin31149@gmail.com> cc: Jakub Kicinski <kuba@kernel.org> cc: netdev@vger.kernel.org --- include/net/sock.h | 25 +++++++++++++++++++++++++ kernel/bpf/reuseport_array.c | 2 +- 2 files changed, 26 insertions(+), 1 deletion(-)