diff mbox series

[bpf,1/2] bpf, sockmap: fix sk->prot unhash op reset

Message ID 161661956953.28508.2297266338306692603.stgit@john-Precision-5820-Tower (mailing list archive)
State Superseded
Delegated to: BPF
Headers show
Series bpf, sockmap fixes | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for bpf
netdev/subject_prefix success Link
netdev/cc_maintainers fail 2 blamed authors not CCed: jakub@cloudflare.com songliubraving@fb.com; 9 maintainers not CCed: yhs@fb.com kpsingh@kernel.org kafai@fb.com ast@kernel.org songliubraving@fb.com borisp@nvidia.com davem@davemloft.net jakub@cloudflare.com kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 114 this patch: 114
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 114 this patch: 114
netdev/header_inline success Link

Commit Message

John Fastabend March 24, 2021, 8:59 p.m. UTC
In '4da6a196f93b1' we fixed a potential unhash loop caused when
a TLS socket in a sockmap was removed from the sockmap. This
happened because the unhash operation on the TLS ctx continued
to point at the sockmap implementation of unhash even though the
psock has already been removed. The sockmap unhash handler when a
psock is removed does the following,

 void sock_map_unhash(struct sock *sk)
 {
	void (*saved_unhash)(struct sock *sk);
	struct sk_psock *psock;

	rcu_read_lock();
	psock = sk_psock(sk);
	if (unlikely(!psock)) {
		rcu_read_unlock();
		if (sk->sk_prot->unhash)
			sk->sk_prot->unhash(sk);
		return;
	}
        [...]
 }

The unlikely() case is there to handle the case where psock is detached
but the proto ops have not been updated yet. But, in the above case
with TLS and removed psock we never fixed sk_prot->unhash() and unhash()
points back to sock_map_unhash resulting in a loop. To fix this we added
this bit of code,

 static inline void sk_psock_restore_proto(struct sock *sk,
                                          struct sk_psock *psock)
 {
       sk->sk_prot->unhash = psock->saved_unhash;

This will set the sk_prot->unhash back to its saved value. This is the
correct callback for a TLS socket that has been removed from the sock_map.
Unfortunately, this also overwrites the unhash pointer for all psocks.
We effectively break sockmap unhash handling for any future socks.
Omitting the unhash operation will leave stale entries in the map if
a socket transition through unhash, but does not do close() op.

To fix handle similar to write_space and rewrite it in the TLS update
hook. This way the TLS enabled socket will point to the saved unhash()
handler.

Fixes: 4da6a196f93b1 ("bpf: Sockmap/tls, during free we may call tcp_bpf_unhash() in loop")
Reported-by: Cong Wang <xiyou.wangcong@gmail.com>
Reported-by: Lorenz Bauer <lmb@cloudflare.com>
Signed-off-by: John Fastabend <john.fastabend@gmail.com>
---
 include/linux/skmsg.h |    1 -
 net/tls/tls_main.c    |    6 ++++++
 2 files changed, 6 insertions(+), 1 deletion(-)

Comments

Cong Wang March 25, 2021, 12:11 a.m. UTC | #1
On Wed, Mar 24, 2021 at 1:59 PM John Fastabend <john.fastabend@gmail.com> wrote:
> diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> index 47b7c5334c34..ecb5634b4c4a 100644
> --- a/net/tls/tls_main.c
> +++ b/net/tls/tls_main.c
> @@ -754,6 +754,12 @@ static void tls_update(struct sock *sk, struct proto *p,
>
>         ctx = tls_get_ctx(sk);
>         if (likely(ctx)) {
> +               /* TLS does not have an unhash proto in SW cases, but we need
> +                * to ensure we stop using the sock_map unhash routine because
> +                * the associated psock is being removed. So use the original
> +                * unhash handler.
> +                */
> +               WRITE_ONCE(sk->sk_prot->unhash, p->unhash);
>                 ctx->sk_write_space = write_space;
>                 ctx->sk_proto = p;

It looks awkward to update sk->sk_proto inside tls_update(),
at least when ctx!=NULL.

What is wrong with updating it in sk_psock_restore_proto()
when inet_csk_has_ulp() is true? It looks better to me.

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 6c09d94be2e9..da5dc3ef0ee3 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -360,8 +360,8 @@ static inline void sk_psock_update_proto(struct sock *sk,
 static inline void sk_psock_restore_proto(struct sock *sk,
                                          struct sk_psock *psock)
 {
-       sk->sk_prot->unhash = psock->saved_unhash;
        if (inet_csk_has_ulp(sk)) {
+               sk->sk_prot->unhash = psock->sk_proto->unhash;
                tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
        } else {
                sk->sk_write_space = psock->saved_write_space;


sk_psock_restore_proto() is the only caller of tcp_update_ulp()
so should be equivalent.

Thanks.
John Fastabend March 25, 2021, 2:28 a.m. UTC | #2
Cong Wang wrote:
> On Wed, Mar 24, 2021 at 1:59 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > index 47b7c5334c34..ecb5634b4c4a 100644
> > --- a/net/tls/tls_main.c
> > +++ b/net/tls/tls_main.c
> > @@ -754,6 +754,12 @@ static void tls_update(struct sock *sk, struct proto *p,
> >
> >         ctx = tls_get_ctx(sk);
> >         if (likely(ctx)) {
> > +               /* TLS does not have an unhash proto in SW cases, but we need
> > +                * to ensure we stop using the sock_map unhash routine because
> > +                * the associated psock is being removed. So use the original
> > +                * unhash handler.
> > +                */
> > +               WRITE_ONCE(sk->sk_prot->unhash, p->unhash);
> >                 ctx->sk_write_space = write_space;
> >                 ctx->sk_proto = p;
> 
> It looks awkward to update sk->sk_proto inside tls_update(),
> at least when ctx!=NULL.

hmm. It doesn't strike me as paticularly awkward but OK.

> 
> What is wrong with updating it in sk_psock_restore_proto()
> when inet_csk_has_ulp() is true? It looks better to me.

It could be wrong if inet_csk_has_ulp has an unhash callback
already assigned. But, because we know inet_csk_has_ulp()
really means is_tls_attached() it would be fine.

> 
> diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
> index 6c09d94be2e9..da5dc3ef0ee3 100644
> --- a/include/linux/skmsg.h
> +++ b/include/linux/skmsg.h
> @@ -360,8 +360,8 @@ static inline void sk_psock_update_proto(struct sock *sk,
>  static inline void sk_psock_restore_proto(struct sock *sk,
>                                           struct sk_psock *psock)
>  {
> -       sk->sk_prot->unhash = psock->saved_unhash;
>         if (inet_csk_has_ulp(sk)) {
> +               sk->sk_prot->unhash = psock->sk_proto->unhash;
>                 tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
>         } else {
>                 sk->sk_write_space = psock->saved_write_space;
> 
> 
> sk_psock_restore_proto() is the only caller of tcp_update_ulp()
> so should be equivalent.

Agree it is equivalent. I don't mind moving the assignment around
if folks think its nicer.

> 
> Thanks.
Cong Wang March 25, 2021, 6:57 p.m. UTC | #3
On Wed, Mar 24, 2021 at 7:28 PM John Fastabend <john.fastabend@gmail.com> wrote:
>
> Cong Wang wrote:
> > On Wed, Mar 24, 2021 at 1:59 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > > index 47b7c5334c34..ecb5634b4c4a 100644
> > > --- a/net/tls/tls_main.c
> > > +++ b/net/tls/tls_main.c
> > > @@ -754,6 +754,12 @@ static void tls_update(struct sock *sk, struct proto *p,
> > >
> > >         ctx = tls_get_ctx(sk);
> > >         if (likely(ctx)) {
> > > +               /* TLS does not have an unhash proto in SW cases, but we need
> > > +                * to ensure we stop using the sock_map unhash routine because
> > > +                * the associated psock is being removed. So use the original
> > > +                * unhash handler.
> > > +                */
> > > +               WRITE_ONCE(sk->sk_prot->unhash, p->unhash);
> > >                 ctx->sk_write_space = write_space;
> > >                 ctx->sk_proto = p;
> >
> > It looks awkward to update sk->sk_proto inside tls_update(),
> > at least when ctx!=NULL.
>
> hmm. It doesn't strike me as paticularly awkward but OK.

I read tls_update() as "updating ctx when it is initialized", with your
patch, we are updating sk->sk_prot->unhash too when updating ctx,
pretty much like a piggyback, hence it reads odd to me.

Thanks.
John Fastabend March 26, 2021, 12:57 a.m. UTC | #4
Cong Wang wrote:
> On Wed, Mar 24, 2021 at 7:28 PM John Fastabend <john.fastabend@gmail.com> wrote:
> >
> > Cong Wang wrote:
> > > On Wed, Mar 24, 2021 at 1:59 PM John Fastabend <john.fastabend@gmail.com> wrote:
> > > > diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
> > > > index 47b7c5334c34..ecb5634b4c4a 100644
> > > > --- a/net/tls/tls_main.c
> > > > +++ b/net/tls/tls_main.c
> > > > @@ -754,6 +754,12 @@ static void tls_update(struct sock *sk, struct proto *p,
> > > >
> > > >         ctx = tls_get_ctx(sk);
> > > >         if (likely(ctx)) {
> > > > +               /* TLS does not have an unhash proto in SW cases, but we need
> > > > +                * to ensure we stop using the sock_map unhash routine because
> > > > +                * the associated psock is being removed. So use the original
> > > > +                * unhash handler.
> > > > +                */
> > > > +               WRITE_ONCE(sk->sk_prot->unhash, p->unhash);
> > > >                 ctx->sk_write_space = write_space;
> > > >                 ctx->sk_proto = p;
> > >
> > > It looks awkward to update sk->sk_proto inside tls_update(),
> > > at least when ctx!=NULL.
> >
> > hmm. It doesn't strike me as paticularly awkward but OK.
> 
> I read tls_update() as "updating ctx when it is initialized", with your
> patch, we are updating sk->sk_prot->unhash too when updating ctx,
> pretty much like a piggyback, hence it reads odd to me.
> 
> Thanks.


OK convinced.
diff mbox series

Patch

diff --git a/include/linux/skmsg.h b/include/linux/skmsg.h
index 8edbbf5f2f93..f6009fe9c9ac 100644
--- a/include/linux/skmsg.h
+++ b/include/linux/skmsg.h
@@ -349,7 +349,6 @@  static inline void sk_psock_update_proto(struct sock *sk,
 static inline void sk_psock_restore_proto(struct sock *sk,
 					  struct sk_psock *psock)
 {
-	sk->sk_prot->unhash = psock->saved_unhash;
 	if (inet_csk_has_ulp(sk)) {
 		tcp_update_ulp(sk, psock->sk_proto, psock->saved_write_space);
 	} else {
diff --git a/net/tls/tls_main.c b/net/tls/tls_main.c
index 47b7c5334c34..ecb5634b4c4a 100644
--- a/net/tls/tls_main.c
+++ b/net/tls/tls_main.c
@@ -754,6 +754,12 @@  static void tls_update(struct sock *sk, struct proto *p,
 
 	ctx = tls_get_ctx(sk);
 	if (likely(ctx)) {
+		/* TLS does not have an unhash proto in SW cases, but we need
+		 * to ensure we stop using the sock_map unhash routine because
+		 * the associated psock is being removed. So use the original
+		 * unhash handler.
+		 */
+		WRITE_ONCE(sk->sk_prot->unhash, p->unhash);
 		ctx->sk_write_space = write_space;
 		ctx->sk_proto = p;
 	} else {