diff mbox series

[net] net: let skb_orphan_partial wake-up waiters.

Message ID 880d627b79b24c0f92a47203193ed11f48c3031e.1617113947.git.pabeni@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [net] net: let skb_orphan_partial wake-up waiters. | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/cc_maintainers fail 1 blamed authors not CCed: davem@davemloft.net; 5 maintainers not CCed: linmiaohe@huawei.com xiangxia.m.yue@gmail.com ast@kernel.org davem@davemloft.net kuba@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 5 this patch: 5
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 8 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 5 this patch: 5
netdev/header_inline success Link

Commit Message

Paolo Abeni March 30, 2021, 2:24 p.m. UTC
Currently the mentioned helper can end-up freeing the socket wmem
without waking-up any processes waiting for more write memory.

If the partially orphaned skb is attached to an UDP (or raw) socket,
the lack of wake-up can hang the user-space.

Address the issue invoking the write_space callback after
releasing the memory, if the old skb destructor requires that.

Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/core/sock.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Eric Dumazet March 30, 2021, 2:39 p.m. UTC | #1
On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> Currently the mentioned helper can end-up freeing the socket wmem
> without waking-up any processes waiting for more write memory.
>
> If the partially orphaned skb is attached to an UDP (or raw) socket,
> the lack of wake-up can hang the user-space.
>
> Address the issue invoking the write_space callback after
> releasing the memory, if the old skb destructor requires that.
>
> Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/core/sock.c | 2 ++
>  1 file changed, 2 insertions(+)
>
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 0ed98f20448a2..7a38332d748e7 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
>
>                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
>                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> +                       if (skb->destructor == sock_wfree)
> +                               sk->sk_write_space(sk);

Interesting.

Why TCP is not a problem here ?

I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
&sk->sk_wmem_alloc)) by :
                        skb_orphan(skb);

This will get rid of this suspect WARN_ON() at the same time ?

>                         skb->destructor = sock_efree;
>                 }
>         } else {
> --
> 2.26.2
>
Eric Dumazet March 30, 2021, 2:40 p.m. UTC | #2
On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> >
> > Currently the mentioned helper can end-up freeing the socket wmem
> > without waking-up any processes waiting for more write memory.
> >
> > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > the lack of wake-up can hang the user-space.
> >
> > Address the issue invoking the write_space callback after
> > releasing the memory, if the old skb destructor requires that.
> >
> > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/core/sock.c | 2 ++
> >  1 file changed, 2 insertions(+)
> >
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 0ed98f20448a2..7a38332d748e7 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> >
> >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > +                       if (skb->destructor == sock_wfree)
> > +                               sk->sk_write_space(sk);
>
> Interesting.
>
> Why TCP is not a problem here ?
>
> I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> &sk->sk_wmem_alloc)) by :
>                         skb_orphan(skb);

And of course re-add
                        skb->sk = sk;

>
> This will get rid of this suspect WARN_ON() at the same time ?
>
> >                         skb->destructor = sock_efree;
> >                 }
> >         } else {
> > --
> > 2.26.2
> >
Paolo Abeni March 30, 2021, 3:18 p.m. UTC | #3
On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:
> On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
> > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > Currently the mentioned helper can end-up freeing the socket wmem
> > > without waking-up any processes waiting for more write memory.
> > > 
> > > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > > the lack of wake-up can hang the user-space.
> > > 
> > > Address the issue invoking the write_space callback after
> > > releasing the memory, if the old skb destructor requires that.
> > > 
> > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  net/core/sock.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > index 0ed98f20448a2..7a38332d748e7 100644
> > > --- a/net/core/sock.c
> > > +++ b/net/core/sock.c
> > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> > > 
> > >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> > >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > > +                       if (skb->destructor == sock_wfree)
> > > +                               sk->sk_write_space(sk);
> > 
> > Interesting.
> > 
> > Why TCP is not a problem here ?

AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
waiting for wmem are woken by ack processing.

> > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> > &sk->sk_wmem_alloc)) by :
> >                         skb_orphan(skb);
> 
> And of course re-add
>                         skb->sk = sk;

Double checking to be sure. The patched slice of skb_orphan_partial()
will then look like:

	if (can_skb_orphan_partial(skb)) {
		struct sock *sk = skb->sk;
		
		if (refcount_inc_not_zero(&sk->sk_refcnt)) {
			skb_orphan(skb);
			skb->sk = sk;
			skb->destructor = sock_efree;
		}
	} // ...

Am I correct?

Thanks!

Paolo
Eric Dumazet March 30, 2021, 3:29 p.m. UTC | #4
On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:
> > On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote:
> > > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > Currently the mentioned helper can end-up freeing the socket wmem
> > > > without waking-up any processes waiting for more write memory.
> > > >
> > > > If the partially orphaned skb is attached to an UDP (or raw) socket,
> > > > the lack of wake-up can hang the user-space.
> > > >
> > > > Address the issue invoking the write_space callback after
> > > > releasing the memory, if the old skb destructor requires that.
> > > >
> > > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()")
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > ---
> > > >  net/core/sock.c | 2 ++
> > > >  1 file changed, 2 insertions(+)
> > > >
> > > > diff --git a/net/core/sock.c b/net/core/sock.c
> > > > index 0ed98f20448a2..7a38332d748e7 100644
> > > > --- a/net/core/sock.c
> > > > +++ b/net/core/sock.c
> > > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb)
> > > >
> > > >                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
> > > >                         WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
> > > > +                       if (skb->destructor == sock_wfree)
> > > > +                               sk->sk_write_space(sk);
> > >
> > > Interesting.
> > >
> > > Why TCP is not a problem here ?
>
> AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
> waiting for wmem are woken by ack processing.
>
> > > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize,
> > > &sk->sk_wmem_alloc)) by :
> > >                         skb_orphan(skb);
> >
> > And of course re-add
> >                         skb->sk = sk;
>
> Double checking to be sure. The patched slice of skb_orphan_partial()
> will then look like:
>
>         if (can_skb_orphan_partial(skb)) {
>                 struct sock *sk = skb->sk;
>
>                 if (refcount_inc_not_zero(&sk->sk_refcnt)) {
>                         skb_orphan(skb);
>                         skb->sk = sk;
>                         skb->destructor = sock_efree;
>                 }
>         } // ...
>
> Am I correct?
>

Yes.

We also could add a helper for the whole construct, since many other
paths do almost the same.
(They might use sock_hold(), but it seems safe to use the
refcount_inc_not_zero())
Or they omit the skb_orphan() (see can_skb_set_owner), which seems also risky.

static inline void skb_set_owner_sk_safe(sk, skb)
{
    if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) {
           skb_orphan(skb);
           skb->destructor = sock_efree;
          skb->sk = sk;
    }
}



> Thanks!
>
> Paolo
>
Eric Dumazet March 30, 2021, 3:32 p.m. UTC | #5
On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote:

> > >
> > > Why TCP is not a problem here ?
>
> AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes
> waiting for wmem are woken by ack processing.

My concern was TCP Small Queue. If we do not call tcp_wfree() we might miss
an opportunity to queue more packets (and thus fallback to RTO or
incoming ACK is we are lucky)
diff mbox series

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index 0ed98f20448a2..7a38332d748e7 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2137,6 +2137,8 @@  void skb_orphan_partial(struct sk_buff *skb)
 
 		if (refcount_inc_not_zero(&sk->sk_refcnt)) {
 			WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc));
+			if (skb->destructor == sock_wfree)
+				sk->sk_write_space(sk);
 			skb->destructor = sock_efree;
 		}
 	} else {