Message ID | 880d627b79b24c0f92a47203193ed11f48c3031e.1617113947.git.pabeni@redhat.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net] net: let skb_orphan_partial wake-up waiters. | expand |
Context | Check | Description |
---|---|---|
netdev/cover_letter | success | Link |
netdev/fixes_present | success | Link |
netdev/patch_count | success | Link |
netdev/tree_selection | success | Clearly marked for net |
netdev/subject_prefix | success | Link |
netdev/cc_maintainers | fail | 1 blamed authors not CCed: davem@davemloft.net; 5 maintainers not CCed: linmiaohe@huawei.com xiangxia.m.yue@gmail.com ast@kernel.org davem@davemloft.net kuba@kernel.org |
netdev/source_inline | success | Was 0 now: 0 |
netdev/verify_signedoff | success | Link |
netdev/module_param | success | Was 0 now: 0 |
netdev/build_32bit | success | Errors and warnings before: 5 this patch: 5 |
netdev/kdoc | success | Errors and warnings before: 0 this patch: 0 |
netdev/verify_fixes | success | Link |
netdev/checkpatch | success | total: 0 errors, 0 warnings, 0 checks, 8 lines checked |
netdev/build_allmodconfig_warn | success | Errors and warnings before: 5 this patch: 5 |
netdev/header_inline | success | Link |
On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > Currently the mentioned helper can end-up freeing the socket wmem > without waking-up any processes waiting for more write memory. > > If the partially orphaned skb is attached to an UDP (or raw) socket, > the lack of wake-up can hang the user-space. > > Address the issue invoking the write_space callback after > releasing the memory, if the old skb destructor requires that. > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > --- > net/core/sock.c | 2 ++ > 1 file changed, 2 insertions(+) > > diff --git a/net/core/sock.c b/net/core/sock.c > index 0ed98f20448a2..7a38332d748e7 100644 > --- a/net/core/sock.c > +++ b/net/core/sock.c > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb) > > if (refcount_inc_not_zero(&sk->sk_refcnt)) { > WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)); > + if (skb->destructor == sock_wfree) > + sk->sk_write_space(sk); Interesting. Why TCP is not a problem here ? I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)) by : skb_orphan(skb); This will get rid of this suspect WARN_ON() at the same time ? > skb->destructor = sock_efree; > } > } else { > -- > 2.26.2 >
On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > Currently the mentioned helper can end-up freeing the socket wmem > > without waking-up any processes waiting for more write memory. > > > > If the partially orphaned skb is attached to an UDP (or raw) socket, > > the lack of wake-up can hang the user-space. > > > > Address the issue invoking the write_space callback after > > releasing the memory, if the old skb destructor requires that. > > > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > --- > > net/core/sock.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > index 0ed98f20448a2..7a38332d748e7 100644 > > --- a/net/core/sock.c > > +++ b/net/core/sock.c > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb) > > > > if (refcount_inc_not_zero(&sk->sk_refcnt)) { > > WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)); > > + if (skb->destructor == sock_wfree) > > + sk->sk_write_space(sk); > > Interesting. > > Why TCP is not a problem here ? > > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize, > &sk->sk_wmem_alloc)) by : > skb_orphan(skb); And of course re-add skb->sk = sk; > > This will get rid of this suspect WARN_ON() at the same time ? > > > skb->destructor = sock_efree; > > } > > } else { > > -- > > 2.26.2 > >
On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote: > On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > Currently the mentioned helper can end-up freeing the socket wmem > > > without waking-up any processes waiting for more write memory. > > > > > > If the partially orphaned skb is attached to an UDP (or raw) socket, > > > the lack of wake-up can hang the user-space. > > > > > > Address the issue invoking the write_space callback after > > > releasing the memory, if the old skb destructor requires that. > > > > > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > --- > > > net/core/sock.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > index 0ed98f20448a2..7a38332d748e7 100644 > > > --- a/net/core/sock.c > > > +++ b/net/core/sock.c > > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb) > > > > > > if (refcount_inc_not_zero(&sk->sk_refcnt)) { > > > WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)); > > > + if (skb->destructor == sock_wfree) > > > + sk->sk_write_space(sk); > > > > Interesting. > > > > Why TCP is not a problem here ? AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes waiting for wmem are woken by ack processing. > > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize, > > &sk->sk_wmem_alloc)) by : > > skb_orphan(skb); > > And of course re-add > skb->sk = sk; Double checking to be sure. The patched slice of skb_orphan_partial() will then look like: if (can_skb_orphan_partial(skb)) { struct sock *sk = skb->sk; if (refcount_inc_not_zero(&sk->sk_refcnt)) { skb_orphan(skb); skb->sk = sk; skb->destructor = sock_efree; } } // ... Am I correct? Thanks! Paolo
On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote: > > On Tue, Mar 30, 2021 at 4:39 PM Eric Dumazet <edumazet@google.com> wrote: > > > On Tue, Mar 30, 2021 at 4:25 PM Paolo Abeni <pabeni@redhat.com> wrote: > > > > Currently the mentioned helper can end-up freeing the socket wmem > > > > without waking-up any processes waiting for more write memory. > > > > > > > > If the partially orphaned skb is attached to an UDP (or raw) socket, > > > > the lack of wake-up can hang the user-space. > > > > > > > > Address the issue invoking the write_space callback after > > > > releasing the memory, if the old skb destructor requires that. > > > > > > > > Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com> > > > > --- > > > > net/core/sock.c | 2 ++ > > > > 1 file changed, 2 insertions(+) > > > > > > > > diff --git a/net/core/sock.c b/net/core/sock.c > > > > index 0ed98f20448a2..7a38332d748e7 100644 > > > > --- a/net/core/sock.c > > > > +++ b/net/core/sock.c > > > > @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb) > > > > > > > > if (refcount_inc_not_zero(&sk->sk_refcnt)) { > > > > WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)); > > > > + if (skb->destructor == sock_wfree) > > > > + sk->sk_write_space(sk); > > > > > > Interesting. > > > > > > Why TCP is not a problem here ? > > AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes > waiting for wmem are woken by ack processing. > > > > I would rather replace WARN_ON(refcount_sub_and_test(skb->truesize, > > > &sk->sk_wmem_alloc)) by : > > > skb_orphan(skb); > > > > And of course re-add > > skb->sk = sk; > > Double checking to be sure. The patched slice of skb_orphan_partial() > will then look like: > > if (can_skb_orphan_partial(skb)) { > struct sock *sk = skb->sk; > > if (refcount_inc_not_zero(&sk->sk_refcnt)) { > skb_orphan(skb); > skb->sk = sk; > skb->destructor = sock_efree; > } > } // ... > > Am I correct? > Yes. We also could add a helper for the whole construct, since many other paths do almost the same. (They might use sock_hold(), but it seems safe to use the refcount_inc_not_zero()) Or they omit the skb_orphan() (see can_skb_set_owner), which seems also risky. static inline void skb_set_owner_sk_safe(sk, skb) { if (sk && refcount_inc_not_zero(&sk->sk_refcnt)) { skb_orphan(skb); skb->destructor = sock_efree; skb->sk = sk; } } > Thanks! > > Paolo >
On Tue, Mar 30, 2021 at 5:18 PM Paolo Abeni <pabeni@redhat.com> wrote: > > On Tue, 2021-03-30 at 16:40 +0200, Eric Dumazet wrote: > > > > > > Why TCP is not a problem here ? > > AFAICS, tcp_wfree() does not call sk->sk_write_space(). Processes > waiting for wmem are woken by ack processing. My concern was TCP Small Queue. If we do not call tcp_wfree() we might miss an opportunity to queue more packets (and thus fallback to RTO or incoming ACK is we are lucky)
diff --git a/net/core/sock.c b/net/core/sock.c index 0ed98f20448a2..7a38332d748e7 100644 --- a/net/core/sock.c +++ b/net/core/sock.c @@ -2137,6 +2137,8 @@ void skb_orphan_partial(struct sk_buff *skb) if (refcount_inc_not_zero(&sk->sk_refcnt)) { WARN_ON(refcount_sub_and_test(skb->truesize, &sk->sk_wmem_alloc)); + if (skb->destructor == sock_wfree) + sk->sk_write_space(sk); skb->destructor = sock_efree; } } else {
Currently the mentioned helper can end-up freeing the socket wmem without waking-up any processes waiting for more write memory. If the partially orphaned skb is attached to an UDP (or raw) socket, the lack of wake-up can hang the user-space. Address the issue invoking the write_space callback after releasing the memory, if the old skb destructor requires that. Fixes: f6ba8d33cfbb ("netem: fix skb_orphan_partial()") Signed-off-by: Paolo Abeni <pabeni@redhat.com> --- net/core/sock.c | 2 ++ 1 file changed, 2 insertions(+)