Message ID | 20240328144032.1864988-4-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 612b1c0dec5bc7367f90fc508448b8d0d7c05414 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | udp: small changes on receive path | expand |
Eric Dumazet wrote: > sock_def_readable() is quite expensive (particularly > when ep_poll_callback() is in the picture). > > We must call sk->sk_data_ready() when : > > - receive queue was empty, or > - SO_PEEK_OFF is enabled on the socket, or > - sk->sk_data_ready is not sock_def_readable. > > We still need to call sk_wake_async(). > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
On Thu, 2024-03-28 at 14:40 +0000, Eric Dumazet wrote: > sock_def_readable() is quite expensive (particularly > when ep_poll_callback() is in the picture). > > We must call sk->sk_data_ready() when : > > - receive queue was empty, or > - SO_PEEK_OFF is enabled on the socket, or > - sk->sk_data_ready is not sock_def_readable. > > We still need to call sk_wake_async(). > > Signed-off-by: Eric Dumazet <edumazet@google.com> > --- > net/ipv4/udp.c | 14 +++++++++++--- > 1 file changed, 11 insertions(+), 3 deletions(-) > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index d2fa9755727ce034c2b4bca82bd9e72130d588e6..5dfbe4499c0f89f94af9ee1fb64559dd672c1439 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -1492,6 +1492,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > struct sk_buff_head *list = &sk->sk_receive_queue; > int rmem, err = -ENOMEM; > spinlock_t *busy = NULL; > + bool becomes_readable; > int size, rcvbuf; > > /* Immediately drop when the receive queue is full. > @@ -1532,12 +1533,19 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > */ > sock_skb_set_dropcount(sk, skb); > > + becomes_readable = skb_queue_empty(list); > __skb_queue_tail(list, skb); > spin_unlock(&list->lock); > > - if (!sock_flag(sk, SOCK_DEAD)) > - INDIRECT_CALL_1(sk->sk_data_ready, sock_def_readable, sk); > - > + if (!sock_flag(sk, SOCK_DEAD)) { > + if (becomes_readable || > + sk->sk_data_ready != sock_def_readable || > + READ_ONCE(sk->sk_peek_off) >= 0) > + INDIRECT_CALL_1(sk->sk_data_ready, > + sock_def_readable, sk); > + else > + sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > + } I understood this change showed no performances benefit??? I guess the atomic_add_return() MB was hiding some/most of sock_def_readable() cost? Thanks! Paolo
On Fri, Mar 29, 2024 at 11:22 AM Paolo Abeni <pabeni@redhat.com> wrote: > > On Thu, 2024-03-28 at 14:40 +0000, Eric Dumazet wrote: > > sock_def_readable() is quite expensive (particularly > > when ep_poll_callback() is in the picture). > > > > We must call sk->sk_data_ready() when : > > > > - receive queue was empty, or > > - SO_PEEK_OFF is enabled on the socket, or > > - sk->sk_data_ready is not sock_def_readable. > > > > We still need to call sk_wake_async(). > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > --- > > net/ipv4/udp.c | 14 +++++++++++--- > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > index d2fa9755727ce034c2b4bca82bd9e72130d588e6..5dfbe4499c0f89f94af9ee1fb64559dd672c1439 100644 > > --- a/net/ipv4/udp.c > > +++ b/net/ipv4/udp.c > > @@ -1492,6 +1492,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > > struct sk_buff_head *list = &sk->sk_receive_queue; > > int rmem, err = -ENOMEM; > > spinlock_t *busy = NULL; > > + bool becomes_readable; > > int size, rcvbuf; > > > > /* Immediately drop when the receive queue is full. > > @@ -1532,12 +1533,19 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > > */ > > sock_skb_set_dropcount(sk, skb); > > > > + becomes_readable = skb_queue_empty(list); > > __skb_queue_tail(list, skb); > > spin_unlock(&list->lock); > > > > - if (!sock_flag(sk, SOCK_DEAD)) > > - INDIRECT_CALL_1(sk->sk_data_ready, sock_def_readable, sk); > > - > > + if (!sock_flag(sk, SOCK_DEAD)) { > > + if (becomes_readable || > > + sk->sk_data_ready != sock_def_readable || > > + READ_ONCE(sk->sk_peek_off) >= 0) > > + INDIRECT_CALL_1(sk->sk_data_ready, > > + sock_def_readable, sk); > > + else > > + sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > > + } > > I understood this change showed no performances benefit??? > > I guess the atomic_add_return() MB was hiding some/most of > sock_def_readable() cost? It did show benefits in the epoll case, because ep_poll_callback() is very expensive. I think you are referring to a prior discussion we had while still using netperf tests, which do not use epoll. Eliminating sock_def_readable() was avoiding the smp_mb() we have in wq_has_sleeper() and this was not a convincing win : The apparent cost of this smp_mb() was high in moderate traffic, but gradually became small if the cpu was fully utilized. The atomic_add_return() cost is orthogonal (I see it mostly on ARM64 platforms)
On Fri, 2024-03-29 at 11:52 +0100, Eric Dumazet wrote: > On Fri, Mar 29, 2024 at 11:22 AM Paolo Abeni <pabeni@redhat.com> wrote: > > > > On Thu, 2024-03-28 at 14:40 +0000, Eric Dumazet wrote: > > > sock_def_readable() is quite expensive (particularly > > > when ep_poll_callback() is in the picture). > > > > > > We must call sk->sk_data_ready() when : > > > > > > - receive queue was empty, or > > > - SO_PEEK_OFF is enabled on the socket, or > > > - sk->sk_data_ready is not sock_def_readable. > > > > > > We still need to call sk_wake_async(). > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > --- > > > net/ipv4/udp.c | 14 +++++++++++--- > > > 1 file changed, 11 insertions(+), 3 deletions(-) > > > > > > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > > > index d2fa9755727ce034c2b4bca82bd9e72130d588e6..5dfbe4499c0f89f94af9ee1fb64559dd672c1439 100644 > > > --- a/net/ipv4/udp.c > > > +++ b/net/ipv4/udp.c > > > @@ -1492,6 +1492,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > > > struct sk_buff_head *list = &sk->sk_receive_queue; > > > int rmem, err = -ENOMEM; > > > spinlock_t *busy = NULL; > > > + bool becomes_readable; > > > int size, rcvbuf; > > > > > > /* Immediately drop when the receive queue is full. > > > @@ -1532,12 +1533,19 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) > > > */ > > > sock_skb_set_dropcount(sk, skb); > > > > > > + becomes_readable = skb_queue_empty(list); > > > __skb_queue_tail(list, skb); > > > spin_unlock(&list->lock); > > > > > > - if (!sock_flag(sk, SOCK_DEAD)) > > > - INDIRECT_CALL_1(sk->sk_data_ready, sock_def_readable, sk); > > > - > > > + if (!sock_flag(sk, SOCK_DEAD)) { > > > + if (becomes_readable || > > > + sk->sk_data_ready != sock_def_readable || > > > + READ_ONCE(sk->sk_peek_off) >= 0) > > > + INDIRECT_CALL_1(sk->sk_data_ready, > > > + sock_def_readable, sk); > > > + else > > > + sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); > > > + } > > > > I understood this change showed no performances benefit??? > > > > I guess the atomic_add_return() MB was hiding some/most of > > sock_def_readable() cost? > > It did show benefits in the epoll case, because ep_poll_callback() is > very expensive. > > I think you are referring to a prior discussion we had while still > using netperf tests, which do not use epoll. Indeed. > Eliminating sock_def_readable() was avoiding the smp_mb() we have in > wq_has_sleeper() > and this was not a convincing win : The apparent cost of this smp_mb() > was high in moderate traffic, > but gradually became small if the cpu was fully utilized. > > The atomic_add_return() cost is orthogonal (I see it mostly on ARM64 platforms) Thanks for the additional details. FTR, I guessed that (part of) atomic_add_return() cost comes from the implied additional barrier (compared to plain adomic_add()) and the barrier in sock_def_readable() was relatively cheap in the presence of the previous one and become more visible after moving to adomic_add(). In any case LGTM, thanks! Acked-by: Paolo Abeni <pabeni@redhat.com>
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index d2fa9755727ce034c2b4bca82bd9e72130d588e6..5dfbe4499c0f89f94af9ee1fb64559dd672c1439 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -1492,6 +1492,7 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) struct sk_buff_head *list = &sk->sk_receive_queue; int rmem, err = -ENOMEM; spinlock_t *busy = NULL; + bool becomes_readable; int size, rcvbuf; /* Immediately drop when the receive queue is full. @@ -1532,12 +1533,19 @@ int __udp_enqueue_schedule_skb(struct sock *sk, struct sk_buff *skb) */ sock_skb_set_dropcount(sk, skb); + becomes_readable = skb_queue_empty(list); __skb_queue_tail(list, skb); spin_unlock(&list->lock); - if (!sock_flag(sk, SOCK_DEAD)) - INDIRECT_CALL_1(sk->sk_data_ready, sock_def_readable, sk); - + if (!sock_flag(sk, SOCK_DEAD)) { + if (becomes_readable || + sk->sk_data_ready != sock_def_readable || + READ_ONCE(sk->sk_peek_off) >= 0) + INDIRECT_CALL_1(sk->sk_data_ready, + sock_def_readable, sk); + else + sk_wake_async(sk, SOCK_WAKE_WAITD, POLL_IN); + } busylock_release(busy); return 0;
sock_def_readable() is quite expensive (particularly when ep_poll_callback() is in the picture). We must call sk->sk_data_ready() when : - receive queue was empty, or - SO_PEEK_OFF is enabled on the socket, or - sk->sk_data_ready is not sock_def_readable. We still need to call sk_wake_async(). Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/udp.c | 14 +++++++++++--- 1 file changed, 11 insertions(+), 3 deletions(-)