diff mbox series

[net-next] net: avoid unconditionally touching sk_tsflags on RX

Message ID dbd18c8a1171549f8249ac5a8b30b1b5ec88a425.1739294057.git.pabeni@redhat.com (mailing list archive)
State Accepted
Commit f0e70409b7eb0584d451f74db0c72af67b6170b3
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: avoid unconditionally touching sk_tsflags on RX | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 3881 this patch: 3881
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2618 this patch: 2618
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 31 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 13 this patch: 13
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2025-02-12--06-00 (tests: 889)

Commit Message

Paolo Abeni Feb. 11, 2025, 5:17 p.m. UTC
After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
the sk_tsflags field shares the same cacheline with sk_forward_alloc.

The UDP protocol does not acquire the sock lock in the RX path;
forward allocations are protected via the receive queue spinlock;
additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
touching sk_tsflags on each packet reception.

Due to the above, under high packet rate traffic, when the BH and the
user-space process run on different CPUs, UDP packet reception
experiences a cache miss while accessing sk_tsflags.

The receive path doesn't strictly need to access the problematic field;
change sock_set_timestamping() to maintain the relevant information
in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can
take decisions accessing the latter field only.

With this patch applied, on an AMD epic server with i40e NICs, I
measured a 10% performance improvement for small packets UDP flood
performance tests - possibly a larger delta could be observed with more
recent H/W.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/sock.h | 9 +++++----
 net/core/sock.c    | 1 +
 2 files changed, 6 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Feb. 11, 2025, 8:13 p.m. UTC | #1
On Tue, Feb 11, 2025 at 6:17 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
> the sk_tsflags field shares the same cacheline with sk_forward_alloc.
>
> The UDP protocol does not acquire the sock lock in the RX path;
> forward allocations are protected via the receive queue spinlock;
> additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
> touching sk_tsflags on each packet reception.
>
> Due to the above, under high packet rate traffic, when the BH and the
> user-space process run on different CPUs, UDP packet reception
> experiences a cache miss while accessing sk_tsflags.
>
> The receive path doesn't strictly need to access the problematic field;
> change sock_set_timestamping() to maintain the relevant information
> in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can
> take decisions accessing the latter field only.
>
> With this patch applied, on an AMD epic server with i40e NICs, I
> measured a 10% performance improvement for small packets UDP flood
> performance tests - possibly a larger delta could be observed with more
> recent H/W.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Thanks a lot Paolo

Reviewed-by: Eric Dumazet <edumazet@google.com>
Willem de Bruijn Feb. 12, 2025, 1:35 a.m. UTC | #2
Paolo Abeni wrote:
> After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
> the sk_tsflags field shares the same cacheline with sk_forward_alloc.
> 
> The UDP protocol does not acquire the sock lock in the RX path;
> forward allocations are protected via the receive queue spinlock;
> additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
> touching sk_tsflags on each packet reception.
> 
> Due to the above, under high packet rate traffic, when the BH and the
> user-space process run on different CPUs, UDP packet reception
> experiences a cache miss while accessing sk_tsflags.
> 
> The receive path doesn't strictly need to access the problematic field;
> change sock_set_timestamping() to maintain the relevant information
> in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can
> take decisions accessing the latter field only.
> 
> With this patch applied, on an AMD epic server with i40e NICs, I
> measured a 10% performance improvement for small packets UDP flood
> performance tests - possibly a larger delta could be observed with more
> recent H/W.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
Kuniyuki Iwashima Feb. 12, 2025, 7:39 a.m. UTC | #3
From: Paolo Abeni <pabeni@redhat.com>
Date: Tue, 11 Feb 2025 18:17:31 +0100
> After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
> the sk_tsflags field shares the same cacheline with sk_forward_alloc.
> 
> The UDP protocol does not acquire the sock lock in the RX path;
> forward allocations are protected via the receive queue spinlock;
> additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
> touching sk_tsflags on each packet reception.
> 
> Due to the above, under high packet rate traffic, when the BH and the
> user-space process run on different CPUs, UDP packet reception
> experiences a cache miss while accessing sk_tsflags.
> 
> The receive path doesn't strictly need to access the problematic field;
> change sock_set_timestamping() to maintain the relevant information
> in a newly allocated sk_flags bit, so that sock_recv_cmsgs() can
> take decisions accessing the latter field only.
> 
> With this patch applied, on an AMD epic server with i40e NICs, I
> measured a 10% performance improvement for small packets UDP flood
> performance tests - possibly a larger delta could be observed with more
> recent H/W.
> 
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

Reviewed-by: Kuniyuki Iwashima <kuniyu@amazon.com>
patchwork-bot+netdevbpf@kernel.org Feb. 13, 2025, 4:10 a.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (main)
by Jakub Kicinski <kuba@kernel.org>:

On Tue, 11 Feb 2025 18:17:31 +0100 you wrote:
> After commit 5d4cc87414c5 ("net: reorganize "struct sock" fields"),
> the sk_tsflags field shares the same cacheline with sk_forward_alloc.
> 
> The UDP protocol does not acquire the sock lock in the RX path;
> forward allocations are protected via the receive queue spinlock;
> additionally udp_recvmsg() calls sock_recv_cmsgs() unconditionally
> touching sk_tsflags on each packet reception.
> 
> [...]

Here is the summary with links:
  - [net-next] net: avoid unconditionally touching sk_tsflags on RX
    https://git.kernel.org/netdev/net-next/c/f0e70409b7eb

You are awesome, thank you!
diff mbox series

Patch

diff --git a/include/net/sock.h b/include/net/sock.h
index 8036b3b79cd8..60ebf3c7b229 100644
--- a/include/net/sock.h
+++ b/include/net/sock.h
@@ -954,6 +954,7 @@  enum sock_flags {
 	SOCK_TSTAMP_NEW, /* Indicates 64 bit timestamps always */
 	SOCK_RCVMARK, /* Receive SO_MARK  ancillary data with packet */
 	SOCK_RCVPRIORITY, /* Receive SO_PRIORITY ancillary data with packet */
+	SOCK_TIMESTAMPING_ANY, /* Copy of sk_tsflags & TSFLAGS_ANY */
 };
 
 #define SK_FLAGS_TIMESTAMP ((1UL << SOCK_TIMESTAMP) | (1UL << SOCK_TIMESTAMPING_RX_SOFTWARE))
@@ -2664,13 +2665,13 @@  static inline void sock_recv_cmsgs(struct msghdr *msg, struct sock *sk,
 {
 #define FLAGS_RECV_CMSGS ((1UL << SOCK_RXQ_OVFL)			| \
 			   (1UL << SOCK_RCVTSTAMP)			| \
-			   (1UL << SOCK_RCVMARK)			|\
-			   (1UL << SOCK_RCVPRIORITY))
+			   (1UL << SOCK_RCVMARK)			| \
+			   (1UL << SOCK_RCVPRIORITY)			| \
+			   (1UL << SOCK_TIMESTAMPING_ANY))
 #define TSFLAGS_ANY	  (SOF_TIMESTAMPING_SOFTWARE			| \
 			   SOF_TIMESTAMPING_RAW_HARDWARE)
 
-	if (sk->sk_flags & FLAGS_RECV_CMSGS ||
-	    READ_ONCE(sk->sk_tsflags) & TSFLAGS_ANY)
+	if (READ_ONCE(sk->sk_flags) & FLAGS_RECV_CMSGS)
 		__sock_recv_cmsgs(msg, sk, skb);
 	else if (unlikely(sock_flag(sk, SOCK_TIMESTAMP)))
 		sock_write_timestamp(sk, skb->tstamp);
diff --git a/net/core/sock.c b/net/core/sock.c
index eae2ae70a2e0..a197f0a0b878 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -938,6 +938,7 @@  int sock_set_timestamping(struct sock *sk, int optname,
 
 	WRITE_ONCE(sk->sk_tsflags, val);
 	sock_valbool_flag(sk, SOCK_TSTAMP_NEW, optname == SO_TIMESTAMPING_NEW);
+	sock_valbool_flag(sk, SOCK_TIMESTAMPING_ANY, !!(val & TSFLAGS_ANY));
 
 	if (val & SOF_TIMESTAMPING_RX_SOFTWARE)
 		sock_enable_timestamp(sk,