diff mbox series

[net-next,1/4] tcp: use RCU in __inet{6}_check_established()

Message ID 20250302124237.3913746-2-edumazet@google.com (mailing list archive)
State New
Delegated to: Netdev Maintainers
Headers show
Series tcp: scale connect() under pressure | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 0 this patch: 0
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 0 this patch: 0
netdev/checkpatch warning CHECK: spinlock_t definition without comment
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet March 2, 2025, 12:42 p.m. UTC
When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
__inet_check_established() and/or __inet6_check_established().

This patch adds an RCU lookup to avoid the spinlock
acquisition when the 4-tuple is found in the hash table.

Note that there are still spin_lock_bh() calls in
__inet_hash_connect() to protect inet_bind_hashbucket,
this will be fixed later in this series.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
---
 net/ipv4/inet_hashtables.c  | 19 ++++++++++++++++---
 net/ipv6/inet6_hashtables.c | 19 ++++++++++++++++---
 2 files changed, 32 insertions(+), 6 deletions(-)

Comments

Jason Xing March 3, 2025, 12:24 a.m. UTC | #1
On Sun, Mar 2, 2025 at 8:42 PM Eric Dumazet <edumazet@google.com> wrote:
>
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> __inet_check_established() and/or __inet6_check_established().
>
> This patch adds an RCU lookup to avoid the spinlock
> acquisition when the 4-tuple is found in the hash table.
>
> Note that there are still spin_lock_bh() calls in
> __inet_hash_connect() to protect inet_bind_hashbucket,
> this will be fixed later in this series.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Yesterday, I did a few tests on this single patch and managed to see a
~7% increase in performance on my virtual machine[1] :) Thanks!

Tested-by: Jason Xing <kerneljasonxing@gmail.com>

[1]: https://lore.kernel.org/all/CAL+tcoBAVmTk_JBX=OEBqZZuoSzZd8bjuw9rgwRLMd9fvZOSkA@mail.gmail.com/
diff mbox series

Patch

diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c
index 9bfcfd016e18275fb50fea8d77adc8a64fb12494..46d39aa2199ec3a405b50e8e85130e990d2c26b7 100644
--- a/net/ipv4/inet_hashtables.c
+++ b/net/ipv4/inet_hashtables.c
@@ -551,11 +551,24 @@  static int __inet_check_established(struct inet_timewait_death_row *death_row,
 	unsigned int hash = inet_ehashfn(net, daddr, lport,
 					 saddr, inet->inet_dport);
 	struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
-	spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
-	struct sock *sk2;
-	const struct hlist_nulls_node *node;
 	struct inet_timewait_sock *tw = NULL;
+	const struct hlist_nulls_node *node;
+	struct sock *sk2;
+	spinlock_t *lock;
+
+	rcu_read_lock();
+	sk_nulls_for_each(sk2, node, &head->chain) {
+		if (sk2->sk_hash != hash ||
+		    !inet_match(net, sk2, acookie, ports, dif, sdif))
+			continue;
+		if (sk2->sk_state == TCP_TIME_WAIT)
+			break;
+		rcu_read_unlock();
+		return -EADDRNOTAVAIL;
+	}
+	rcu_read_unlock();
 
+	lock = inet_ehash_lockp(hinfo, hash);
 	spin_lock(lock);
 
 	sk_nulls_for_each(sk2, node, &head->chain) {
diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c
index 9ec05e354baa69d14e88da37f5a9fce11e874e35..3604a5cae5d29a25d24f9513308334ff8e64b083 100644
--- a/net/ipv6/inet6_hashtables.c
+++ b/net/ipv6/inet6_hashtables.c
@@ -276,11 +276,24 @@  static int __inet6_check_established(struct inet_timewait_death_row *death_row,
 	const unsigned int hash = inet6_ehashfn(net, daddr, lport, saddr,
 						inet->inet_dport);
 	struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash);
-	spinlock_t *lock = inet_ehash_lockp(hinfo, hash);
-	struct sock *sk2;
-	const struct hlist_nulls_node *node;
 	struct inet_timewait_sock *tw = NULL;
+	const struct hlist_nulls_node *node;
+	struct sock *sk2;
+	spinlock_t *lock;
+
+	rcu_read_lock();
+	sk_nulls_for_each(sk2, node, &head->chain) {
+		if (sk2->sk_hash != hash ||
+		    !inet6_match(net, sk2, saddr, daddr, ports, dif, sdif))
+			continue;
+		if (sk2->sk_state == TCP_TIME_WAIT)
+			break;
+		rcu_read_unlock();
+		return -EADDRNOTAVAIL;
+	}
+	rcu_read_unlock();
 
+	lock = inet_ehash_lockp(hinfo, hash);
 	spin_lock(lock);
 
 	sk_nulls_for_each(sk2, node, &head->chain) {