Message ID | 20250301194624.1879919-1-edumazet@google.com (mailing list archive) |
---|---|
State | New |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] tcp: use RCU in __inet{6}_check_established() | expand |
On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote: > > When __inet_hash_connect() has to try many 4-tuples before > finding an available one, we see a high spinlock cost from > __inet_check_established() and/or __inet6_check_established(). > > This patch adds an RCU lookup to avoid the spinlock > acquisition if the 4-tuple is found in the hash table. > > Note that there are still spin_lock_bh() calls in > __inet_hash_connect() to protect inet_bind_hashbucket, > this will be fixed in a future patch. > > Signed-off-by: Eric Dumazet <edumazet@google.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> It can introduce extra system overhead in most cases because it takes effect only when the socket is not unique in the hash table. I'm not sure what the probability of seeing this case is in reality in general. Considering performing a look-up seems not to consume much, I think it looks good to me. Well, it's the only one I'm a bit worried about. As you said, it truly mitigates the huge contention in the earlier mentioned case where the available port resources are becoming rare. We've encountered this situation causing high cpu load before. Thanks for the optimization! Thanks, Jason
On Sun, Mar 2, 2025 at 1:17 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote: > > > > When __inet_hash_connect() has to try many 4-tuples before > > finding an available one, we see a high spinlock cost from > > __inet_check_established() and/or __inet6_check_established(). > > > > This patch adds an RCU lookup to avoid the spinlock > > acquisition if the 4-tuple is found in the hash table. > > > > Note that there are still spin_lock_bh() calls in > > __inet_hash_connect() to protect inet_bind_hashbucket, > > this will be fixed in a future patch. > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> > > It can introduce extra system overhead in most cases because it takes > effect only when the socket is not unique in the hash table. I'm not > sure what the probability of seeing this case is in reality in > general. Considering performing a look-up seems not to consume much, I > think it looks good to me. Well, it's the only one I'm a bit worried > about. > > As you said, it truly mitigates the huge contention in the earlier > mentioned case where the available port resources are becoming rare. > We've encountered this situation causing high cpu load before. Thanks > for the optimization! Addition of bhash2 in 6.1 added a major regression. This is the reason I started to work on this stuff. I will send the whole series later today, but I get a ~200% increase in performance. I will provide numbers in the cover letter. neper/tcp_crr can be used to measure the gains. Both server/client have 240 cores, 480 hyperthreads (Intel(R) Xeon(R) 6985P-C) Server ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog Client ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog -c -H server Before this first patch: utime_start=0.210641 utime_end=1.704755 stime_start=11.842697 stime_end=1997.341498 nvcsw_start=18518 nvcsw_end=18672 nivcsw_start=26 nivcsw_end=14828 num_transactions=615906 latency_min=0.051826868 latency_max=12.015396087 latency_mean=0.642949344 latency_stddev=1.860316922 num_samples=207534 correlation_coefficient=1.00 throughput=62524.04 After this patch: utime_start=0.185656 utime_end=2.436602 stime_start=11.470889 stime_end=1980.679087 nvcsw_start=17327 nvcsw_end=17514 nivcsw_start=48 nivcsw_end=77724 num_transactions=821025 latency_min=0.025097789 latency_max=11.581610596 latency_mean=0.475903462 latency_stddev=1.597439931 num_samples=206556 time_end=173.321207377 correlation_coefficient=1.00 throughput=84387.19
On Sun, Mar 2, 2025 at 3:11 PM Eric Dumazet <edumazet@google.com> wrote: > > On Sun, Mar 2, 2025 at 1:17 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > When __inet_hash_connect() has to try many 4-tuples before > > > finding an available one, we see a high spinlock cost from > > > __inet_check_established() and/or __inet6_check_established(). > > > > > > This patch adds an RCU lookup to avoid the spinlock > > > acquisition if the 4-tuple is found in the hash table. > > > > > > Note that there are still spin_lock_bh() calls in > > > __inet_hash_connect() to protect inet_bind_hashbucket, > > > this will be fixed in a future patch. > > > > > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > > > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> > > > > It can introduce extra system overhead in most cases because it takes > > effect only when the socket is not unique in the hash table. I'm not > > sure what the probability of seeing this case is in reality in > > general. Considering performing a look-up seems not to consume much, I > > think it looks good to me. Well, it's the only one I'm a bit worried > > about. > > > > As you said, it truly mitigates the huge contention in the earlier > > mentioned case where the available port resources are becoming rare. > > We've encountered this situation causing high cpu load before. Thanks > > for the optimization! > > Addition of bhash2 in 6.1 added a major regression. > > This is the reason I started to work on this stuff. > I will send the whole series later today, but I get a ~200% increase > in performance. > I will provide numbers in the cover letter. > > neper/tcp_crr can be used to measure the gains. > > Both server/client have 240 cores, 480 hyperthreads (Intel(R) Xeon(R) 6985P-C) > > Server > ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog > > Client > ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog -c -H server > > Before this first patch: > > utime_start=0.210641 > utime_end=1.704755 > stime_start=11.842697 > stime_end=1997.341498 > nvcsw_start=18518 > nvcsw_end=18672 > nivcsw_start=26 > nivcsw_end=14828 > num_transactions=615906 > latency_min=0.051826868 > latency_max=12.015396087 > latency_mean=0.642949344 > latency_stddev=1.860316922 > num_samples=207534 > correlation_coefficient=1.00 > throughput=62524.04 > > After this patch: > > utime_start=0.185656 > utime_end=2.436602 > stime_start=11.470889 > stime_end=1980.679087 > nvcsw_start=17327 > nvcsw_end=17514 > nivcsw_start=48 > nivcsw_end=77724 > num_transactions=821025 > latency_min=0.025097789 > latency_max=11.581610596 > latency_mean=0.475903462 > latency_stddev=1.597439931 > num_samples=206556 > time_end=173.321207377 > correlation_coefficient=1.00 > throughput=84387.19 Amazing! Thanks for the work and looking forward to your series then! Thanks, Jason
On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote: > > When __inet_hash_connect() has to try many 4-tuples before > finding an available one, we see a high spinlock cost from > __inet_check_established() and/or __inet6_check_established(). > > This patch adds an RCU lookup to avoid the spinlock > acquisition if the 4-tuple is found in the hash table. > > Note that there are still spin_lock_bh() calls in > __inet_hash_connect() to protect inet_bind_hashbucket, > this will be fixed in a future patch. > > Signed-off-by: Eric Dumazet <edumazet@google.com> After Eric just reminded me, I similarly conduct the test and succeed to see a 7% performance increase. And there are 1225112852 times early return with -EADDRNOTAVAIL (during single test period) in RCU protection newly added in this patch, which means we save 1225112852 times using unnecessary spin lock. It's really remarkable! So, Tested-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com> Thank you.
diff --git a/net/ipv4/inet_hashtables.c b/net/ipv4/inet_hashtables.c index 9bfcfd016e18275fb50fea8d77adc8a64fb12494..46d39aa2199ec3a405b50e8e85130e990d2c26b7 100644 --- a/net/ipv4/inet_hashtables.c +++ b/net/ipv4/inet_hashtables.c @@ -551,11 +551,24 @@ static int __inet_check_established(struct inet_timewait_death_row *death_row, unsigned int hash = inet_ehashfn(net, daddr, lport, saddr, inet->inet_dport); struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash); - spinlock_t *lock = inet_ehash_lockp(hinfo, hash); - struct sock *sk2; - const struct hlist_nulls_node *node; struct inet_timewait_sock *tw = NULL; + const struct hlist_nulls_node *node; + struct sock *sk2; + spinlock_t *lock; + + rcu_read_lock(); + sk_nulls_for_each(sk2, node, &head->chain) { + if (sk2->sk_hash != hash || + !inet_match(net, sk2, acookie, ports, dif, sdif)) + continue; + if (sk2->sk_state == TCP_TIME_WAIT) + break; + rcu_read_unlock(); + return -EADDRNOTAVAIL; + } + rcu_read_unlock(); + lock = inet_ehash_lockp(hinfo, hash); spin_lock(lock); sk_nulls_for_each(sk2, node, &head->chain) { diff --git a/net/ipv6/inet6_hashtables.c b/net/ipv6/inet6_hashtables.c index 9ec05e354baa69d14e88da37f5a9fce11e874e35..3604a5cae5d29a25d24f9513308334ff8e64b083 100644 --- a/net/ipv6/inet6_hashtables.c +++ b/net/ipv6/inet6_hashtables.c @@ -276,11 +276,24 @@ static int __inet6_check_established(struct inet_timewait_death_row *death_row, const unsigned int hash = inet6_ehashfn(net, daddr, lport, saddr, inet->inet_dport); struct inet_ehash_bucket *head = inet_ehash_bucket(hinfo, hash); - spinlock_t *lock = inet_ehash_lockp(hinfo, hash); - struct sock *sk2; - const struct hlist_nulls_node *node; struct inet_timewait_sock *tw = NULL; + const struct hlist_nulls_node *node; + struct sock *sk2; + spinlock_t *lock; + + rcu_read_lock(); + sk_nulls_for_each(sk2, node, &head->chain) { + if (sk2->sk_hash != hash || + !inet6_match(net, sk2, saddr, daddr, ports, dif, sdif)) + continue; + if (sk2->sk_state == TCP_TIME_WAIT) + break; + rcu_read_unlock(); + return -EADDRNOTAVAIL; + } + rcu_read_unlock(); + lock = inet_ehash_lockp(hinfo, hash); spin_lock(lock); sk_nulls_for_each(sk2, node, &head->chain) {
When __inet_hash_connect() has to try many 4-tuples before finding an available one, we see a high spinlock cost from __inet_check_established() and/or __inet6_check_established(). This patch adds an RCU lookup to avoid the spinlock acquisition if the 4-tuple is found in the hash table. Note that there are still spin_lock_bh() calls in __inet_hash_connect() to protect inet_bind_hashbucket, this will be fixed in a future patch. Signed-off-by: Eric Dumazet <edumazet@google.com> --- net/ipv4/inet_hashtables.c | 19 ++++++++++++++++--- net/ipv6/inet6_hashtables.c | 19 ++++++++++++++++--- 2 files changed, 32 insertions(+), 6 deletions(-)