[net-next] tcp: use RCU in __inet{6}_check_established()

Message ID	20250301194624.1879919-1-edumazet@google.com (mailing list archive)
State	New
Delegated to:	Netdev Maintainers
Headers	show Received: from mail-qv1-f73.google.com (mail-qv1-f73.google.com [209.85.219.73]) (using TLSv1.2 with cipher ECDHE-RSA-AES128-GCM-SHA256 (128/128 bits)) (No client certificate requested) by smtp.subspace.kernel.org (Postfix) with ESMTPS id 8EAE71E8333 for <netdev@vger.kernel.org>; Sat, 1 Mar 2025 19:46:27 +0000 (UTC) Date: Sat, 1 Mar 2025 19:46:24 +0000 Precedence: bulk Mime-Version: 1.0 Message-ID: <20250301194624.1879919-1-edumazet@google.com> Subject: [PATCH net-next] tcp: use RCU in __inet{6}_check_established() From: Eric Dumazet <edumazet@google.com> To: "David S . Miller" <davem@davemloft.net>, Jakub Kicinski <kuba@kernel.org>, Paolo Abeni <pabeni@redhat.com>, Neal Cardwell <ncardwell@google.com> Cc: Kuniyuki Iwashima <kuniyu@amazon.com>, Simon Horman <horms@kernel.org>, netdev@vger.kernel.org, eric.dumazet@gmail.com, Eric Dumazet <edumazet@google.com> Content-Type: text/plain; charset="UTF-8"
Series	[net-next] tcp: use RCU in __inet{6}_check_established() \| expand [net-next] tcp: use RCU in __inet{6}_check_established()

Context	Check	Description
netdev/series_format	success	Single patches do not need cover letters
netdev/tree_selection	success	Clearly marked for net-next
netdev/ynl	success	Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present	success	Fixes tag not required for -next series
netdev/header_inline	success	No static functions without inline keyword in header files
netdev/build_32bit	success	Errors and warnings before: 0 this patch: 0
netdev/build_tools	success	No tools touched, skip
netdev/cc_maintainers	warning	1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang	success	Errors and warnings before: 0 this patch: 0
netdev/verify_signedoff	success	Signed-off-by tag matches author and committer
netdev/deprecated_api	success	None detected
netdev/check_selftest	success	No net selftest shell script
netdev/verify_fixes	success	No Fixes tag
netdev/build_allmodconfig_warn	success	Errors and warnings before: 0 this patch: 0
netdev/checkpatch	warning	CHECK: spinlock_t definition without comment
netdev/build_clang_rust	success	No Rust files in patch. Skipping build
netdev/kdoc	success	Errors and warnings before: 0 this patch: 0
netdev/source_inline	success	Was 0 now: 0
netdev/contest	success	net-next-2025-03-02--15-00 (tests: 893)

Eric Dumazet March 1, 2025, 7:46 p.m. UTC

When __inet_hash_connect() has to try many 4-tuples before
finding an available one, we see a high spinlock cost from
__inet_check_established() and/or __inet6_check_established().

This patch adds an RCU lookup to avoid the spinlock
acquisition if the 4-tuple is found in the hash table.

Note that there are still spin_lock_bh() calls in
__inet_hash_connect() to protect inet_bind_hashbucket,
this will be fixed in a future patch.

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 net/ipv4/inet_hashtables.c  | 19 ++++++++++++++++---
 net/ipv6/inet6_hashtables.c | 19 ++++++++++++++++---
 2 files changed, 32 insertions(+), 6 deletions(-)

Jason Xing March 2, 2025, 12:16 a.m. UTC | #1

On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote:
>
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> __inet_check_established() and/or __inet6_check_established().
>
> This patch adds an RCU lookup to avoid the spinlock
> acquisition if the 4-tuple is found in the hash table.
>
> Note that there are still spin_lock_bh() calls in
> __inet_hash_connect() to protect inet_bind_hashbucket,
> this will be fixed in a future patch.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

It can introduce extra system overhead in most cases because it takes
effect only when the socket is not unique in the hash table. I'm not
sure what the probability of seeing this case is in reality in
general. Considering performing a look-up seems not to consume much, I
think it looks good to me. Well, it's the only one I'm a bit worried
about.

As you said, it truly mitigates the huge contention in the earlier
mentioned case where the available port resources are becoming rare.
We've encountered this situation causing high cpu load before. Thanks
for the optimization!

Thanks,
Jason

Eric Dumazet March 2, 2025, 7:11 a.m. UTC | #2

On Sun, Mar 2, 2025 at 1:17 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
>
> On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > When __inet_hash_connect() has to try many 4-tuples before
> > finding an available one, we see a high spinlock cost from
> > __inet_check_established() and/or __inet6_check_established().
> >
> > This patch adds an RCU lookup to avoid the spinlock
> > acquisition if the 4-tuple is found in the hash table.
> >
> > Note that there are still spin_lock_bh() calls in
> > __inet_hash_connect() to protect inet_bind_hashbucket,
> > this will be fixed in a future patch.
> >
> > Signed-off-by: Eric Dumazet <edumazet@google.com>
>
> Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
>
> It can introduce extra system overhead in most cases because it takes
> effect only when the socket is not unique in the hash table. I'm not
> sure what the probability of seeing this case is in reality in
> general. Considering performing a look-up seems not to consume much, I
> think it looks good to me. Well, it's the only one I'm a bit worried
> about.
>
> As you said, it truly mitigates the huge contention in the earlier
> mentioned case where the available port resources are becoming rare.
> We've encountered this situation causing high cpu load before. Thanks
> for the optimization!

Addition of bhash2 in 6.1 added a major regression.

This is the reason I started to work on this stuff.
I will send the whole series later today, but I get a ~200% increase
in performance.
I will provide numbers in the cover letter.

neper/tcp_crr can be used to measure the gains.

Both server/client have 240 cores, 480 hyperthreads (Intel(R) Xeon(R) 6985P-C)

Server
ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog

Client
ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog -c -H server

Before this first patch:

utime_start=0.210641
utime_end=1.704755
stime_start=11.842697
stime_end=1997.341498
nvcsw_start=18518
nvcsw_end=18672
nivcsw_start=26
nivcsw_end=14828
num_transactions=615906
latency_min=0.051826868
latency_max=12.015396087
latency_mean=0.642949344
latency_stddev=1.860316922
num_samples=207534
correlation_coefficient=1.00
throughput=62524.04

After this patch:

utime_start=0.185656
utime_end=2.436602
stime_start=11.470889
stime_end=1980.679087
nvcsw_start=17327
nvcsw_end=17514
nivcsw_start=48
nivcsw_end=77724
num_transactions=821025
latency_min=0.025097789
latency_max=11.581610596
latency_mean=0.475903462
latency_stddev=1.597439931
num_samples=206556
time_end=173.321207377
correlation_coefficient=1.00
throughput=84387.19

Jason Xing March 2, 2025, 8:56 a.m. UTC | #3

On Sun, Mar 2, 2025 at 3:11 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sun, Mar 2, 2025 at 1:17 AM Jason Xing <kerneljasonxing@gmail.com> wrote:
> >
> > On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > When __inet_hash_connect() has to try many 4-tuples before
> > > finding an available one, we see a high spinlock cost from
> > > __inet_check_established() and/or __inet6_check_established().
> > >
> > > This patch adds an RCU lookup to avoid the spinlock
> > > acquisition if the 4-tuple is found in the hash table.
> > >
> > > Note that there are still spin_lock_bh() calls in
> > > __inet_hash_connect() to protect inet_bind_hashbucket,
> > > this will be fixed in a future patch.
> > >
> > > Signed-off-by: Eric Dumazet <edumazet@google.com>
> >
> > Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>
> >
> > It can introduce extra system overhead in most cases because it takes
> > effect only when the socket is not unique in the hash table. I'm not
> > sure what the probability of seeing this case is in reality in
> > general. Considering performing a look-up seems not to consume much, I
> > think it looks good to me. Well, it's the only one I'm a bit worried
> > about.
> >
> > As you said, it truly mitigates the huge contention in the earlier
> > mentioned case where the available port resources are becoming rare.
> > We've encountered this situation causing high cpu load before. Thanks
> > for the optimization!
>
> Addition of bhash2 in 6.1 added a major regression.
>
> This is the reason I started to work on this stuff.
> I will send the whole series later today, but I get a ~200% increase
> in performance.
> I will provide numbers in the cover letter.
>
> neper/tcp_crr can be used to measure the gains.
>
> Both server/client have 240 cores, 480 hyperthreads (Intel(R) Xeon(R) 6985P-C)
>
> Server
> ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog
>
> Client
> ulimit -n 40000; neper/tcp_crr -6 -T200 -F20000 --nolog -c -H server
>
> Before this first patch:
>
> utime_start=0.210641
> utime_end=1.704755
> stime_start=11.842697
> stime_end=1997.341498
> nvcsw_start=18518
> nvcsw_end=18672
> nivcsw_start=26
> nivcsw_end=14828
> num_transactions=615906
> latency_min=0.051826868
> latency_max=12.015396087
> latency_mean=0.642949344
> latency_stddev=1.860316922
> num_samples=207534
> correlation_coefficient=1.00
> throughput=62524.04
>
> After this patch:
>
> utime_start=0.185656
> utime_end=2.436602
> stime_start=11.470889
> stime_end=1980.679087
> nvcsw_start=17327
> nvcsw_end=17514
> nivcsw_start=48
> nivcsw_end=77724
> num_transactions=821025
> latency_min=0.025097789
> latency_max=11.581610596
> latency_mean=0.475903462
> latency_stddev=1.597439931
> num_samples=206556
> time_end=173.321207377
> correlation_coefficient=1.00
> throughput=84387.19

Amazing! Thanks for the work and looking forward to your series then!

Thanks,
Jason

Jason Xing March 2, 2025, 9:48 a.m. UTC | #4

On Sun, Mar 2, 2025 at 3:46 AM Eric Dumazet <edumazet@google.com> wrote:
>
> When __inet_hash_connect() has to try many 4-tuples before
> finding an available one, we see a high spinlock cost from
> __inet_check_established() and/or __inet6_check_established().
>
> This patch adds an RCU lookup to avoid the spinlock
> acquisition if the 4-tuple is found in the hash table.
>
> Note that there are still spin_lock_bh() calls in
> __inet_hash_connect() to protect inet_bind_hashbucket,
> this will be fixed in a future patch.
>
> Signed-off-by: Eric Dumazet <edumazet@google.com>

After Eric just reminded me, I similarly conduct the test and succeed
to see a 7% performance increase. And there are 1225112852 times early
return with -EADDRNOTAVAIL (during single test period) in RCU
protection newly added in this patch, which means we save 1225112852
times using unnecessary spin lock. It's really remarkable!

So,
Tested-by: Jason Xing <kerneljasonxing@gmail.com>
Reviewed-by: Jason Xing <kerneljasonxing@gmail.com>

Thank you.

[net-next] tcp: use RCU in __inet{6}_check_established()

Checks

Commit Message

Comments

Patch