diff mbox series

[net-next] net: Adjust sk_gso_max_size once when set

Message ID 20220125024511.27480-1-dsahern@kernel.org (mailing list archive)
State Accepted
Commit ab14f1802cfb2d7ca120bbf48e3ba6712314ffc3
Delegated to: Netdev Maintainers
Headers show
Series [net-next] net: Adjust sk_gso_max_size once when set | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6 this patch: 6
netdev/cc_maintainers warning 3 maintainers not CCed: yoshfuji@linux-ipv6.org davem@davemloft.net kuba@kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 24 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

David Ahern Jan. 25, 2022, 2:45 a.m. UTC
sk_gso_max_size is set based on the dst dev. Both users of it
adjust the value by the same offset - (MAX_TCP_HEADER + 1). Rather
than compute the same adjusted value on each call do the adjustment
once when set.

Signed-off-by: David Ahern <dsahern@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
---
 net/core/sock.c       | 1 +
 net/ipv4/tcp.c        | 3 +--
 net/ipv4/tcp_output.c | 2 +-
 3 files changed, 3 insertions(+), 3 deletions(-)

Comments

Eric Dumazet Jan. 25, 2022, 4:46 p.m. UTC | #1
On Mon, Jan 24, 2022 at 6:45 PM David Ahern <dsahern@kernel.org> wrote:
>
> sk_gso_max_size is set based on the dst dev. Both users of it
> adjust the value by the same offset - (MAX_TCP_HEADER + 1). Rather
> than compute the same adjusted value on each call do the adjustment
> once when set.
>
> Signed-off-by: David Ahern <dsahern@kernel.org>
> Cc: Eric Dumazet <edumazet@google.com>


SGTM, thanks.

Reviewed-by: Eric Dumazet <edumazet@google.com>
David Ahern Jan. 25, 2022, 5:16 p.m. UTC | #2
On 1/25/22 9:46 AM, Eric Dumazet wrote:
> On Mon, Jan 24, 2022 at 6:45 PM David Ahern <dsahern@kernel.org> wrote:
>>
>> sk_gso_max_size is set based on the dst dev. Both users of it
>> adjust the value by the same offset - (MAX_TCP_HEADER + 1). Rather
>> than compute the same adjusted value on each call do the adjustment
>> once when set.
>>
>> Signed-off-by: David Ahern <dsahern@kernel.org>
>> Cc: Eric Dumazet <edumazet@google.com>
> 
> 
> SGTM, thanks.
> 
> Reviewed-by: Eric Dumazet <edumazet@google.com>

The git history does not explain why MAX_TCP_HEADER is used to lower
sk_gso_max_size. Do you recall the history on it?
Eric Dumazet Jan. 25, 2022, 5:20 p.m. UTC | #3
On Tue, Jan 25, 2022 at 9:16 AM David Ahern <dsahern@gmail.com> wrote:
>
> On 1/25/22 9:46 AM, Eric Dumazet wrote:
> > On Mon, Jan 24, 2022 at 6:45 PM David Ahern <dsahern@kernel.org> wrote:
> >>
> >> sk_gso_max_size is set based on the dst dev. Both users of it
> >> adjust the value by the same offset - (MAX_TCP_HEADER + 1). Rather
> >> than compute the same adjusted value on each call do the adjustment
> >> once when set.
> >>
> >> Signed-off-by: David Ahern <dsahern@kernel.org>
> >> Cc: Eric Dumazet <edumazet@google.com>
> >
> >
> > SGTM, thanks.
> >
> > Reviewed-by: Eric Dumazet <edumazet@google.com>
>
> The git history does not explain why MAX_TCP_HEADER is used to lower
> sk_gso_max_size. Do you recall the history on it?

Simply that max IP datagram size is 64K

And TCP is sizing its payload size there (eg in  tcp_tso_autosize()),
when skb only contains payload.

Headers are added later in various xmit layers.

MAX_TCP_HEADER is chosen to avoid re-allocs of skb->head in typical workload.
patchwork-bot+netdevbpf@kernel.org Jan. 25, 2022, 11:40 p.m. UTC | #4
Hello:

This patch was applied to netdev/net-next.git (master)
by Jakub Kicinski <kuba@kernel.org>:

On Mon, 24 Jan 2022 19:45:11 -0700 you wrote:
> sk_gso_max_size is set based on the dst dev. Both users of it
> adjust the value by the same offset - (MAX_TCP_HEADER + 1). Rather
> than compute the same adjusted value on each call do the adjustment
> once when set.
> 
> Signed-off-by: David Ahern <dsahern@kernel.org>
> Cc: Eric Dumazet <edumazet@google.com>
> 
> [...]

Here is the summary with links:
  - [net-next] net: Adjust sk_gso_max_size once when set
    https://git.kernel.org/netdev/net-next/c/ab14f1802cfb

You are awesome, thank you!
David Ahern Jan. 25, 2022, 11:49 p.m. UTC | #5
On 1/25/22 10:20 AM, Eric Dumazet wrote:
>> The git history does not explain why MAX_TCP_HEADER is used to lower
>> sk_gso_max_size. Do you recall the history on it?
> 
> Simply that max IP datagram size is 64K
> 
> And TCP is sizing its payload size there (eg in  tcp_tso_autosize()),
> when skb only contains payload.
> 
> Headers are added later in various xmit layers.
> 
> MAX_TCP_HEADER is chosen to avoid re-allocs of skb->head in typical workload.

From what I can tell skb->head is allocated based on MAX_TCP_HEADER, and
payload is added as frags for TSO.

I was just curious because I noticed a few MTUs (I only looked multiples
of 100 from 1500 to 9000) can get an extra segment in a TSO packet and
stay under the 64kB limit if that offset had better information of the
actual header size needed (if any beyond network + tcp).
Eric Dumazet Jan. 26, 2022, 12:27 a.m. UTC | #6
On Tue, Jan 25, 2022 at 3:49 PM David Ahern <dsahern@gmail.com> wrote:
>
> On 1/25/22 10:20 AM, Eric Dumazet wrote:
> >> The git history does not explain why MAX_TCP_HEADER is used to lower
> >> sk_gso_max_size. Do you recall the history on it?
> >
> > Simply that max IP datagram size is 64K
> >
> > And TCP is sizing its payload size there (eg in  tcp_tso_autosize()),
> > when skb only contains payload.
> >
> > Headers are added later in various xmit layers.
> >
> > MAX_TCP_HEADER is chosen to avoid re-allocs of skb->head in typical workload.
>
> From what I can tell skb->head is allocated based on MAX_TCP_HEADER, and
> payload is added as frags for TSO.

Sure, but at the end, ip packet length field is 16bit wide, so
sizeof(network+tcp headers) + tcp_payload <= 65535

-> tcp_payload =< 65535 - sizeof(headers)

-> tcp_payload_max_per_skb = 65536 - ( MAX_TCP_HEADER + 1)

(This would not include Ethernet header)

>
> I was just curious because I noticed a few MTUs (I only looked multiples
> of 100 from 1500 to 9000) can get an extra segment in a TSO packet and
> stay under the 64kB limit if that offset had better information of the
> actual header size needed (if any beyond network + tcp).

TCP does not care about the extra sub-mss bytes that _could_ be added
to a TSO packet

So if I have 4K MTU (4096 bytes of payload), max TSO size would be 15*4k = 60K

Application writing 60*1024+100 bytes in one sendmsg() would send one
TSO packet of 15 segments, plus one extra tiny skb with 100 bytes of
payload.

I have played in the past trying to cover this case, but adding tests
in the fast path gave no noticeable difference for common workloads.
diff mbox series

Patch

diff --git a/net/core/sock.c b/net/core/sock.c
index e21485ab285d..114a6e220ba9 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -2261,6 +2261,7 @@  void sk_setup_caps(struct sock *sk, struct dst_entry *dst)
 			sk->sk_route_caps |= NETIF_F_SG | NETIF_F_HW_CSUM;
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_size() */
 			sk->sk_gso_max_size = READ_ONCE(dst->dev->gso_max_size);
+			sk->sk_gso_max_size -= (MAX_TCP_HEADER + 1);
 			/* pairs with the WRITE_ONCE() in netif_set_gso_max_segs() */
 			max_segs = max_t(u32, READ_ONCE(dst->dev->gso_max_segs), 1);
 		}
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 3b75836db19b..1afa3f2f9a6d 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -893,8 +893,7 @@  static unsigned int tcp_xmit_size_goal(struct sock *sk, u32 mss_now,
 		return mss_now;
 
 	/* Note : tcp_tso_autosize() will eventually split this later */
-	new_size_goal = sk->sk_gso_max_size - 1 - MAX_TCP_HEADER;
-	new_size_goal = tcp_bound_to_half_wnd(tp, new_size_goal);
+	new_size_goal = tcp_bound_to_half_wnd(tp, sk->sk_gso_max_size);
 
 	/* We try hard to avoid divides here */
 	size_goal = tp->gso_segs * mss_now;
diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c
index 5079832af5c1..11c06b9db801 100644
--- a/net/ipv4/tcp_output.c
+++ b/net/ipv4/tcp_output.c
@@ -1960,7 +1960,7 @@  static u32 tcp_tso_autosize(const struct sock *sk, unsigned int mss_now,
 
 	bytes = min_t(unsigned long,
 		      sk->sk_pacing_rate >> READ_ONCE(sk->sk_pacing_shift),
-		      sk->sk_gso_max_size - 1 - MAX_TCP_HEADER);
+		      sk->sk_gso_max_size);
 
 	/* Goal is to send at least one packet per ms,
 	 * not one big TSO packet every 100 ms.