diff mbox series

[net-next,4/4] net: introduce a config option to tweak MAX_SKB_FRAGS

Message ID 20220210175557.1843151-5-eric.dumazet@gmail.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series net: make MAX_SKB_FRAGS configurable | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 6020 this patch: 6020
netdev/cc_maintainers warning 1 maintainers not CCed: keescook@chromium.org
netdev/build_clang success Errors and warnings before: 882 this patch: 882
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 6173 this patch: 6173
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 38 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Feb. 10, 2022, 5:55 p.m. UTC
From: Eric Dumazet <edumazet@google.com>

Currently, MAX_SKB_FRAGS value is 17.

For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
attempts order-3 allocations, stuffing 32768 bytes per frag.

But with zero copy, we use order-0 pages.

For BIG TCP to show its full potential, we add a config option
to be able to fit up to 45 segments per skb.

This is also needed for BIG TCP rx zerocopy, as zerocopy currently
does not support skbs with frag list.

We have used MAX_SKB_FRAGS=45 value for years at Google before
we deployed 4K MTU, with no adverse effect.

Back then, goal was to be able to receive full size (64KB) GRO
packets without the frag_list overhead.

By default we keep the old/legacy value of 17 until we get
more coverage for the updated values.

Sizes of struct skb_shared_info on 64bit arches

MAX_SKB_FRAGS | sizeof(struct skb_shared_info)
----------------------------------------------
         17     320
         21     320+64  = 384
         25     320+128 = 448
         29     320+192 = 512
         33     320+256 = 576
         37     320+320 = 640
         41     320+384 = 704
         45     320+448 = 768

This inflation might cause problems for drivers assuming they could pack
both the incoming packet and skb_shared_info in half a page, using build_skb().

Signed-off-by: Eric Dumazet <edumazet@google.com>
---
 include/linux/skbuff.h | 14 ++------------
 net/Kconfig            | 12 ++++++++++++
 2 files changed, 14 insertions(+), 12 deletions(-)

Comments

Jakub Kicinski Feb. 11, 2022, 10:16 p.m. UTC | #1
On Thu, 10 Feb 2022 09:55:57 -0800 Eric Dumazet wrote:
> From: Eric Dumazet <edumazet@google.com>
> 
> Currently, MAX_SKB_FRAGS value is 17.
> 
> For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> attempts order-3 allocations, stuffing 32768 bytes per frag.
> 
> But with zero copy, we use order-0 pages.

If I read this right BIG TCP works but for zc cases, without this patch,
but there's little point to applying this patch without BIG TCP.

Shouldn't the BIG TCP work go in first and then we'll worry about how
many frags can each skb carry?
Eric Dumazet Feb. 12, 2022, 2:41 a.m. UTC | #2
On Fri, Feb 11, 2022 at 2:16 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 10 Feb 2022 09:55:57 -0800 Eric Dumazet wrote:
> > From: Eric Dumazet <edumazet@google.com>
> >
> > Currently, MAX_SKB_FRAGS value is 17.
> >
> > For standard tcp sendmsg() traffic, no big deal because tcp_sendmsg()
> > attempts order-3 allocations, stuffing 32768 bytes per frag.
> >
> > But with zero copy, we use order-0 pages.
>
> If I read this right BIG TCP works but for zc cases, without this patch,
> but there's little point to applying this patch without BIG TCP.
>
> Shouldn't the BIG TCP work go in first and then we'll worry about how
> many frags can each skb carry?

This is orthogonal really.

My guess is that most people do not use TCP RX zerocopy, apart from Google ?
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index a5adbf6b51e86f955b7f4fcd4a65e38adce97601..6bba71532415019d33cd98e172b5469fa7a5c1bd 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -349,18 +349,8 @@  enum skb_drop_reason {
 	SKB_DROP_REASON_MAX,
 };
 
-/* To allow 64K frame to be packed as single skb without frag_list we
- * require 64K/PAGE_SIZE pages plus 1 additional page to allow for
- * buffers which do not start on a page boundary.
- *
- * Since GRO uses frags we allocate at least 16 regardless of page
- * size.
- */
-#if (65536/PAGE_SIZE + 1) < 16
-#define MAX_SKB_FRAGS 16UL
-#else
-#define MAX_SKB_FRAGS (65536/PAGE_SIZE + 1)
-#endif
+#define MAX_SKB_FRAGS CONFIG_MAX_SKB_FRAGS
+
 extern int sysctl_max_skb_frags;
 
 /* Set skb_shinfo(skb)->gso_size to this in case you want skb_segment to
diff --git a/net/Kconfig b/net/Kconfig
index 8a1f9d0287de3c32040eee03b60114c6e6d150bc..7b96047911ee78bf61e9a290ad430261e4fc91c8 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -253,6 +253,18 @@  config PCPU_DEV_REFCNT
 	  network device refcount are using per cpu variables if this option is set.
 	  This can be forced to N to detect underflows (with a performance drop).
 
+config MAX_SKB_FRAGS
+	int "Maximum number of fragments per skb_shared_info"
+	range 17 45
+	default 17
+	help
+	  Having more fragments per skb_shared_info can help GRO efficiency.
+	  This helps BIG TCP workloads, but might expose bugs in some
+	  legacy drivers.
+	  This also increases memory overhead of small packets,
+	  and in drivers using build_skb().
+	  If unsure, say 17.
+
 config RPS
 	bool
 	depends on SMP && SYSFS