diff mbox series

[v2,net-next,4/4] net: add dedicated kmem_cache for typical/small skb->head

Message ID 20230206173103.2617121-5-edumazet@google.com (mailing list archive)
State Accepted
Commit bf9f1baa279f0758dc2297080360c5a616843927
Delegated to: Netdev Maintainers
Headers show
Series net: core: use a dedicated kmem_cache for skb head allocs | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 2 this patch: 2
netdev/cc_maintainers success CCed 5 of 5 maintainers
netdev/build_clang success Errors and warnings before: 1 this patch: 1
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 2 this patch: 2
netdev/checkpatch warning CHECK: Alignment should match open parenthesis CHECK: Blank lines aren't necessary after an open brace '{'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Eric Dumazet Feb. 6, 2023, 5:31 p.m. UTC
Recent removal of ksize() in alloc_skb() increased
performance because we no longer read
the associated struct page.

We have an equivalent cost at kfree_skb() time.

kfree(skb->head) has to access a struct page,
often cold in cpu caches to get the owning
struct kmem_cache.

Considering that many allocations are small (at least for TCP ones)
we can have our own kmem_cache to avoid the cache line miss.

This also saves memory because these small heads
are no longer padded to 1024 bytes.

CONFIG_SLUB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head   2907   2907    640   51    8 : tunables    0    0    0 : slabdata     57     57      0

CONFIG_SLAB=y
$ grep skbuff_small_head /proc/slabinfo
skbuff_small_head    607    624    640    6    1 : tunables   54   27    8 : slabdata    104    104      5

Notes:

- After Kees Cook patches and this one, we might
  be able to revert commit
  dbae2b062824 ("net: skb: introduce and use a single page frag cache")
  because GRO_MAX_HEAD is also small.

- This patch is a NOP for CONFIG_SLOB=y builds.

Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Paolo Abeni <pabeni@redhat.com>
Acked-by: Soheil Hassas Yeganeh <soheil@google.com>
---
 net/core/skbuff.c | 72 +++++++++++++++++++++++++++++++++++++++++++----
 1 file changed, 67 insertions(+), 5 deletions(-)

Comments

kernel test robot Feb. 8, 2023, 8:37 a.m. UTC | #1
Greeting,

FYI, we noticed kernel_BUG_at_mm/usercopy.c due to commit (built with gcc-11):

commit: b9943e1e516b7fd27d5163cfee1250309fb10dd3 ("[PATCH v2 net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head")
url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230207-013333
base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git c21adf256f8dcfbc07436d45be4ba2edf7a6f463
patch link: https://lore.kernel.org/all/20230206173103.2617121-5-edumazet@google.com/
patch subject: [PATCH v2 net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head

in testcase: boot

on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G

caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):


If you fix the issue, kindly add following tag
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Link: https://lore.kernel.org/oe-lkp/202302081521.8e1a1948-oliver.sang@intel.com


[  133.916379][    T1] ------------[ cut here ]------------
[  133.917321][    T1] kernel BUG at mm/usercopy.c:102!
[  133.918172][    T1] invalid opcode: 0000 [#1] SMP PTI
[  133.919045][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 6.2.0-rc6-01338-gb9943e1e516b #2
[  133.920417][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
[ 133.921959][ T1] RIP: 0010:usercopy_abort (kbuild/src/x86_64-2/mm/usercopy.c:102 (discriminator 16)) 
[ 133.922891][ T1] Code: e8 dc d5 73 fe ff 74 24 08 49 89 d9 4d 89 e8 ff 74 24 08 4c 89 e1 4c 89 fa 48 89 ee 41 56 48 c7 c7 90 49 5c 83 e8 98 ea fe ff <0f> 0b e8 b0 d5 73 fe 41 0f b6 d5 4d 89 e0 48 89 e9 31 f6 48 c7 c7
All code
========
   0:	e8 dc d5 73 fe       	callq  0xfffffffffe73d5e1
   5:	ff 74 24 08          	pushq  0x8(%rsp)
   9:	49 89 d9             	mov    %rbx,%r9
   c:	4d 89 e8             	mov    %r13,%r8
   f:	ff 74 24 08          	pushq  0x8(%rsp)
  13:	4c 89 e1             	mov    %r12,%rcx
  16:	4c 89 fa             	mov    %r15,%rdx
  19:	48 89 ee             	mov    %rbp,%rsi
  1c:	41 56                	push   %r14
  1e:	48 c7 c7 90 49 5c 83 	mov    $0xffffffff835c4990,%rdi
  25:	e8 98 ea fe ff       	callq  0xfffffffffffeeac2
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e8 b0 d5 73 fe       	callq  0xfffffffffe73d5e1
  31:	41 0f b6 d5          	movzbl %r13b,%edx
  35:	4d 89 e0             	mov    %r12,%r8
  38:	48 89 e9             	mov    %rbp,%rcx
  3b:	31 f6                	xor    %esi,%esi
  3d:	48                   	rex.W
  3e:	c7                   	.byte 0xc7
  3f:	c7                   	.byte 0xc7

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e8 b0 d5 73 fe       	callq  0xfffffffffe73d5b7
   7:	41 0f b6 d5          	movzbl %r13b,%edx
   b:	4d 89 e0             	mov    %r12,%r8
   e:	48 89 e9             	mov    %rbp,%rcx
  11:	31 f6                	xor    %esi,%esi
  13:	48                   	rex.W
  14:	c7                   	.byte 0xc7
  15:	c7                   	.byte 0xc7
[  133.925607][    T1] RSP: 0018:ffffc90000013c00 EFLAGS: 00010286
[  133.926544][    T1] RAX: 000000000000006a RBX: ffffffff835877b0 RCX: 0000000000000000
[  133.927822][    T1] RDX: 0000000000000000 RSI: ffffffff811f3595 RDI: ffffffff83bd7cd8
[  133.929183][    T1] RBP: ffffffff835407f4 R08: ffffffff850ba350 R09: 0000000000000000
[  133.930540][    T1] R10: 0000000000000004 R11: 0001ffffffffffff R12: ffffffff8354969c
[  133.931802][    T1] R13: ffffffff8354a96e R14: ffffffff8354a96f R15: ffffffff835429ca
[  133.933094][    T1] FS:  00007fa58ae35900(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
[  133.934557][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  133.935583][    T1] CR2: 00007fa58b9aff30 CR3: 0000000100e58000 CR4: 00000000000406e0
[  133.936852][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
[  133.938174][    T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
[  133.939489][    T1] Call Trace:
[  133.940122][    T1]  <TASK>
[ 133.940724][ T1] __check_heap_object (kbuild/src/x86_64-2/mm/slub.c:4738) 
[ 133.941613][ T1] check_heap_object (kbuild/src/x86_64-2/mm/usercopy.c:196) 
[ 133.942520][ T1] __check_object_size (kbuild/src/x86_64-2/mm/usercopy.c:113 kbuild/src/x86_64-2/mm/usercopy.c:127 kbuild/src/x86_64-2/mm/usercopy.c:254 kbuild/src/x86_64-2/mm/usercopy.c:213) 
[ 133.943386][ T1] ? skb_put (kbuild/src/x86_64-2/net/core/skbuff.c:2313) 
[ 133.944151][ T1] netlink_sendmsg (kbuild/src/x86_64-2/include/linux/uio.h:187 kbuild/src/x86_64-2/include/linux/uio.h:194 kbuild/src/x86_64-2/include/linux/skbuff.h:3977 kbuild/src/x86_64-2/net/netlink/af_netlink.c:1927) 
[ 133.944969][ T1] ? __pfx_netlink_sendmsg (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1861) 
[ 133.945824][ T1] sock_sendmsg (kbuild/src/x86_64-2/net/socket.c:722 kbuild/src/x86_64-2/net/socket.c:745) 
[ 133.946591][ T1] __sys_sendto (kbuild/src/x86_64-2/net/socket.c:2142) 
[ 133.947478][ T1] ? netlink_getsockopt (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1840) 
[ 133.948404][ T1] ? write_comp_data (kbuild/src/x86_64-2/kernel/kcov.c:236) 
[ 133.949240][ T1] ? __pfx_netlink_getsockopt (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1742) 
[ 133.950214][ T1] ? __sys_getsockopt (kbuild/src/x86_64-2/net/socket.c:2325) 
[ 133.951117][ T1] __x64_sys_sendto (kbuild/src/x86_64-2/net/socket.c:2150) 
[ 133.951965][ T1] do_syscall_64 (kbuild/src/x86_64-2/arch/x86/entry/common.c:50 kbuild/src/x86_64-2/arch/x86/entry/common.c:80) 
[ 133.952772][ T1] entry_SYSCALL_64_after_hwframe (kbuild/src/x86_64-2/arch/x86/entry/entry_64.S:120) 
[  133.953782][    T1] RIP: 0033:0x7fa58b613366
[ 133.954582][ T1] Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
All code
========
   0:	eb 0b                	jmp    0xd
   2:	00 f7                	add    %dh,%bh
   4:	d8 64 89 02          	fsubs  0x2(%rcx,%rcx,4)
   8:	48 c7 c0 ff ff ff ff 	mov    $0xffffffffffffffff,%rax
   f:	eb b8                	jmp    0xffffffffffffffc9
  11:	0f 1f 00             	nopl   (%rax)
  14:	41 89 ca             	mov    %ecx,%r10d
  17:	64 8b 04 25 18 00 00 	mov    %fs:0x18,%eax
  1e:	00 
  1f:	85 c0                	test   %eax,%eax
  21:	75 11                	jne    0x34
  23:	b8 2c 00 00 00       	mov    $0x2c,%eax
  28:	0f 05                	syscall 
  2a:*	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax		<-- trapping instruction
  30:	77 72                	ja     0xa4
  32:	c3                   	retq   
  33:	90                   	nop
  34:	55                   	push   %rbp
  35:	48 83 ec 30          	sub    $0x30,%rsp
  39:	44 89 4c 24 2c       	mov    %r9d,0x2c(%rsp)
  3e:	4c                   	rex.WR
  3f:	89                   	.byte 0x89

Code starting with the faulting instruction
===========================================
   0:	48 3d 00 f0 ff ff    	cmp    $0xfffffffffffff000,%rax
   6:	77 72                	ja     0x7a
   8:	c3                   	retq   
   9:	90                   	nop
   a:	55                   	push   %rbp
   b:	48 83 ec 30          	sub    $0x30,%rsp
   f:	44 89 4c 24 2c       	mov    %r9d,0x2c(%rsp)
  14:	4c                   	rex.WR
  15:	89                   	.byte 0x89
[  133.957460][    T1] RSP: 002b:00007ffe1e572498 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
[  133.958872][    T1] RAX: ffffffffffffffda RBX: 00007ffe1e57251c RCX: 00007fa58b613366
[  133.960240][    T1] RDX: 0000000000000020 RSI: 00005600c1e5bf80 RDI: 0000000000000004
[  133.961601][    T1] RBP: 00005600c1e5fd10 R08: 00007ffe1e5724a0 R09: 0000000000000010
[  133.962897][    T1] R10: 0000000000000000 R11: 0000000000000246 R12: 00005600c1e5fdf0
[  133.966272][    T1] R13: 0000000000000001 R14: 00005600c1e5c790 R15: 00005600c0f4e543
[  133.967615][    T1]  </TASK>
[  133.968228][    T1] Modules linked in: ip_tables
[  133.969143][    T1] ---[ end trace 0000000000000000 ]---
[ 133.970098][ T1] RIP: 0010:usercopy_abort (kbuild/src/x86_64-2/mm/usercopy.c:102 (discriminator 16)) 
[ 133.971164][ T1] Code: e8 dc d5 73 fe ff 74 24 08 49 89 d9 4d 89 e8 ff 74 24 08 4c 89 e1 4c 89 fa 48 89 ee 41 56 48 c7 c7 90 49 5c 83 e8 98 ea fe ff <0f> 0b e8 b0 d5 73 fe 41 0f b6 d5 4d 89 e0 48 89 e9 31 f6 48 c7 c7
All code
========
   0:	e8 dc d5 73 fe       	callq  0xfffffffffe73d5e1
   5:	ff 74 24 08          	pushq  0x8(%rsp)
   9:	49 89 d9             	mov    %rbx,%r9
   c:	4d 89 e8             	mov    %r13,%r8
   f:	ff 74 24 08          	pushq  0x8(%rsp)
  13:	4c 89 e1             	mov    %r12,%rcx
  16:	4c 89 fa             	mov    %r15,%rdx
  19:	48 89 ee             	mov    %rbp,%rsi
  1c:	41 56                	push   %r14
  1e:	48 c7 c7 90 49 5c 83 	mov    $0xffffffff835c4990,%rdi
  25:	e8 98 ea fe ff       	callq  0xfffffffffffeeac2
  2a:*	0f 0b                	ud2    		<-- trapping instruction
  2c:	e8 b0 d5 73 fe       	callq  0xfffffffffe73d5e1
  31:	41 0f b6 d5          	movzbl %r13b,%edx
  35:	4d 89 e0             	mov    %r12,%r8
  38:	48 89 e9             	mov    %rbp,%rcx
  3b:	31 f6                	xor    %esi,%esi
  3d:	48                   	rex.W
  3e:	c7                   	.byte 0xc7
  3f:	c7                   	.byte 0xc7

Code starting with the faulting instruction
===========================================
   0:	0f 0b                	ud2    
   2:	e8 b0 d5 73 fe       	callq  0xfffffffffe73d5b7
   7:	41 0f b6 d5          	movzbl %r13b,%edx
   b:	4d 89 e0             	mov    %r12,%r8
   e:	48 89 e9             	mov    %rbp,%rcx
  11:	31 f6                	xor    %esi,%esi
  13:	48                   	rex.W
  14:	c7                   	.byte 0xc7
  15:	c7                   	.byte 0xc7


To reproduce:

        # build kernel
	cd linux
	cp config-6.2.0-rc6-01338-gb9943e1e516b .config
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
	make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
	cd <mod-install-dir>
	find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz


        git clone https://github.com/intel/lkp-tests.git
        cd lkp-tests
        bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email

        # if come across any failure that blocks the test,
        # please remove ~/.lkp and /lkp dir to run from a clean state.
Eric Dumazet Feb. 8, 2023, 1:38 p.m. UTC | #2
On Wed, Feb 8, 2023 at 9:37 AM kernel test robot <oliver.sang@intel.com> wrote:
>
>
> Greeting,
>
> FYI, we noticed kernel_BUG_at_mm/usercopy.c due to commit (built with gcc-11):
>
> commit: b9943e1e516b7fd27d5163cfee1250309fb10dd3 ("[PATCH v2 net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head")
> url: https://github.com/intel-lab-lkp/linux/commits/Eric-Dumazet/net-add-SKB_HEAD_ALIGN-helper/20230207-013333
> base: https://git.kernel.org/cgit/linux/kernel/git/davem/net-next.git c21adf256f8dcfbc07436d45be4ba2edf7a6f463
> patch link: https://lore.kernel.org/all/20230206173103.2617121-5-edumazet@google.com/
> patch subject: [PATCH v2 net-next 4/4] net: add dedicated kmem_cache for typical/small skb->head
>

Thanks for the report, I will use kmem_cache_create_usercopy() instead
of kmem_cache_create()

> in testcase: boot
>
> on test machine: qemu-system-x86_64 -enable-kvm -cpu SandyBridge -smp 2 -m 16G
>
> caused below changes (please refer to attached dmesg/kmsg for entire log/backtrace):
>
>
> If you fix the issue, kindly add following tag
> | Reported-by: kernel test robot <oliver.sang@intel.com>
> | Link: https://lore.kernel.org/oe-lkp/202302081521.8e1a1948-oliver.sang@intel.com
>
>
> [  133.916379][    T1] ------------[ cut here ]------------
> [  133.917321][    T1] kernel BUG at mm/usercopy.c:102!
> [  133.918172][    T1] invalid opcode: 0000 [#1] SMP PTI
> [  133.919045][    T1] CPU: 1 PID: 1 Comm: systemd Not tainted 6.2.0-rc6-01338-gb9943e1e516b #2
> [  133.920417][    T1] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.0-debian-1.16.0-5 04/01/2014
> [ 133.921959][ T1] RIP: 0010:usercopy_abort (kbuild/src/x86_64-2/mm/usercopy.c:102 (discriminator 16))
> [ 133.922891][ T1] Code: e8 dc d5 73 fe ff 74 24 08 49 89 d9 4d 89 e8 ff 74 24 08 4c 89 e1 4c 89 fa 48 89 ee 41 56 48 c7 c7 90 49 5c 83 e8 98 ea fe ff <0f> 0b e8 b0 d5 73 fe 41 0f b6 d5 4d 89 e0 48 89 e9 31 f6 48 c7 c7
> All code
> ========
>    0:   e8 dc d5 73 fe          callq  0xfffffffffe73d5e1
>    5:   ff 74 24 08             pushq  0x8(%rsp)
>    9:   49 89 d9                mov    %rbx,%r9
>    c:   4d 89 e8                mov    %r13,%r8
>    f:   ff 74 24 08             pushq  0x8(%rsp)
>   13:   4c 89 e1                mov    %r12,%rcx
>   16:   4c 89 fa                mov    %r15,%rdx
>   19:   48 89 ee                mov    %rbp,%rsi
>   1c:   41 56                   push   %r14
>   1e:   48 c7 c7 90 49 5c 83    mov    $0xffffffff835c4990,%rdi
>   25:   e8 98 ea fe ff          callq  0xfffffffffffeeac2
>   2a:*  0f 0b                   ud2             <-- trapping instruction
>   2c:   e8 b0 d5 73 fe          callq  0xfffffffffe73d5e1
>   31:   41 0f b6 d5             movzbl %r13b,%edx
>   35:   4d 89 e0                mov    %r12,%r8
>   38:   48 89 e9                mov    %rbp,%rcx
>   3b:   31 f6                   xor    %esi,%esi
>   3d:   48                      rex.W
>   3e:   c7                      .byte 0xc7
>   3f:   c7                      .byte 0xc7
>
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   e8 b0 d5 73 fe          callq  0xfffffffffe73d5b7
>    7:   41 0f b6 d5             movzbl %r13b,%edx
>    b:   4d 89 e0                mov    %r12,%r8
>    e:   48 89 e9                mov    %rbp,%rcx
>   11:   31 f6                   xor    %esi,%esi
>   13:   48                      rex.W
>   14:   c7                      .byte 0xc7
>   15:   c7                      .byte 0xc7
> [  133.925607][    T1] RSP: 0018:ffffc90000013c00 EFLAGS: 00010286
> [  133.926544][    T1] RAX: 000000000000006a RBX: ffffffff835877b0 RCX: 0000000000000000
> [  133.927822][    T1] RDX: 0000000000000000 RSI: ffffffff811f3595 RDI: ffffffff83bd7cd8
> [  133.929183][    T1] RBP: ffffffff835407f4 R08: ffffffff850ba350 R09: 0000000000000000
> [  133.930540][    T1] R10: 0000000000000004 R11: 0001ffffffffffff R12: ffffffff8354969c
> [  133.931802][    T1] R13: ffffffff8354a96e R14: ffffffff8354a96f R15: ffffffff835429ca
> [  133.933094][    T1] FS:  00007fa58ae35900(0000) GS:ffff88842fd00000(0000) knlGS:0000000000000000
> [  133.934557][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  133.935583][    T1] CR2: 00007fa58b9aff30 CR3: 0000000100e58000 CR4: 00000000000406e0
> [  133.936852][    T1] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000
> [  133.938174][    T1] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400
> [  133.939489][    T1] Call Trace:
> [  133.940122][    T1]  <TASK>
> [ 133.940724][ T1] __check_heap_object (kbuild/src/x86_64-2/mm/slub.c:4738)
> [ 133.941613][ T1] check_heap_object (kbuild/src/x86_64-2/mm/usercopy.c:196)
> [ 133.942520][ T1] __check_object_size (kbuild/src/x86_64-2/mm/usercopy.c:113 kbuild/src/x86_64-2/mm/usercopy.c:127 kbuild/src/x86_64-2/mm/usercopy.c:254 kbuild/src/x86_64-2/mm/usercopy.c:213)
> [ 133.943386][ T1] ? skb_put (kbuild/src/x86_64-2/net/core/skbuff.c:2313)
> [ 133.944151][ T1] netlink_sendmsg (kbuild/src/x86_64-2/include/linux/uio.h:187 kbuild/src/x86_64-2/include/linux/uio.h:194 kbuild/src/x86_64-2/include/linux/skbuff.h:3977 kbuild/src/x86_64-2/net/netlink/af_netlink.c:1927)
> [ 133.944969][ T1] ? __pfx_netlink_sendmsg (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1861)
> [ 133.945824][ T1] sock_sendmsg (kbuild/src/x86_64-2/net/socket.c:722 kbuild/src/x86_64-2/net/socket.c:745)
> [ 133.946591][ T1] __sys_sendto (kbuild/src/x86_64-2/net/socket.c:2142)
> [ 133.947478][ T1] ? netlink_getsockopt (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1840)
> [ 133.948404][ T1] ? write_comp_data (kbuild/src/x86_64-2/kernel/kcov.c:236)
> [ 133.949240][ T1] ? __pfx_netlink_getsockopt (kbuild/src/x86_64-2/net/netlink/af_netlink.c:1742)
> [ 133.950214][ T1] ? __sys_getsockopt (kbuild/src/x86_64-2/net/socket.c:2325)
> [ 133.951117][ T1] __x64_sys_sendto (kbuild/src/x86_64-2/net/socket.c:2150)
> [ 133.951965][ T1] do_syscall_64 (kbuild/src/x86_64-2/arch/x86/entry/common.c:50 kbuild/src/x86_64-2/arch/x86/entry/common.c:80)
> [ 133.952772][ T1] entry_SYSCALL_64_after_hwframe (kbuild/src/x86_64-2/arch/x86/entry/entry_64.S:120)
> [  133.953782][    T1] RIP: 0033:0x7fa58b613366
> [ 133.954582][ T1] Code: eb 0b 00 f7 d8 64 89 02 48 c7 c0 ff ff ff ff eb b8 0f 1f 00 41 89 ca 64 8b 04 25 18 00 00 00 85 c0 75 11 b8 2c 00 00 00 0f 05 <48> 3d 00 f0 ff ff 77 72 c3 90 55 48 83 ec 30 44 89 4c 24 2c 4c 89
> All code
> ========
>    0:   eb 0b                   jmp    0xd
>    2:   00 f7                   add    %dh,%bh
>    4:   d8 64 89 02             fsubs  0x2(%rcx,%rcx,4)
>    8:   48 c7 c0 ff ff ff ff    mov    $0xffffffffffffffff,%rax
>    f:   eb b8                   jmp    0xffffffffffffffc9
>   11:   0f 1f 00                nopl   (%rax)
>   14:   41 89 ca                mov    %ecx,%r10d
>   17:   64 8b 04 25 18 00 00    mov    %fs:0x18,%eax
>   1e:   00
>   1f:   85 c0                   test   %eax,%eax
>   21:   75 11                   jne    0x34
>   23:   b8 2c 00 00 00          mov    $0x2c,%eax
>   28:   0f 05                   syscall
>   2a:*  48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax         <-- trapping instruction
>   30:   77 72                   ja     0xa4
>   32:   c3                      retq
>   33:   90                      nop
>   34:   55                      push   %rbp
>   35:   48 83 ec 30             sub    $0x30,%rsp
>   39:   44 89 4c 24 2c          mov    %r9d,0x2c(%rsp)
>   3e:   4c                      rex.WR
>   3f:   89                      .byte 0x89
>
> Code starting with the faulting instruction
> ===========================================
>    0:   48 3d 00 f0 ff ff       cmp    $0xfffffffffffff000,%rax
>    6:   77 72                   ja     0x7a
>    8:   c3                      retq
>    9:   90                      nop
>    a:   55                      push   %rbp
>    b:   48 83 ec 30             sub    $0x30,%rsp
>    f:   44 89 4c 24 2c          mov    %r9d,0x2c(%rsp)
>   14:   4c                      rex.WR
>   15:   89                      .byte 0x89
> [  133.957460][    T1] RSP: 002b:00007ffe1e572498 EFLAGS: 00000246 ORIG_RAX: 000000000000002c
> [  133.958872][    T1] RAX: ffffffffffffffda RBX: 00007ffe1e57251c RCX: 00007fa58b613366
> [  133.960240][    T1] RDX: 0000000000000020 RSI: 00005600c1e5bf80 RDI: 0000000000000004
> [  133.961601][    T1] RBP: 00005600c1e5fd10 R08: 00007ffe1e5724a0 R09: 0000000000000010
> [  133.962897][    T1] R10: 0000000000000000 R11: 0000000000000246 R12: 00005600c1e5fdf0
> [  133.966272][    T1] R13: 0000000000000001 R14: 00005600c1e5c790 R15: 00005600c0f4e543
> [  133.967615][    T1]  </TASK>
> [  133.968228][    T1] Modules linked in: ip_tables
> [  133.969143][    T1] ---[ end trace 0000000000000000 ]---
> [ 133.970098][ T1] RIP: 0010:usercopy_abort (kbuild/src/x86_64-2/mm/usercopy.c:102 (discriminator 16))
> [ 133.971164][ T1] Code: e8 dc d5 73 fe ff 74 24 08 49 89 d9 4d 89 e8 ff 74 24 08 4c 89 e1 4c 89 fa 48 89 ee 41 56 48 c7 c7 90 49 5c 83 e8 98 ea fe ff <0f> 0b e8 b0 d5 73 fe 41 0f b6 d5 4d 89 e0 48 89 e9 31 f6 48 c7 c7
> All code
> ========
>    0:   e8 dc d5 73 fe          callq  0xfffffffffe73d5e1
>    5:   ff 74 24 08             pushq  0x8(%rsp)
>    9:   49 89 d9                mov    %rbx,%r9
>    c:   4d 89 e8                mov    %r13,%r8
>    f:   ff 74 24 08             pushq  0x8(%rsp)
>   13:   4c 89 e1                mov    %r12,%rcx
>   16:   4c 89 fa                mov    %r15,%rdx
>   19:   48 89 ee                mov    %rbp,%rsi
>   1c:   41 56                   push   %r14
>   1e:   48 c7 c7 90 49 5c 83    mov    $0xffffffff835c4990,%rdi
>   25:   e8 98 ea fe ff          callq  0xfffffffffffeeac2
>   2a:*  0f 0b                   ud2             <-- trapping instruction
>   2c:   e8 b0 d5 73 fe          callq  0xfffffffffe73d5e1
>   31:   41 0f b6 d5             movzbl %r13b,%edx
>   35:   4d 89 e0                mov    %r12,%r8
>   38:   48 89 e9                mov    %rbp,%rcx
>   3b:   31 f6                   xor    %esi,%esi
>   3d:   48                      rex.W
>   3e:   c7                      .byte 0xc7
>   3f:   c7                      .byte 0xc7
>
> Code starting with the faulting instruction
> ===========================================
>    0:   0f 0b                   ud2
>    2:   e8 b0 d5 73 fe          callq  0xfffffffffe73d5b7
>    7:   41 0f b6 d5             movzbl %r13b,%edx
>    b:   4d 89 e0                mov    %r12,%r8
>    e:   48 89 e9                mov    %rbp,%rcx
>   11:   31 f6                   xor    %esi,%esi
>   13:   48                      rex.W
>   14:   c7                      .byte 0xc7
>   15:   c7                      .byte 0xc7
>
>
> To reproduce:
>
>         # build kernel
>         cd linux
>         cp config-6.2.0-rc6-01338-gb9943e1e516b .config
>         make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 olddefconfig prepare modules_prepare bzImage modules
>         make HOSTCC=gcc-11 CC=gcc-11 ARCH=x86_64 INSTALL_MOD_PATH=<mod-install-dir> modules_install
>         cd <mod-install-dir>
>         find lib/ | cpio -o -H newc --quiet | gzip > modules.cgz
>
>
>         git clone https://github.com/intel/lkp-tests.git
>         cd lkp-tests
>         bin/lkp qemu -k <bzImage> -m modules.cgz job-script # job-script is attached in this email
>
>         # if come across any failure that blocks the test,
>         # please remove ~/.lkp and /lkp dir to run from a clean state.
>
>
>
> --
> 0-DAY CI Kernel Test Service
> https://github.com/intel/lkp-tests
>
>
diff mbox series

Patch

diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index c1232837cd0cb3befce0262fb8fda20272a26d45..bdb1e015e32b9386139e9ad73acd6efb3c357118 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -89,6 +89,34 @@  static struct kmem_cache *skbuff_fclone_cache __ro_after_init;
 #ifdef CONFIG_SKB_EXTENSIONS
 static struct kmem_cache *skbuff_ext_cache __ro_after_init;
 #endif
+
+/* skb_small_head_cache and related code is only supported
+ * for CONFIG_SLAB and CONFIG_SLUB.
+ * As soon as SLOB is removed from the kernel, we can clean up this.
+ */
+#if !defined(CONFIG_SLOB)
+# define HAVE_SKB_SMALL_HEAD_CACHE 1
+#endif
+
+#ifdef HAVE_SKB_SMALL_HEAD_CACHE
+static struct kmem_cache *skb_small_head_cache __ro_after_init;
+
+#define SKB_SMALL_HEAD_SIZE SKB_HEAD_ALIGN(MAX_TCP_HEADER)
+
+/* We want SKB_SMALL_HEAD_CACHE_SIZE to not be a power of two.
+ * This should ensure that SKB_SMALL_HEAD_HEADROOM is a unique
+ * size, and we can differentiate heads from skb_small_head_cache
+ * vs system slabs by looking at their size (skb_end_offset()).
+ */
+#define SKB_SMALL_HEAD_CACHE_SIZE					\
+	(is_power_of_2(SKB_SMALL_HEAD_SIZE) ?			\
+		(SKB_SMALL_HEAD_SIZE + L1_CACHE_BYTES) :	\
+		SKB_SMALL_HEAD_SIZE)
+
+#define SKB_SMALL_HEAD_HEADROOM						\
+	SKB_WITH_OVERHEAD(SKB_SMALL_HEAD_CACHE_SIZE)
+#endif /* HAVE_SKB_SMALL_HEAD_CACHE */
+
 int sysctl_max_skb_frags __read_mostly = MAX_SKB_FRAGS;
 EXPORT_SYMBOL(sysctl_max_skb_frags);
 
@@ -486,6 +514,23 @@  static void *kmalloc_reserve(unsigned int *size, gfp_t flags, int node,
 	void *obj;
 
 	obj_size = SKB_HEAD_ALIGN(*size);
+#ifdef HAVE_SKB_SMALL_HEAD_CACHE
+	if (obj_size <= SKB_SMALL_HEAD_CACHE_SIZE &&
+	    !(flags & KMALLOC_NOT_NORMAL_BITS)) {
+
+		/* skb_small_head_cache has non power of two size,
+		 * likely forcing SLUB to use order-3 pages.
+		 * We deliberately attempt a NOMEMALLOC allocation only.
+		 */
+		obj = kmem_cache_alloc_node(skb_small_head_cache,
+				flags | __GFP_NOMEMALLOC | __GFP_NOWARN,
+				node);
+		if (obj) {
+			*size = SKB_SMALL_HEAD_CACHE_SIZE;
+			goto out;
+		}
+	}
+#endif
 	*size = obj_size = kmalloc_size_roundup(obj_size);
 	/*
 	 * Try a regular allocation, when that fails and we're not entitled
@@ -805,6 +850,16 @@  static bool skb_pp_recycle(struct sk_buff *skb, void *data)
 	return page_pool_return_skb_page(virt_to_page(data));
 }
 
+static void skb_kfree_head(void *head, unsigned int end_offset)
+{
+#ifdef HAVE_SKB_SMALL_HEAD_CACHE
+	if (end_offset == SKB_SMALL_HEAD_HEADROOM)
+		kmem_cache_free(skb_small_head_cache, head);
+	else
+#endif
+		kfree(head);
+}
+
 static void skb_free_head(struct sk_buff *skb)
 {
 	unsigned char *head = skb->head;
@@ -814,7 +869,7 @@  static void skb_free_head(struct sk_buff *skb)
 			return;
 		skb_free_frag(head);
 	} else {
-		kfree(head);
+		skb_kfree_head(head, skb_end_offset(skb));
 	}
 }
 
@@ -1997,7 +2052,7 @@  int pskb_expand_head(struct sk_buff *skb, int nhead, int ntail,
 	return 0;
 
 nofrags:
-	kfree(data);
+	skb_kfree_head(data, size);
 nodata:
 	return -ENOMEM;
 }
@@ -4634,6 +4689,13 @@  void __init skb_init(void)
 						0,
 						SLAB_HWCACHE_ALIGN|SLAB_PANIC,
 						NULL);
+#ifdef HAVE_SKB_SMALL_HEAD_CACHE
+	skb_small_head_cache = kmem_cache_create("skbuff_small_head",
+						SKB_SMALL_HEAD_CACHE_SIZE,
+						0,
+						SLAB_HWCACHE_ALIGN | SLAB_PANIC,
+						NULL);
+#endif
 	skb_extensions_init();
 }
 
@@ -6298,7 +6360,7 @@  static int pskb_carve_inside_header(struct sk_buff *skb, const u32 off,
 	if (skb_cloned(skb)) {
 		/* drop the old head gracefully */
 		if (skb_orphan_frags(skb, gfp_mask)) {
-			kfree(data);
+			skb_kfree_head(data, size);
 			return -ENOMEM;
 		}
 		for (i = 0; i < skb_shinfo(skb)->nr_frags; i++)
@@ -6406,7 +6468,7 @@  static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 	memcpy((struct skb_shared_info *)(data + size),
 	       skb_shinfo(skb), offsetof(struct skb_shared_info, frags[0]));
 	if (skb_orphan_frags(skb, gfp_mask)) {
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	shinfo = (struct skb_shared_info *)(data + size);
@@ -6442,7 +6504,7 @@  static int pskb_carve_inside_nonlinear(struct sk_buff *skb, const u32 off,
 		/* skb_frag_unref() is not needed here as shinfo->nr_frags = 0. */
 		if (skb_has_frag_list(skb))
 			kfree_skb_list(skb_shinfo(skb)->frag_list);
-		kfree(data);
+		skb_kfree_head(data, size);
 		return -ENOMEM;
 	}
 	skb_release_data(skb, SKB_CONSUMED);