diff mbox series

[v2,1/2] bpf-next: Introduced to support the ULP to get or set sockets

Message ID 20250210134550.3189616-2-zhangmingyi5@huawei.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series bpf-next: Introduced to support the ULP to get or set sockets | expand

Checks

Context Check Description
netdev/series_format warning Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/build_tools success Errors and warnings before: 26 (+1) this patch: 26 (+1)
netdev/cc_maintainers warning 14 maintainers not CCed: mptcp@lists.linux.dev edumazet@google.com dsahern@kernel.org yonghong.song@linux.dev horms@kernel.org ncardwell@google.com kuba@kernel.org geliang@kernel.org matttbe@kernel.org eddyz87@gmail.com netdev@vger.kernel.org kuniyu@amazon.com martineau@kernel.org pabeni@redhat.com
netdev/build_clang success Errors and warnings before: 602 this patch: 602
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 912 this patch: 912
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 81 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 8 this patch: 8
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-PR fail PR summary
bpf/vmtest-bpf-next-VM_Test-4 success Logs for aarch64-gcc / GCC BPF
bpf/vmtest-bpf-next-VM_Test-6 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-5 fail Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for aarch64-gcc / test
bpf/vmtest-bpf-next-VM_Test-8 success Logs for aarch64-gcc / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-9 success Logs for aarch64-gcc / veristat-meta
bpf/vmtest-bpf-next-VM_Test-10 success Logs for s390x-gcc / GCC BPF
bpf/vmtest-bpf-next-VM_Test-11 fail Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-12 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-13 success Logs for s390x-gcc / test
bpf/vmtest-bpf-next-VM_Test-14 success Logs for s390x-gcc / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-15 success Logs for s390x-gcc / veristat-meta
bpf/vmtest-bpf-next-VM_Test-16 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-gcc / GCC BPF
bpf/vmtest-bpf-next-VM_Test-18 fail Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-19 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-20 success Logs for x86_64-gcc / test
bpf/vmtest-bpf-next-VM_Test-21 success Logs for x86_64-gcc / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-22 success Logs for x86_64-gcc / veristat-meta
bpf/vmtest-bpf-next-VM_Test-23 success Logs for x86_64-llvm-17 / GCC BPF
bpf/vmtest-bpf-next-VM_Test-24 fail Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-25 fail Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17-O2
bpf/vmtest-bpf-next-VM_Test-26 success Logs for x86_64-llvm-17 / test
bpf/vmtest-bpf-next-VM_Test-27 success Logs for x86_64-llvm-17 / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-28 success Logs for x86_64-llvm-17 / veristat-meta
bpf/vmtest-bpf-next-VM_Test-29 success Logs for x86_64-llvm-18 / GCC BPF
bpf/vmtest-bpf-next-VM_Test-30 fail Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-31 fail Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18-O2
bpf/vmtest-bpf-next-VM_Test-32 success Logs for x86_64-llvm-18 / test
bpf/vmtest-bpf-next-VM_Test-33 success Logs for x86_64-llvm-18 / veristat-kernel
bpf/vmtest-bpf-next-VM_Test-34 success Logs for x86_64-llvm-18 / veristat-meta

Commit Message

zhangmingyi Feb. 10, 2025, 1:45 p.m. UTC
From: Mingyi Zhang <zhangmingyi5@huawei.com>

Note that tcp_getsockopt and tcp_setsockopt support TCP_ULP, while
bpf_getsockopt and bpf_setsockopt do not support TCP_ULP.
I think we can add the handling of this case.

We want call bpf_setsockopt to replace the kernel module in the TCP_ULP
case. The purpose is to customize the behavior in connect and sendmsg.
We have an open source community project kmesh (kmesh.net). Based on
this, we refer to some processes of tcp fastopen to implement delayed
connet and perform HTTP DNAT when sendmsg.In this case, we need to parse
HTTP packets in the bpf program and set TCP_ULP for the specified socket.

Signed-off-by: Mingyi Zhang <zhangmingyi5@huawei.com>
Signed-off-by: Xin Liu <liuxin350@huawei.com>
---
 include/net/tcp.h   |  2 +-
 net/core/filter.c   |  1 +
 net/ipv4/tcp.c      |  2 +-
 net/ipv4/tcp_ulp.c  | 28 +++++++++++++++-------------
 net/mptcp/subflow.c |  2 +-
 5 files changed, 19 insertions(+), 16 deletions(-)

Comments

Martin KaFai Lau Feb. 13, 2025, 11:17 p.m. UTC | #1
On 2/10/25 5:45 AM, zhangmingyi wrote:
> From: Mingyi Zhang <zhangmingyi5@huawei.com>
> 
> Note that tcp_getsockopt and tcp_setsockopt support TCP_ULP, while
> bpf_getsockopt and bpf_setsockopt do not support TCP_ULP.
> I think we can add the handling of this case.

The commit message should talk about the "bool load" related changes in v2 and 
why it is needed.

The subject line is confusing. How about,
"Support TCP_ULP in bpf_get/setsockopt"

The code changes lgtm.

> 
> We want call bpf_setsockopt to replace the kernel module in the TCP_ULP
> case. The purpose is to customize the behavior in connect and sendmsg.
> We have an open source community project kmesh (kmesh.net). Based on
> this, we refer to some processes of tcp fastopen to implement delayed
> connet and perform HTTP DNAT when sendmsg.In this case, we need to parse
> HTTP packets in the bpf program and set TCP_ULP for the specified socket.
Oliver Sang Feb. 14, 2025, 2:13 a.m. UTC | #2
Hello,

kernel test robot noticed "BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c" on:

commit: 8f510de3f26b2fabaf47eacd59053469e9c32754 ("[PATCH v2 1/2] bpf-next: Introduced to support the ULP to get or set sockets")
url: https://github.com/intel-lab-lkp/linux/commits/zhangmingyi/bpf-next-Introduced-to-support-the-ULP-to-get-or-set-sockets/20250210-215203
base: https://git.kernel.org/cgit/linux/kernel/git/bpf/bpf-next.git master
patch link: https://lore.kernel.org/all/20250210134550.3189616-2-zhangmingyi5@huawei.com/
patch subject: [PATCH v2 1/2] bpf-next: Introduced to support the ULP to get or set sockets

in testcase: trinity
version: trinity-i386-abe9de86-1_20230429
with following parameters:

	runtime: 300s
	group: group-03
	nr_groups: 5



config: i386-randconfig-054-20250212
compiler: gcc-12
test machine: qemu-system-i386 -enable-kvm -cpu SandyBridge -smp 2 -m 4G

(please refer to attached dmesg/kmsg for entire log/backtrace)


+-----------------------------------------------------------------------------+------------+------------+
|                                                                             | 9b6cdaf2ac | 8f510de3f2 |
+-----------------------------------------------------------------------------+------------+------------+
| BUG:sleeping_function_called_from_invalid_context_at_kernel/locking/mutex.c | 0          | 6          |
+-----------------------------------------------------------------------------+------------+------------+


If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202502140959.f66e2ba6-lkp@intel.com


[   71.099773][ T3759] BUG: sleeping function called from invalid context at kernel/locking/mutex.c:562
[   71.101798][ T3759] in_atomic(): 0, irqs_disabled(): 0, non_block: 0, pid: 3759, name: trinity-c4
[   71.103659][ T3759] preempt_count: 0, expected: 0
[   71.104658][ T3759] RCU nest depth: 1, expected: 0
[   71.105669][ T3759] 2 locks held by trinity-c4/3759:
[ 71.106777][ T3759] #0: ecffcd80 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock (include/net/sock.h:1625) 
[ 71.108460][ T3759] #1: c3500498 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire (include/linux/rcupdate.h:336) 
[   71.110397][ T3759] CPU: 1 UID: 65534 PID: 3759 Comm: trinity-c4 Tainted: G                T  6.14.0-rc1-00030-g8f510de3f26b #1 8ad64aae41fa4cb8babad52c8f50e0a7d5e34569
[   71.110406][ T3759] Tainted: [T]=RANDSTRUCT
[   71.110407][ T3759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   71.110410][ T3759] Call Trace:
[ 71.110416][ T3759] dump_stack_lvl (lib/dump_stack.c:123) 
[ 71.110423][ T3759] dump_stack (lib/dump_stack.c:130) 
[ 71.110428][ T3759] __might_resched (kernel/sched/core.c:8767) 
[ 71.110440][ T3759] __might_sleep (kernel/sched/core.c:8696 (discriminator 17)) 
[ 71.110446][ T3759] __mutex_lock (include/linux/kernel.h:73 kernel/locking/mutex.c:562 kernel/locking/mutex.c:730) 
[ 71.110452][ T3759] ? rcu_read_unlock (include/linux/rcupdate.h:335) 
[ 71.110462][ T3759] ? mark_held_locks (kernel/locking/lockdep.c:4323) 
[ 71.110470][ T3759] ? lock_sock_nested (net/core/sock.c:3653) 
[ 71.110481][ T3759] mutex_lock_nested (kernel/locking/mutex.c:783) 
[ 71.110486][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.110494][ T3759] tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.110505][ T3759] tcp_set_ulp (net/ipv4/tcp_ulp.c:140 net/ipv4/tcp_ulp.c:166) 
[ 71.110513][ T3759] do_tcp_setsockopt (net/ipv4/tcp.c:3747) 
[ 71.110534][ T3759] tcp_setsockopt (net/ipv4/tcp.c:4032) 
[ 71.110542][ T3759] ? sock_common_recvmsg (net/core/sock.c:3833) 
[ 71.110548][ T3759] sock_common_setsockopt (net/core/sock.c:3838) 
[ 71.110561][ T3759] do_sock_setsockopt (net/socket.c:2298) 
[ 71.110577][ T3759] __sys_setsockopt (net/socket.c:2323) 
[ 71.110592][ T3759] __ia32_sys_setsockopt (net/socket.c:2326) 
[ 71.110599][ T3759] ia32_sys_call (kbuild/obj/consumer/i386-randconfig-054-20250212/./arch/x86/include/generated/asm/syscalls_32.h:367) 
[ 71.110607][ T3759] do_int80_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:339) 
[ 71.110616][ T3759] entry_INT80_32 (arch/x86/entry/entry_32.S:942) 
[   71.110621][ T3759] EIP: 0xb4014092
[ 71.110626][ T3759] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 f8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 e9                	add    %ch,%cl
   4:	90                   	nop
   5:	ff                   	(bad)
   6:	ff                   	(bad)
   7:	ff                   	(bad)
   8:	ff a3 24 00 00 00    	jmp    *0x24(%rbx)
   e:	68 30 00 00 00       	push   $0x30
  13:	e9 80 ff ff ff       	jmp    0xffffffffffffff98
  18:	ff a3 f8 ff ff ff    	jmp    *-0x8(%rbx)
  1e:	66 90                	xchg   %ax,%ax
	...
  28:	cd 80                	int    $0x80
  2a:*	c3                   	ret		<-- trapping instruction
  2b:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  32:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
  38:	8b 1c 24             	mov    (%rsp),%ebx
  3b:	c3                   	ret
  3c:	8d                   	.byte 0x8d
  3d:	b4 26                	mov    $0x26,%ah
	...

Code starting with the faulting instruction
===========================================
   0:	c3                   	ret
   1:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   8:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
   e:	8b 1c 24             	mov    (%rsp),%ebx
  11:	c3                   	ret
  12:	8d                   	.byte 0x8d
  13:	b4 26                	mov    $0x26,%ah
	...
[   71.110630][ T3759] EAX: ffffffda EBX: 00000134 ECX: 00000006 EDX: 0000001f
[   71.110634][ T3759] ESI: 08fee650 EDI: 00000004 EBP: 000012cf ESP: bfc1c538
[   71.110638][ T3759] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296
[   71.182507][ T3759]
[   71.182999][ T3759] =============================
[   71.183907][ T3759] [ BUG: Invalid wait context ]
[   71.184819][ T3759] 6.14.0-rc1-00030-g8f510de3f26b #1 Tainted: G        W       T
[   71.186327][ T3759] -----------------------------
[   71.187265][ T3759] trinity-c4/3759 is trying to lock:
[ 71.188287][ T3759] c37b35e0 (tcpv4_prot_mutex){....}-{4:4}, at: tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[   71.189847][ T3759] other info that might help us debug this:
[   71.191018][ T3759] context-{5:5}
[   71.191678][ T3759] 2 locks held by trinity-c4/3759:
[ 71.192635][ T3759] #0: ecffcd80 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock (include/net/sock.h:1625) 
[ 71.194220][ T3759] #1: c3500498 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire (include/linux/rcupdate.h:336) 
[   71.196078][ T3759] stack backtrace:
[   71.196797][ T3759] CPU: 0 UID: 65534 PID: 3759 Comm: trinity-c4 Tainted: G        W       T  6.14.0-rc1-00030-g8f510de3f26b #1 8ad64aae41fa4cb8babad52c8f50e0a7d5e34569
[   71.196807][ T3759] Tainted: [W]=WARN, [T]=RANDSTRUCT
[   71.196809][ T3759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
[   71.196812][ T3759] Call Trace:
[ 71.196818][ T3759] dump_stack_lvl (lib/dump_stack.c:123) 
[ 71.196825][ T3759] dump_stack (lib/dump_stack.c:130) 
[ 71.196830][ T3759] __lock_acquire (kernel/locking/lockdep.c:4830 kernel/locking/lockdep.c:4900 kernel/locking/lockdep.c:5178) 
[ 71.196840][ T3759] lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853) 
[ 71.196846][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.196856][ T3759] ? __schedule (kernel/sched/core.c:5380) 
[ 71.196866][ T3759] __mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:730) 
[ 71.196872][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.196878][ T3759] ? rcu_read_unlock (include/linux/rcupdate.h:335) 
[ 71.196885][ T3759] ? mark_held_locks (kernel/locking/lockdep.c:4323) 
[ 71.196889][ T3759] ? lock_sock_nested (net/core/sock.c:3653) 
[ 71.196898][ T3759] mutex_lock_nested (kernel/locking/mutex.c:783) 
[ 71.196904][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.196909][ T3759] tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993) 
[ 71.196916][ T3759] tcp_set_ulp (net/ipv4/tcp_ulp.c:140 net/ipv4/tcp_ulp.c:166) 
[ 71.196923][ T3759] do_tcp_setsockopt (net/ipv4/tcp.c:3747) 
[ 71.196934][ T3759] tcp_setsockopt (net/ipv4/tcp.c:4032) 
[ 71.196939][ T3759] ? sock_common_recvmsg (net/core/sock.c:3833) 
[ 71.196946][ T3759] sock_common_setsockopt (net/core/sock.c:3838) 
[ 71.196952][ T3759] do_sock_setsockopt (net/socket.c:2298) 
[ 71.196961][ T3759] __sys_setsockopt (net/socket.c:2323) 
[ 71.196967][ T3759] __ia32_sys_setsockopt (net/socket.c:2326) 
[ 71.196972][ T3759] ia32_sys_call (kbuild/obj/consumer/i386-randconfig-054-20250212/./arch/x86/include/generated/asm/syscalls_32.h:367) 
[ 71.196979][ T3759] do_int80_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:339) 
[ 71.196985][ T3759] entry_INT80_32 (arch/x86/entry/entry_32.S:942) 
[   71.196990][ T3759] EIP: 0xb4014092
[ 71.196995][ T3759] Code: 00 00 00 e9 90 ff ff ff ff a3 24 00 00 00 68 30 00 00 00 e9 80 ff ff ff ff a3 f8 ff ff ff 66 90 00 00 00 00 00 00 00 00 cd 80 <c3> 8d b4 26 00 00 00 00 8d b6 00 00 00 00 8b 1c 24 c3 8d b4 26 00
All code
========
   0:	00 00                	add    %al,(%rax)
   2:	00 e9                	add    %ch,%cl
   4:	90                   	nop
   5:	ff                   	(bad)
   6:	ff                   	(bad)
   7:	ff                   	(bad)
   8:	ff a3 24 00 00 00    	jmp    *0x24(%rbx)
   e:	68 30 00 00 00       	push   $0x30
  13:	e9 80 ff ff ff       	jmp    0xffffffffffffff98
  18:	ff a3 f8 ff ff ff    	jmp    *-0x8(%rbx)
  1e:	66 90                	xchg   %ax,%ax
	...
  28:	cd 80                	int    $0x80
  2a:*	c3                   	ret		<-- trapping instruction
  2b:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
  32:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
  38:	8b 1c 24             	mov    (%rsp),%ebx
  3b:	c3                   	ret
  3c:	8d                   	.byte 0x8d
  3d:	b4 26                	mov    $0x26,%ah
	...

Code starting with the faulting instruction
===========================================
   0:	c3                   	ret
   1:	8d b4 26 00 00 00 00 	lea    0x0(%rsi,%riz,1),%esi
   8:	8d b6 00 00 00 00    	lea    0x0(%rsi),%esi
   e:	8b 1c 24             	mov    (%rsp),%ebx
  11:	c3                   	ret
  12:	8d                   	.byte 0x8d
  13:	b4 26                	mov    $0x26,%ah
	...
[   71.196999][ T3759] EAX: ffffffda EBX: 00000134 ECX: 00000006 EDX: 0000001f
[   71.197004][ T3759] ESI: 08fee650 EDI: 00000004 EBP: 000012cf ESP: bfc1c538
[   71.197008][ T3759] DS: 007b ES: 007b FS: 0000 GS: 0033 SS: 007b EFLAGS: 00000296



The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20250214/202502140959.f66e2ba6-lkp@intel.com
Martin KaFai Lau Feb. 14, 2025, 6:23 a.m. UTC | #3
On 2/13/25 6:13 PM, kernel test robot wrote:
> [   71.182999][ T3759] =============================
> [   71.183907][ T3759] [ BUG: Invalid wait context ]
> [   71.184819][ T3759] 6.14.0-rc1-00030-g8f510de3f26b #1 Tainted: G        W       T
> [   71.186327][ T3759] -----------------------------
> [   71.187265][ T3759] trinity-c4/3759 is trying to lock:
> [ 71.188287][ T3759] c37b35e0 (tcpv4_prot_mutex){....}-{4:4}, at: tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> [   71.189847][ T3759] other info that might help us debug this:
> [   71.191018][ T3759] context-{5:5}
> [   71.191678][ T3759] 2 locks held by trinity-c4/3759:
> [ 71.192635][ T3759] #0: ecffcd80 (sk_lock-AF_INET){+.+.}-{0:0}, at: lock_sock (include/net/sock.h:1625)
> [ 71.194220][ T3759] #1: c3500498 (rcu_read_lock){....}-{1:3}, at: rcu_lock_acquire (include/linux/rcupdate.h:336)
> [   71.196078][ T3759] stack backtrace:
> [   71.196797][ T3759] CPU: 0 UID: 65534 PID: 3759 Comm: trinity-c4 Tainted: G        W       T  6.14.0-rc1-00030-g8f510de3f26b #1 8ad64aae41fa4cb8babad52c8f50e0a7d5e34569
> [   71.196807][ T3759] Tainted: [W]=WARN, [T]=RANDSTRUCT
> [   71.196809][ T3759] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.2-debian-1.16.2-1 04/01/2014
> [   71.196812][ T3759] Call Trace:
> [ 71.196818][ T3759] dump_stack_lvl (lib/dump_stack.c:123)
> [ 71.196825][ T3759] dump_stack (lib/dump_stack.c:130)
> [ 71.196830][ T3759] __lock_acquire (kernel/locking/lockdep.c:4830 kernel/locking/lockdep.c:4900 kernel/locking/lockdep.c:5178)
> [ 71.196840][ T3759] lock_acquire (kernel/locking/lockdep.c:469 kernel/locking/lockdep.c:5853)
> [ 71.196846][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> [ 71.196856][ T3759] ? __schedule (kernel/sched/core.c:5380)
> [ 71.196866][ T3759] __mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:730)
> [ 71.196872][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> [ 71.196878][ T3759] ? rcu_read_unlock (include/linux/rcupdate.h:335)
> [ 71.196885][ T3759] ? mark_held_locks (kernel/locking/lockdep.c:4323)
> [ 71.196889][ T3759] ? lock_sock_nested (net/core/sock.c:3653)
> [ 71.196898][ T3759] mutex_lock_nested (kernel/locking/mutex.c:783)

This is probably because __tcp_set_ulp is now under the rcu_read_lock() in patch 1.

Even fixing patch 1 will not be enough. The bpf cgrp prog (e.g. sockops) cannot 
sleep now, so it still cannot call bpf_setsockopt(TCP_ULP, "tls") which will 
take a mutex. This is a blocker :(

> [ 71.196904][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> [ 71.196909][ T3759] tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> [ 71.196916][ T3759] tcp_set_ulp (net/ipv4/tcp_ulp.c:140 net/ipv4/tcp_ulp.c:166)
> [ 71.196923][ T3759] do_tcp_setsockopt (net/ipv4/tcp.c:3747)
> [ 71.196934][ T3759] tcp_setsockopt (net/ipv4/tcp.c:4032)
> [ 71.196939][ T3759] ? sock_common_recvmsg (net/core/sock.c:3833)
> [ 71.196946][ T3759] sock_common_setsockopt (net/core/sock.c:3838)
> [ 71.196952][ T3759] do_sock_setsockopt (net/socket.c:2298)
> [ 71.196961][ T3759] __sys_setsockopt (net/socket.c:2323)
> [ 71.196967][ T3759] __ia32_sys_setsockopt (net/socket.c:2326)
> [ 71.196972][ T3759] ia32_sys_call (kbuild/obj/consumer/i386-randconfig-054-20250212/./arch/x86/include/generated/asm/syscalls_32.h:367)
> [ 71.196979][ T3759] do_int80_syscall_32 (arch/x86/entry/common.c:165 arch/x86/entry/common.c:339)
> [ 71.196985][ T3759] entry_INT80_32 (arch/x86/entry/entry_32.S:942)
Jakub Kicinski Feb. 14, 2025, 9:20 p.m. UTC | #4
On Thu, 13 Feb 2025 22:23:39 -0800 Martin KaFai Lau wrote:
> On 2/13/25 6:13 PM, kernel test robot wrote:
> > [ 71.196846][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> > [ 71.196856][ T3759] ? __schedule (kernel/sched/core.c:5380)
> > [ 71.196866][ T3759] __mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:730)
> > [ 71.196872][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
> > [ 71.196878][ T3759] ? rcu_read_unlock (include/linux/rcupdate.h:335)
> > [ 71.196885][ T3759] ? mark_held_locks (kernel/locking/lockdep.c:4323)
> > [ 71.196889][ T3759] ? lock_sock_nested (net/core/sock.c:3653)
> > [ 71.196898][ T3759] mutex_lock_nested (kernel/locking/mutex.c:783)  
> 
> This is probably because __tcp_set_ulp is now under the rcu_read_lock() in patch 1.
> 
> Even fixing patch 1 will not be enough. The bpf cgrp prog (e.g. sockops) cannot 
> sleep now, so it still cannot call bpf_setsockopt(TCP_ULP, "tls") which will 
> take a mutex. This is a blocker :(

Oh, kbuild bot was nice enough to CC netdev, it wasn't CCed on 
the submission.

I'd really rather we didn't allow setting ULP from BPF unless there 
is a strong and clear use case. The ULP configuration and stacking
is a source of many bugs. And the use case here AFAIU is to allow
attaching some ULP from an OOT module to a socket, which I think
won't make core BPF folks happy either, right?
Martin KaFai Lau Feb. 14, 2025, 10:11 p.m. UTC | #5
On 2/14/25 1:20 PM, Jakub Kicinski wrote:
> On Thu, 13 Feb 2025 22:23:39 -0800 Martin KaFai Lau wrote:
>> On 2/13/25 6:13 PM, kernel test robot wrote:
>>> [ 71.196846][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
>>> [ 71.196856][ T3759] ? __schedule (kernel/sched/core.c:5380)
>>> [ 71.196866][ T3759] __mutex_lock (kernel/locking/mutex.c:587 kernel/locking/mutex.c:730)
>>> [ 71.196872][ T3759] ? tls_init (net/tls/tls_main.c:934 net/tls/tls_main.c:993)
>>> [ 71.196878][ T3759] ? rcu_read_unlock (include/linux/rcupdate.h:335)
>>> [ 71.196885][ T3759] ? mark_held_locks (kernel/locking/lockdep.c:4323)
>>> [ 71.196889][ T3759] ? lock_sock_nested (net/core/sock.c:3653)
>>> [ 71.196898][ T3759] mutex_lock_nested (kernel/locking/mutex.c:783)
>>
>> This is probably because __tcp_set_ulp is now under the rcu_read_lock() in patch 1.
>>
>> Even fixing patch 1 will not be enough. The bpf cgrp prog (e.g. sockops) cannot
>> sleep now, so it still cannot call bpf_setsockopt(TCP_ULP, "tls") which will
>> take a mutex. This is a blocker :(
> 
> Oh, kbuild bot was nice enough to CC netdev, it wasn't CCed on
> the submission.

Ah. I also didn't notice netdev was not cc-ed. will pay attention in the future.

> 
> I'd really rather we didn't allow setting ULP from BPF unless there
> is a strong and clear use case. The ULP configuration and stacking
> is a source of many bugs. And the use case here AFAIU is to allow
> attaching some ULP from an OOT module to a socket, which I think
> won't make core BPF folks happy either, right?

If the in-tree ulp does not work, there is little reason to do it for the 
out-of-tree module only.

My question on the ulp use case went to silence in v1, so we can assume it is 
out-of-tree ulp only. I also asked to replace the "smc" ulp testing with a more 
real "tls" ulp testing to see how it goes first. It does not work as the bot 
reported it.
diff mbox series

Patch

diff --git a/include/net/tcp.h b/include/net/tcp.h
index e9b37b76e894..f26e92099b86 100644
--- a/include/net/tcp.h
+++ b/include/net/tcp.h
@@ -2582,7 +2582,7 @@  struct tcp_ulp_ops {
 };
 int tcp_register_ulp(struct tcp_ulp_ops *type);
 void tcp_unregister_ulp(struct tcp_ulp_ops *type);
-int tcp_set_ulp(struct sock *sk, const char *name);
+int tcp_set_ulp(struct sock *sk, const char *name, bool load);
 void tcp_get_available_ulp(char *buf, size_t len);
 void tcp_cleanup_ulp(struct sock *sk);
 void tcp_update_ulp(struct sock *sk, struct proto *p,
diff --git a/net/core/filter.c b/net/core/filter.c
index 713d6f454df3..bdb5c43d6fb0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -5380,6 +5380,7 @@  static int sol_tcp_sockopt(struct sock *sk, int optname,
 	case TCP_CONGESTION:
 		return sol_tcp_sockopt_congestion(sk, optval, optlen, getopt);
 	case TCP_SAVED_SYN:
+	case TCP_ULP:
 		if (*optlen < 1)
 			return -EINVAL;
 		break;
diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c
index 0d704bda6c41..88ccd0e211f9 100644
--- a/net/ipv4/tcp.c
+++ b/net/ipv4/tcp.c
@@ -3744,7 +3744,7 @@  int do_tcp_setsockopt(struct sock *sk, int level, int optname,
 		name[val] = 0;
 
 		sockopt_lock_sock(sk);
-		err = tcp_set_ulp(sk, name);
+		err = tcp_set_ulp(sk, name, !has_current_bpf_ctx());
 		sockopt_release_sock(sk);
 		return err;
 	}
diff --git a/net/ipv4/tcp_ulp.c b/net/ipv4/tcp_ulp.c
index 2aa442128630..c1c39dbef417 100644
--- a/net/ipv4/tcp_ulp.c
+++ b/net/ipv4/tcp_ulp.c
@@ -33,10 +33,7 @@  static struct tcp_ulp_ops *tcp_ulp_find(const char *name)
 
 static const struct tcp_ulp_ops *__tcp_ulp_find_autoload(const char *name)
 {
-	const struct tcp_ulp_ops *ulp = NULL;
-
-	rcu_read_lock();
-	ulp = tcp_ulp_find(name);
+	const struct tcp_ulp_ops *ulp = tcp_ulp_find(name);
 
 #ifdef CONFIG_MODULES
 	if (!ulp && capable(CAP_NET_ADMIN)) {
@@ -46,10 +43,6 @@  static const struct tcp_ulp_ops *__tcp_ulp_find_autoload(const char *name)
 		ulp = tcp_ulp_find(name);
 	}
 #endif
-	if (!ulp || !try_module_get(ulp->owner))
-		ulp = NULL;
-
-	rcu_read_unlock();
 	return ulp;
 }
 
@@ -154,15 +147,24 @@  static int __tcp_set_ulp(struct sock *sk, const struct tcp_ulp_ops *ulp_ops)
 	return err;
 }
 
-int tcp_set_ulp(struct sock *sk, const char *name)
+int tcp_set_ulp(struct sock *sk, const char *name, bool load)
 {
 	const struct tcp_ulp_ops *ulp_ops;
+	int err = 0;
 
 	sock_owned_by_me(sk);
 
-	ulp_ops = __tcp_ulp_find_autoload(name);
-	if (!ulp_ops)
-		return -ENOENT;
+	rcu_read_lock();
+	if (!load)
+		ulp_ops = tcp_ulp_find(name);
+	else
+		ulp_ops = __tcp_ulp_find_autoload(name);
+
+	if (!ulp_ops || !try_module_get(ulp_ops->owner))
+		err = -ENOENT;
+	else
+		err = __tcp_set_ulp(sk, ulp_ops);
 
-	return __tcp_set_ulp(sk, ulp_ops);
+	rcu_read_unlock();
+	return err;
 }
diff --git a/net/mptcp/subflow.c b/net/mptcp/subflow.c
index fd021cf8286e..fb936d280b83 100644
--- a/net/mptcp/subflow.c
+++ b/net/mptcp/subflow.c
@@ -1776,7 +1776,7 @@  int mptcp_subflow_create_socket(struct sock *sk, unsigned short family,
 	sf->sk->sk_net_refcnt = 1;
 	get_net_track(net, &sf->sk->ns_tracker, GFP_KERNEL);
 	sock_inuse_add(net, 1);
-	err = tcp_set_ulp(sf->sk, "mptcp");
+	err = tcp_set_ulp(sf->sk, "mptcp", true);
 	if (err)
 		goto err_free;