Message ID | 20250407-tcpsendmsg-v2-2-9f0ea843ef99@debian.org (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | trace: add tracepoint to tcp_sendmsg_locked | expand |
From: Breno Leitao <leitao@debian.org> Date: Mon, 07 Apr 2025 06:40:44 -0700 > Add a tracepoint to monitor TCP send operations, enabling detailed > visibility into TCP message transmission. > > Create a new tracepoint within the tcp_sendmsg_locked function, > capturing traditional fields along with size_goal, which indicates the > optimal data size for a single TCP segment. Additionally, a reference to > the struct sock sk is passed, allowing direct access for BPF programs. > The implementation is largely based on David's patch and suggestions. > > The implementation is largely based on David's patch[1] and suggestions. nit: duplicate sentences. > > Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1] > Signed-off-by: Breno Leitao <leitao@debian.org> > --- > include/trace/events/tcp.h | 24 ++++++++++++++++++++++++ > net/ipv4/tcp.c | 2 ++ > 2 files changed, 26 insertions(+) > > diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h > index 1a40c41ff8c30..cab25504c4f9d 100644 > --- a/include/trace/events/tcp.h > +++ b/include/trace/events/tcp.h > @@ -259,6 +259,30 @@ TRACE_EVENT(tcp_retransmit_synack, > __entry->saddr_v6, __entry->daddr_v6) > ); > > +TRACE_EVENT(tcp_sendmsg_locked, > + TP_PROTO(const struct sock *sk, const struct msghdr *msg, > + const struct sk_buff *skb, int size_goal), > + > + TP_ARGS(sk, msg, skb, size_goal), > + > + TP_STRUCT__entry( > + __field(const void *, skb_addr) > + __field(int, skb_len) > + __field(int, msg_left) > + __field(int, size_goal) > + ), > + > + TP_fast_assign( > + __entry->skb_addr = skb; > + __entry->skb_len = skb ? skb->len : 0; > + __entry->msg_left = msg_data_left(msg); > + __entry->size_goal = size_goal; > + ), > + > + TP_printk("skb_addr %p skb_len %d msg_left %d size_goal %d", > + __entry->skb_addr, __entry->skb_len, __entry->msg_left, > + __entry->size_goal)); > + > DECLARE_TRACE(tcp_cwnd_reduction_tp, > TP_PROTO(const struct sock *sk, int newly_acked_sacked, > int newly_lost, int flag), > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index ea8de00f669d0..270ce2c8c2d54 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -1160,6 +1160,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > if (skb) > copy = size_goal - skb->len; > > + trace_tcp_sendmsg_locked(sk, msg, skb, size_goal); skb could be NULL, so I think raw_tp_null_args[] needs to be updated. Maybe try attaching a bpf prog that dereferences skb unconditionally and see if the bpf verifier rejects it. See this commit for the similar issue: commit 5da7e15fb5a12e78de974d8908f348e279922ce9 Author: Kuniyuki Iwashima <kuniyu@amazon.com> Date: Fri Jan 31 19:01:42 2025 -0800 net: Add rx_skb of kfree_skb to raw_tp_null_args[]. > + > if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) { > bool first_skb; > > > -- > 2.47.1
From: Breno Leitao <leitao@debian.org> Date: Mon, 07 Apr 2025 06:40:44 -0700 > + TP_printk("skb_addr %p skb_len %d msg_left %d size_goal %d", > + __entry->skb_addr, __entry->skb_len, __entry->msg_left, > + __entry->size_goal)); Also could you align these two lines as other events ?
Hello Kuniyuki, On Mon, Apr 07, 2025 at 06:00:35PM -0700, Kuniyuki Iwashima wrote: > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index ea8de00f669d0..270ce2c8c2d54 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -1160,6 +1160,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) > > if (skb) > > copy = size_goal - skb->len; > > > > + trace_tcp_sendmsg_locked(sk, msg, skb, size_goal); > > skb could be NULL, so I think raw_tp_null_args[] needs to be updated. > > Maybe try attaching a bpf prog that dereferences skb unconditionally > and see if the bpf verifier rejects it. I've been trying to dereference skb (while 0), and bpf verifier rejects it. Here is the code I wrote to test: SEC("tracepoint/tcp/tcp_sendmsg_locked") int bpf_tcp_sendmsg_locked(struct tcp_sendmsg_locked_args *ctx) { bpf_printk("TCP: skb_addr %p skb_len %d msg_left %d size_goal %d", ctx->skb_addr, ctx->skb_len, ctx->msg_left, ctx->size_goal); return 0; } And it matches the tracepoint, but, trying to dereference skb_addr fails in all forms. I tried with a proper dereference, or, something as simple as the following, and the program is not loaded. bpf_printk("deref %d\n", *(int *) ctx->skb_addr); Here is all it returns. libbpf: prog 'bpf_tcp_sendmsg_locked': BPF program load failed: Permission denied libbpf: prog 'bpf_tcp_sendmsg_locked': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; int bpf_tcp_sendmsg_locked(struct tcp_sendmsg_locked_args *ctx) @ tcp_sendmsg_locked_bpf.c:16 0: (bf) r6 = r1 ; R1=ctx() R6_w=ctx() ; bpf_printk("TCP: skb_addr %p skb_len %d msg_left %d size_goal %d", @ tcp_sendmsg_locked_bpf.c:19 1: (79) r1 = *(u64 *)(r6 +8) ; R1_w=scalar() R6_w=ctx() 2: (7b) *(u64 *)(r10 -32) = r1 ; R1_w=scalar(id=1) R10=fp0 fp-32_w=scalar(id=1) 3: (61) r1 = *(u32 *)(r6 +16) ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=ctx() 4: (67) r1 <<= 32 ; R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000)) 5: (c7) r1 s>>= 32 ; R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) 6: (7b) *(u64 *)(r10 -24) = r1 ; R1_w=scalar(id=2,smin=0xffffffff80000000,smax=0x7fffffff) R10=fp0 fp-24_w=scalar(id=2,smin=0xffffffff80000000,smax=0x7fffffff) 7: (61) r1 = *(u32 *)(r6 +20) ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=ctx() 8: (67) r1 <<= 32 ; R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000)) 9: (c7) r1 s>>= 32 ; R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) 10: (7b) *(u64 *)(r10 -16) = r1 ; R1_w=scalar(id=3,smin=0xffffffff80000000,smax=0x7fffffff) R10=fp0 fp-16_w=scalar(id=3,smin=0xffffffff80000000,smax=0x7fffffff) 11: (61) r1 = *(u32 *)(r6 +24) ; R1_w=scalar(smin=0,smax=umax=0xffffffff,var_off=(0x0; 0xffffffff)) R6_w=ctx() 12: (67) r1 <<= 32 ; R1_w=scalar(smax=0x7fffffff00000000,umax=0xffffffff00000000,smin32=0,smax32=umax32=0,var_off=(0x0; 0xffffffff00000000)) 13: (c7) r1 s>>= 32 ; R1_w=scalar(smin=0xffffffff80000000,smax=0x7fffffff) 14: (7b) *(u64 *)(r10 -8) = r1 ; R1_w=scalar(id=4,smin=0xffffffff80000000,smax=0x7fffffff) R10=fp0 fp-8_w=scalar(id=4,smin=0xffffffff80000000,smax=0x7fffffff) 15: (bf) r3 = r10 ; R3_w=fp0 R10=fp0 16: (07) r3 += -32 ; R3_w=fp-32 17: (18) r1 = 0xff1100010965cdd8 ; R1_w=map_value(map=tcp_send.rodata,ks=4,vs=63) 19: (b7) r2 = 53 ; R2_w=53 20: (b7) r4 = 32 ; R4_w=32 21: (85) call bpf_trace_vprintk#177 ; R0_w=scalar() ; bpf_printk("deref %d\n", *(int *) ctx->skb_addr); @ tcp_sendmsg_locked_bpf.c:22 22: (79) r1 = *(u64 *)(r6 +8) ; R1_w=scalar() R6_w=ctx() 23: (61) r3 = *(u32 *)(r1 +0) R1 invalid mem access 'scalar' processed 23 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'bpf_tcp_sendmsg_locked': failed to load: -13 libbpf: failed to load object 'tcp_sendmsg_locked_bpf.o' Failed to load BPF object: -13 I've pushed this example to the following URL, if you want to experiment as well: https://github.com/leitao/debug/blob/main/bpf/tracepoint/tcp_sendmsg_locked_bpf.c > See this commit for the similar issue: > > commit 5da7e15fb5a12e78de974d8908f348e279922ce9 > Author: Kuniyuki Iwashima <kuniyu@amazon.com> > Date: Fri Jan 31 19:01:42 2025 -0800 > > net: Add rx_skb of kfree_skb to raw_tp_null_args[]. > Thanks for the heads-up. I can populate raw_tp_null_args with this new tracepoint function, if that is the right thing to do, even without being able to reproduce the issue above. Thanks for the review, --breno
On 4/8/25 8:27 AM, Breno Leitao wrote: > > SEC("tracepoint/tcp/tcp_sendmsg_locked") Try `raw_tracepoint/tcp/tcp_sendmsg_locked`. This is the form I use for my tracepoint based packet capture (not tied to this tracepoint, but traces inside our driver) and it works fine. As suggested, you might need to update raw_tp_null_args
On Tue, Apr 08, 2025 at 09:16:51AM -0600, David Ahern wrote: > On 4/8/25 8:27 AM, Breno Leitao wrote: > > > > SEC("tracepoint/tcp/tcp_sendmsg_locked") > > Try `raw_tracepoint/tcp/tcp_sendmsg_locked`. > > This is the form I use for my tracepoint based packet capture (not tied > to this tracepoint, but traces inside our driver) and it works fine. Thanks. I was not able to get this crashing as well. In fact, the following program fails to be loaded: SEC("raw_tracepoint/tcp/tcp_sendmsg_locked") int bpf_tcp_sendmsg_locked(struct bpf_raw_tracepoint_args *ctx) { void *skb_addr = (void *) ctx->args[0]; bpf_printk("deref %d\n", *(int *) skb_addr); return 0; } libbpf refuses to load it, and drumps: libbpf: prog 'bpf_tcp_sendmsg_locked': BPF program load failed: Permission denied libbpf: prog 'bpf_tcp_sendmsg_locked': -- BEGIN PROG LOAD LOG -- 0: R1=ctx() R10=fp0 ; void *skb_addr = (void *) ctx->args[0]; @ tcp_sendmsg_locked_bpf.c:18 0: (79) r1 = *(u64 *)(r1 +0) ; R1_w=scalar() ; bpf_printk("deref %d\n", *(int *) skb_addr); @ tcp_sendmsg_locked_bpf.c:20 1: (61) r3 = *(u32 *)(r1 +0) R1 invalid mem access 'scalar' processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 -- END PROG LOAD LOG -- libbpf: prog 'bpf_tcp_sendmsg_locked': failed to load: -13 libbpf: failed to load object 'tcp_sendmsg_locked_bpf.o' Failed to load BPF object: -13 > As suggested, you might need to update raw_tp_null_args Thanks for confirming it. I will update raw_tp_null_args, assuming that the problem exists but I am failing to reproduce it. I will send an updated version soon. Thanks --breno
From: Breno Leitao <leitao@debian.org> Date: Tue, 8 Apr 2025 10:01:05 -0700 > On Tue, Apr 08, 2025 at 09:16:51AM -0600, David Ahern wrote: > > On 4/8/25 8:27 AM, Breno Leitao wrote: > > > > > > SEC("tracepoint/tcp/tcp_sendmsg_locked") > > > > Try `raw_tracepoint/tcp/tcp_sendmsg_locked`. > > > > This is the form I use for my tracepoint based packet capture (not tied > > to this tracepoint, but traces inside our driver) and it works fine. > > Thanks. I was not able to get this crashing as well. In fact, the > following program fails to be loaded: > > SEC("raw_tracepoint/tcp/tcp_sendmsg_locked") Try SEC("tp_btf/tcp_sendmsg_locked") and access the raw argument (struct sk_buff *skb) instead of bpf_raw_tracepoint_args. The original report used it and I was able to reproduce it at that time. https://lore.kernel.org/netdev/Z50zebTRzI962e6X@debian.debian/ > int bpf_tcp_sendmsg_locked(struct bpf_raw_tracepoint_args *ctx) > { > void *skb_addr = (void *) ctx->args[0]; > > bpf_printk("deref %d\n", *(int *) skb_addr); > > return 0; > } > > libbpf refuses to load it, and drumps: > > libbpf: prog 'bpf_tcp_sendmsg_locked': BPF program load failed: Permission denied > libbpf: prog 'bpf_tcp_sendmsg_locked': -- BEGIN PROG LOAD LOG -- > 0: R1=ctx() R10=fp0 > ; void *skb_addr = (void *) ctx->args[0]; @ tcp_sendmsg_locked_bpf.c:18 > 0: (79) r1 = *(u64 *)(r1 +0) ; R1_w=scalar() > ; bpf_printk("deref %d\n", *(int *) skb_addr); @ tcp_sendmsg_locked_bpf.c:20 > 1: (61) r3 = *(u32 *)(r1 +0) > R1 invalid mem access 'scalar' > processed 2 insns (limit 1000000) max_states_per_insn 0 total_states 0 peak_states 0 mark_read 0 > -- END PROG LOAD LOG -- > libbpf: prog 'bpf_tcp_sendmsg_locked': failed to load: -13 > libbpf: failed to load object 'tcp_sendmsg_locked_bpf.o' > Failed to load BPF object: -13 > > > As suggested, you might need to update raw_tp_null_args > > Thanks for confirming it. I will update raw_tp_null_args, assuming that > the problem exists but I am failing to reproduce it. > > I will send an updated version soon.
On Tue, Apr 08, 2025 at 10:12:14AM -0700, Kuniyuki Iwashima wrote: > From: Breno Leitao <leitao@debian.org> > Date: Tue, 8 Apr 2025 10:01:05 -0700 > > On Tue, Apr 08, 2025 at 09:16:51AM -0600, David Ahern wrote: > > > On 4/8/25 8:27 AM, Breno Leitao wrote: > > > > > > > > SEC("tracepoint/tcp/tcp_sendmsg_locked") > > > > > > Try `raw_tracepoint/tcp/tcp_sendmsg_locked`. > > > > > > This is the form I use for my tracepoint based packet capture (not tied > > > to this tracepoint, but traces inside our driver) and it works fine. > > > > Thanks. I was not able to get this crashing as well. In fact, the > > following program fails to be loaded: > > > > SEC("raw_tracepoint/tcp/tcp_sendmsg_locked") > > Try SEC("tp_btf/tcp_sendmsg_locked") and access the raw argument > (struct sk_buff *skb) instead of bpf_raw_tracepoint_args. Nice, I was able to crash the host, with the following code: SEC("tp_btf/tcp_sendmsg_locked") int BPF_PROG(tcp_sendmsg_locked, struct sock *sk, struct msghdr *msg, struct sk_buff *skb, int size_goal) { bpf_printk("skb->len %d\n", skb->len); return 0; } This is the unusually expected stacktrace. :-) BUG: kernel NULL pointer dereference, address: 0000000000000070 #PF: supervisor read access in kernel mode "virtme-ng" 11:03 08-Apr-25 #PF: error_code(0x0000) - not-present page PGD 10ca78067 P4D 0 Oops: Oops: 0000 [#1] SMP DEBUG_PAGEALLOC NOPTI CPU: 13 UID: 0 PID: 1020 Comm: nc Tainted: G E N 6.14.0-upstream-05880-g14fbb7a1a500 #73 PREEMPT(undef) Tainted: [E]=UNSIGNED_MODULE, [N]=TEST Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS rel-1.16.3-0-ga6ed6b701f0a-prebuilt.qemu.org 04/01/2014 RIP: 0010:bpf_prog_5b31430a4390397c_tcp_sendmsg_locked+0x18/0x37 Code: cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc cc f3 0f 1e fa 0f 1f 44 00 00 0f 1f 00 55 48 89 e5 f3 0f 1e fa 48 8b 7f 10 <8b> 57 70 48 bf d8 d9 03 06 01 00 11 ff be 0d 00 00 00 e8 15 f4 4c RSP: 0018:ffa0000003c03bd0 EFLAGS: 00010282 RAX: 5aab7562e1de3200 RBX: ffa0000003be4000 RCX: 0000000000000018 RDX: 0000000000000000 RSI: ffa0000003be4048 RDI: 0000000000000000 RBP: ffa0000003c03bd0 R08: 000000000006043d R09: ffffffffffffffff R10: 0000000000000000 R11: ffffffffa000096c R12: ff11000104ae5b00 R13: ff1100010610a3c0 R14: ffffffff814d34ef R15: 0000000000000000 FS: 00007fd67d550740(0000) GS:ff110005a40a9000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 0000000000000070 CR3: 000000010d9ec002 CR4: 0000000000771ef0 PKRU: 55555554 Call Trace: <TASK> ? __die_body+0xaf/0xc0 ? page_fault_oops+0x35b/0x3c0 ? do_user_addr_fault+0x6d4/0x730 ? srso_alias_return_thunk+0x5/0xfbef5 ? exc_page_fault+0x5f/0xe0 ? asm_exc_page_fault+0x26/0x30 ? bpf_trace_run4+0xbf/0x240 ? 0xffffffffa000096c ? bpf_prog_5b31430a4390397c_tcp_sendmsg_locked+0x18/0x37 bpf_trace_run4+0x14c/0x240 ? trace_event_raw_event_tcp_sendmsg_locked+0xc3/0xf0 __traceiter_tcp_sendmsg_locked+0x44/0x60 tcp_sendmsg_locked+0x10c8/0x15b0 ? __local_bh_enable_ip+0x166/0x1c0 ? srso_alias_return_thunk+0x5/0xfbef5 tcp_sendmsg+0x2c/0x50 ? __pfx_inet6_sendmsg+0x10/0x10 sock_sendmsg_nosec+0xa0/0x100 __sys_sendto+0x1b4/0x1f0 __x64_sys_sendto+0x26/0x30 do_syscall_64+0x83/0x170 entry_SYSCALL_64_after_hwframe+0x76/0x7e
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 1a40c41ff8c30..cab25504c4f9d 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -259,6 +259,30 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); +TRACE_EVENT(tcp_sendmsg_locked, + TP_PROTO(const struct sock *sk, const struct msghdr *msg, + const struct sk_buff *skb, int size_goal), + + TP_ARGS(sk, msg, skb, size_goal), + + TP_STRUCT__entry( + __field(const void *, skb_addr) + __field(int, skb_len) + __field(int, msg_left) + __field(int, size_goal) + ), + + TP_fast_assign( + __entry->skb_addr = skb; + __entry->skb_len = skb ? skb->len : 0; + __entry->msg_left = msg_data_left(msg); + __entry->size_goal = size_goal; + ), + + TP_printk("skb_addr %p skb_len %d msg_left %d size_goal %d", + __entry->skb_addr, __entry->skb_len, __entry->msg_left, + __entry->size_goal)); + DECLARE_TRACE(tcp_cwnd_reduction_tp, TP_PROTO(const struct sock *sk, int newly_acked_sacked, int newly_lost, int flag), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index ea8de00f669d0..270ce2c8c2d54 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1160,6 +1160,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (skb) copy = size_goal - skb->len; + trace_tcp_sendmsg_locked(sk, msg, skb, size_goal); + if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) { bool first_skb;
Add a tracepoint to monitor TCP send operations, enabling detailed visibility into TCP message transmission. Create a new tracepoint within the tcp_sendmsg_locked function, capturing traditional fields along with size_goal, which indicates the optimal data size for a single TCP segment. Additionally, a reference to the struct sock sk is passed, allowing direct access for BPF programs. The implementation is largely based on David's patch and suggestions. The implementation is largely based on David's patch[1] and suggestions. Link: https://lore.kernel.org/all/70168c8f-bf52-4279-b4c4-be64527aa1ac@kernel.org/ [1] Signed-off-by: Breno Leitao <leitao@debian.org> --- include/trace/events/tcp.h | 24 ++++++++++++++++++++++++ net/ipv4/tcp.c | 2 ++ 2 files changed, 26 insertions(+)