Message ID | 20250224-tcpsendmsg-v1-1-bac043c59cc8@debian.org (mailing list archive) |
---|---|
State | Handled Elsewhere |
Headers | show |
Series | [net-next] trace: tcp: Add tracepoint for tcp_sendmsg() | expand |
On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > > Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > the tracing of TCP messages being sent. > > Meta has been using BPF programs to monitor this function for years, > indicating significant interest in observing this important > functionality. Adding a proper tracepoint provides a stable API for all > users who need visibility into TCP message transmission. > > The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > creating unnecessary trace event infrastructure and tracefs exports, > keeping the implementation minimal while stabilizing the API. > > Given that this patch creates a rawtracepoint, you could hook into it > using regular tooling, like bpftrace, using regular rawtracepoint > infrastructure, such as: > > rawtracepoint:tcp_sendmsg_tp { > .... > } I would expect tcp_sendmsg() being stable enough ? kprobe:tcp_sendmsg { }
> ________________________________________ > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: >> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling >> the tracing of TCP messages being sent. >> >> Meta has been using BPF programs to monitor this function for years, >> indicating significant interest in observing this important >> functionality. Adding a proper tracepoint provides a stable API for all >> users who need visibility into TCP message transmission. >> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid >> creating unnecessary trace event infrastructure and tracefs exports, >> keeping the implementation minimal while stabilizing the API. >> >> Given that this patch creates a rawtracepoint, you could hook into it >> using regular tooling, like bpftrace, using regular rawtracepoint >> infrastructure, such as: >> >> rawtracepoint:tcp_sendmsg_tp { >> .... >> } > > I would expect tcp_sendmsg() being stable enough ? > > kprobe:tcp_sendmsg { > } In LTO mode, tcp_sendmsg could be inlined cross files. For example, net/ipv4/tcp.c: int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) net/ipv4/tcp_bpf.c: ... return tcp_sendmsg(sk, msg, size); net/ipv6/af_inet6.c: ... return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) And this does happen in our production environment.
On 2/24/25 12:03 PM, Eric Dumazet wrote: > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: >> >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling >> the tracing of TCP messages being sent. >> >> Meta has been using BPF programs to monitor this function for years, >> indicating significant interest in observing this important >> functionality. Adding a proper tracepoint provides a stable API for all >> users who need visibility into TCP message transmission. >> >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid >> creating unnecessary trace event infrastructure and tracefs exports, >> keeping the implementation minimal while stabilizing the API. >> >> Given that this patch creates a rawtracepoint, you could hook into it >> using regular tooling, like bpftrace, using regular rawtracepoint >> infrastructure, such as: >> >> rawtracepoint:tcp_sendmsg_tp { >> .... >> } > > I would expect tcp_sendmsg() being stable enough ? > > kprobe:tcp_sendmsg { > } Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover more use cases (see kernel references to it). We have a patch for a couple years now with a tracepoint inside the while (msg_data_left(msg)) { } loop which is more useful than just entry to sendmsg.
On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote: > > > ________________________________________ > > > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > >> > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > >> the tracing of TCP messages being sent. > >> > >> Meta has been using BPF programs to monitor this function for years, > >> indicating significant interest in observing this important > >> functionality. Adding a proper tracepoint provides a stable API for all > >> users who need visibility into TCP message transmission. > >> > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > >> creating unnecessary trace event infrastructure and tracefs exports, > >> keeping the implementation minimal while stabilizing the API. > >> > >> Given that this patch creates a rawtracepoint, you could hook into it > >> using regular tooling, like bpftrace, using regular rawtracepoint > >> infrastructure, such as: > >> > >> rawtracepoint:tcp_sendmsg_tp { > >> .... > >> } > > > > I would expect tcp_sendmsg() being stable enough ? > > > > kprobe:tcp_sendmsg { > > } > > In LTO mode, tcp_sendmsg could be inlined cross files. For example, > > net/ipv4/tcp.c: > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > net/ipv4/tcp_bpf.c: > ... > return tcp_sendmsg(sk, msg, size); > net/ipv6/af_inet6.c: > ... > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) > > And this does happen in our production environment. And we do not have a way to make the kprobe work even if LTO decided to inline a function ? This seems like a tracing or LTO issue, this could be addressed there in a generic way and avoid many other patches to work around this.
Hello Eric, On Mon, Feb 24, 2025 at 08:23:00PM +0100, Eric Dumazet wrote: > On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote: > > > > > ________________________________________ > > > > > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > > >> > > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > > >> the tracing of TCP messages being sent. > > >> > > >> Meta has been using BPF programs to monitor this function for years, > > >> indicating significant interest in observing this important > > >> functionality. Adding a proper tracepoint provides a stable API for all > > >> users who need visibility into TCP message transmission. > > >> > > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > > >> creating unnecessary trace event infrastructure and tracefs exports, > > >> keeping the implementation minimal while stabilizing the API. > > >> > > >> Given that this patch creates a rawtracepoint, you could hook into it > > >> using regular tooling, like bpftrace, using regular rawtracepoint > > >> infrastructure, such as: > > >> > > >> rawtracepoint:tcp_sendmsg_tp { > > >> .... > > >> } > > > > > > I would expect tcp_sendmsg() being stable enough ? > > > > > > kprobe:tcp_sendmsg { > > > } > > > > In LTO mode, tcp_sendmsg could be inlined cross files. For example, > > > > net/ipv4/tcp.c: > > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > net/ipv4/tcp_bpf.c: > > ... > > return tcp_sendmsg(sk, msg, size); > > net/ipv6/af_inet6.c: > > ... > > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) > > > > And this does happen in our production environment. > > And we do not have a way to make the kprobe work even if LTO decided > to inline a function ? > > This seems like a tracing or LTO issue, this could be addressed there > in a generic way > and avoid many other patches to work around this. I understand that the {raw}tracepoint ensures the compiler cannot interfere with these hook points. For everything else, we rely on the hope that the compiler behaves favorably, which is far from ideal.
Hello David, On Mon, Feb 24, 2025 at 12:16:04PM -0700, David Ahern wrote: > On 2/24/25 12:03 PM, Eric Dumazet wrote: > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > >> > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > >> the tracing of TCP messages being sent. > >> > >> Meta has been using BPF programs to monitor this function for years, > >> indicating significant interest in observing this important > >> functionality. Adding a proper tracepoint provides a stable API for all > >> users who need visibility into TCP message transmission. > >> > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > >> creating unnecessary trace event infrastructure and tracefs exports, > >> keeping the implementation minimal while stabilizing the API. > >> > >> Given that this patch creates a rawtracepoint, you could hook into it > >> using regular tooling, like bpftrace, using regular rawtracepoint > >> infrastructure, such as: > >> > >> rawtracepoint:tcp_sendmsg_tp { > >> .... > >> } > > > > I would expect tcp_sendmsg() being stable enough ? > > > > kprobe:tcp_sendmsg { > > } > > Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover > more use cases (see kernel references to it). Agree, this seems to provide more useful information > We have a patch for a couple years now with a tracepoint inside the Sorry, where do you have this patch? is it downstream? > while (msg_data_left(msg)) { > } > > loop which is more useful than just entry to sendmsg. Do you mean something like the following? diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 1a40c41ff8c30..23318e252d6b9 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -259,6 +259,11 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); +DECLARE_TRACE(tcp_sendmsg_tp, + TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size, ssize_t copied), + TP_ARGS(sk, msg, size, copied) +); + DECLARE_TRACE(tcp_cwnd_reduction_tp, TP_PROTO(const struct sock *sk, int newly_acked_sacked, int newly_lost, int flag), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 08d73f17e8162..5fcef82275d4a 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1290,6 +1290,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) sk_mem_charge(sk, copy); } + trace_tcp_sendmsg_tp(sk, msg, size, copy); + if (!copied) TCP_SKB_CB(skb)->tcp_flags &= ~TCPHDR_PSH;
On 2/26/25 9:10 AM, Breno Leitao wrote: >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover >> more use cases (see kernel references to it). > > Agree, this seems to provide more useful information > >> We have a patch for a couple years now with a tracepoint inside the > > Sorry, where do you have this patch? is it downstream? company tree. Attached. Where to put tracepoints and what arguments to supply so that it is beneficial to multiple users is always a touchy subject :-), so I have not tried to push the patch out. sock arg should be added to it for example. The key is to see how tcp_sendmsg_locked breaks up the buffers, and then another one in tcp_write_xmit to see when the actual push out happens. At the time I was looking at latency in the stack - from sendmsg call to driver pushing descriptors to hardware. From 2298ca66c15bae6a95698abd8d029b9271fbefa3 Mon Sep 17 00:00:00 2001 From: David Ahern <dsahern@gmail.com> Date: Fri, 24 Mar 2023 22:07:53 +0000 Subject: [PATCH] tcp: Add tracepoints to tcp_write_xmit and tcp_sendmsg_locked Signed-off-by: David Ahern <dsahern@gmail.com> --- include/trace/events/tcp.h | 57 ++++++++++++++++++++++++++++++++++++++ net/ipv4/tcp.c | 4 +++ net/ipv4/tcp_output.c | 1 + 3 files changed, 62 insertions(+) diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 901b440238d5..6b19e1d4d79d 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -187,6 +187,63 @@ DEFINE_EVENT(tcp_event_sk, tcp_rcv_space_adjust, TP_ARGS(sk) ); +TRACE_EVENT(tcp_sendmsg_locked, + + TP_PROTO(struct msghdr *msg, struct sk_buff *skb, int size_goal), + + TP_ARGS(msg, skb, size_goal), + + TP_STRUCT__entry( + __field(__u64, skb) + __field(int, skb_len) + __field(int, msg_left) + __field(int, size_goal) + ), + + TP_fast_assign( + __entry->skb = (__u64)skb; + __entry->skb_len = skb ? skb->len : 0; + __entry->msg_left = msg_data_left(msg); + __entry->size_goal = size_goal; + ), + + TP_printk("skb %llx len %d msg_left %d size_goal %d", + __entry->skb, __entry->skb_len, + __entry->msg_left, __entry->size_goal) +); + +TRACE_EVENT(tcp_write_xmit, + + TP_PROTO(struct sock *sk, unsigned int mss_now, int nonagle, u32 max_segs), + + TP_ARGS(sk, mss_now, nonagle,max_segs), + + TP_STRUCT__entry( + __field(__u64, tcp_wstamp_ns) + __field(__u64, tcp_clock_cache) + __field(unsigned int, mss_now) + __field(unsigned int, max_segs) + __field(int, nonagle) + __field(int, sk_pacing) + ), + + TP_fast_assign( + struct tcp_sock *tp = tcp_sk(sk); + + __entry->mss_now = mss_now; + __entry->nonagle = nonagle; + __entry->max_segs = max_segs; + __entry->sk_pacing = tcp_needs_internal_pacing(sk); + __entry->tcp_wstamp_ns = tp->tcp_wstamp_ns; + __entry->tcp_clock_cache = tp->tcp_clock_cache; + ), + + TP_printk("mss %u segs %u nonagle %d sk_pacing %d tcp_wstamp_ns %lld tcp_clock_cache %lld", + __entry->mss_now, __entry->max_segs, + __entry->nonagle, __entry->sk_pacing, + __entry->tcp_wstamp_ns, __entry->tcp_clock_cache) +); + TRACE_EVENT(tcp_retransmit_synack, TP_PROTO(const struct sock *sk, const struct request_sock *req), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 6bb8eb803105..0fa2c8e035b7 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -279,6 +279,7 @@ #include <linux/uaccess.h> #include <asm/ioctls.h> #include <net/busy_poll.h> +#include <trace/events/tcp.h> /* Track pending CMSGs. */ enum { @@ -1312,6 +1313,8 @@ int tcp_sendmsg_locked(struct sock *sk, struct msghdr *msg, size_t size) if (skb) copy = size_goal - skb->len; + trace_tcp_sendmsg_locked(msg, skb, size_goal); + if (copy <= 0 || !tcp_skb_can_collapse_to(skb)) { bool first_skb; diff --git a/net/ipv4/tcp_output.c b/net/ipv4/tcp_output.c index 74190518708a..c84f18cd9b7e 100644 --- a/net/ipv4/tcp_output.c +++ b/net/ipv4/tcp_output.c @@ -2623,6 +2623,7 @@ static bool tcp_write_xmit(struct sock *sk, unsigned int mss_now, int nonagle, } max_segs = tcp_tso_segs(sk, mss_now); + trace_tcp_write_xmit(sk, mss_now, nonagle, max_segs); while ((skb = tcp_send_head(sk))) { unsigned int limit;
Hello David, On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote: > On 2/26/25 9:10 AM, Breno Leitao wrote: > >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover > >> more use cases (see kernel references to it). > > > > Agree, this seems to provide more useful information > > > >> We have a patch for a couple years now with a tracepoint inside the > > > > Sorry, where do you have this patch? is it downstream? > > company tree. Attached. Where to put tracepoints and what arguments to > supply so that it is beneficial to multiple users is always a touchy > subject :-) Thanks. I would like to state that this would be useful for Meta as well. Right now, we (Meta) are using nasty `noinline` attribute in tcp_sendmsg() in order to make the API stable, and this tracepoint would solve this problem avoiding the `noinline` hack. So, at least two type of users would benefit from it. > so I have not tried to push the patch out. sock arg should > be added to it for example. True, if it becomes a tracepoint instead of a rawtracepoint, the sock arg might be useful. How would you recommend me proceeding in this case? Thanks --breno
On Wed, Feb 26, 2025 at 7:18 PM Breno Leitao <leitao@debian.org> wrote: > > Hello David, > > On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote: > > On 2/26/25 9:10 AM, Breno Leitao wrote: > > >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover > > >> more use cases (see kernel references to it). > > > > > > Agree, this seems to provide more useful information > > > > > >> We have a patch for a couple years now with a tracepoint inside the > > > > > > Sorry, where do you have this patch? is it downstream? > > > > company tree. Attached. Where to put tracepoints and what arguments to > > supply so that it is beneficial to multiple users is always a touchy > > subject :-) > > Thanks. I would like to state that this would be useful for Meta as > well. > > Right now, we (Meta) are using nasty `noinline` attribute in > tcp_sendmsg() in order to make the API stable, and this tracepoint would > solve this problem avoiding the `noinline` hack. So, at least two type > of users would benefit from it. > > > so I have not tried to push the patch out. sock arg should > > be added to it for example. > > True, if it becomes a tracepoint instead of a rawtracepoint, the sock > arg might be useful. > > How would you recommend me proceeding in this case? In 2022, Menglong Dong added __fix_address Then later , Yafang Shao added noinline_for_tracing . Would one of them be sufficient ?
On 2/26/25 11:27 AM, Eric Dumazet wrote: > On Wed, Feb 26, 2025 at 7:18 PM Breno Leitao <leitao@debian.org> wrote: >> >> Hello David, >> >> On Wed, Feb 26, 2025 at 10:12:08AM -0700, David Ahern wrote: >>> On 2/26/25 9:10 AM, Breno Leitao wrote: >>>>> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover >>>>> more use cases (see kernel references to it). >>>> >>>> Agree, this seems to provide more useful information >>>> >>>>> We have a patch for a couple years now with a tracepoint inside the >>>> >>>> Sorry, where do you have this patch? is it downstream? >>> >>> company tree. Attached. Where to put tracepoints and what arguments to >>> supply so that it is beneficial to multiple users is always a touchy >>> subject :-) >> >> Thanks. I would like to state that this would be useful for Meta as >> well. >> >> Right now, we (Meta) are using nasty `noinline` attribute in >> tcp_sendmsg() in order to make the API stable, and this tracepoint would >> solve this problem avoiding the `noinline` hack. So, at least two type >> of users would benefit from it. >> >>> so I have not tried to push the patch out. sock arg should >>> be added to it for example. >> >> True, if it becomes a tracepoint instead of a rawtracepoint, the sock >> arg might be useful. >> >> How would you recommend me proceeding in this case? > > In 2022, Menglong Dong added __fix_address > > Then later , Yafang Shao added noinline_for_tracing . > > Would one of them be sufficient ? tcp_sendmsg_locked should not be getting inlined; it is the sendmsg_locked handler and directly called by several subsystems. ie., moving the tracepoint to tcp_sendmsg_locked should solve the inline problem. From there, the question is inside the loop or at entry to the function. Inside the loop has been very helpful for me.
Hi David, On Thu, Feb 27, 2025 at 1:14 AM David Ahern <dsahern@kernel.org> wrote: > > On 2/26/25 9:10 AM, Breno Leitao wrote: > >> Also, if a tracepoint is added, inside of tcp_sendmsg_locked would cover > >> more use cases (see kernel references to it). > > > > Agree, this seems to provide more useful information > > > >> We have a patch for a couple years now with a tracepoint inside the > > > > Sorry, where do you have this patch? is it downstream? > > company tree. Attached. Where to put tracepoints and what arguments to > supply so that it is beneficial to multiple users is always a touchy Right. I am always eager to establish a standard evaluation/method which developers have common sense in. It's really hard because I gave it a try before. Maintainers seem not to like to see too many tracepoints appearing in the stack. > subject :-), so I have not tried to push the patch out. sock arg should > be added to it for example. > > The key is to see how tcp_sendmsg_locked breaks up the buffers, and then > another one in tcp_write_xmit to see when the actual push out happens. Agreed on this point because a fine-grained BPF program can take advantage of it. But it seems another small topic that is probably different from what the original motivation from Breno is in this patch: I guess, making the tcp_sendmsg_locked non-inlined can allow the BPF program to calculate the delta between when tcp_sendmsg_locked starts and when tcp_sendmsg_locked ends? I don't know. Probably as Eric said, using noinline or something like this is simpler? > At the time I was looking at latency in the stack - from sendmsg call to > driver pushing descriptors to hardware. So do I. Thanks, Jason
On Mon, 24 Feb 2025 20:23:00 +0100 Eric Dumazet <edumazet@google.com> wrote: > On Mon, Feb 24, 2025 at 8:13 PM Yonghong Song <yhs@meta.com> wrote: > > > > > ________________________________________ > > > > > > On Mon, Feb 24, 2025 at 7:24 PM Breno Leitao <leitao@debian.org> wrote: > > >> > > >> Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling > > >> the tracing of TCP messages being sent. > > >> > > >> Meta has been using BPF programs to monitor this function for years, > > >> indicating significant interest in observing this important > > >> functionality. Adding a proper tracepoint provides a stable API for all > > >> users who need visibility into TCP message transmission. > > >> > > >> The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid > > >> creating unnecessary trace event infrastructure and tracefs exports, > > >> keeping the implementation minimal while stabilizing the API. > > >> > > >> Given that this patch creates a rawtracepoint, you could hook into it > > >> using regular tooling, like bpftrace, using regular rawtracepoint > > >> infrastructure, such as: > > >> > > >> rawtracepoint:tcp_sendmsg_tp { > > >> .... > > >> } > > > > > > I would expect tcp_sendmsg() being stable enough ? > > > > > > kprobe:tcp_sendmsg { > > > } > > > > In LTO mode, tcp_sendmsg could be inlined cross files. For example, > > > > net/ipv4/tcp.c: > > int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) > > net/ipv4/tcp_bpf.c: > > ... > > return tcp_sendmsg(sk, msg, size); > > net/ipv6/af_inet6.c: > > ... > > return INDIRECT_CALL_2(prot->sendmsg, tcp_sendmsg, udpv6_sendmsg, ...) > > > > And this does happen in our production environment. > > And we do not have a way to make the kprobe work even if LTO decided > to inline a function ? There is `perf probe` command to set it up based on the debuginfo. But the function parameters sometimes optimized out and no more accessible. (You can try.) Thus if you needs this tracepoint not temporarily nor debugging, I would recommend to use tracepoint (or trace event). Thank you, > > This seems like a tracing or LTO issue, this could be addressed there > in a generic way > and avoid many other patches to work around this.
On Wed, Feb 26, 2025 at 11:31:49AM -0700, David Ahern wrote: > On 2/26/25 11:27 AM, Eric Dumazet wrote: > > In 2022, Menglong Dong added __fix_address > > > > Then later , Yafang Shao added noinline_for_tracing . > > > > Would one of them be sufficient ? > > tcp_sendmsg_locked should not be getting inlined; it is the > sendmsg_locked handler and directly called by several subsystems. > > ie., moving the tracepoint to tcp_sendmsg_locked should solve the inline > problem. From there, the question is inside the loop or at entry to the > function. Inside the loop has been very helpful for me. Inside the loop would work for me as well.
diff --git a/include/trace/events/tcp.h b/include/trace/events/tcp.h index 1a40c41ff8c30..7c0171d2dacdc 100644 --- a/include/trace/events/tcp.h +++ b/include/trace/events/tcp.h @@ -259,6 +259,11 @@ TRACE_EVENT(tcp_retransmit_synack, __entry->saddr_v6, __entry->daddr_v6) ); +DECLARE_TRACE(tcp_sendmsg_tp, + TP_PROTO(const struct sock *sk, const struct msghdr *msg, size_t size), + TP_ARGS(sk, msg, size) +); + DECLARE_TRACE(tcp_cwnd_reduction_tp, TP_PROTO(const struct sock *sk, int newly_acked_sacked, int newly_lost, int flag), diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 08d73f17e8162..5ef86fbd8aa85 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -1362,6 +1362,8 @@ int tcp_sendmsg(struct sock *sk, struct msghdr *msg, size_t size) { int ret; + trace_tcp_sendmsg_tp(sk, msg, size); + lock_sock(sk); ret = tcp_sendmsg_locked(sk, msg, size); release_sock(sk);
Add a lightweight tracepoint to monitor TCP sendmsg operations, enabling the tracing of TCP messages being sent. Meta has been using BPF programs to monitor this function for years, indicating significant interest in observing this important functionality. Adding a proper tracepoint provides a stable API for all users who need visibility into TCP message transmission. The implementation uses DECLARE_TRACE instead of TRACE_EVENT to avoid creating unnecessary trace event infrastructure and tracefs exports, keeping the implementation minimal while stabilizing the API. Given that this patch creates a rawtracepoint, you could hook into it using regular tooling, like bpftrace, using regular rawtracepoint infrastructure, such as: rawtracepoint:tcp_sendmsg_tp { .... } Signed-off-by: Breno Leitao <leitao@debian.org> --- include/trace/events/tcp.h | 5 +++++ net/ipv4/tcp.c | 2 ++ 2 files changed, 7 insertions(+) --- base-commit: e13b6da7045f997e1a5a5efd61d40e63c4fc20e8 change-id: 20250224-tcpsendmsg-4f0a236751d7 Best regards,