Message ID | 20250218050125.73676-11-kerneljasonxing@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | BPF |
Headers | show |
Series | net-timestamp: bpf extension to equip applications transparently | expand |
Jason Xing wrote: > This patch introduces a new callback in tcp_tx_timestamp() to correlate > tcp_sendmsg timestamp with timestamps from other tx timestamping > callbacks (e.g., SND/SW/ACK). > > Without this patch, BPF program wouldn't know which timestamps belong > to which flow because of no socket lock protection. This new callback > is inserted in tcp_tx_timestamp() to address this issue because > tcp_tx_timestamp() still owns the same socket lock with > tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes > the timestamping related fields for the skb, especially tskey. The > tskey is the bridge to do the correlation. > > For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and > then stores the sendmsg timestamp at the bpf_sk_storage, correlating > this timestamp with its tskey that are later used in other sending > timestamping callbacks. > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
Jason Xing wrote: > This patch introduces a new callback in tcp_tx_timestamp() to correlate > tcp_sendmsg timestamp with timestamps from other tx timestamping > callbacks (e.g., SND/SW/ACK). > > Without this patch, BPF program wouldn't know which timestamps belong > to which flow because of no socket lock protection. This new callback > is inserted in tcp_tx_timestamp() to address this issue because > tcp_tx_timestamp() still owns the same socket lock with > tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes > the timestamping related fields for the skb, especially tskey. The > tskey is the bridge to do the correlation. > > For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and > then stores the sendmsg timestamp at the bpf_sk_storage, correlating > this timestamp with its tskey that are later used in other sending > timestamping callbacks. > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > --- > include/uapi/linux/bpf.h | 5 +++++ > net/ipv4/tcp.c | 4 ++++ > tools/include/uapi/linux/bpf.h | 5 +++++ > 3 files changed, 14 insertions(+) > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > index 9355d617767f..86fca729fbd8 100644 > --- a/include/uapi/linux/bpf.h > +++ b/include/uapi/linux/bpf.h > @@ -7052,6 +7052,11 @@ enum { > * when SK_BPF_CB_TX_TIMESTAMPING > * feature is on. > */ > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > + * is triggered. It's used to correlate > + * sendmsg timestamp with corresponding > + * tskey. > + */ > }; > > /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > index 12b9c4f9c151..4b9739cd3bc5 100644 > --- a/net/ipv4/tcp.c > +++ b/net/ipv4/tcp.c > @@ -492,6 +492,10 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc) > if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) > shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1; > } > + > + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && > + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) > + bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB); > } > > static bool tcp_stream_is_readable(struct sock *sk, int target) > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > index d3e2988b3b4c..2739ee0154a0 100644 > --- a/tools/include/uapi/linux/bpf.h > +++ b/tools/include/uapi/linux/bpf.h > @@ -7042,6 +7042,11 @@ enum { > * when SK_BPF_CB_TX_TIMESTAMPING > * feature is on. > */ > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > + * is triggered. It's used to correlate > + * sendmsg timestamp with corresponding > + * tskey. > + */ Feel free to decline this late in the review process, but a bit more bikeshedding.. Can we spell out TSTAMP instead of TS in these definitions? Within the context of this series it is self-explanatory, but when reading kernel code the meaning of a two letter acronym is not that clear. And instead of SND can we use SENDMSG or something like that? SND here confused me as the software timestamp is SCM_TSTAMP_SND. For instance: BPF_SOCK_OPS_TSTAMP_SENDMSG_CB, BPF_SOCK_OPS_TSTAMP_SCHED_CB, BPF_SOCK_OPS_TSTAMP_SND_SW_CB, BPF_SOCK_OPS_TSTAMP_SND_HW_CB, (BPF_SOCK_OPS_TSTAMP_TX_COMPLETION_CB,) BPF_SOCK_OPS_TSTAMP_ACK_CB, (not sure what the OPT in OPT_CB added).
On Thu, Feb 20, 2025 at 10:55 AM Willem de Bruijn <willemdebruijn.kernel@gmail.com> wrote: > > Jason Xing wrote: > > This patch introduces a new callback in tcp_tx_timestamp() to correlate > > tcp_sendmsg timestamp with timestamps from other tx timestamping > > callbacks (e.g., SND/SW/ACK). > > > > Without this patch, BPF program wouldn't know which timestamps belong > > to which flow because of no socket lock protection. This new callback > > is inserted in tcp_tx_timestamp() to address this issue because > > tcp_tx_timestamp() still owns the same socket lock with > > tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes > > the timestamping related fields for the skb, especially tskey. The > > tskey is the bridge to do the correlation. > > > > For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and > > then stores the sendmsg timestamp at the bpf_sk_storage, correlating > > this timestamp with its tskey that are later used in other sending > > timestamping callbacks. > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > > --- > > include/uapi/linux/bpf.h | 5 +++++ > > net/ipv4/tcp.c | 4 ++++ > > tools/include/uapi/linux/bpf.h | 5 +++++ > > 3 files changed, 14 insertions(+) > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > index 9355d617767f..86fca729fbd8 100644 > > --- a/include/uapi/linux/bpf.h > > +++ b/include/uapi/linux/bpf.h > > @@ -7052,6 +7052,11 @@ enum { > > * when SK_BPF_CB_TX_TIMESTAMPING > > * feature is on. > > */ > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > + * is triggered. It's used to correlate > > + * sendmsg timestamp with corresponding > > + * tskey. > > + */ > > }; > > > > /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > index 12b9c4f9c151..4b9739cd3bc5 100644 > > --- a/net/ipv4/tcp.c > > +++ b/net/ipv4/tcp.c > > @@ -492,6 +492,10 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc) > > if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) > > shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1; > > } > > + > > + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && > > + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) > > + bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB); > > } > > > > static bool tcp_stream_is_readable(struct sock *sk, int target) > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > index d3e2988b3b4c..2739ee0154a0 100644 > > --- a/tools/include/uapi/linux/bpf.h > > +++ b/tools/include/uapi/linux/bpf.h > > @@ -7042,6 +7042,11 @@ enum { > > * when SK_BPF_CB_TX_TIMESTAMPING > > * feature is on. > > */ > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > + * is triggered. It's used to correlate > > + * sendmsg timestamp with corresponding > > + * tskey. > > + */ > > Feel free to decline this late in the review process, but a bit more > bikeshedding.. > > Can we spell out TSTAMP instead of TS in these definitions? Within > the context of this series it is self-explanatory, but when reading > kernel code the meaning of a two letter acronym is not that clear. Even though I feel reluctant to change across the whole series because if so, I will adjust in many places. Of course, you're right about the new name being clearer :) > > And instead of SND can we use SENDMSG or something like that? > SND here confused me as the software timestamp is SCM_TSTAMP_SND. I'm not sure about this. For TCP, it's not implemented in the tcp_sendmsg_locked but tcp_tx_timestamp. Well, I have no strong preference. You can make the final call :) Thanks, Jason > > For instance: > > BPF_SOCK_OPS_TSTAMP_SENDMSG_CB, > BPF_SOCK_OPS_TSTAMP_SCHED_CB, > BPF_SOCK_OPS_TSTAMP_SND_SW_CB, > BPF_SOCK_OPS_TSTAMP_SND_HW_CB, > (BPF_SOCK_OPS_TSTAMP_TX_COMPLETION_CB,) > BPF_SOCK_OPS_TSTAMP_ACK_CB, > > (not sure what the OPT in OPT_CB added). > >
On Thu, Feb 20, 2025 at 11:15 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > On Thu, Feb 20, 2025 at 10:55 AM Willem de Bruijn > <willemdebruijn.kernel@gmail.com> wrote: > > > > Jason Xing wrote: > > > This patch introduces a new callback in tcp_tx_timestamp() to correlate > > > tcp_sendmsg timestamp with timestamps from other tx timestamping > > > callbacks (e.g., SND/SW/ACK). > > > > > > Without this patch, BPF program wouldn't know which timestamps belong > > > to which flow because of no socket lock protection. This new callback > > > is inserted in tcp_tx_timestamp() to address this issue because > > > tcp_tx_timestamp() still owns the same socket lock with > > > tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes > > > the timestamping related fields for the skb, especially tskey. The > > > tskey is the bridge to do the correlation. > > > > > > For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and > > > then stores the sendmsg timestamp at the bpf_sk_storage, correlating > > > this timestamp with its tskey that are later used in other sending > > > timestamping callbacks. > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > > > --- > > > include/uapi/linux/bpf.h | 5 +++++ > > > net/ipv4/tcp.c | 4 ++++ > > > tools/include/uapi/linux/bpf.h | 5 +++++ > > > 3 files changed, 14 insertions(+) > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > index 9355d617767f..86fca729fbd8 100644 > > > --- a/include/uapi/linux/bpf.h > > > +++ b/include/uapi/linux/bpf.h > > > @@ -7052,6 +7052,11 @@ enum { > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > * feature is on. > > > */ > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > + * is triggered. It's used to correlate > > > + * sendmsg timestamp with corresponding > > > + * tskey. > > > + */ > > > }; > > > > > > /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > index 12b9c4f9c151..4b9739cd3bc5 100644 > > > --- a/net/ipv4/tcp.c > > > +++ b/net/ipv4/tcp.c > > > @@ -492,6 +492,10 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc) > > > if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) > > > shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1; > > > } > > > + > > > + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && > > > + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) > > > + bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB); > > > } > > > > > > static bool tcp_stream_is_readable(struct sock *sk, int target) > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > > index d3e2988b3b4c..2739ee0154a0 100644 > > > --- a/tools/include/uapi/linux/bpf.h > > > +++ b/tools/include/uapi/linux/bpf.h > > > @@ -7042,6 +7042,11 @@ enum { > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > * feature is on. > > > */ > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > + * is triggered. It's used to correlate > > > + * sendmsg timestamp with corresponding > > > + * tskey. > > > + */ > > > > Feel free to decline this late in the review process, but a bit more > > bikeshedding.. > > > > Can we spell out TSTAMP instead of TS in these definitions? Within > > the context of this series it is self-explanatory, but when reading > > kernel code the meaning of a two letter acronym is not that clear. > > Even though I feel reluctant to change across the whole series because > if so, I will adjust in many places. Of course, you're right about the > new name being clearer :) > > > > > And instead of SND can we use SENDMSG or something like that? > > SND here confused me as the software timestamp is SCM_TSTAMP_SND. > > I'm not sure about this. For TCP, it's not implemented in the > tcp_sendmsg_locked but tcp_tx_timestamp. Well, I have no strong > preference. > > You can make the final call :) After taking a break, I feel full of energy and I will update them all as you requested :) Thanks, Jason
Jason Xing wrote: > On Thu, Feb 20, 2025 at 11:15 AM Jason Xing <kerneljasonxing@gmail.com> wrote: > > > > On Thu, Feb 20, 2025 at 10:55 AM Willem de Bruijn > > <willemdebruijn.kernel@gmail.com> wrote: > > > > > > Jason Xing wrote: > > > > This patch introduces a new callback in tcp_tx_timestamp() to correlate > > > > tcp_sendmsg timestamp with timestamps from other tx timestamping > > > > callbacks (e.g., SND/SW/ACK). > > > > > > > > Without this patch, BPF program wouldn't know which timestamps belong > > > > to which flow because of no socket lock protection. This new callback > > > > is inserted in tcp_tx_timestamp() to address this issue because > > > > tcp_tx_timestamp() still owns the same socket lock with > > > > tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes > > > > the timestamping related fields for the skb, especially tskey. The > > > > tskey is the bridge to do the correlation. > > > > > > > > For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and > > > > then stores the sendmsg timestamp at the bpf_sk_storage, correlating > > > > this timestamp with its tskey that are later used in other sending > > > > timestamping callbacks. > > > > > > > > Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> > > > > --- > > > > include/uapi/linux/bpf.h | 5 +++++ > > > > net/ipv4/tcp.c | 4 ++++ > > > > tools/include/uapi/linux/bpf.h | 5 +++++ > > > > 3 files changed, 14 insertions(+) > > > > > > > > diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h > > > > index 9355d617767f..86fca729fbd8 100644 > > > > --- a/include/uapi/linux/bpf.h > > > > +++ b/include/uapi/linux/bpf.h > > > > @@ -7052,6 +7052,11 @@ enum { > > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > > * feature is on. > > > > */ > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > > + * is triggered. It's used to correlate > > > > + * sendmsg timestamp with corresponding > > > > + * tskey. > > > > + */ > > > > }; > > > > > > > > /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect > > > > diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c > > > > index 12b9c4f9c151..4b9739cd3bc5 100644 > > > > --- a/net/ipv4/tcp.c > > > > +++ b/net/ipv4/tcp.c > > > > @@ -492,6 +492,10 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc) > > > > if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) > > > > shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1; > > > > } > > > > + > > > > + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && > > > > + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) > > > > + bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB); > > > > } > > > > > > > > static bool tcp_stream_is_readable(struct sock *sk, int target) > > > > diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h > > > > index d3e2988b3b4c..2739ee0154a0 100644 > > > > --- a/tools/include/uapi/linux/bpf.h > > > > +++ b/tools/include/uapi/linux/bpf.h > > > > @@ -7042,6 +7042,11 @@ enum { > > > > * when SK_BPF_CB_TX_TIMESTAMPING > > > > * feature is on. > > > > */ > > > > + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall > > > > + * is triggered. It's used to correlate > > > > + * sendmsg timestamp with corresponding > > > > + * tskey. > > > > + */ > > > > > > Feel free to decline this late in the review process, but a bit more > > > bikeshedding.. > > > > > > Can we spell out TSTAMP instead of TS in these definitions? Within > > > the context of this series it is self-explanatory, but when reading > > > kernel code the meaning of a two letter acronym is not that clear. > > > > Even though I feel reluctant to change across the whole series because > > if so, I will adjust in many places. Of course, you're right about the > > new name being clearer :) > > > > > > > > And instead of SND can we use SENDMSG or something like that? > > > SND here confused me as the software timestamp is SCM_TSTAMP_SND. > > > > I'm not sure about this. For TCP, it's not implemented in the > > tcp_sendmsg_locked but tcp_tx_timestamp. Well, I have no strong > > preference. > > > > You can make the final call :) > > After taking a break, I feel full of energy and I will update them all > as you requested :) Awesome, thank you.
diff --git a/include/uapi/linux/bpf.h b/include/uapi/linux/bpf.h index 9355d617767f..86fca729fbd8 100644 --- a/include/uapi/linux/bpf.h +++ b/include/uapi/linux/bpf.h @@ -7052,6 +7052,11 @@ enum { * when SK_BPF_CB_TX_TIMESTAMPING * feature is on. */ + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall + * is triggered. It's used to correlate + * sendmsg timestamp with corresponding + * tskey. + */ }; /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect diff --git a/net/ipv4/tcp.c b/net/ipv4/tcp.c index 12b9c4f9c151..4b9739cd3bc5 100644 --- a/net/ipv4/tcp.c +++ b/net/ipv4/tcp.c @@ -492,6 +492,10 @@ static void tcp_tx_timestamp(struct sock *sk, struct sockcm_cookie *sockc) if (tsflags & SOF_TIMESTAMPING_TX_RECORD_MASK) shinfo->tskey = TCP_SKB_CB(skb)->seq + skb->len - 1; } + + if (cgroup_bpf_enabled(CGROUP_SOCK_OPS) && + SK_BPF_CB_FLAG_TEST(sk, SK_BPF_CB_TX_TIMESTAMPING) && skb) + bpf_skops_tx_timestamping(sk, skb, BPF_SOCK_OPS_TS_SND_CB); } static bool tcp_stream_is_readable(struct sock *sk, int target) diff --git a/tools/include/uapi/linux/bpf.h b/tools/include/uapi/linux/bpf.h index d3e2988b3b4c..2739ee0154a0 100644 --- a/tools/include/uapi/linux/bpf.h +++ b/tools/include/uapi/linux/bpf.h @@ -7042,6 +7042,11 @@ enum { * when SK_BPF_CB_TX_TIMESTAMPING * feature is on. */ + BPF_SOCK_OPS_TS_SND_CB, /* Called when every sendmsg syscall + * is triggered. It's used to correlate + * sendmsg timestamp with corresponding + * tskey. + */ }; /* List of TCP states. There is a build check in net/ipv4/tcp.c to detect
This patch introduces a new callback in tcp_tx_timestamp() to correlate tcp_sendmsg timestamp with timestamps from other tx timestamping callbacks (e.g., SND/SW/ACK). Without this patch, BPF program wouldn't know which timestamps belong to which flow because of no socket lock protection. This new callback is inserted in tcp_tx_timestamp() to address this issue because tcp_tx_timestamp() still owns the same socket lock with tcp_sendmsg_locked() in the meanwhile tcp_tx_timestamp() initializes the timestamping related fields for the skb, especially tskey. The tskey is the bridge to do the correlation. For TCP, BPF program hooks the beginning of tcp_sendmsg_locked() and then stores the sendmsg timestamp at the bpf_sk_storage, correlating this timestamp with its tskey that are later used in other sending timestamping callbacks. Signed-off-by: Jason Xing <kerneljasonxing@gmail.com> --- include/uapi/linux/bpf.h | 5 +++++ net/ipv4/tcp.c | 4 ++++ tools/include/uapi/linux/bpf.h | 5 +++++ 3 files changed, 14 insertions(+)