mbox series

[bpf-next,v7,00/13] net-timestamp: bpf extension to equip applications transparently

Message ID 20250128084620.57547-1-kerneljasonxing@gmail.com (mailing list archive)
Headers show
Series net-timestamp: bpf extension to equip applications transparently | expand

Message

Jason Xing Jan. 28, 2025, 8:46 a.m. UTC
"Timestamping is key to debugging network stack latency. With
SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
network issues can be attributed to the kernel." This is extracted
from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
addressed by Willem de Bruijn at netdevconf 0x17).

There are a few areas that need optimization with the consideration of
easier use and less performance impact, which I highlighted and mainly
discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
uAPI compatibility, extra system call overhead, and the need for
application modification. I initially managed to solve these issues
by writing a kernel module that hooks various key functions. However,
this approach is not suitable for the next kernel release. Therefore,
a BPF extension was proposed. During recent period, Martin KaFai Lau
provides invaluable suggestions about BPF along the way. Many thanks
here!

In this series, I only support foundamental codes and tx for TCP.
This approach mostly relies on existing SO_TIMESTAMPING feature, users
only needs to pass certain flags through bpf_setsocktopt() to a separate
tsflags. Please see the last selftest patch in this series.

After this series, we could step by step implement more advanced
functions/flags already in SO_TIMESTAMPING feature for bpf extension.

---
v7
Link: https://lore.kernel.org/all/20250121012901.87763-1-kerneljasonxing@gmail.com/
1. target bpf-next tree
2. simplely and directly stop timestamping callbacks calling a few BPF
CALLS due to safety concern.
3. add more new testcases and adjust the existing testcases
4. revise some comments of new timestamping callbacks
5. remove a few BPF CGROUP locks

RFC v6
In the meantime, any suggestions and reviews are welcome!
Link: https://lore.kernel.org/all/20250112113748.73504-1-kerneljasonxing@gmail.com/
1. handle those safety problem by using the correct method.
2. support bpf_getsockopt.
3. adjust the position of BPF_SOCK_OPS_TS_TCP_SND_CB
4. fix mishandling the hardware timestamp error
5. add more corresponding tests

v5
Link: https://lore.kernel.org/all/20241207173803.90744-1-kerneljasonxing@gmail.com/
1. handle the safety issus when someone tries to call unrelated bpf
helpers.
2. avoid adding direct function call in the hot path like
__dev_queue_xmit()
3. remove reporting the hardware timestamp and tskey since they can be
fetched through the existing helper with the help of
bpf_skops_init_skb(), please see the selftest.
4. add new sendmsg callback in tcp_sendmsg, and introduce tskey_bpf used
by bpf program to correlate tcp_sendmsg with other hook points in patch [13/15].

v4
Link: https://lore.kernel.org/all/20241028110535.82999-1-kerneljasonxing@gmail.com/
1. introduce sk->sk_bpf_cb_flags to let user use bpf_setsockopt() (Martin)
2. introduce SKBTX_BPF to enable the bpf SO_TIMESTAMPING feature (Martin)
3. introduce bpf map in tests (Martin)
4. I choose to make this series as simple as possible, so I only support
most cases in the tx path for TCP protocol.

v3
Link: https://lore.kernel.org/all/20241012040651.95616-1-kerneljasonxing@gmail.com/
1. support UDP proto by introducing a new generation point.
2. for OPT_ID, introducing sk_tskey_bpf_offset to compute the delta
between the current socket key and bpf socket key. It is desiged for
UDP, which also applies to TCP.
3. support bpf_getsockopt()
4. use cgroup static key instead.
5. add one simple bpf selftest to show how it can be used.
6. remove the rx support from v2 because the number of patches could
exceed the limit of one series.

V2
Link: https://lore.kernel.org/all/20241008095109.99918-1-kerneljasonxing@gmail.com/
1. Introduce tsflag requestors so that we are able to extend more in the
future. Besides, it enables TX flags for bpf extension feature separately
without breaking users. It is suggested by Vadim Fedorenko.
2. introduce a static key to control the whole feature. (Willem)
3. Open the gate of bpf_setsockopt for the SO_TIMESTAMPING feature in
some TX/RX cases, not all the cases.

Jason Xing (13):
  net-timestamp: add support for bpf_setsockopt()
  net-timestamp: prepare for timestamping callbacks use
  bpf: stop unsafely accessing TCP fields in bpf callbacks
  bpf: stop calling some sock_op BPF CALLs in new timestamping callbacks
  net-timestamp: prepare for isolating two modes of SO_TIMESTAMPING
  net-timestamp: support SCM_TSTAMP_SCHED for bpf extension
  net-timestamp: support sw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support hw SCM_TSTAMP_SND for bpf extension
  net-timestamp: support SCM_TSTAMP_ACK for bpf extension
  net-timestamp: make TCP tx timestamp bpf extension work
  net-timestamp: add a new callback in tcp_tx_timestamp()
  net-timestamp: introduce cgroup lock to avoid affecting non-bpf cases
  bpf: add simple bpf tests in the tx path for so_timestamping feature

 include/linux/filter.h                        |   5 +
 include/linux/skbuff.h                        |  25 +-
 include/net/sock.h                            |  10 +
 include/net/tcp.h                             |   4 +-
 include/uapi/linux/bpf.h                      |  35 ++
 net/core/dev.c                                |   5 +-
 net/core/filter.c                             |  48 ++-
 net/core/skbuff.c                             |  65 +++-
 net/core/sock.c                               |  15 +
 net/dsa/user.c                                |   2 +-
 net/ipv4/tcp.c                                |  11 +
 net/ipv4/tcp_input.c                          |   8 +-
 net/ipv4/tcp_output.c                         |   7 +
 net/socket.c                                  |   2 +-
 tools/include/uapi/linux/bpf.h                |  28 ++
 .../bpf/prog_tests/so_timestamping.c          |  86 +++++
 .../selftests/bpf/progs/so_timestamping.c     | 299 ++++++++++++++++++
 17 files changed, 633 insertions(+), 22 deletions(-)
 create mode 100644 tools/testing/selftests/bpf/prog_tests/so_timestamping.c
 create mode 100644 tools/testing/selftests/bpf/progs/so_timestamping.c

Comments

Martin KaFai Lau Feb. 4, 2025, 2:27 a.m. UTC | #1
On 1/28/25 12:46 AM, Jason Xing wrote:
> "Timestamping is key to debugging network stack latency. With
> SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> network issues can be attributed to the kernel." This is extracted
> from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> addressed by Willem de Bruijn at netdevconf 0x17).
> 
> There are a few areas that need optimization with the consideration of
> easier use and less performance impact, which I highlighted and mainly
> discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> uAPI compatibility, extra system call overhead, and the need for
> application modification. I initially managed to solve these issues
> by writing a kernel module that hooks various key functions. However,
> this approach is not suitable for the next kernel release. Therefore,
> a BPF extension was proposed. During recent period, Martin KaFai Lau
> provides invaluable suggestions about BPF along the way. Many thanks
> here!
> 
> In this series, I only support foundamental codes and tx for TCP.

*fundamental*.

May be just "only tx time stamping for TCP is supported..."

> This approach mostly relies on existing SO_TIMESTAMPING feature, users
> only needs to pass certain flags through bpf_setsocktopt() to a separate
> tsflags. Please see the last selftest patch in this series.
> 
> After this series, we could step by step implement more advanced
> functions/flags already in SO_TIMESTAMPING feature for bpf extension.

Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13 
should be "selftests/bpf:" instead of "bpf:" in the subject.

Please revisit the commit messages of this patch set to check for outdated 
comments from the earlier revisions. I may have missed some of them.

Overall, it looks close. I will review at your replies later.

Willem, could you also take a look? Thanks.
Jason Xing Feb. 4, 2025, 2:44 a.m. UTC | #2
On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
>
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > "Timestamping is key to debugging network stack latency. With
> > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > network issues can be attributed to the kernel." This is extracted
> > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > addressed by Willem de Bruijn at netdevconf 0x17).
> >
> > There are a few areas that need optimization with the consideration of
> > easier use and less performance impact, which I highlighted and mainly
> > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > uAPI compatibility, extra system call overhead, and the need for
> > application modification. I initially managed to solve these issues
> > by writing a kernel module that hooks various key functions. However,
> > this approach is not suitable for the next kernel release. Therefore,
> > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > provides invaluable suggestions about BPF along the way. Many thanks
> > here!
> >
> > In this series, I only support foundamental codes and tx for TCP.
>
> *fundamental*.
>
> May be just "only tx time stamping for TCP is supported..."
>
> > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > tsflags. Please see the last selftest patch in this series.
> >
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
>
> Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> should be "selftests/bpf:" instead of "bpf:" in the subject.
>
> Please revisit the commit messages of this patch set to check for outdated
> comments from the earlier revisions. I may have missed some of them.

Roger that, sir. Thanks for your help!

>
> Overall, it looks close. I will review at your replies later.
>
> Willem, could you also take a look? Thanks.

Right, some related parts need reviews from netdev experts as well.

Willem, please help me review this when you're available. No rush :)

Thanks,
Jason
Willem de Bruijn Feb. 4, 2025, 5:06 p.m. UTC | #3
Martin KaFai Lau wrote:
> On 1/28/25 12:46 AM, Jason Xing wrote:
> > "Timestamping is key to debugging network stack latency. With
> > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > network issues can be attributed to the kernel." This is extracted
> > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > addressed by Willem de Bruijn at netdevconf 0x17).
> > 
> > There are a few areas that need optimization with the consideration of
> > easier use and less performance impact, which I highlighted and mainly
> > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > uAPI compatibility, extra system call overhead, and the need for
> > application modification. I initially managed to solve these issues
> > by writing a kernel module that hooks various key functions. However,
> > this approach is not suitable for the next kernel release. Therefore,
> > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > provides invaluable suggestions about BPF along the way. Many thanks
> > here!
> > 
> > In this series, I only support foundamental codes and tx for TCP.
> 
> *fundamental*.
> 
> May be just "only tx time stamping for TCP is supported..."
> 
> > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > tsflags. Please see the last selftest patch in this series.
> > 
> > After this series, we could step by step implement more advanced
> > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> 
> Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13 
> should be "selftests/bpf:" instead of "bpf:" in the subject.
> 
> Please revisit the commit messages of this patch set to check for outdated 
> comments from the earlier revisions. I may have missed some of them.
> 
> Overall, it looks close. I will review at your replies later.
> 
> Willem, could you also take a look? Thanks.

Will do. Traveling, but took a first quick skim.
Willem de Bruijn Feb. 4, 2025, 5:11 p.m. UTC | #4
Jason Xing wrote:
> On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> >
> > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > "Timestamping is key to debugging network stack latency. With
> > > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > > network issues can be attributed to the kernel." This is extracted
> > > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > > addressed by Willem de Bruijn at netdevconf 0x17).
> > >
> > > There are a few areas that need optimization with the consideration of
> > > easier use and less performance impact, which I highlighted and mainly
> > > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > > uAPI compatibility, extra system call overhead, and the need for
> > > application modification. I initially managed to solve these issues
> > > by writing a kernel module that hooks various key functions. However,
> > > this approach is not suitable for the next kernel release. Therefore,
> > > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > > provides invaluable suggestions about BPF along the way. Many thanks
> > > here!
> > >
> > > In this series, I only support foundamental codes and tx for TCP.
> >
> > *fundamental*.
> >
> > May be just "only tx time stamping for TCP is supported..."
> >
> > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > > tsflags. Please see the last selftest patch in this series.
> > >
> > > After this series, we could step by step implement more advanced
> > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> >
> > Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> > should be "selftests/bpf:" instead of "bpf:" in the subject.
> >
> > Please revisit the commit messages of this patch set to check for outdated
> > comments from the earlier revisions. I may have missed some of them.
> 
> Roger that, sir. Thanks for your help!
> 
> >
> > Overall, it looks close. I will review at your replies later.
> >
> > Willem, could you also take a look? Thanks.
> 
> Right, some related parts need reviews from netdev experts as well.
> 
> Willem, please help me review this when you're available. No rush :)

I won't have much to add for the BPF side, to be clear.

One small high level commit message point: as submitting-patches
suggests, use imperative mood: "adds X" when the patch introduces a
feature, not "I add". And "caller gets" rather than "we get".

Specific case, with capitalization issue: "we need to Introduce".

I'll respond to a few inline code elements later. Nothing huge.
Also feel free to post the next version and I'll respond to that, if
you prefer.
Jason Xing Feb. 4, 2025, 6:12 p.m. UTC | #5
On Wed, Feb 5, 2025 at 1:11 AM Willem de Bruijn
<willemdebruijn.kernel@gmail.com> wrote:
>
> Jason Xing wrote:
> > On Tue, Feb 4, 2025 at 10:27 AM Martin KaFai Lau <martin.lau@linux.dev> wrote:
> > >
> > > On 1/28/25 12:46 AM, Jason Xing wrote:
> > > > "Timestamping is key to debugging network stack latency. With
> > > > SO_TIMESTAMPING, bugs that are otherwise incorrectly assumed to be
> > > > network issues can be attributed to the kernel." This is extracted
> > > > from the talk "SO_TIMESTAMPING: Powering Fleetwide RPC Monitoring"
> > > > addressed by Willem de Bruijn at netdevconf 0x17).
> > > >
> > > > There are a few areas that need optimization with the consideration of
> > > > easier use and less performance impact, which I highlighted and mainly
> > > > discussed at netconf 2024 with Willem de Bruijn and John Fastabend:
> > > > uAPI compatibility, extra system call overhead, and the need for
> > > > application modification. I initially managed to solve these issues
> > > > by writing a kernel module that hooks various key functions. However,
> > > > this approach is not suitable for the next kernel release. Therefore,
> > > > a BPF extension was proposed. During recent period, Martin KaFai Lau
> > > > provides invaluable suggestions about BPF along the way. Many thanks
> > > > here!
> > > >
> > > > In this series, I only support foundamental codes and tx for TCP.
> > >
> > > *fundamental*.
> > >
> > > May be just "only tx time stamping for TCP is supported..."
> > >
> > > > This approach mostly relies on existing SO_TIMESTAMPING feature, users
> > > > only needs to pass certain flags through bpf_setsocktopt() to a separate
> > > > tsflags. Please see the last selftest patch in this series.
> > > >
> > > > After this series, we could step by step implement more advanced
> > > > functions/flags already in SO_TIMESTAMPING feature for bpf extension.
> > >
> > > Patch 1-4 and 6-11 can use an extra "bpf:" tag in the subject line. Patch 13
> > > should be "selftests/bpf:" instead of "bpf:" in the subject.
> > >
> > > Please revisit the commit messages of this patch set to check for outdated
> > > comments from the earlier revisions. I may have missed some of them.
> >
> > Roger that, sir. Thanks for your help!
> >
> > >
> > > Overall, it looks close. I will review at your replies later.
> > >
> > > Willem, could you also take a look? Thanks.
> >
> > Right, some related parts need reviews from netdev experts as well.
> >
> > Willem, please help me review this when you're available. No rush :)
>
> I won't have much to add for the BPF side, to be clear.
>
> One small high level commit message point: as submitting-patches
> suggests, use imperative mood: "adds X" when the patch introduces a
> feature, not "I add". And "caller gets" rather than "we get".
>
> Specific case, with capitalization issue: "we need to Introduce".

Thanks for learning a new lesson. I will adjust them.

>
> I'll respond to a few inline code elements later. Nothing huge.
> Also feel free to post the next version and I'll respond to that, if
> you prefer.

I will post v8 soon. Thanks for your precious time. Have fun with your trip :p

Thanks,
Jason