diff mbox series

[net-next,v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets

Message ID 20240301201348.2815102-1-quic_abchauha@quicinc.com (mailing list archive)
State Accepted
Commit 885c36e59f46375c138de18ff1692f18eff67b7f
Delegated to: Netdev Maintainers
Headers show
Series [net-next,v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 5835 this patch: 5835
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: dsahern@kernel.org
netdev/build_clang success Errors and warnings before: 2069 this patch: 2069
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 6150 this patch: 6150
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-03-05--06-00 (tests: 891)

Commit Message

Abhishek Chauhan (ABC) March 1, 2024, 8:13 p.m. UTC
Bridge driver today has no support to forward the userspace timestamp
packets and ends up resetting the timestamp. ETF qdisc checks the
packet coming from userspace and encounters to be 0 thereby dropping
time sensitive packets. These changes will allow userspace timestamps
packets to be forwarded from the bridge to NIC drivers.

Setting the same bit (mono_delivery_time) to avoid dropping of
userspace tstamp packets in the forwarding path.

Existing functionality of mono_delivery_time remains unaltered here,
instead just extended with userspace tstamp support for bridge
forwarding path.

Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>
---
Changes since v3
- Setting mono_delivery_time at all instances where the skb->tstamp is 
  initialized with sockcm.transmit_time as reviewed by Willem
- Removed repetitive comments from all the sources file and limited only
  to skbuff.h as suggested by Willem
- Re-phrased the comment explanation in skbuff.h and made it much simpler 
  and generic as suggested by Willem 

Changes since v2
- Updated the commit subject and message. 
- Took care of few comments from Willem to re-use mono_delivery_time
  with comments and documentations in the header and source file.
- Took care of comment from Andrew on the typo in the comment.
- Existing self-test test cases are executed to make sure existing 
  implementation is not impacted as stated by Paolo.(so_txtime.sh). 
- Internal validation of UDP packets using iperf/so_priority/so_txtime
  with MQPRIO + ETF offload is executed as well.
- Test case is included below

Test 1 :- FQ + ETF (SW path)

[root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh
[  280.640551] q->last time is 1707955476143297550
[  283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready
[  284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready

SO_TXTIME ipv4 clock monotonic
payload:a delay:109 expected:0 (us)

SO_TXTIME ipv6 clock monotonic
payload:a delay:140 expected:0 (us)

SO_TXTIME ipv6 clock monotonic
payload:a delay:12739 expected:10000 (us)

SO_TXTIME ipv4 clock monotonic
payload:a delay:10054 expected:10000 (us)
payload:b delay:20043 expected:20000 (us)

SO_TXTIME ipv6 clock monotonic
payload:b delay:20078 expected:20000 (us)
payload:a delay:20177 expected:20000 (us)

SO_TXTIME ipv4 clock tai
send: pkt a at -1707955482913ms dropped: invalid txtime
[  287.070504] now is set to 1707955482913404839
[  287.070509] tx time from SKB is 0
./so_txtime: recv: timeout: Resource temporarily unavailable

SO_TXTIME ipv6 clock tai
send: pkt a at 0ms dropped: invalid txtime
[  287.070510] q->last time is 0
[  287.420590] now is set to 1707955483263491298
[  287.420596] tx time from SKB is 1707955483263454527
./so_txtime: recv: timeout: Resource temporarily unavailable

SO_TXTIME ipv6 clock tai
[  287.420597] q->last time is 0
[  287.700598] now is set to 1707955483543498954
[  287.700604] tx time from SKB is 1707955483553463173
payload:a delay:9655 expected:10000 (us)

SO_TXTIME ipv4 clock tai
[  287.700605] q->last time is 0
[  288.100532] now is set to 1707955483943432391
[  288.100537] tx time from SKB is 1707955483953413016
payload:a delay:9668 expected:10000 (us)[  288.100538] q->last time is 1707955483553463173

[  288.100546] now is set to 1707955483943446975
[  288.100547] tx time from SKB is 1707955483963413016
payload:b delay:20484 expected:20000 (us)

SO_TXTIME ipv6 clock tai
[  288.100547] q->last time is 1707955483553463173
[  288.440582] now is set to 1707955484283482495
[  288.440587] tx time from SKB is 1707955484303452808
payload:b delay:9648 expected:10000 (us)[  288.440588] q->last time is 1707955483963413016

[  288.440598] now is set to 1707955484283499370
payload:a delay:22037 expected:20000 (us)
[  288.440599] tx time from SKB is 1707955484293452808
OK. All tests passed


Test case 2 (MQPRIO + ETF HW offload)

[root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \
            map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \
            queues 1@0 1@1 1@2 1@3\
            hw 0
[root@ecbldauto-lvarm04-lnx ~]#
tc qdisc replace dev eth0 parent 100:4 etf \
            clockid CLOCK_TAI delta 40000  offload skip_sock_check
[   89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1
[   89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3


[root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2

SO_TXTIME ipv4 clock tai

 glob_tstat = 1707955395256170394
[  199.623650] now is set to 1707955395256215810
[  199.623655] tx time from SKB is 1707955395257170394
[  199.623656] q->last time is 0
[  199.623663] now is set to 1707955395256230029
[  199.623664] tx time from SKB is 1707955395258170394
[  199.623665] q->last time is 0
[  199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec
[  199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec

Changes since v1 
- Changed the commit subject as i am modifying the mono_delivery_time 
  bit with clockid_delivery_time.
- Took care of suggestion mentioned by Willem to use the same bit for 
  userspace delivery time as there are no conflicts between TCP and 
  SCM_TXTIME, because explicit cmsg makes no sense for TCP and only
  RAW and DGRAM sockets interprets it. 
- Clear explaination of why this is needed mentioned below and this 
  is extending the work done by Martin for mono_delivery_time 
  https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/
- Version 1 patch can be referenced with below link which states 
  the exact problem with tc-etf and discussions which took place
  https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ 

 include/linux/skbuff.h | 6 +++---
 net/ipv4/ip_output.c   | 1 +
 net/ipv4/raw.c         | 1 +
 net/ipv6/ip6_output.c  | 2 +-
 net/ipv6/raw.c         | 2 +-
 net/packet/af_packet.c | 4 +++-
 6 files changed, 10 insertions(+), 6 deletions(-)

Comments

Willem de Bruijn March 1, 2024, 10:21 p.m. UTC | #1
Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
> 
> Existing functionality of mono_delivery_time remains unaltered here,
> instead just extended with userspace tstamp support for bridge
> forwarding path.
> 
> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>
patchwork-bot+netdevbpf@kernel.org March 5, 2024, 1 p.m. UTC | #2
Hello:

This patch was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Fri,  1 Mar 2024 12:13:48 -0800 you wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
> 
> [...]

Here is the summary with links:
  - [net-next,v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets
    https://git.kernel.org/netdev/net-next/c/885c36e59f46

You are awesome, thank you!
Martin KaFai Lau March 12, 2024, 11:52 p.m. UTC | #3
On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> Bridge driver today has no support to forward the userspace timestamp
> packets and ends up resetting the timestamp. ETF qdisc checks the
> packet coming from userspace and encounters to be 0 thereby dropping
> time sensitive packets. These changes will allow userspace timestamps
> packets to be forwarded from the bridge to NIC drivers.
> 
> Setting the same bit (mono_delivery_time) to avoid dropping of
> userspace tstamp packets in the forwarding path.
> 
> Existing functionality of mono_delivery_time remains unaltered here,
> instead just extended with userspace tstamp support for bridge
> forwarding path.

The patch currently broke the bpf selftest test_tc_dtime: 
https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675

In particular, there is a uapi field __sk_buff->tstamp_type which currently has 
BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. 
BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at 
ingress or a delivery time set by user space).

__sk_buff->tstamp_type depends on skb->mono_delivery_time which does not 
necessarily mean mono after this patch. I thought about fixing it on the bpf 
side such that reading __sk_buff->tstamp_type only returns 
BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk 
is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().

There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, 
BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the 
skb->mono_delivery_time. The expectation is this could change skb->tstamp in the 
ingress skb and redirect to egress sch_fq. It could also set a mono time to 
skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then 
bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, 
BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects 
BPF_SKB_TSTAMP_DELIVERY_MONO also.

I ran out of idea to solve this uapi breakage.

I am afraid it may need to go back to v1 idea and use another bit 
(user_delivery_time) in the skb.
Abhishek Chauhan (ABC) March 13, 2024, 4:34 a.m. UTC | #4
On 3/12/2024 4:52 PM, Martin KaFai Lau wrote:
> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>> Bridge driver today has no support to forward the userspace timestamp
>> packets and ends up resetting the timestamp. ETF qdisc checks the
>> packet coming from userspace and encounters to be 0 thereby dropping
>> time sensitive packets. These changes will allow userspace timestamps
>> packets to be forwarded from the bridge to NIC drivers.
>>
>> Setting the same bit (mono_delivery_time) to avoid dropping of
>> userspace tstamp packets in the forwarding path.
>>
>> Existing functionality of mono_delivery_time remains unaltered here,
>> instead just extended with userspace tstamp support for bridge
>> forwarding path.
> 
> The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> 
> In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space).
> 
> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> 
> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also.
> 
> I ran out of idea to solve this uapi breakage.
> 
> I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb.
> 
I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. 

Martin what do you suggest ?
Abhishek Chauhan (ABC) March 13, 2024, 5:32 a.m. UTC | #5
On 3/12/2024 9:34 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/12/2024 4:52 PM, Martin KaFai Lau wrote:
>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>> Bridge driver today has no support to forward the userspace timestamp
>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>> packet coming from userspace and encounters to be 0 thereby dropping
>>> time sensitive packets. These changes will allow userspace timestamps
>>> packets to be forwarded from the bridge to NIC drivers.
>>>
>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>> userspace tstamp packets in the forwarding path.
>>>
>>> Existing functionality of mono_delivery_time remains unaltered here,
>>> instead just extended with userspace tstamp support for bridge
>>> forwarding path.
>>
>> The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>
>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space).
>>
>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>
>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>
>> I ran out of idea to solve this uapi breakage.
>>
>> I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb.
>>
> I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. 
> 
> Martin what do you suggest ? 
Sorry i meant Willem. 

Willem do we want to fall back to version 1 where we add an extra bit for userspace timestamp as Martin is facing some issue in one of his test case.
Willem de Bruijn March 13, 2024, 8:52 a.m. UTC | #6
Martin KaFai Lau wrote:
> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> > Bridge driver today has no support to forward the userspace timestamp
> > packets and ends up resetting the timestamp. ETF qdisc checks the
> > packet coming from userspace and encounters to be 0 thereby dropping
> > time sensitive packets. These changes will allow userspace timestamps
> > packets to be forwarded from the bridge to NIC drivers.
> > 
> > Setting the same bit (mono_delivery_time) to avoid dropping of
> > userspace tstamp packets in the forwarding path.
> > 
> > Existing functionality of mono_delivery_time remains unaltered here,
> > instead just extended with userspace tstamp support for bridge
> > forwarding path.
> 
> The patch currently broke the bpf selftest test_tc_dtime: 
> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> 
> In particular, there is a uapi field __sk_buff->tstamp_type which currently has 
> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. 
> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at 
> ingress or a delivery time set by user space).
> 
> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not 
> necessarily mean mono after this patch. I thought about fixing it on the bpf 
> side such that reading __sk_buff->tstamp_type only returns 
> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk 
> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> 
> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, 
> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the 
> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the 
> ingress skb and redirect to egress sch_fq. It could also set a mono time to 
> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then 
> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, 
> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects 
> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> 
> I ran out of idea to solve this uapi breakage.
> 
> I am afraid it may need to go back to v1 idea and use another bit 
> (user_delivery_time) in the skb.

Is the only conflict when bpf_skb_set_tstamp is called for an skb from
a socket with sk_clockid set (and thus SO_TXTIME called)?

Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
skb->sk is NULL is fine. This is the ingress to egress redirect case.

I don't see an immediate use for this BPF function on egress where it
would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.

Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
already explicitly programmed?

    skb->sk &&
    sock_flag(sk, SOCK_TXTIME) &&
    skb->sk->sk_clockid != CLOCK_MONOTONIC

Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
fall back on interpreting as monotonic.
Martin KaFai Lau March 13, 2024, 6:42 p.m. UTC | #7
On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>> Bridge driver today has no support to forward the userspace timestamp
>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>> packet coming from userspace and encounters to be 0 thereby dropping
>>> time sensitive packets. These changes will allow userspace timestamps
>>> packets to be forwarded from the bridge to NIC drivers.
>>>
>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>> userspace tstamp packets in the forwarding path.
>>>
>>> Existing functionality of mono_delivery_time remains unaltered here,
>>> instead just extended with userspace tstamp support for bridge
>>> forwarding path.
>>
>> The patch currently broke the bpf selftest test_tc_dtime:
>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>
>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>> ingress or a delivery time set by user space).
>>
>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>> side such that reading __sk_buff->tstamp_type only returns
>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>
>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>
>> I ran out of idea to solve this uapi breakage.
>>
>> I am afraid it may need to go back to v1 idea and use another bit
>> (user_delivery_time) in the skb.
> 
> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> a socket with sk_clockid set (and thus SO_TXTIME called)?

Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
its interpretation depends on skb->sk->sk_clockid.

> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> skb->sk is NULL is fine. This is the ingress to egress redirect case.

skb->sk == NULL is fine. I tried something like this in 
bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:

__sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:

	skb->mono_delivery_time == 1 &&
	(!skb->sk ||
	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
	 skb->sk->sk_protocol == IPPROTO_TCP)

Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.

> 
> I don't see an immediate use for this BPF function on egress where it
> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.

The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
sending to the sch_fq.

I currently also don't have a use case for skb->sk->sk_clockid != 
CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
before queuing to sch_fq.

The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
between the veth pair.

> 
> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> already explicitly programmed?

This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
acceptable.

> 
>      skb->sk &&
>      sock_flag(sk, SOCK_TXTIME) &&
>      skb->sk->sk_clockid != CLOCK_MONOTONIC

> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> fall back on interpreting as monotonic.

Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.

sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
changes skb and does not change skb->sk.

I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
from looking at __sock_cmsg_send().

There may be a short period of disconnect between what is in sk->sk_flags and 
what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
again after skb->tstamp is set by bpf. This could be considered a small glitch 
for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).

I think all this is crying for another bit in skb to mean user_delivery_time 
(skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
mono time either set by kernel-tcp or bpf. If we need to revert the 
mono_delivery_time reuse patch later, we will need to revert the netdev patch 
and the (to-be-made) bpf patch.
Willem de Bruijn March 13, 2024, 7:36 p.m. UTC | #8
Martin KaFai Lau wrote:
> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> > Martin KaFai Lau wrote:
> >> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>> Bridge driver today has no support to forward the userspace timestamp
> >>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>> packet coming from userspace and encounters to be 0 thereby dropping
> >>> time sensitive packets. These changes will allow userspace timestamps
> >>> packets to be forwarded from the bridge to NIC drivers.
> >>>
> >>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>> userspace tstamp packets in the forwarding path.
> >>>
> >>> Existing functionality of mono_delivery_time remains unaltered here,
> >>> instead just extended with userspace tstamp support for bridge
> >>> forwarding path.
> >>
> >> The patch currently broke the bpf selftest test_tc_dtime:
> >> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>
> >> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >> ingress or a delivery time set by user space).
> >>
> >> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >> side such that reading __sk_buff->tstamp_type only returns
> >> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>
> >> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>
> >> I ran out of idea to solve this uapi breakage.
> >>
> >> I am afraid it may need to go back to v1 idea and use another bit
> >> (user_delivery_time) in the skb.
> > 
> > Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> > a socket with sk_clockid set (and thus SO_TXTIME called)?
> 
> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
> its interpretation depends on skb->sk->sk_clockid.
> 
> > Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> > skb->sk is NULL is fine. This is the ingress to egress redirect case.
> 
> skb->sk == NULL is fine. I tried something like this in 
> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> 
> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> 
> 	skb->mono_delivery_time == 1 &&
> 	(!skb->sk ||
> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> 	 skb->sk->sk_protocol == IPPROTO_TCP)
> 
> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> 
> > 
> > I don't see an immediate use for this BPF function on egress where it
> > would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> > but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> 
> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
> sending to the sch_fq.
> 
> I currently also don't have a use case for skb->sk->sk_clockid != 
> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
> before queuing to sch_fq.
> 
> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
> between the veth pair.
> 
> > 
> > Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> > already explicitly programmed?
> 
> This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
> acceptable.
>
> > 
> >      skb->sk &&
> >      sock_flag(sk, SOCK_TXTIME) &&
> >      skb->sk->sk_clockid != CLOCK_MONOTONIC
> 
> > Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> > fall back on interpreting as monotonic.
> 
> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> 
> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
> changes skb and does not change skb->sk.
> 
> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
> from looking at __sock_cmsg_send().
> 
> There may be a short period of disconnect between what is in sk->sk_flags and 
> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
> again after skb->tstamp is set by bpf. This could be considered a small glitch 
> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> 
> I think all this is crying for another bit in skb to mean user_delivery_time 
> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
> mono time either set by kernel-tcp or bpf.

It does sound like the approach with least side effects.

If we're going to increase to two bits per skb, it's perhaps
better to just encode the (selected supported) CLOCK_ type, rather
than only supporting clockid through skb->sk->sk_clockid.

This BPF function is the analogue to SO_TXTIME. It is clearly
extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
work with sch_etf, say.

> If we need to revert the 
> mono_delivery_time reuse patch later, we will need to revert the netdev patch 
> and the (to-be-made) bpf patch.
Abhishek Chauhan (ABC) March 13, 2024, 8:59 p.m. UTC | #9
On 3/13/2024 12:36 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>
>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>> userspace tstamp packets in the forwarding path.
>>>>>
>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>> instead just extended with userspace tstamp support for bridge
>>>>> forwarding path.
>>>>
>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>
>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>> ingress or a delivery time set by user space).
>>>>
>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>> side such that reading __sk_buff->tstamp_type only returns
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>
>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>
>>>> I ran out of idea to solve this uapi breakage.
>>>>
>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>> (user_delivery_time) in the skb.
>>>
>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>
>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and 
>> its interpretation depends on skb->sk->sk_clockid.
>>
>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>
>> skb->sk == NULL is fine. I tried something like this in 
>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>
>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>
>> 	skb->mono_delivery_time == 1 &&
>> 	(!skb->sk ||
>> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>> 	 skb->sk->sk_protocol == IPPROTO_TCP)
>>
>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>
>>>
>>> I don't see an immediate use for this BPF function on egress where it
>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>
>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all 
>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before 
>> sending to the sch_fq.
>>
>> I currently also don't have a use case for skb->sk->sk_clockid != 
>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now 
>> before queuing to sch_fq.
>>
>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) 
>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() 
>> between the veth pair.
>>
>>>
>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>> already explicitly programmed?
>>
>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not 
>> acceptable.
>>
>>>
>>>      skb->sk &&
>>>      sock_flag(sk, SOCK_TXTIME) &&
>>>      skb->sk->sk_clockid != CLOCK_MONOTONIC
>>
>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>> fall back on interpreting as monotonic.
>>
>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>
>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The 
>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only 
>> changes skb and does not change skb->sk.
>>
>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space 
>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing 
>> from looking at __sock_cmsg_send().
>>
>> There may be a short period of disconnect between what is in sk->sk_flags and 
>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) 
>> again after skb->tstamp is set by bpf. This could be considered a small glitch 
>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>
>> I think all this is crying for another bit in skb to mean user_delivery_time 
>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the 
>> mono time either set by kernel-tcp or bpf.
> 
> It does sound like the approach with least side effects.
> 
Martin and Willem , while we are discussing to use seperate bit per skb to check if its 
SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
called in filter framework
@tc_at_ingress: used within tc_classify to distinguish in/egress
@from_ingress: packet was redirected from the ingress path

Since i believe the above testcase is based on redirecting from ingress to egress part
why cant we use these already existing flags ? Please advice 

I am not completely sure if we can use these flags to solve the bpf problem or not. 
but considering the problem statment put forth by Willem stating that we are just adding 
flags for every usecase and not keeping the design scalable for future needs. 

> If we're going to increase to two bits per skb, it's perhaps
> better to just encode the (selected supported) CLOCK_ type, rather
> than only supporting clockid through skb->sk->sk_clockid.
> 
> This BPF function is the analogue to SO_TXTIME. It is clearly
> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> work with sch_etf, say.
> 
>> If we need to revert the 
>> mono_delivery_time reuse patch later, we will need to revert the netdev patch 
>> and the (to-be-made) bpf patch.
> 
> 
>  
> 
>
Martin KaFai Lau March 13, 2024, 9:01 p.m. UTC | #10
On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>> Martin KaFai Lau wrote:
>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>
>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>> userspace tstamp packets in the forwarding path.
>>>>>
>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>> instead just extended with userspace tstamp support for bridge
>>>>> forwarding path.
>>>>
>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>
>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>> ingress or a delivery time set by user space).
>>>>
>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>> side such that reading __sk_buff->tstamp_type only returns
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>
>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>
>>>> I ran out of idea to solve this uapi breakage.
>>>>
>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>> (user_delivery_time) in the skb.
>>>
>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>
>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>> its interpretation depends on skb->sk->sk_clockid.
>>
>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>
>> skb->sk == NULL is fine. I tried something like this in
>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>
>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>
>> 	skb->mono_delivery_time == 1 &&
>> 	(!skb->sk ||
>> 	 !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>> 	 skb->sk->sk_protocol == IPPROTO_TCP)
>>
>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>
>>>
>>> I don't see an immediate use for this BPF function on egress where it
>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>
>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>> sending to the sch_fq.
>>
>> I currently also don't have a use case for skb->sk->sk_clockid !=
>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>> before queuing to sch_fq.
>>
>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>> between the veth pair.
>>
>>>
>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>> already explicitly programmed?
>>
>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>> acceptable.
>>
>>>
>>>       skb->sk &&
>>>       sock_flag(sk, SOCK_TXTIME) &&
>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
>>
>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>> fall back on interpreting as monotonic.
>>
>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>
>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>> changes skb and does not change skb->sk.
>>
>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>> from looking at __sock_cmsg_send().
>>
>> There may be a short period of disconnect between what is in sk->sk_flags and
>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>
>> I think all this is crying for another bit in skb to mean user_delivery_time
>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>> mono time either set by kernel-tcp or bpf.
> 
> It does sound like the approach with least side effects.
> 
> If we're going to increase to two bits per skb, it's perhaps
> better to just encode the (selected supported) CLOCK_ type, rather
> than only supporting clockid through skb->sk->sk_clockid.

Good idea. May be starting with mono and tai (Abishek's use case?), only forward 
these two clocks and reset the skb->tstamp for others.

> 
> This BPF function is the analogue to SO_TXTIME. It is clearly
> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> work with sch_etf, say.

Yes, if there are bits in skb to describe the clock in the skb->tstamp, 
BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the 
values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.

The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes 
to include the consideration of these two bits. I think we have mostly settled 
with the approach (thanks for the discussion!). Abhishek, not sure how much can 
be reused from this patch for the two bits apporach, do you want to revert the 
current patch first and then start from clean?

> 
>> If we need to revert the
>> mono_delivery_time reuse patch later, we will need to revert the netdev patch
>> and the (to-be-made) bpf patch.
> 
> 
>   
> 
>
Martin KaFai Lau March 13, 2024, 9:19 p.m. UTC | #11
On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote:
>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>> mono time either set by kernel-tcp or bpf.
>>
>> It does sound like the approach with least side effects.
>>
> Martin and Willem , while we are discussing to use seperate bit per skb to check if its
> SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
> called in filter framework
> @tc_at_ingress: used within tc_classify to distinguish in/egress
> @from_ingress: packet was redirected from the ingress path
> 
> Since i believe the above testcase is based on redirecting from ingress to egress part
> why cant we use these already existing flags ? Please advice
> 
> I am not completely sure if we can use these flags to solve the bpf problem or not.

I don't see how they are related. It can tell where a skb is at. It cannot tell 
what is in skb->tstamp.
Abhishek Chauhan (ABC) March 13, 2024, 9:26 p.m. UTC | #12
On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
>> Martin KaFai Lau wrote:
>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>>> Martin KaFai Lau wrote:
>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>>
>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>>> userspace tstamp packets in the forwarding path.
>>>>>>
>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>>> instead just extended with userspace tstamp support for bridge
>>>>>> forwarding path.
>>>>>
>>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>>
>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>>> ingress or a delivery time set by user space).
>>>>>
>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>>> side such that reading __sk_buff->tstamp_type only returns
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>>
>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>>
>>>>> I ran out of idea to solve this uapi breakage.
>>>>>
>>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>>> (user_delivery_time) in the skb.
>>>>
>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>>
>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>>> its interpretation depends on skb->sk->sk_clockid.
>>>
>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>>
>>> skb->sk == NULL is fine. I tried something like this in
>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>>
>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>>
>>>     skb->mono_delivery_time == 1 &&
>>>     (!skb->sk ||
>>>      !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>>>      skb->sk->sk_protocol == IPPROTO_TCP)
>>>
>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>>
>>>>
>>>> I don't see an immediate use for this BPF function on egress where it
>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>>
>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>>> sending to the sch_fq.
>>>
>>> I currently also don't have a use case for skb->sk->sk_clockid !=
>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>>> before queuing to sch_fq.
>>>
>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>>> between the veth pair.
>>>
>>>>
>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>>> already explicitly programmed?
>>>
>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>>> acceptable.
>>>
>>>>
>>>>       skb->sk &&
>>>>       sock_flag(sk, SOCK_TXTIME) &&
>>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
>>>
>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>>> fall back on interpreting as monotonic.
>>>
>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>>
>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>>> changes skb and does not change skb->sk.
>>>
>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>>> from looking at __sock_cmsg_send().
>>>
>>> There may be a short period of disconnect between what is in sk->sk_flags and
>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>>
>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>> mono time either set by kernel-tcp or bpf.
>>
>> It does sound like the approach with least side effects.
>>
>> If we're going to increase to two bits per skb, it's perhaps
>> better to just encode the (selected supported) CLOCK_ type, rather
>> than only supporting clockid through skb->sk->sk_clockid.
> 
> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> 
>>
>> This BPF function is the analogue to SO_TXTIME. It is clearly
>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
>> work with sch_etf, say.
> 
> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> 
> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> 
Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)

Here is what i will do from myside
1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets 
2. Keep mono_delivery_time as ease 
3. Add user_delivery_time as a new bitfield 
4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
5. do not reset the time if either mono_delivery_time or user_delivery_time is set. 

Let me know if i have covered all the design details or add if i missed anything. 

>>
>>> If we need to revert the
>>> mono_delivery_time reuse patch later, we will need to revert the netdev patch
>>> and the (to-be-made) bpf patch.
>>
>>
>>  
>>
>
Willem de Bruijn March 13, 2024, 9:40 p.m. UTC | #13
Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> > On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> >> Martin KaFai Lau wrote:
> >>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> >>>> Martin KaFai Lau wrote:
> >>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>>>>> Bridge driver today has no support to forward the userspace timestamp
> >>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>>>>> packet coming from userspace and encounters to be 0 thereby dropping
> >>>>>> time sensitive packets. These changes will allow userspace timestamps
> >>>>>> packets to be forwarded from the bridge to NIC drivers.
> >>>>>>
> >>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>>>>> userspace tstamp packets in the forwarding path.
> >>>>>>
> >>>>>> Existing functionality of mono_delivery_time remains unaltered here,
> >>>>>> instead just extended with userspace tstamp support for bridge
> >>>>>> forwarding path.
> >>>>>
> >>>>> The patch currently broke the bpf selftest test_tc_dtime:
> >>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>>>>
> >>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >>>>> ingress or a delivery time set by user space).
> >>>>>
> >>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >>>>> side such that reading __sk_buff->tstamp_type only returns
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>>>>
> >>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>>>>
> >>>>> I ran out of idea to solve this uapi breakage.
> >>>>>
> >>>>> I am afraid it may need to go back to v1 idea and use another bit
> >>>>> (user_delivery_time) in the skb.
> >>>>
> >>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> >>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
> >>>
> >>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
> >>> its interpretation depends on skb->sk->sk_clockid.
> >>>
> >>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> >>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
> >>>
> >>> skb->sk == NULL is fine. I tried something like this in
> >>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> >>>
> >>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> >>>
> >>>     skb->mono_delivery_time == 1 &&
> >>>     (!skb->sk ||
> >>>      !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> >>>      skb->sk->sk_protocol == IPPROTO_TCP)
> >>>
> >>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> >>>
> >>>>
> >>>> I don't see an immediate use for this BPF function on egress where it
> >>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> >>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> >>>
> >>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
> >>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
> >>> sending to the sch_fq.
> >>>
> >>> I currently also don't have a use case for skb->sk->sk_clockid !=
> >>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
> >>> before queuing to sch_fq.
> >>>
> >>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
> >>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
> >>> between the veth pair.
> >>>
> >>>>
> >>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> >>>> already explicitly programmed?
> >>>
> >>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
> >>> acceptable.
> >>>
> >>>>
> >>>>       skb->sk &&
> >>>>       sock_flag(sk, SOCK_TXTIME) &&
> >>>>       skb->sk->sk_clockid != CLOCK_MONOTONIC
> >>>
> >>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> >>>> fall back on interpreting as monotonic.
> >>>
> >>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> >>>
> >>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
> >>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
> >>> changes skb and does not change skb->sk.
> >>>
> >>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
> >>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
> >>> from looking at __sock_cmsg_send().
> >>>
> >>> There may be a short period of disconnect between what is in sk->sk_flags and
> >>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
> >>> again after skb->tstamp is set by bpf. This could be considered a small glitch
> >>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> >>>
> >>> I think all this is crying for another bit in skb to mean user_delivery_time
> >>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
> >>> mono time either set by kernel-tcp or bpf.
> >>
> >> It does sound like the approach with least side effects.
> >>
> >> If we're going to increase to two bits per skb, it's perhaps
> >> better to just encode the (selected supported) CLOCK_ type, rather
> >> than only supporting clockid through skb->sk->sk_clockid.
> > 
> > Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> > 
> >>
> >> This BPF function is the analogue to SO_TXTIME. It is clearly
> >> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> >> work with sch_etf, say.
> > 
> > Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> > 
> > The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> > 
> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
> 
> Here is what i will do from myside
> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets 
> 2. Keep mono_delivery_time as ease 
> 3. Add user_delivery_time as a new bitfield 
> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. 
> 
> Let me know if i have covered all the design details or add if i missed anything. 

Thanks for revising this.

No need to add the BPF part here.

I was mistaken that we can encode the clock_id in two skb bits.
SO_TXTIME allows essentially all CLOCK_...

So indeed we will need a user_delivery_time bit and use that to look
at sk_clockid.
Daniel Borkmann March 13, 2024, 9:41 p.m. UTC | #14
On 3/13/24 10:19 PM, Martin KaFai Lau wrote:
> On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote:
>>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>>> mono time either set by kernel-tcp or bpf.
>>>
>>> It does sound like the approach with least side effects.
>>>
>> Martin and Willem , while we are discussing to use seperate bit per skb to check if its
>> SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff
>> called in filter framework
>> @tc_at_ingress: used within tc_classify to distinguish in/egress
>> @from_ingress: packet was redirected from the ingress path
>>
>> Since i believe the above testcase is based on redirecting from ingress to egress part
>> why cant we use these already existing flags ? Please advice
>>
>> I am not completely sure if we can use these flags to solve the bpf problem or not.
> 
> I don't see how they are related. It can tell where a skb is at. It cannot tell what is in skb->tstamp.

+1, if we go that route as per discussion, I'd also prefer clearly encoding delivery
time clock base in skb to avoid ambiguity.
Martin KaFai Lau March 13, 2024, 10:08 p.m. UTC | #15
On 3/13/24 2:40 PM, Willem de Bruijn wrote:
> Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
>>> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
>>>> Martin KaFai Lau wrote:
>>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
>>>>>> Martin KaFai Lau wrote:
>>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
>>>>>>>> Bridge driver today has no support to forward the userspace timestamp
>>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
>>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
>>>>>>>> time sensitive packets. These changes will allow userspace timestamps
>>>>>>>> packets to be forwarded from the bridge to NIC drivers.
>>>>>>>>
>>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
>>>>>>>> userspace tstamp packets in the forwarding path.
>>>>>>>>
>>>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
>>>>>>>> instead just extended with userspace tstamp support for bridge
>>>>>>>> forwarding path.
>>>>>>>
>>>>>>> The patch currently broke the bpf selftest test_tc_dtime:
>>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
>>>>>>>
>>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
>>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
>>>>>>> ingress or a delivery time set by user space).
>>>>>>>
>>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
>>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
>>>>>>> side such that reading __sk_buff->tstamp_type only returns
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
>>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
>>>>>>>
>>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
>>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
>>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
>>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
>>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
>>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
>>>>>>>
>>>>>>> I ran out of idea to solve this uapi breakage.
>>>>>>>
>>>>>>> I am afraid it may need to go back to v1 idea and use another bit
>>>>>>> (user_delivery_time) in the skb.
>>>>>>
>>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
>>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
>>>>>
>>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
>>>>> its interpretation depends on skb->sk->sk_clockid.
>>>>>
>>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
>>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
>>>>>
>>>>> skb->sk == NULL is fine. I tried something like this in
>>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
>>>>>
>>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
>>>>>
>>>>>      skb->mono_delivery_time == 1 &&
>>>>>      (!skb->sk ||
>>>>>       !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
>>>>>       skb->sk->sk_protocol == IPPROTO_TCP)
>>>>>
>>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
>>>>>
>>>>>>
>>>>>> I don't see an immediate use for this BPF function on egress where it
>>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
>>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
>>>>>
>>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
>>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
>>>>> sending to the sch_fq.
>>>>>
>>>>> I currently also don't have a use case for skb->sk->sk_clockid !=
>>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
>>>>> before queuing to sch_fq.
>>>>>
>>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
>>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
>>>>> between the veth pair.
>>>>>
>>>>>>
>>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
>>>>>> already explicitly programmed?
>>>>>
>>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
>>>>> acceptable.
>>>>>
>>>>>>
>>>>>>        skb->sk &&
>>>>>>        sock_flag(sk, SOCK_TXTIME) &&
>>>>>>        skb->sk->sk_clockid != CLOCK_MONOTONIC
>>>>>
>>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
>>>>>> fall back on interpreting as monotonic.
>>>>>
>>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
>>>>>
>>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
>>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
>>>>> changes skb and does not change skb->sk.
>>>>>
>>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
>>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
>>>>> from looking at __sock_cmsg_send().
>>>>>
>>>>> There may be a short period of disconnect between what is in sk->sk_flags and
>>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
>>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
>>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
>>>>>
>>>>> I think all this is crying for another bit in skb to mean user_delivery_time
>>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
>>>>> mono time either set by kernel-tcp or bpf.
>>>>
>>>> It does sound like the approach with least side effects.
>>>>
>>>> If we're going to increase to two bits per skb, it's perhaps
>>>> better to just encode the (selected supported) CLOCK_ type, rather
>>>> than only supporting clockid through skb->sk->sk_clockid.
>>>
>>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
>>>
>>>>
>>>> This BPF function is the analogue to SO_TXTIME. It is clearly
>>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
>>>> work with sch_etf, say.
>>>
>>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
>>>
>>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
>>>
>> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
>>
>> Here is what i will do from myside
>> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets
>> 2. Keep mono_delivery_time as ease
>> 3. Add user_delivery_time as a new bitfield
>> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
>> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set.
>>
>> Let me know if i have covered all the design details or add if i missed anything.
> 
> Thanks for revising this.
> 
> No need to add the BPF part here.
> 
> I was mistaken that we can encode the clock_id in two skb bits.
> SO_TXTIME allows essentially all CLOCK_...

The two bits could potentially only encode the delivery time that is allowed to 
be forwarded without reset. 0 could mean refering back to sk_clockid and don't 
forward. The final consumer of the forwarded skb->tstamp is the qdisc which 
currently only has mono and tai. The concern is more on future extension to 
allow more clock type to be forwarded?

I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on 
how other clock types are used now.

Thanks.

> 
> So indeed we will need a user_delivery_time bit and use that to look
> at sk_clockid.
Willem de Bruijn March 14, 2024, 9:49 a.m. UTC | #16
Martin KaFai Lau wrote:
> On 3/13/24 2:40 PM, Willem de Bruijn wrote:
> > Abhishek Chauhan (ABC) wrote:
> >>
> >>
> >> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote:
> >>> On 3/13/24 12:36 PM, Willem de Bruijn wrote:
> >>>> Martin KaFai Lau wrote:
> >>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote:
> >>>>>> Martin KaFai Lau wrote:
> >>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote:
> >>>>>>>> Bridge driver today has no support to forward the userspace timestamp
> >>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the
> >>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping
> >>>>>>>> time sensitive packets. These changes will allow userspace timestamps
> >>>>>>>> packets to be forwarded from the bridge to NIC drivers.
> >>>>>>>>
> >>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of
> >>>>>>>> userspace tstamp packets in the forwarding path.
> >>>>>>>>
> >>>>>>>> Existing functionality of mono_delivery_time remains unaltered here,
> >>>>>>>> instead just extended with userspace tstamp support for bridge
> >>>>>>>> forwarding path.
> >>>>>>>
> >>>>>>> The patch currently broke the bpf selftest test_tc_dtime:
> >>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675
> >>>>>>>
> >>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time.
> >>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at
> >>>>>>> ingress or a delivery time set by user space).
> >>>>>>>
> >>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not
> >>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf
> >>>>>>> side such that reading __sk_buff->tstamp_type only returns
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk
> >>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp().
> >>>>>>>
> >>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp,
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the
> >>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the
> >>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to
> >>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then
> >>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp,
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects
> >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also.
> >>>>>>>
> >>>>>>> I ran out of idea to solve this uapi breakage.
> >>>>>>>
> >>>>>>> I am afraid it may need to go back to v1 idea and use another bit
> >>>>>>> (user_delivery_time) in the skb.
> >>>>>>
> >>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from
> >>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)?
> >>>>>
> >>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and
> >>>>> its interpretation depends on skb->sk->sk_clockid.
> >>>>>
> >>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and
> >>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case.
> >>>>>
> >>>>> skb->sk == NULL is fine. I tried something like this in
> >>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type:
> >>>>>
> >>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when:
> >>>>>
> >>>>>      skb->mono_delivery_time == 1 &&
> >>>>>      (!skb->sk ||
> >>>>>       !sk_fullsock(skb->sk) /* tcp tw or req sk */ ||
> >>>>>       skb->sk->sk_protocol == IPPROTO_TCP)
> >>>>>
> >>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable.
> >>>>>
> >>>>>>
> >>>>>> I don't see an immediate use for this BPF function on egress where it
> >>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono,
> >>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC.
> >>>>>
> >>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all
> >>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before
> >>>>> sending to the sch_fq.
> >>>>>
> >>>>> I currently also don't have a use case for skb->sk->sk_clockid !=
> >>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now
> >>>>> before queuing to sch_fq.
> >>>>>
> >>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf)
> >>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb()
> >>>>> between the veth pair.
> >>>>>
> >>>>>>
> >>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is
> >>>>>> already explicitly programmed?
> >>>>>
> >>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not
> >>>>> acceptable.
> >>>>>
> >>>>>>
> >>>>>>        skb->sk &&
> >>>>>>        sock_flag(sk, SOCK_TXTIME) &&
> >>>>>>        skb->sk->sk_clockid != CLOCK_MONOTONIC
> >>>>>
> >>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and
> >>>>>> fall back on interpreting as monotonic.
> >>>>>
> >>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also.
> >>>>>
> >>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The
> >>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only
> >>>>> changes skb and does not change skb->sk.
> >>>>>
> >>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space
> >>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing
> >>>>> from looking at __sock_cmsg_send().
> >>>>>
> >>>>> There may be a short period of disconnect between what is in sk->sk_flags and
> >>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME)
> >>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch
> >>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME).
> >>>>>
> >>>>> I think all this is crying for another bit in skb to mean user_delivery_time
> >>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the
> >>>>> mono time either set by kernel-tcp or bpf.
> >>>>
> >>>> It does sound like the approach with least side effects.
> >>>>
> >>>> If we're going to increase to two bits per skb, it's perhaps
> >>>> better to just encode the (selected supported) CLOCK_ type, rather
> >>>> than only supporting clockid through skb->sk->sk_clockid.
> >>>
> >>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others.
> >>>
> >>>>
> >>>> This BPF function is the analogue to SO_TXTIME. It is clearly
> >>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To
> >>>> work with sch_etf, say.
> >>>
> >>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*.
> >>>
> >>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean?
> >>>
> >> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem)
> >>
> >> Here is what i will do from myside
> >> 1. Revert the v4 patch :-  net: Re-use and set mono_delivery_time bit for userspace tstamp packets
> >> 2. Keep mono_delivery_time as ease
> >> 3. Add user_delivery_time as a new bitfield
> >> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support
> >> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set.
> >>
> >> Let me know if i have covered all the design details or add if i missed anything.
> > 
> > Thanks for revising this.
> > 
> > No need to add the BPF part here.
> > 
> > I was mistaken that we can encode the clock_id in two skb bits.
> > SO_TXTIME allows essentially all CLOCK_...
> 
> The two bits could potentially only encode the delivery time that is allowed to 
> be forwarded without reset. 0 could mean refering back to sk_clockid and don't 
> forward. The final consumer of the forwarded skb->tstamp is the qdisc which 
> currently only has mono and tai.

So the followinng meaning of bit pair
{ skb->mono_delivery_time, skb->user_delivery_time } ?
 
- { 0, 0 } legacy skb->tstamp: realtime on rx
- { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
- { 0, 1 } skb->tstamp is tai: analogous to mono case
- { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid

> The concern is more on future extension to 
> allow more clock type to be forwarded?

Right. Not because I have an immediate use for them. Only because
SO_TXTIME already allows configuring other values for sk_clockid.
  
> I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on 
> how other clock types are used now.
> 
> Thanks.
> 
> > 
> > So indeed we will need a user_delivery_time bit and use that to look
> > at sk_clockid.
>
Martin KaFai Lau March 14, 2024, 7:21 p.m. UTC | #17
On 3/14/24 2:49 AM, Willem de Bruijn wrote:
>> The two bits could potentially only encode the delivery time that is allowed to
>> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
>> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
>> currently only has mono and tai.
> 
> So the followinng meaning of bit pair
> { skb->mono_delivery_time, skb->user_delivery_time } ?
>   
> - { 0, 0 } legacy skb->tstamp: realtime on rx
> - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
> - { 0, 1 } skb->tstamp is tai: analogous to mono case
> - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid

I was thinking only forward mono and tai until it is clearer how other clocks 
will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
} at egress will mean go look skb->sk->sk_clockid.

I do like your scheme such that it is much clearer what is in skb->tstamp 
without depending on other bits like tc_at_ingress or not.

"{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
skb->sk->sk_clockid".

The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
changed to handle the new "user_delivery_time" bit anyway, e.g. 
bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
"user_delivery_time" bit.

I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
field. "mono_delivery_time" is already in there.

It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
into a 2 bits field like:

struct sk_buff {
	__u8 tstamp_type:2;
};

enum {
	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */

	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
	 *
	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
	 * such that reading __sk_buff->tstamp_type will match the
	 * SKB_TSTAMP_TYPE_TX_USER.
	 *
	 * The bpf program can learn the clockid by
	 * reading skb->sk->sk_clockid.
	 *
	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
	 * should be disallowed for now until the use case
	 * is more clear. Potentially, we could allow it
	 * in the future as long as
	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
	 */
	SKB_TSTAMP_TYPE_TX_USER = 2,

	/* UNUSED_FOR_FUTURE = 3, */
};

It will have more code churns in the first patch to rename 
s/mono_delivery_time/tstamp_type/.

wdyt?
Willem de Bruijn March 14, 2024, 8:28 p.m. UTC | #18
Martin KaFai Lau wrote:
> On 3/14/24 2:49 AM, Willem de Bruijn wrote:
> >> The two bits could potentially only encode the delivery time that is allowed to
> >> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
> >> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
> >> currently only has mono and tai.
> > 
> > So the followinng meaning of bit pair
> > { skb->mono_delivery_time, skb->user_delivery_time } ?
> >   
> > - { 0, 0 } legacy skb->tstamp: realtime on rx
> > - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
> > - { 0, 1 } skb->tstamp is tai: analogous to mono case
> > - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid
> 
> I was thinking only forward mono and tai until it is clearer how other clocks 
> will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
> other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
> } at egress will mean go look skb->sk->sk_clockid.
> 
> I do like your scheme such that it is much clearer what is in skb->tstamp 
> without depending on other bits like tc_at_ingress or not.
> 
> "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
> for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
> Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
> skb->sk->sk_clockid".
> 
> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
> changed to handle the new "user_delivery_time" bit anyway, e.g. 
> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
> "user_delivery_time" bit.
> 
> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
> field. "mono_delivery_time" is already in there.
> 
> It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
> into a 2 bits field like:
> 
> struct sk_buff {
> 	__u8 tstamp_type:2;
> };
> 
> enum {
> 	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
> 	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
> 
> 	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
> 	 *
> 	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
> 	 * such that reading __sk_buff->tstamp_type will match the
> 	 * SKB_TSTAMP_TYPE_TX_USER.
> 	 *
> 	 * The bpf program can learn the clockid by
> 	 * reading skb->sk->sk_clockid.
> 	 *
> 	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
> 	 * should be disallowed for now until the use case
> 	 * is more clear. Potentially, we could allow it
> 	 * in the future as long as
> 	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
> 	 */
> 	SKB_TSTAMP_TYPE_TX_USER = 2,
> 
> 	/* UNUSED_FOR_FUTURE = 3, */
> };
> 
> It will have more code churns in the first patch to rename 
> s/mono_delivery_time/tstamp_type/.
> 
> wdyt?

I asked for such code churn in the original patch. We then decided to
leave the variable name as is, as the churn was significant.

Long term, it is obviously cleaner.

I don't have a strong opinion. If doing this, let's at least make it
two separate patches, one that is a NOOP rename only.
Abhishek Chauhan (ABC) March 14, 2024, 8:53 p.m. UTC | #19
On 3/14/2024 1:28 PM, Willem de Bruijn wrote:
> Martin KaFai Lau wrote:
>> On 3/14/24 2:49 AM, Willem de Bruijn wrote:
>>>> The two bits could potentially only encode the delivery time that is allowed to
>>>> be forwarded without reset. 0 could mean refering back to sk_clockid and don't
>>>> forward. The final consumer of the forwarded skb->tstamp is the qdisc which
>>>> currently only has mono and tai.
>>>
>>> So the followinng meaning of bit pair
>>> { skb->mono_delivery_time, skb->user_delivery_time } ?
>>>   
>>> - { 0, 0 } legacy skb->tstamp: realtime on rx
>>> - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit
>>> - { 0, 1 } skb->tstamp is tai: analogous to mono case
>>> - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid
>>
>> I was thinking only forward mono and tai until it is clearer how other clocks 
>> will be useful for forwarding between e/ingress. By resetting all skb->tstamp 
>> other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 
>> } at egress will mean go look skb->sk->sk_clockid.
>>
>> I do like your scheme such that it is much clearer what is in skb->tstamp 
>> without depending on other bits like tc_at_ingress or not.
>>
>> "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped 
>> for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed.
>> Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by 
>> skb->sk->sk_clockid".
>>
>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be 
>> changed to handle the new "user_delivery_time" bit anyway, e.g. 
>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the 
>> "user_delivery_time" bit.
>>
>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" 
>> field. "mono_delivery_time" is already in there.
>>
>> It may as well be cleaner to combine mono_delivery_time and user_delivery_time 
>> into a 2 bits field like:
>>
>> struct sk_buff {
>> 	__u8 tstamp_type:2;
>> };
>>
>> enum {
>> 	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>> 	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>
>> 	/* A TX (delivery) time and its clock is in skb->sk->sk_clockid.
>> 	 *
>> 	 * BPF_SKB_TSTAMP_DELIVERY_USER should be added
>> 	 * such that reading __sk_buff->tstamp_type will match the
>> 	 * SKB_TSTAMP_TYPE_TX_USER.
>> 	 *
>> 	 * The bpf program can learn the clockid by
>> 	 * reading skb->sk->sk_clockid.
>> 	 *
>> 	 * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER)
>> 	 * should be disallowed for now until the use case
>> 	 * is more clear. Potentially, we could allow it
>> 	 * in the future as long as
>> 	 * the sock_flag(sk, SOCK_TXTIME) is true at that moment.
>> 	 */
>> 	SKB_TSTAMP_TYPE_TX_USER = 2,
>>
>> 	/* UNUSED_FOR_FUTURE = 3, */
>> };
>>
>> It will have more code churns in the first patch to rename 
>> s/mono_delivery_time/tstamp_type/.
>>
>> wdyt?
> 
> I asked for such code churn in the original patch. We then decided to
> leave the variable name as is, as the churn was significant.
> 
> Long term, it is obviously cleaner.
> 
> I don't have a strong opinion. If doing this, let's at least make it
> two separate patches, one that is a NOOP rename only.
> Martin and Willem.
Lets do the cleaner approach. I feel its now or never. 

1. I will raise one patch to introduce rename mono_delivery_time to 
tstamp_type 
2. I will introduce setting of userspace timestamp type as the second bit 
whem transmit_time is set. 
3. This will be a first step to make the design scalable. 
4. Tomorrow if we have more timestamp to support, upstream community has to do is 
update the enum and increase the bitfield from 2=>3 and so on. 

I need help from Martin to test the patch which renames the mono_delivery_time 
to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)

Sounds like a plan ? 

>
Martin KaFai Lau March 14, 2024, 9:48 p.m. UTC | #20
On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>> "user_delivery_time" bit.
>>>
>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>> field. "mono_delivery_time" is already in there.

[ ... ]

I would think the first step is to revert this patch. I don't think much of the 
current patch can be reused.

> 1. I will raise one patch to introduce rename mono_delivery_time to
> tstamp_type

Right, I expect something like this:

struct sk_buff {
		/* ... */
-	        __u8                    mono_delivery_time:1;
+		__u8			tstamp_type:1;
		/* ... */
};

> 2. I will introduce setting of userspace timestamp type as the second bit
> whem transmit_time is set.

I expect the second patch should be introducing the enum first

enum skb_tstamp_type {
	SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
};

and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
"skb->tstamp_type = 1;"

and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
"skb->tstamp_type = 0;"


This one I am not sure but probably need to change the skb_set_delivery_time() 
function signature also:

static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
-                                        bool mono)
+					 enum skb_tstamp_type tstamp_type)

The third patch is to change tstamp_type from 1 bit to 2 bits and add 
SKB_TSTAMP_TYPE_TX_USER.

struct sk_buff {
		/* ... */
-		__u8			tstamp_type:1;
+		__u8			tstamp_type:2;
		/* ... */
};

enum skb_tstamp_type {
	SKB_TSTAMP_TYPE_RX_REAL = 0,	/* A RX (receive) time in real */
	SKB_TSTAMP_TYPE_TX_MONO = 1,	/* A TX (delivery) time in mono */
+	SKB_TSTAMP_TYPE_TX_USER = 2,	/* A TX (delivery) time and its clock
					 * is in skb->sk->sk_clockid.
					 */
				
};

This will shift a bit out of the byte where tstamp_type lives. It should be the 
"inner_protocol_type" bit by my hand count. Please check if it is directly used 
in bpf instruction (filter.c). As far as I look, it is not, so should be fine. 
Some details about bpf instruction accessible skb bit field here: 
https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/


> 3. This will be a first step to make the design scalable.
> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
> update the enum and increase the bitfield from 2=>3 and so on.
> 
> I need help from Martin to test the patch which renames the mono_delivery_time
> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)

The bpf change is not a no-op rename of mono_delivery_time. It needs to take 
care of the new bit added to the tstamp_type. Please see the previous email (and 
I also left it in the beginning of this email).

Thus, you need to compile the selftests/bpf/ and run it to verify the changes 
when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto 
details. You probably only need the newer llvm (newer gcc should work also as 
bpf CI has been using it) and the newer pahole. I can definitely help if there 
is issue in running the test_progs in selftests/bpf or you have question on 
making the changes in filter.c. To run the test: "./test_progs -t 
tc_redirect/tc_redirect_dtime"
Martin KaFai Lau March 14, 2024, 9:54 p.m. UTC | #21
On 3/14/24 2:48 PM, Martin KaFai Lau wrote:
> 
> I would think the first step is to revert this patch. I don't think much of the 
> current patch can be reused.

My bad. Please ignore. Just catch up on the revert patch you sent.
Abhishek Chauhan (ABC) March 14, 2024, 10:29 p.m. UTC | #22
On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>> "user_delivery_time" bit.
>>>>
>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>> field. "mono_delivery_time" is already in there.
> 
> [ ... ]
> 
> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
> 
>> 1. I will raise one patch to introduce rename mono_delivery_time to
>> tstamp_type
> 
> Right, I expect something like this:
> 
> struct sk_buff {
>         /* ... */
> -            __u8                    mono_delivery_time:1;
> +        __u8            tstamp_type:1;
>         /* ... */
> };
> 

Okay ,This should be straight-forward. 

>> 2. I will introduce setting of userspace timestamp type as the second bit
>> whem transmit_time is set.
> 
> I expect the second patch should be introducing the enum first
> 
> enum skb_tstamp_type {
>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
> };
> 
> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
> "skb->tstamp_type = 1;"
> 
> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
> "skb->tstamp_type = 0;"
> 
> 
> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
> 
> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
> -                                        bool mono)
> +                     enum skb_tstamp_type tstamp_type)
> 
This should be straight-forward as well 

> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
> 
> struct sk_buff {
>         /* ... */
> -        __u8            tstamp_type:1;
> +        __u8            tstamp_type:2;
>         /* ... */
> };
> 
> enum skb_tstamp_type {
>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>                      * is in skb->sk->sk_clockid.
>                      */
>                
> };
> 
> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
But i have plan on this and i know how to do it. 

Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
Fingers crossed :) 

> 
> 
>> 3. This will be a first step to make the design scalable.
>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>> update the enum and increase the bitfield from 2=>3 and so on.
>>
>> I need help from Martin to test the patch which renames the mono_delivery_time
>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
> 
> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
> 
> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>
Abhishek Chauhan (ABC) March 18, 2024, 7:02 p.m. UTC | #23
On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>> "user_delivery_time" bit.
>>>>>
>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>> field. "mono_delivery_time" is already in there.
>>
>> [ ... ]
>>

Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to 
distinguish between timestamp anyway .

Let me know what you think ? 

>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>
>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>> tstamp_type
>>
>> Right, I expect something like this:
>>
>> struct sk_buff {
>>         /* ... */
>> -            __u8                    mono_delivery_time:1;
>> +        __u8            tstamp_type:1;
>>         /* ... */
>> };
>>
> 
> Okay ,This should be straight-forward. 
> 
>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>> whem transmit_time is set.
>>
>> I expect the second patch should be introducing the enum first
>>
>> enum skb_tstamp_type {
>>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>> };
>>
>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>> "skb->tstamp_type = 1;"
>>
>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>> "skb->tstamp_type = 0;"
>>
>>
>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>
>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>> -                                        bool mono)
>> +                     enum skb_tstamp_type tstamp_type)
>>
> This should be straight-forward as well 
> 
>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>
>> struct sk_buff {
>>         /* ... */
>> -        __u8            tstamp_type:1;
>> +        __u8            tstamp_type:2;
>>         /* ... */
>> };
>>
>> enum skb_tstamp_type {
>>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>                      * is in skb->sk->sk_clockid.
>>                      */
>>                
>> };
>>
>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
> But i have plan on this and i know how to do it. 
> 
> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
> Fingers crossed :) 
> 
>>
>>
>>> 3. This will be a first step to make the design scalable.
>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>
>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>
>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>
>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>
Martin KaFai Lau March 19, 2024, 7:46 p.m. UTC | #24
On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>> field. "mono_delivery_time" is already in there.
>>> [ ... ]
>>>
> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
> distinguish between timestamp anyway .


The context for this was before combining mono_delivery_time:1 and 
user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to 
inet_frag_queue because it is combined into tstamp_type:2. If 
mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should 
be done in inet_frag_queue.
Abhishek Chauhan (ABC) March 19, 2024, 8:12 p.m. UTC | #25
On 3/19/2024 12:46 PM, Martin KaFai Lau wrote:
> On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>>> field. "mono_delivery_time" is already in there.
>>>> [ ... ]
>>>>
>> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
>> distinguish between timestamp anyway .
> 
> 
> The context for this was before combining mono_delivery_time:1 and user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to inet_frag_queue because it is combined into tstamp_type:2. If mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should be done in inet_frag_queue.
> 
Thats what i was planning to do. This is more or like the patch 3 - when brings in the changes of two bits. 
Thanks Martin for clarification. 
>
Abhishek Chauhan (ABC) March 20, 2024, 6:22 a.m. UTC | #26
On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>>> "user_delivery_time" bit.
>>>>>>
>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>> field. "mono_delivery_time" is already in there.
>>>
>>> [ ... ]
>>>
> 
> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to 
> distinguish between timestamp anyway .
> 
> Let me know what you think ? 
> 
>>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>>
>>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>>> tstamp_type
>>>
>>> Right, I expect something like this:
>>>
>>> struct sk_buff {
>>>         /* ... */
>>> -            __u8                    mono_delivery_time:1;
>>> +        __u8            tstamp_type:1;
>>>         /* ... */
>>> };
>>>
>>
>> Okay ,This should be straight-forward. 
>>
>>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>>> whem transmit_time is set.
>>>
>>> I expect the second patch should be introducing the enum first
>>>
>>> enum skb_tstamp_type {
>>>     SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>>     SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>> };
>>>
>>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>>> "skb->tstamp_type = 1;"
>>>
>>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>>> "skb->tstamp_type = 0;"
>>>
>>>
>>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>>
>>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>> -                                        bool mono)
>>> +                     enum skb_tstamp_type tstamp_type)
>>>
>> This should be straight-forward as well 
>>
>>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>>
>>> struct sk_buff {
>>>         /* ... */
>>> -        __u8            tstamp_type:1;
>>> +        __u8            tstamp_type:2;
>>>         /* ... */
>>> };
>>>
>>> enum skb_tstamp_type {
>>>     SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>>     SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>>                      * is in skb->sk->sk_clockid.
>>>                      */
>>>                
>>> };
>>>
>>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
>> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. 
>> But i have plan on this and i know how to do it. 
>>
>> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm 
>> Fingers crossed :) 
>>
>>>
>>>
>>>> 3. This will be a first step to make the design scalable.
>>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>>
>>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>>
>>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>>
>>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>>

Martin,
I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ? 

[ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready
[ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
[ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready
[ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready
#214/5   tc_redirect/tc_redirect_dtime:OK
#214     tc_redirect:OK
Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
Martin KaFai Lau March 20, 2024, 8:30 p.m. UTC | #27
On 3/19/24 11:22 PM, Abhishek Chauhan (ABC) wrote:
> 
> 
> On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote:
>>
>>
>> On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote:
>>>
>>>
>>> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote:
>>>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote:
>>>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be
>>>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g.
>>>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the
>>>>>>> "user_delivery_time" bit.
>>>>>>>
>>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time"
>>>>>>> field. "mono_delivery_time" is already in there.
>>>>
>>>> [ ... ]
>>>>
>>
>> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to
>> distinguish between timestamp anyway .
>>
>> Let me know what you think ?
>>
>>>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused.
>>>>
>>>>> 1. I will raise one patch to introduce rename mono_delivery_time to
>>>>> tstamp_type
>>>>
>>>> Right, I expect something like this:
>>>>
>>>> struct sk_buff {
>>>>          /* ... */
>>>> -            __u8                    mono_delivery_time:1;
>>>> +        __u8            tstamp_type:1;
>>>>          /* ... */
>>>> };
>>>>
>>>
>>> Okay ,This should be straight-forward.
>>>
>>>>> 2. I will introduce setting of userspace timestamp type as the second bit
>>>>> whem transmit_time is set.
>>>>
>>>> I expect the second patch should be introducing the enum first
>>>>
>>>> enum skb_tstamp_type {
>>>>      SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */
>>>>      SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */
>>>> };
>>>>
>>>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of
>>>> "skb->tstamp_type = 1;"
>>>>
>>>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of
>>>> "skb->tstamp_type = 0;"
>>>>
>>>>
>>>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also:
>>>>
>>>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt,
>>>> -                                        bool mono)
>>>> +                     enum skb_tstamp_type tstamp_type)
>>>>
>>> This should be straight-forward as well
>>>
>>>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER.
>>>>
>>>> struct sk_buff {
>>>>          /* ... */
>>>> -        __u8            tstamp_type:1;
>>>> +        __u8            tstamp_type:2;
>>>>          /* ... */
>>>> };
>>>>
>>>> enum skb_tstamp_type {
>>>>      SKB_TSTAMP_TYPE_RX_REAL = 0,    /* A RX (receive) time in real */
>>>>      SKB_TSTAMP_TYPE_TX_MONO = 1,    /* A TX (delivery) time in mono */
>>>> +    SKB_TSTAMP_TYPE_TX_USER = 2,    /* A TX (delivery) time and its clock
>>>>                       * is in skb->sk->sk_clockid.
>>>>                       */
>>>>                 
>>>> };
>>>>
>>>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/
>>> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them.
>>> But i have plan on this and i know how to do it.
>>>
>>> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm
>>> Fingers crossed :)
>>>
>>>>
>>>>
>>>>> 3. This will be a first step to make the design scalable.
>>>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is
>>>>> update the enum and increase the bitfield from 2=>3 and so on.
>>>>>
>>>>> I need help from Martin to test the patch which renames the mono_delivery_time
>>>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1)
>>>>
>>>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email).
>>>>
>>>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
>>>>
> 
> Martin,
> I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ?
> 
> [ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready
> [ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready
> [ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready
> [ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready
> #214/5   tc_redirect/tc_redirect_dtime:OK
> #214     tc_redirect:OK

lgtm. thanks.
diff mbox series

Patch

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 2dde34c29203..4726298d4ed4 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -817,9 +817,9 @@  typedef unsigned char *sk_buff_data_t;
  *	@decrypted: Decrypted SKB
  *	@slow_gro: state present at GRO time, slower prepare step required
  *	@mono_delivery_time: When set, skb->tstamp has the
- *		delivery_time in mono clock base (i.e. EDT).  Otherwise, the
- *		skb->tstamp has the (rcv) timestamp at ingress and
- *		delivery_time at egress.
+ *		delivery_time in mono clock base (i.e., EDT) or a clock base chosen
+ *		by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at
+ *		ingress.
  *	@napi_id: id of the NAPI struct this skb came from
  *	@sender_cpu: (aka @napi_id) source CPU in XPS
  *	@alloc_cpu: CPU which did the skb allocation.
diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c
index 5b5a0adb927f..ff1df64c5697 100644
--- a/net/ipv4/ip_output.c
+++ b/net/ipv4/ip_output.c
@@ -1455,6 +1455,7 @@  struct sk_buff *__ip_make_skb(struct sock *sk,
 	skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority);
 	skb->mark = cork->mark;
 	skb->tstamp = cork->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	/*
 	 * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec
 	 * on dst refcount
diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c
index aea89326c697..c4c29fc5b73f 100644
--- a/net/ipv4/raw.c
+++ b/net/ipv4/raw.c
@@ -353,6 +353,7 @@  static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
 	skb->tstamp = sockc->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_dst_set(skb, &rt->dst);
 	*rtp = NULL;
 
diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c
index a722a43dd668..2fc1d03dc07d 100644
--- a/net/ipv6/ip6_output.c
+++ b/net/ipv6/ip6_output.c
@@ -1922,7 +1922,7 @@  struct sk_buff *__ip6_make_skb(struct sock *sk,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = cork->base.mark;
 	skb->tstamp = cork->base.transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	ip6_cork_steal_dst(skb, cork);
 	IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS);
 	if (proto == IPPROTO_ICMPV6) {
diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c
index 03dbb874c363..13f54f8eea35 100644
--- a/net/ipv6/raw.c
+++ b/net/ipv6/raw.c
@@ -616,7 +616,7 @@  static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc->mark;
 	skb->tstamp = sockc->transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_put(skb, length);
 	skb_reset_network_header(skb);
 	iph = ipv6_hdr(skb);
diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c
index c9bbc2686690..0db31ca4982d 100644
--- a/net/packet/af_packet.c
+++ b/net/packet/af_packet.c
@@ -2057,7 +2057,7 @@  static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg,
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = READ_ONCE(sk->sk_mark);
 	skb->tstamp = sockc.transmit_time;
-
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_setup_tx_timestamp(skb, sockc.tsflags);
 
 	if (unlikely(extra_len == 4))
@@ -2586,6 +2586,7 @@  static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb,
 	skb->priority = READ_ONCE(po->sk.sk_priority);
 	skb->mark = READ_ONCE(po->sk.sk_mark);
 	skb->tstamp = sockc->transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 	skb_setup_tx_timestamp(skb, sockc->tsflags);
 	skb_zcopy_set_nouarg(skb, ph.raw);
 
@@ -3064,6 +3065,7 @@  static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len)
 	skb->priority = READ_ONCE(sk->sk_priority);
 	skb->mark = sockc.mark;
 	skb->tstamp = sockc.transmit_time;
+	skb->mono_delivery_time = !!skb->tstamp;
 
 	if (unlikely(extra_len == 4))
 		skb->no_fcs = 1;