Message ID | 20240301201348.2815102-1-quic_abchauha@quicinc.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 885c36e59f46375c138de18ff1692f18eff67b7f |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets | expand |
Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Setting the same bit (mono_delivery_time) to avoid dropping of > userspace tstamp packets in the forwarding path. > > Existing functionality of mono_delivery_time remains unaltered here, > instead just extended with userspace tstamp support for bridge > forwarding path. > > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> Reviewed-by: Willem de Bruijn <willemb@google.com>
Hello: This patch was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni@redhat.com>: On Fri, 1 Mar 2024 12:13:48 -0800 you wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Setting the same bit (mono_delivery_time) to avoid dropping of > userspace tstamp packets in the forwarding path. > > [...] Here is the summary with links: - [net-next,v4] net: Re-use and set mono_delivery_time bit for userspace tstamp packets https://git.kernel.org/netdev/net-next/c/885c36e59f46 You are awesome, thank you!
On 3/1/24 12:13 PM, Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Setting the same bit (mono_delivery_time) to avoid dropping of > userspace tstamp packets in the forwarding path. > > Existing functionality of mono_delivery_time remains unaltered here, > instead just extended with userspace tstamp support for bridge > forwarding path. The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space). __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also. I ran out of idea to solve this uapi breakage. I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb.
On 3/12/2024 4:52 PM, Martin KaFai Lau wrote: > On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >> Bridge driver today has no support to forward the userspace timestamp >> packets and ends up resetting the timestamp. ETF qdisc checks the >> packet coming from userspace and encounters to be 0 thereby dropping >> time sensitive packets. These changes will allow userspace timestamps >> packets to be forwarded from the bridge to NIC drivers. >> >> Setting the same bit (mono_delivery_time) to avoid dropping of >> userspace tstamp packets in the forwarding path. >> >> Existing functionality of mono_delivery_time remains unaltered here, >> instead just extended with userspace tstamp support for bridge >> forwarding path. > > The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 > > In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space). > > __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). > > There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also. > > I ran out of idea to solve this uapi breakage. > > I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb. > I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. Martin what do you suggest ?
On 3/12/2024 9:34 PM, Abhishek Chauhan (ABC) wrote: > > > On 3/12/2024 4:52 PM, Martin KaFai Lau wrote: >> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>> Bridge driver today has no support to forward the userspace timestamp >>> packets and ends up resetting the timestamp. ETF qdisc checks the >>> packet coming from userspace and encounters to be 0 thereby dropping >>> time sensitive packets. These changes will allow userspace timestamps >>> packets to be forwarded from the bridge to NIC drivers. >>> >>> Setting the same bit (mono_delivery_time) to avoid dropping of >>> userspace tstamp packets in the forwarding path. >>> >>> Existing functionality of mono_delivery_time remains unaltered here, >>> instead just extended with userspace tstamp support for bridge >>> forwarding path. >> >> The patch currently broke the bpf selftest test_tc_dtime: https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >> >> In particular, there is a uapi field __sk_buff->tstamp_type which currently has BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at ingress or a delivery time set by user space). >> >> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not necessarily mean mono after this patch. I thought about fixing it on the bpf side such that reading __sk_buff->tstamp_type only returns BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >> >> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the skb->mono_delivery_time. The expectation is this could change skb->tstamp in the ingress skb and redirect to egress sch_fq. It could also set a mono time to skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects BPF_SKB_TSTAMP_DELIVERY_MONO also. >> >> I ran out of idea to solve this uapi breakage. >> >> I am afraid it may need to go back to v1 idea and use another bit (user_delivery_time) in the skb. >> > I am okay to switch back to version 1 of this patch by adding another bit called userspace_time_stamp as that was the initial intent. > > Martin what do you suggest ? Sorry i meant Willem. Willem do we want to fall back to version 1 where we add an extra bit for userspace timestamp as Martin is facing some issue in one of his test case.
Martin KaFai Lau wrote: > On 3/1/24 12:13 PM, Abhishek Chauhan wrote: > > Bridge driver today has no support to forward the userspace timestamp > > packets and ends up resetting the timestamp. ETF qdisc checks the > > packet coming from userspace and encounters to be 0 thereby dropping > > time sensitive packets. These changes will allow userspace timestamps > > packets to be forwarded from the bridge to NIC drivers. > > > > Setting the same bit (mono_delivery_time) to avoid dropping of > > userspace tstamp packets in the forwarding path. > > > > Existing functionality of mono_delivery_time remains unaltered here, > > instead just extended with userspace tstamp support for bridge > > forwarding path. > > The patch currently broke the bpf selftest test_tc_dtime: > https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 > > In particular, there is a uapi field __sk_buff->tstamp_type which currently has > BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. > BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at > ingress or a delivery time set by user space). > > __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not > necessarily mean mono after this patch. I thought about fixing it on the bpf > side such that reading __sk_buff->tstamp_type only returns > BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk > is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). > > There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, > BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the > skb->mono_delivery_time. The expectation is this could change skb->tstamp in the > ingress skb and redirect to egress sch_fq. It could also set a mono time to > skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then > bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, > BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects > BPF_SKB_TSTAMP_DELIVERY_MONO also. > > I ran out of idea to solve this uapi breakage. > > I am afraid it may need to go back to v1 idea and use another bit > (user_delivery_time) in the skb. Is the only conflict when bpf_skb_set_tstamp is called for an skb from a socket with sk_clockid set (and thus SO_TXTIME called)? Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and skb->sk is NULL is fine. This is the ingress to egress redirect case. I don't see an immediate use for this BPF function on egress where it would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is already explicitly programmed? skb->sk && sock_flag(sk, SOCK_TXTIME) && skb->sk->sk_clockid != CLOCK_MONOTONIC Either that, or unset SOCK_TXTIME to make sk_clockid undefined and fall back on interpreting as monotonic.
On 3/13/24 1:52 AM, Willem de Bruijn wrote: > Martin KaFai Lau wrote: >> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>> Bridge driver today has no support to forward the userspace timestamp >>> packets and ends up resetting the timestamp. ETF qdisc checks the >>> packet coming from userspace and encounters to be 0 thereby dropping >>> time sensitive packets. These changes will allow userspace timestamps >>> packets to be forwarded from the bridge to NIC drivers. >>> >>> Setting the same bit (mono_delivery_time) to avoid dropping of >>> userspace tstamp packets in the forwarding path. >>> >>> Existing functionality of mono_delivery_time remains unaltered here, >>> instead just extended with userspace tstamp support for bridge >>> forwarding path. >> >> The patch currently broke the bpf selftest test_tc_dtime: >> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >> >> In particular, there is a uapi field __sk_buff->tstamp_type which currently has >> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. >> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at >> ingress or a delivery time set by user space). >> >> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not >> necessarily mean mono after this patch. I thought about fixing it on the bpf >> side such that reading __sk_buff->tstamp_type only returns >> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk >> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >> >> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, >> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the >> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the >> ingress skb and redirect to egress sch_fq. It could also set a mono time to >> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then >> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, >> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects >> BPF_SKB_TSTAMP_DELIVERY_MONO also. >> >> I ran out of idea to solve this uapi breakage. >> >> I am afraid it may need to go back to v1 idea and use another bit >> (user_delivery_time) in the skb. > > Is the only conflict when bpf_skb_set_tstamp is called for an skb from > a socket with sk_clockid set (and thus SO_TXTIME called)? Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and its interpretation depends on skb->sk->sk_clockid. > Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and > skb->sk is NULL is fine. This is the ingress to egress redirect case. skb->sk == NULL is fine. I tried something like this in bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: skb->mono_delivery_time == 1 && (!skb->sk || !sk_fullsock(skb->sk) /* tcp tw or req sk */ || skb->sk->sk_protocol == IPPROTO_TCP) Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. > > I don't see an immediate use for this BPF function on egress where it > would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, > but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. The bpf prog may act as a traffic shaper that limits the bandwidth usage of all outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before sending to the sch_fq. I currently also don't have a use case for skb->sk->sk_clockid != CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now before queuing to sch_fq. The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() between the veth pair. > > Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is > already explicitly programmed? This will change the existing bpf_skb_set_tstamp() behavior, so probably not acceptable. > > skb->sk && > sock_flag(sk, SOCK_TXTIME) && > skb->sk->sk_clockid != CLOCK_MONOTONIC > Either that, or unset SOCK_TXTIME to make sk_clockid undefined and > fall back on interpreting as monotonic. Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only changes skb and does not change skb->sk. I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing from looking at __sock_cmsg_send(). There may be a short period of disconnect between what is in sk->sk_flags and what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) again after skb->tstamp is set by bpf. This could be considered a small glitch for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). I think all this is crying for another bit in skb to mean user_delivery_time (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the mono time either set by kernel-tcp or bpf. If we need to revert the mono_delivery_time reuse patch later, we will need to revert the netdev patch and the (to-be-made) bpf patch.
Martin KaFai Lau wrote: > On 3/13/24 1:52 AM, Willem de Bruijn wrote: > > Martin KaFai Lau wrote: > >> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: > >>> Bridge driver today has no support to forward the userspace timestamp > >>> packets and ends up resetting the timestamp. ETF qdisc checks the > >>> packet coming from userspace and encounters to be 0 thereby dropping > >>> time sensitive packets. These changes will allow userspace timestamps > >>> packets to be forwarded from the bridge to NIC drivers. > >>> > >>> Setting the same bit (mono_delivery_time) to avoid dropping of > >>> userspace tstamp packets in the forwarding path. > >>> > >>> Existing functionality of mono_delivery_time remains unaltered here, > >>> instead just extended with userspace tstamp support for bridge > >>> forwarding path. > >> > >> The patch currently broke the bpf selftest test_tc_dtime: > >> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 > >> > >> In particular, there is a uapi field __sk_buff->tstamp_type which currently has > >> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. > >> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at > >> ingress or a delivery time set by user space). > >> > >> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not > >> necessarily mean mono after this patch. I thought about fixing it on the bpf > >> side such that reading __sk_buff->tstamp_type only returns > >> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk > >> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). > >> > >> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, > >> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the > >> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the > >> ingress skb and redirect to egress sch_fq. It could also set a mono time to > >> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then > >> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, > >> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects > >> BPF_SKB_TSTAMP_DELIVERY_MONO also. > >> > >> I ran out of idea to solve this uapi breakage. > >> > >> I am afraid it may need to go back to v1 idea and use another bit > >> (user_delivery_time) in the skb. > > > > Is the only conflict when bpf_skb_set_tstamp is called for an skb from > > a socket with sk_clockid set (and thus SO_TXTIME called)? > > Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and > its interpretation depends on skb->sk->sk_clockid. > > > Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and > > skb->sk is NULL is fine. This is the ingress to egress redirect case. > > skb->sk == NULL is fine. I tried something like this in > bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: > > __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: > > skb->mono_delivery_time == 1 && > (!skb->sk || > !sk_fullsock(skb->sk) /* tcp tw or req sk */ || > skb->sk->sk_protocol == IPPROTO_TCP) > > Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. > > > > > I don't see an immediate use for this BPF function on egress where it > > would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, > > but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. > > The bpf prog may act as a traffic shaper that limits the bandwidth usage of all > outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before > sending to the sch_fq. > > I currently also don't have a use case for skb->sk->sk_clockid != > CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now > before queuing to sch_fq. > > The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) > setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() > between the veth pair. > > > > > Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is > > already explicitly programmed? > > This will change the existing bpf_skb_set_tstamp() behavior, so probably not > acceptable. > > > > > skb->sk && > > sock_flag(sk, SOCK_TXTIME) && > > skb->sk->sk_clockid != CLOCK_MONOTONIC > > > Either that, or unset SOCK_TXTIME to make sk_clockid undefined and > > fall back on interpreting as monotonic. > > Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. > > sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The > tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only > changes skb and does not change skb->sk. > > I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space > visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing > from looking at __sock_cmsg_send(). > > There may be a short period of disconnect between what is in sk->sk_flags and > what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) > again after skb->tstamp is set by bpf. This could be considered a small glitch > for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). > > I think all this is crying for another bit in skb to mean user_delivery_time > (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the > mono time either set by kernel-tcp or bpf. It does sound like the approach with least side effects. If we're going to increase to two bits per skb, it's perhaps better to just encode the (selected supported) CLOCK_ type, rather than only supporting clockid through skb->sk->sk_clockid. This BPF function is the analogue to SO_TXTIME. It is clearly extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To work with sch_etf, say. > If we need to revert the > mono_delivery_time reuse patch later, we will need to revert the netdev patch > and the (to-be-made) bpf patch.
On 3/13/2024 12:36 PM, Willem de Bruijn wrote: > Martin KaFai Lau wrote: >> On 3/13/24 1:52 AM, Willem de Bruijn wrote: >>> Martin KaFai Lau wrote: >>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>>>> Bridge driver today has no support to forward the userspace timestamp >>>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>>> packet coming from userspace and encounters to be 0 thereby dropping >>>>> time sensitive packets. These changes will allow userspace timestamps >>>>> packets to be forwarded from the bridge to NIC drivers. >>>>> >>>>> Setting the same bit (mono_delivery_time) to avoid dropping of >>>>> userspace tstamp packets in the forwarding path. >>>>> >>>>> Existing functionality of mono_delivery_time remains unaltered here, >>>>> instead just extended with userspace tstamp support for bridge >>>>> forwarding path. >>>> >>>> The patch currently broke the bpf selftest test_tc_dtime: >>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >>>> >>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has >>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. >>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at >>>> ingress or a delivery time set by user space). >>>> >>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not >>>> necessarily mean mono after this patch. I thought about fixing it on the bpf >>>> side such that reading __sk_buff->tstamp_type only returns >>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk >>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >>>> >>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, >>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the >>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the >>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to >>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then >>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, >>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects >>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. >>>> >>>> I ran out of idea to solve this uapi breakage. >>>> >>>> I am afraid it may need to go back to v1 idea and use another bit >>>> (user_delivery_time) in the skb. >>> >>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from >>> a socket with sk_clockid set (and thus SO_TXTIME called)? >> >> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and >> its interpretation depends on skb->sk->sk_clockid. >> >>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and >>> skb->sk is NULL is fine. This is the ingress to egress redirect case. >> >> skb->sk == NULL is fine. I tried something like this in >> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: >> >> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: >> >> skb->mono_delivery_time == 1 && >> (!skb->sk || >> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || >> skb->sk->sk_protocol == IPPROTO_TCP) >> >> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. >> >>> >>> I don't see an immediate use for this BPF function on egress where it >>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, >>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. >> >> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all >> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before >> sending to the sch_fq. >> >> I currently also don't have a use case for skb->sk->sk_clockid != >> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now >> before queuing to sch_fq. >> >> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) >> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() >> between the veth pair. >> >>> >>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is >>> already explicitly programmed? >> >> This will change the existing bpf_skb_set_tstamp() behavior, so probably not >> acceptable. >> >>> >>> skb->sk && >>> sock_flag(sk, SOCK_TXTIME) && >>> skb->sk->sk_clockid != CLOCK_MONOTONIC >> >>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and >>> fall back on interpreting as monotonic. >> >> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. >> >> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The >> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only >> changes skb and does not change skb->sk. >> >> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space >> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing >> from looking at __sock_cmsg_send(). >> >> There may be a short period of disconnect between what is in sk->sk_flags and >> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) >> again after skb->tstamp is set by bpf. This could be considered a small glitch >> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). >> >> I think all this is crying for another bit in skb to mean user_delivery_time >> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >> mono time either set by kernel-tcp or bpf. > > It does sound like the approach with least side effects. > Martin and Willem , while we are discussing to use seperate bit per skb to check if its SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff called in filter framework @tc_at_ingress: used within tc_classify to distinguish in/egress @from_ingress: packet was redirected from the ingress path Since i believe the above testcase is based on redirecting from ingress to egress part why cant we use these already existing flags ? Please advice I am not completely sure if we can use these flags to solve the bpf problem or not. but considering the problem statment put forth by Willem stating that we are just adding flags for every usecase and not keeping the design scalable for future needs. > If we're going to increase to two bits per skb, it's perhaps > better to just encode the (selected supported) CLOCK_ type, rather > than only supporting clockid through skb->sk->sk_clockid. > > This BPF function is the analogue to SO_TXTIME. It is clearly > extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To > work with sch_etf, say. > >> If we need to revert the >> mono_delivery_time reuse patch later, we will need to revert the netdev patch >> and the (to-be-made) bpf patch. > > > > >
On 3/13/24 12:36 PM, Willem de Bruijn wrote: > Martin KaFai Lau wrote: >> On 3/13/24 1:52 AM, Willem de Bruijn wrote: >>> Martin KaFai Lau wrote: >>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>>>> Bridge driver today has no support to forward the userspace timestamp >>>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>>> packet coming from userspace and encounters to be 0 thereby dropping >>>>> time sensitive packets. These changes will allow userspace timestamps >>>>> packets to be forwarded from the bridge to NIC drivers. >>>>> >>>>> Setting the same bit (mono_delivery_time) to avoid dropping of >>>>> userspace tstamp packets in the forwarding path. >>>>> >>>>> Existing functionality of mono_delivery_time remains unaltered here, >>>>> instead just extended with userspace tstamp support for bridge >>>>> forwarding path. >>>> >>>> The patch currently broke the bpf selftest test_tc_dtime: >>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >>>> >>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has >>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. >>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at >>>> ingress or a delivery time set by user space). >>>> >>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not >>>> necessarily mean mono after this patch. I thought about fixing it on the bpf >>>> side such that reading __sk_buff->tstamp_type only returns >>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk >>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >>>> >>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, >>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the >>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the >>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to >>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then >>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, >>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects >>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. >>>> >>>> I ran out of idea to solve this uapi breakage. >>>> >>>> I am afraid it may need to go back to v1 idea and use another bit >>>> (user_delivery_time) in the skb. >>> >>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from >>> a socket with sk_clockid set (and thus SO_TXTIME called)? >> >> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and >> its interpretation depends on skb->sk->sk_clockid. >> >>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and >>> skb->sk is NULL is fine. This is the ingress to egress redirect case. >> >> skb->sk == NULL is fine. I tried something like this in >> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: >> >> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: >> >> skb->mono_delivery_time == 1 && >> (!skb->sk || >> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || >> skb->sk->sk_protocol == IPPROTO_TCP) >> >> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. >> >>> >>> I don't see an immediate use for this BPF function on egress where it >>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, >>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. >> >> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all >> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before >> sending to the sch_fq. >> >> I currently also don't have a use case for skb->sk->sk_clockid != >> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now >> before queuing to sch_fq. >> >> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) >> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() >> between the veth pair. >> >>> >>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is >>> already explicitly programmed? >> >> This will change the existing bpf_skb_set_tstamp() behavior, so probably not >> acceptable. >> >>> >>> skb->sk && >>> sock_flag(sk, SOCK_TXTIME) && >>> skb->sk->sk_clockid != CLOCK_MONOTONIC >> >>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and >>> fall back on interpreting as monotonic. >> >> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. >> >> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The >> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only >> changes skb and does not change skb->sk. >> >> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space >> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing >> from looking at __sock_cmsg_send(). >> >> There may be a short period of disconnect between what is in sk->sk_flags and >> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) >> again after skb->tstamp is set by bpf. This could be considered a small glitch >> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). >> >> I think all this is crying for another bit in skb to mean user_delivery_time >> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >> mono time either set by kernel-tcp or bpf. > > It does sound like the approach with least side effects. > > If we're going to increase to two bits per skb, it's perhaps > better to just encode the (selected supported) CLOCK_ type, rather > than only supporting clockid through skb->sk->sk_clockid. Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others. > > This BPF function is the analogue to SO_TXTIME. It is clearly > extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To > work with sch_etf, say. Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*. The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean? > >> If we need to revert the >> mono_delivery_time reuse patch later, we will need to revert the netdev patch >> and the (to-be-made) bpf patch. > > > > >
On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote: >>> I think all this is crying for another bit in skb to mean user_delivery_time >>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >>> mono time either set by kernel-tcp or bpf. >> >> It does sound like the approach with least side effects. >> > Martin and Willem , while we are discussing to use seperate bit per skb to check if its > SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff > called in filter framework > @tc_at_ingress: used within tc_classify to distinguish in/egress > @from_ingress: packet was redirected from the ingress path > > Since i believe the above testcase is based on redirecting from ingress to egress part > why cant we use these already existing flags ? Please advice > > I am not completely sure if we can use these flags to solve the bpf problem or not. I don't see how they are related. It can tell where a skb is at. It cannot tell what is in skb->tstamp.
On 3/13/2024 2:01 PM, Martin KaFai Lau wrote: > On 3/13/24 12:36 PM, Willem de Bruijn wrote: >> Martin KaFai Lau wrote: >>> On 3/13/24 1:52 AM, Willem de Bruijn wrote: >>>> Martin KaFai Lau wrote: >>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>>>>> Bridge driver today has no support to forward the userspace timestamp >>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>>>> packet coming from userspace and encounters to be 0 thereby dropping >>>>>> time sensitive packets. These changes will allow userspace timestamps >>>>>> packets to be forwarded from the bridge to NIC drivers. >>>>>> >>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of >>>>>> userspace tstamp packets in the forwarding path. >>>>>> >>>>>> Existing functionality of mono_delivery_time remains unaltered here, >>>>>> instead just extended with userspace tstamp support for bridge >>>>>> forwarding path. >>>>> >>>>> The patch currently broke the bpf selftest test_tc_dtime: >>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >>>>> >>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. >>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at >>>>> ingress or a delivery time set by user space). >>>>> >>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not >>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf >>>>> side such that reading __sk_buff->tstamp_type only returns >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk >>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >>>>> >>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the >>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the >>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to >>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then >>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. >>>>> >>>>> I ran out of idea to solve this uapi breakage. >>>>> >>>>> I am afraid it may need to go back to v1 idea and use another bit >>>>> (user_delivery_time) in the skb. >>>> >>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from >>>> a socket with sk_clockid set (and thus SO_TXTIME called)? >>> >>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and >>> its interpretation depends on skb->sk->sk_clockid. >>> >>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and >>>> skb->sk is NULL is fine. This is the ingress to egress redirect case. >>> >>> skb->sk == NULL is fine. I tried something like this in >>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: >>> >>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: >>> >>> skb->mono_delivery_time == 1 && >>> (!skb->sk || >>> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || >>> skb->sk->sk_protocol == IPPROTO_TCP) >>> >>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. >>> >>>> >>>> I don't see an immediate use for this BPF function on egress where it >>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, >>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. >>> >>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all >>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before >>> sending to the sch_fq. >>> >>> I currently also don't have a use case for skb->sk->sk_clockid != >>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now >>> before queuing to sch_fq. >>> >>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) >>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() >>> between the veth pair. >>> >>>> >>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is >>>> already explicitly programmed? >>> >>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not >>> acceptable. >>> >>>> >>>> skb->sk && >>>> sock_flag(sk, SOCK_TXTIME) && >>>> skb->sk->sk_clockid != CLOCK_MONOTONIC >>> >>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and >>>> fall back on interpreting as monotonic. >>> >>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. >>> >>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The >>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only >>> changes skb and does not change skb->sk. >>> >>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space >>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing >>> from looking at __sock_cmsg_send(). >>> >>> There may be a short period of disconnect between what is in sk->sk_flags and >>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) >>> again after skb->tstamp is set by bpf. This could be considered a small glitch >>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). >>> >>> I think all this is crying for another bit in skb to mean user_delivery_time >>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >>> mono time either set by kernel-tcp or bpf. >> >> It does sound like the approach with least side effects. >> >> If we're going to increase to two bits per skb, it's perhaps >> better to just encode the (selected supported) CLOCK_ type, rather >> than only supporting clockid through skb->sk->sk_clockid. > > Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others. > >> >> This BPF function is the analogue to SO_TXTIME. It is clearly >> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To >> work with sch_etf, say. > > Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*. > > The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean? > Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem) Here is what i will do from myside 1. Revert the v4 patch :- net: Re-use and set mono_delivery_time bit for userspace tstamp packets 2. Keep mono_delivery_time as ease 3. Add user_delivery_time as a new bitfield 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. Let me know if i have covered all the design details or add if i missed anything. >> >>> If we need to revert the >>> mono_delivery_time reuse patch later, we will need to revert the netdev patch >>> and the (to-be-made) bpf patch. >> >> >> >> >
Abhishek Chauhan (ABC) wrote: > > > On 3/13/2024 2:01 PM, Martin KaFai Lau wrote: > > On 3/13/24 12:36 PM, Willem de Bruijn wrote: > >> Martin KaFai Lau wrote: > >>> On 3/13/24 1:52 AM, Willem de Bruijn wrote: > >>>> Martin KaFai Lau wrote: > >>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: > >>>>>> Bridge driver today has no support to forward the userspace timestamp > >>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the > >>>>>> packet coming from userspace and encounters to be 0 thereby dropping > >>>>>> time sensitive packets. These changes will allow userspace timestamps > >>>>>> packets to be forwarded from the bridge to NIC drivers. > >>>>>> > >>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of > >>>>>> userspace tstamp packets in the forwarding path. > >>>>>> > >>>>>> Existing functionality of mono_delivery_time remains unaltered here, > >>>>>> instead just extended with userspace tstamp support for bridge > >>>>>> forwarding path. > >>>>> > >>>>> The patch currently broke the bpf selftest test_tc_dtime: > >>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 > >>>>> > >>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has > >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. > >>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at > >>>>> ingress or a delivery time set by user space). > >>>>> > >>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not > >>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf > >>>>> side such that reading __sk_buff->tstamp_type only returns > >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk > >>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). > >>>>> > >>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, > >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the > >>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the > >>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to > >>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then > >>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, > >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects > >>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. > >>>>> > >>>>> I ran out of idea to solve this uapi breakage. > >>>>> > >>>>> I am afraid it may need to go back to v1 idea and use another bit > >>>>> (user_delivery_time) in the skb. > >>>> > >>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from > >>>> a socket with sk_clockid set (and thus SO_TXTIME called)? > >>> > >>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and > >>> its interpretation depends on skb->sk->sk_clockid. > >>> > >>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and > >>>> skb->sk is NULL is fine. This is the ingress to egress redirect case. > >>> > >>> skb->sk == NULL is fine. I tried something like this in > >>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: > >>> > >>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: > >>> > >>> skb->mono_delivery_time == 1 && > >>> (!skb->sk || > >>> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || > >>> skb->sk->sk_protocol == IPPROTO_TCP) > >>> > >>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. > >>> > >>>> > >>>> I don't see an immediate use for this BPF function on egress where it > >>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, > >>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. > >>> > >>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all > >>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before > >>> sending to the sch_fq. > >>> > >>> I currently also don't have a use case for skb->sk->sk_clockid != > >>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now > >>> before queuing to sch_fq. > >>> > >>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) > >>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() > >>> between the veth pair. > >>> > >>>> > >>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is > >>>> already explicitly programmed? > >>> > >>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not > >>> acceptable. > >>> > >>>> > >>>> skb->sk && > >>>> sock_flag(sk, SOCK_TXTIME) && > >>>> skb->sk->sk_clockid != CLOCK_MONOTONIC > >>> > >>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and > >>>> fall back on interpreting as monotonic. > >>> > >>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. > >>> > >>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The > >>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only > >>> changes skb and does not change skb->sk. > >>> > >>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space > >>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing > >>> from looking at __sock_cmsg_send(). > >>> > >>> There may be a short period of disconnect between what is in sk->sk_flags and > >>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) > >>> again after skb->tstamp is set by bpf. This could be considered a small glitch > >>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). > >>> > >>> I think all this is crying for another bit in skb to mean user_delivery_time > >>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the > >>> mono time either set by kernel-tcp or bpf. > >> > >> It does sound like the approach with least side effects. > >> > >> If we're going to increase to two bits per skb, it's perhaps > >> better to just encode the (selected supported) CLOCK_ type, rather > >> than only supporting clockid through skb->sk->sk_clockid. > > > > Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others. > > > >> > >> This BPF function is the analogue to SO_TXTIME. It is clearly > >> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To > >> work with sch_etf, say. > > > > Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*. > > > > The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean? > > > Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem) > > Here is what i will do from myside > 1. Revert the v4 patch :- net: Re-use and set mono_delivery_time bit for userspace tstamp packets > 2. Keep mono_delivery_time as ease > 3. Add user_delivery_time as a new bitfield > 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support > 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. > > Let me know if i have covered all the design details or add if i missed anything. Thanks for revising this. No need to add the BPF part here. I was mistaken that we can encode the clock_id in two skb bits. SO_TXTIME allows essentially all CLOCK_... So indeed we will need a user_delivery_time bit and use that to look at sk_clockid.
On 3/13/24 10:19 PM, Martin KaFai Lau wrote: > On 3/13/24 1:59 PM, Abhishek Chauhan (ABC) wrote: >>>> I think all this is crying for another bit in skb to mean user_delivery_time >>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >>>> mono time either set by kernel-tcp or bpf. >>> >>> It does sound like the approach with least side effects. >>> >> Martin and Willem , while we are discussing to use seperate bit per skb to check if its >> SO_TXTIME from userspace or (rcv) timestamp . I came across two flags used in skbuff >> called in filter framework >> @tc_at_ingress: used within tc_classify to distinguish in/egress >> @from_ingress: packet was redirected from the ingress path >> >> Since i believe the above testcase is based on redirecting from ingress to egress part >> why cant we use these already existing flags ? Please advice >> >> I am not completely sure if we can use these flags to solve the bpf problem or not. > > I don't see how they are related. It can tell where a skb is at. It cannot tell what is in skb->tstamp. +1, if we go that route as per discussion, I'd also prefer clearly encoding delivery time clock base in skb to avoid ambiguity.
On 3/13/24 2:40 PM, Willem de Bruijn wrote: > Abhishek Chauhan (ABC) wrote: >> >> >> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote: >>> On 3/13/24 12:36 PM, Willem de Bruijn wrote: >>>> Martin KaFai Lau wrote: >>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote: >>>>>> Martin KaFai Lau wrote: >>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: >>>>>>>> Bridge driver today has no support to forward the userspace timestamp >>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the >>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping >>>>>>>> time sensitive packets. These changes will allow userspace timestamps >>>>>>>> packets to be forwarded from the bridge to NIC drivers. >>>>>>>> >>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of >>>>>>>> userspace tstamp packets in the forwarding path. >>>>>>>> >>>>>>>> Existing functionality of mono_delivery_time remains unaltered here, >>>>>>>> instead just extended with userspace tstamp support for bridge >>>>>>>> forwarding path. >>>>>>> >>>>>>> The patch currently broke the bpf selftest test_tc_dtime: >>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 >>>>>>> >>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. >>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at >>>>>>> ingress or a delivery time set by user space). >>>>>>> >>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not >>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf >>>>>>> side such that reading __sk_buff->tstamp_type only returns >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk >>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). >>>>>>> >>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the >>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the >>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to >>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then >>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. >>>>>>> >>>>>>> I ran out of idea to solve this uapi breakage. >>>>>>> >>>>>>> I am afraid it may need to go back to v1 idea and use another bit >>>>>>> (user_delivery_time) in the skb. >>>>>> >>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from >>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)? >>>>> >>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and >>>>> its interpretation depends on skb->sk->sk_clockid. >>>>> >>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and >>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case. >>>>> >>>>> skb->sk == NULL is fine. I tried something like this in >>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: >>>>> >>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: >>>>> >>>>> skb->mono_delivery_time == 1 && >>>>> (!skb->sk || >>>>> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || >>>>> skb->sk->sk_protocol == IPPROTO_TCP) >>>>> >>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. >>>>> >>>>>> >>>>>> I don't see an immediate use for this BPF function on egress where it >>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, >>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. >>>>> >>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all >>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before >>>>> sending to the sch_fq. >>>>> >>>>> I currently also don't have a use case for skb->sk->sk_clockid != >>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now >>>>> before queuing to sch_fq. >>>>> >>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) >>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() >>>>> between the veth pair. >>>>> >>>>>> >>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is >>>>>> already explicitly programmed? >>>>> >>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not >>>>> acceptable. >>>>> >>>>>> >>>>>> skb->sk && >>>>>> sock_flag(sk, SOCK_TXTIME) && >>>>>> skb->sk->sk_clockid != CLOCK_MONOTONIC >>>>> >>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and >>>>>> fall back on interpreting as monotonic. >>>>> >>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. >>>>> >>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The >>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only >>>>> changes skb and does not change skb->sk. >>>>> >>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space >>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing >>>>> from looking at __sock_cmsg_send(). >>>>> >>>>> There may be a short period of disconnect between what is in sk->sk_flags and >>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) >>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch >>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). >>>>> >>>>> I think all this is crying for another bit in skb to mean user_delivery_time >>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the >>>>> mono time either set by kernel-tcp or bpf. >>>> >>>> It does sound like the approach with least side effects. >>>> >>>> If we're going to increase to two bits per skb, it's perhaps >>>> better to just encode the (selected supported) CLOCK_ type, rather >>>> than only supporting clockid through skb->sk->sk_clockid. >>> >>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others. >>> >>>> >>>> This BPF function is the analogue to SO_TXTIME. It is clearly >>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To >>>> work with sch_etf, say. >>> >>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*. >>> >>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean? >>> >> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem) >> >> Here is what i will do from myside >> 1. Revert the v4 patch :- net: Re-use and set mono_delivery_time bit for userspace tstamp packets >> 2. Keep mono_delivery_time as ease >> 3. Add user_delivery_time as a new bitfield >> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support >> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. >> >> Let me know if i have covered all the design details or add if i missed anything. > > Thanks for revising this. > > No need to add the BPF part here. > > I was mistaken that we can encode the clock_id in two skb bits. > SO_TXTIME allows essentially all CLOCK_... The two bits could potentially only encode the delivery time that is allowed to be forwarded without reset. 0 could mean refering back to sk_clockid and don't forward. The final consumer of the forwarded skb->tstamp is the qdisc which currently only has mono and tai. The concern is more on future extension to allow more clock type to be forwarded? I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on how other clock types are used now. Thanks. > > So indeed we will need a user_delivery_time bit and use that to look > at sk_clockid.
Martin KaFai Lau wrote: > On 3/13/24 2:40 PM, Willem de Bruijn wrote: > > Abhishek Chauhan (ABC) wrote: > >> > >> > >> On 3/13/2024 2:01 PM, Martin KaFai Lau wrote: > >>> On 3/13/24 12:36 PM, Willem de Bruijn wrote: > >>>> Martin KaFai Lau wrote: > >>>>> On 3/13/24 1:52 AM, Willem de Bruijn wrote: > >>>>>> Martin KaFai Lau wrote: > >>>>>>> On 3/1/24 12:13 PM, Abhishek Chauhan wrote: > >>>>>>>> Bridge driver today has no support to forward the userspace timestamp > >>>>>>>> packets and ends up resetting the timestamp. ETF qdisc checks the > >>>>>>>> packet coming from userspace and encounters to be 0 thereby dropping > >>>>>>>> time sensitive packets. These changes will allow userspace timestamps > >>>>>>>> packets to be forwarded from the bridge to NIC drivers. > >>>>>>>> > >>>>>>>> Setting the same bit (mono_delivery_time) to avoid dropping of > >>>>>>>> userspace tstamp packets in the forwarding path. > >>>>>>>> > >>>>>>>> Existing functionality of mono_delivery_time remains unaltered here, > >>>>>>>> instead just extended with userspace tstamp support for bridge > >>>>>>>> forwarding path. > >>>>>>> > >>>>>>> The patch currently broke the bpf selftest test_tc_dtime: > >>>>>>> https://github.com/kernel-patches/bpf/actions/runs/8242487344/job/22541746675 > >>>>>>> > >>>>>>> In particular, there is a uapi field __sk_buff->tstamp_type which currently has > >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO to mean skb->tstamp has the MONO "delivery" time. > >>>>>>> BPF_SKB_TSTAMP_UNSPEC means everything else (this could be a rx timestamp at > >>>>>>> ingress or a delivery time set by user space). > >>>>>>> > >>>>>>> __sk_buff->tstamp_type depends on skb->mono_delivery_time which does not > >>>>>>> necessarily mean mono after this patch. I thought about fixing it on the bpf > >>>>>>> side such that reading __sk_buff->tstamp_type only returns > >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO when the skb->mono_delivery_time is set and skb->sk > >>>>>>> is IPPROTO_TCP. However, it won't work because of bpf_skb_set_tstamp(). > >>>>>>> > >>>>>>> There is a bpf helper, bpf_skb_set_tstamp(skb, tstamp, > >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO). This helper changes both the skb->tstamp and the > >>>>>>> skb->mono_delivery_time. The expectation is this could change skb->tstamp in the > >>>>>>> ingress skb and redirect to egress sch_fq. It could also set a mono time to > >>>>>>> skb->tstamp where the udp sk->sk_clockid may not be necessary in mono and then > >>>>>>> bpf_redirect to egress sch_fq. When bpf_skb_set_tstamp(skb, tstamp, > >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO) succeeds, reading __sk_buff->tstamp_type expects > >>>>>>> BPF_SKB_TSTAMP_DELIVERY_MONO also. > >>>>>>> > >>>>>>> I ran out of idea to solve this uapi breakage. > >>>>>>> > >>>>>>> I am afraid it may need to go back to v1 idea and use another bit > >>>>>>> (user_delivery_time) in the skb. > >>>>>> > >>>>>> Is the only conflict when bpf_skb_set_tstamp is called for an skb from > >>>>>> a socket with sk_clockid set (and thus SO_TXTIME called)? > >>>>> > >>>>> Right, because skb->mono_delivery_time does not mean skb->tstamp is mono now and > >>>>> its interpretation depends on skb->sk->sk_clockid. > >>>>> > >>>>>> Interpreting skb->tstamp as mono if skb->mono_delivery_time is set and > >>>>>> skb->sk is NULL is fine. This is the ingress to egress redirect case. > >>>>> > >>>>> skb->sk == NULL is fine. I tried something like this in > >>>>> bpf_convert_tstamp_type_read() for reading __sk_buff->tstamp_type: > >>>>> > >>>>> __sk_buff->tstamp_type is BPF_SKB_TSTAMP_DELIVERY_MONO when: > >>>>> > >>>>> skb->mono_delivery_time == 1 && > >>>>> (!skb->sk || > >>>>> !sk_fullsock(skb->sk) /* tcp tw or req sk */ || > >>>>> skb->sk->sk_protocol == IPPROTO_TCP) > >>>>> > >>>>> Not a small bpf instruction addition to bpf_convert_tstamp_type_read() but doable. > >>>>> > >>>>>> > >>>>>> I don't see an immediate use for this BPF function on egress where it > >>>>>> would overwrite an SO_TXTIME timestamp and now skb->tstamp is mono, > >>>>>> but skb->sk != NULL and skb->sk->sk_clockid != CLOCK_MONOTONIC. > >>>>> > >>>>> The bpf prog may act as a traffic shaper that limits the bandwidth usage of all > >>>>> outgoing packets (tcp/udp/...) by setting the mono EDT in skb->tstamp before > >>>>> sending to the sch_fq. > >>>>> > >>>>> I currently also don't have a use case for skb->sk->sk_clockid != > >>>>> CLOCK_MONOTONIC. However, it is something that bpf_skb_set_tstamp() can do now > >>>>> before queuing to sch_fq. > >>>>> > >>>>> The container (in netns + veth) may use other sk_clockid/qdisc (e.g. sch_etf) > >>>>> setup and the non mono skb->tstamp is not cleared now during dev_forward_skb() > >>>>> between the veth pair. > >>>>> > >>>>>> > >>>>>> Perhaps bpf_skb_set_tstamp() can just fail if another delivery time is > >>>>>> already explicitly programmed? > >>>>> > >>>>> This will change the existing bpf_skb_set_tstamp() behavior, so probably not > >>>>> acceptable. > >>>>> > >>>>>> > >>>>>> skb->sk && > >>>>>> sock_flag(sk, SOCK_TXTIME) && > >>>>>> skb->sk->sk_clockid != CLOCK_MONOTONIC > >>>>> > >>>>>> Either that, or unset SOCK_TXTIME to make sk_clockid undefined and > >>>>>> fall back on interpreting as monotonic. > >>>>> > >>>>> Change sk->sk_flags in tc bpf prog? hmm... I am not sure this will work well also. > >>>>> > >>>>> sock_valbool_flag(SOCK_TXTIME) should require a lock_sock() to make changes. The > >>>>> tc bpf prog cannot take the lock_sock, so bpf_skb_set_tstamp() currently only > >>>>> changes skb and does not change skb->sk. > >>>>> > >>>>> I think changing sock_valbool_flag(SOCK_TXTIME) will also have a new user space > >>>>> visible side effect. The sendmsg for cmsg with SCM_TXTIME will start failing > >>>>> from looking at __sock_cmsg_send(). > >>>>> > >>>>> There may be a short period of disconnect between what is in sk->sk_flags and > >>>>> what is set in skb->tstamp. e.g. what if user space does setsockopt(SO_TXTIME) > >>>>> again after skb->tstamp is set by bpf. This could be considered a small glitch > >>>>> for some amount of skb(s) until the user space settled on setsockopt(SO_TXTIME). > >>>>> > >>>>> I think all this is crying for another bit in skb to mean user_delivery_time > >>>>> (skb->tstamp depends on skb->sk->sk_clockid) while mono_delivery_time is the > >>>>> mono time either set by kernel-tcp or bpf. > >>>> > >>>> It does sound like the approach with least side effects. > >>>> > >>>> If we're going to increase to two bits per skb, it's perhaps > >>>> better to just encode the (selected supported) CLOCK_ type, rather > >>>> than only supporting clockid through skb->sk->sk_clockid. > >>> > >>> Good idea. May be starting with mono and tai (Abishek's use case?), only forward these two clocks and reset the skb->tstamp for others. > >>> > >>>> > >>>> This BPF function is the analogue to SO_TXTIME. It is clearly > >>>> extensible to additional BPF_SKB_TSTAMP_DELIVERY_.. types. To > >>>> work with sch_etf, say. > >>> > >>> Yes, if there are bits in skb to describe the clock in the skb->tstamp, BPF_SKB_TSTAMP_DELIVERY_ can be extended to match it. It will be easier if the values in the skb bits is the same as the BPF_SKB_TSTAMP_DELIVERY_*. > >>> > >>> The bpf_convert_tstamp_*() and the bpf_skb_set_tstamp helper will need changes to include the consideration of these two bits. I think we have mostly settled with the approach (thanks for the discussion!). Abhishek, not sure how much can be reused from this patch for the two bits apporach, do you want to revert the current patch first and then start from clean? > >>> > >> Yes , I think since we have concluded the two bit .(Thanks for discussion again, Martin and Willem) > >> > >> Here is what i will do from myside > >> 1. Revert the v4 patch :- net: Re-use and set mono_delivery_time bit for userspace tstamp packets > >> 2. Keep mono_delivery_time as ease > >> 3. Add user_delivery_time as a new bitfield > >> 4. Add BPF_SKB_TSTAMP_DELIVERY_TAI in the bpf.h for etf support > >> 5. do not reset the time if either mono_delivery_time or user_delivery_time is set. > >> > >> Let me know if i have covered all the design details or add if i missed anything. > > > > Thanks for revising this. > > > > No need to add the BPF part here. > > > > I was mistaken that we can encode the clock_id in two skb bits. > > SO_TXTIME allows essentially all CLOCK_... > > The two bits could potentially only encode the delivery time that is allowed to > be forwarded without reset. 0 could mean refering back to sk_clockid and don't > forward. The final consumer of the forwarded skb->tstamp is the qdisc which > currently only has mono and tai. So the followinng meaning of bit pair { skb->mono_delivery_time, skb->user_delivery_time } ? - { 0, 0 } legacy skb->tstamp: realtime on rx - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit - { 0, 1 } skb->tstamp is tai: analogous to mono case - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid > The concern is more on future extension to > allow more clock type to be forwarded? Right. Not because I have an immediate use for them. Only because SO_TXTIME already allows configuring other values for sk_clockid. > I don't have a use case for having BPF_SKB_TSTAMP_DELIVERY_TAI but curious on > how other clock types are used now. > > Thanks. > > > > > So indeed we will need a user_delivery_time bit and use that to look > > at sk_clockid. >
On 3/14/24 2:49 AM, Willem de Bruijn wrote: >> The two bits could potentially only encode the delivery time that is allowed to >> be forwarded without reset. 0 could mean refering back to sk_clockid and don't >> forward. The final consumer of the forwarded skb->tstamp is the qdisc which >> currently only has mono and tai. > > So the followinng meaning of bit pair > { skb->mono_delivery_time, skb->user_delivery_time } ? > > - { 0, 0 } legacy skb->tstamp: realtime on rx > - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit > - { 0, 1 } skb->tstamp is tai: analogous to mono case > - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid I was thinking only forward mono and tai until it is clearer how other clocks will be useful for forwarding between e/ingress. By resetting all skb->tstamp other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 } at egress will mean go look skb->sk->sk_clockid. I do like your scheme such that it is much clearer what is in skb->tstamp without depending on other bits like tc_at_ingress or not. "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed. Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by skb->sk->sk_clockid". The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be changed to handle the new "user_delivery_time" bit anyway, e.g. bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the "user_delivery_time" bit. I think the "struct inet_frag_queue" also needs a new "user_delivery_time" field. "mono_delivery_time" is already in there. It may as well be cleaner to combine mono_delivery_time and user_delivery_time into a 2 bits field like: struct sk_buff { __u8 tstamp_type:2; }; enum { SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ /* A TX (delivery) time and its clock is in skb->sk->sk_clockid. * * BPF_SKB_TSTAMP_DELIVERY_USER should be added * such that reading __sk_buff->tstamp_type will match the * SKB_TSTAMP_TYPE_TX_USER. * * The bpf program can learn the clockid by * reading skb->sk->sk_clockid. * * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER) * should be disallowed for now until the use case * is more clear. Potentially, we could allow it * in the future as long as * the sock_flag(sk, SOCK_TXTIME) is true at that moment. */ SKB_TSTAMP_TYPE_TX_USER = 2, /* UNUSED_FOR_FUTURE = 3, */ }; It will have more code churns in the first patch to rename s/mono_delivery_time/tstamp_type/. wdyt?
Martin KaFai Lau wrote: > On 3/14/24 2:49 AM, Willem de Bruijn wrote: > >> The two bits could potentially only encode the delivery time that is allowed to > >> be forwarded without reset. 0 could mean refering back to sk_clockid and don't > >> forward. The final consumer of the forwarded skb->tstamp is the qdisc which > >> currently only has mono and tai. > > > > So the followinng meaning of bit pair > > { skb->mono_delivery_time, skb->user_delivery_time } ? > > > > - { 0, 0 } legacy skb->tstamp: realtime on rx > > - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit > > - { 0, 1 } skb->tstamp is tai: analogous to mono case > > - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid > > I was thinking only forward mono and tai until it is clearer how other clocks > will be useful for forwarding between e/ingress. By resetting all skb->tstamp > other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 > } at egress will mean go look skb->sk->sk_clockid. > > I do like your scheme such that it is much clearer what is in skb->tstamp > without depending on other bits like tc_at_ingress or not. > > "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped > for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed. > Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by > skb->sk->sk_clockid". > > The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be > changed to handle the new "user_delivery_time" bit anyway, e.g. > bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the > "user_delivery_time" bit. > > I think the "struct inet_frag_queue" also needs a new "user_delivery_time" > field. "mono_delivery_time" is already in there. > > It may as well be cleaner to combine mono_delivery_time and user_delivery_time > into a 2 bits field like: > > struct sk_buff { > __u8 tstamp_type:2; > }; > > enum { > SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ > SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ > > /* A TX (delivery) time and its clock is in skb->sk->sk_clockid. > * > * BPF_SKB_TSTAMP_DELIVERY_USER should be added > * such that reading __sk_buff->tstamp_type will match the > * SKB_TSTAMP_TYPE_TX_USER. > * > * The bpf program can learn the clockid by > * reading skb->sk->sk_clockid. > * > * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER) > * should be disallowed for now until the use case > * is more clear. Potentially, we could allow it > * in the future as long as > * the sock_flag(sk, SOCK_TXTIME) is true at that moment. > */ > SKB_TSTAMP_TYPE_TX_USER = 2, > > /* UNUSED_FOR_FUTURE = 3, */ > }; > > It will have more code churns in the first patch to rename > s/mono_delivery_time/tstamp_type/. > > wdyt? I asked for such code churn in the original patch. We then decided to leave the variable name as is, as the churn was significant. Long term, it is obviously cleaner. I don't have a strong opinion. If doing this, let's at least make it two separate patches, one that is a NOOP rename only.
On 3/14/2024 1:28 PM, Willem de Bruijn wrote: > Martin KaFai Lau wrote: >> On 3/14/24 2:49 AM, Willem de Bruijn wrote: >>>> The two bits could potentially only encode the delivery time that is allowed to >>>> be forwarded without reset. 0 could mean refering back to sk_clockid and don't >>>> forward. The final consumer of the forwarded skb->tstamp is the qdisc which >>>> currently only has mono and tai. >>> >>> So the followinng meaning of bit pair >>> { skb->mono_delivery_time, skb->user_delivery_time } ? >>> >>> - { 0, 0 } legacy skb->tstamp: realtime on rx >>> - { 1, 0 } skb->tstamp is mono: existing behavior of mono_delivery_time bit >>> - { 0, 1 } skb->tstamp is tai: analogous to mono case >>> - { 1, 1 } skb->tstamp defined by skb->sk->sk_clockid >> >> I was thinking only forward mono and tai until it is clearer how other clocks >> will be useful for forwarding between e/ingress. By resetting all skb->tstamp >> other than mono and tai, { 0, 0 } at ingress will mean realtime on rx and { 0, 0 >> } at egress will mean go look skb->sk->sk_clockid. >> >> I do like your scheme such that it is much clearer what is in skb->tstamp >> without depending on other bits like tc_at_ingress or not. >> >> "{ 0, 1 } skb->tstamp is tai: analogous to mono case" can probably be dropped >> for now until bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_TAI) is needed. >> Otherwise, it is mostly a duplicate of "{ 1, 1 } skb->tstamp defined by >> skb->sk->sk_clockid". >> >> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >> changed to handle the new "user_delivery_time" bit anyway, e.g. >> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >> "user_delivery_time" bit. >> >> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >> field. "mono_delivery_time" is already in there. >> >> It may as well be cleaner to combine mono_delivery_time and user_delivery_time >> into a 2 bits field like: >> >> struct sk_buff { >> __u8 tstamp_type:2; >> }; >> >> enum { >> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >> >> /* A TX (delivery) time and its clock is in skb->sk->sk_clockid. >> * >> * BPF_SKB_TSTAMP_DELIVERY_USER should be added >> * such that reading __sk_buff->tstamp_type will match the >> * SKB_TSTAMP_TYPE_TX_USER. >> * >> * The bpf program can learn the clockid by >> * reading skb->sk->sk_clockid. >> * >> * bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_USER) >> * should be disallowed for now until the use case >> * is more clear. Potentially, we could allow it >> * in the future as long as >> * the sock_flag(sk, SOCK_TXTIME) is true at that moment. >> */ >> SKB_TSTAMP_TYPE_TX_USER = 2, >> >> /* UNUSED_FOR_FUTURE = 3, */ >> }; >> >> It will have more code churns in the first patch to rename >> s/mono_delivery_time/tstamp_type/. >> >> wdyt? > > I asked for such code churn in the original patch. We then decided to > leave the variable name as is, as the churn was significant. > > Long term, it is obviously cleaner. > > I don't have a strong opinion. If doing this, let's at least make it > two separate patches, one that is a NOOP rename only. > Martin and Willem. Lets do the cleaner approach. I feel its now or never. 1. I will raise one patch to introduce rename mono_delivery_time to tstamp_type 2. I will introduce setting of userspace timestamp type as the second bit whem transmit_time is set. 3. This will be a first step to make the design scalable. 4. Tomorrow if we have more timestamp to support, upstream community has to do is update the enum and increase the bitfield from 2=>3 and so on. I need help from Martin to test the patch which renames the mono_delivery_time to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) Sounds like a plan ? >
On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote: >>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >>> changed to handle the new "user_delivery_time" bit anyway, e.g. >>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >>> "user_delivery_time" bit. >>> >>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>> field. "mono_delivery_time" is already in there. [ ... ] I would think the first step is to revert this patch. I don't think much of the current patch can be reused. > 1. I will raise one patch to introduce rename mono_delivery_time to > tstamp_type Right, I expect something like this: struct sk_buff { /* ... */ - __u8 mono_delivery_time:1; + __u8 tstamp_type:1; /* ... */ }; > 2. I will introduce setting of userspace timestamp type as the second bit > whem transmit_time is set. I expect the second patch should be introducing the enum first enum skb_tstamp_type { SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ }; and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of "skb->tstamp_type = 1;" and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of "skb->tstamp_type = 0;" This one I am not sure but probably need to change the skb_set_delivery_time() function signature also: static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, - bool mono) + enum skb_tstamp_type tstamp_type) The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER. struct sk_buff { /* ... */ - __u8 tstamp_type:1; + __u8 tstamp_type:2; /* ... */ }; enum skb_tstamp_type { SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock * is in skb->sk->sk_clockid. */ }; This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/ > 3. This will be a first step to make the design scalable. > 4. Tomorrow if we have more timestamp to support, upstream community has to do is > update the enum and increase the bitfield from 2=>3 and so on. > > I need help from Martin to test the patch which renames the mono_delivery_time > to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email). Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime"
On 3/14/24 2:48 PM, Martin KaFai Lau wrote: > > I would think the first step is to revert this patch. I don't think much of the > current patch can be reused. My bad. Please ignore. Just catch up on the revert patch you sent.
On 3/14/2024 2:48 PM, Martin KaFai Lau wrote: > On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote: >>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >>>> changed to handle the new "user_delivery_time" bit anyway, e.g. >>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >>>> "user_delivery_time" bit. >>>> >>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>> field. "mono_delivery_time" is already in there. > > [ ... ] > > I would think the first step is to revert this patch. I don't think much of the current patch can be reused. > >> 1. I will raise one patch to introduce rename mono_delivery_time to >> tstamp_type > > Right, I expect something like this: > > struct sk_buff { > /* ... */ > - __u8 mono_delivery_time:1; > + __u8 tstamp_type:1; > /* ... */ > }; > Okay ,This should be straight-forward. >> 2. I will introduce setting of userspace timestamp type as the second bit >> whem transmit_time is set. > > I expect the second patch should be introducing the enum first > > enum skb_tstamp_type { > SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ > SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ > }; > > and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of > "skb->tstamp_type = 1;" > > and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of > "skb->tstamp_type = 0;" > > > This one I am not sure but probably need to change the skb_set_delivery_time() function signature also: > > static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, > - bool mono) > + enum skb_tstamp_type tstamp_type) > This should be straight-forward as well > The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER. > > struct sk_buff { > /* ... */ > - __u8 tstamp_type:1; > + __u8 tstamp_type:2; > /* ... */ > }; > > enum skb_tstamp_type { > SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ > SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ > + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock > * is in skb->sk->sk_clockid. > */ > > }; > > This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/ This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. But i have plan on this and i know how to do it. Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm Fingers crossed :) > > >> 3. This will be a first step to make the design scalable. >> 4. Tomorrow if we have more timestamp to support, upstream community has to do is >> update the enum and increase the bitfield from 2=>3 and so on. >> >> I need help from Martin to test the patch which renames the mono_delivery_time >> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) > > The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email). > > Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime" >
On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote: > > > On 3/14/2024 2:48 PM, Martin KaFai Lau wrote: >> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote: >>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >>>>> changed to handle the new "user_delivery_time" bit anyway, e.g. >>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >>>>> "user_delivery_time" bit. >>>>> >>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>>> field. "mono_delivery_time" is already in there. >> >> [ ... ] >> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to distinguish between timestamp anyway . Let me know what you think ? >> I would think the first step is to revert this patch. I don't think much of the current patch can be reused. >> >>> 1. I will raise one patch to introduce rename mono_delivery_time to >>> tstamp_type >> >> Right, I expect something like this: >> >> struct sk_buff { >> /* ... */ >> - __u8 mono_delivery_time:1; >> + __u8 tstamp_type:1; >> /* ... */ >> }; >> > > Okay ,This should be straight-forward. > >>> 2. I will introduce setting of userspace timestamp type as the second bit >>> whem transmit_time is set. >> >> I expect the second patch should be introducing the enum first >> >> enum skb_tstamp_type { >> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >> }; >> >> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of >> "skb->tstamp_type = 1;" >> >> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of >> "skb->tstamp_type = 0;" >> >> >> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also: >> >> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >> - bool mono) >> + enum skb_tstamp_type tstamp_type) >> > This should be straight-forward as well > >> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER. >> >> struct sk_buff { >> /* ... */ >> - __u8 tstamp_type:1; >> + __u8 tstamp_type:2; >> /* ... */ >> }; >> >> enum skb_tstamp_type { >> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >> + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock >> * is in skb->sk->sk_clockid. >> */ >> >> }; >> >> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/ > This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. > But i have plan on this and i know how to do it. > > Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm > Fingers crossed :) > >> >> >>> 3. This will be a first step to make the design scalable. >>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is >>> update the enum and increase the bitfield from 2=>3 and so on. >>> >>> I need help from Martin to test the patch which renames the mono_delivery_time >>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) >> >> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email). >> >> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime" >>
On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote: >>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>>>> field. "mono_delivery_time" is already in there. >>> [ ... ] >>> > Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to > distinguish between timestamp anyway . The context for this was before combining mono_delivery_time:1 and user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to inet_frag_queue because it is combined into tstamp_type:2. If mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should be done in inet_frag_queue.
On 3/19/2024 12:46 PM, Martin KaFai Lau wrote: > On 3/18/24 12:02 PM, Abhishek Chauhan (ABC) wrote: >>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>>>>> field. "mono_delivery_time" is already in there. >>>> [ ... ] >>>> >> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to >> distinguish between timestamp anyway . > > > The context for this was before combining mono_delivery_time:1 and user_delivery_time:1 into tstamp_type:2. No need to add user_delivery_time to inet_frag_queue because it is combined into tstamp_type:2. If mono_delivery_time:1 is replaced with tstamp_type:2 in sk_buff, the same should be done in inet_frag_queue. > Thats what i was planning to do. This is more or like the patch 3 - when brings in the changes of two bits. Thanks Martin for clarification. >
On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote: > > > On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote: >> >> >> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote: >>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote: >>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g. >>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >>>>>> "user_delivery_time" bit. >>>>>> >>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>>>> field. "mono_delivery_time" is already in there. >>> >>> [ ... ] >>> > > Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to > distinguish between timestamp anyway . > > Let me know what you think ? > >>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused. >>> >>>> 1. I will raise one patch to introduce rename mono_delivery_time to >>>> tstamp_type >>> >>> Right, I expect something like this: >>> >>> struct sk_buff { >>> /* ... */ >>> - __u8 mono_delivery_time:1; >>> + __u8 tstamp_type:1; >>> /* ... */ >>> }; >>> >> >> Okay ,This should be straight-forward. >> >>>> 2. I will introduce setting of userspace timestamp type as the second bit >>>> whem transmit_time is set. >>> >>> I expect the second patch should be introducing the enum first >>> >>> enum skb_tstamp_type { >>> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >>> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >>> }; >>> >>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of >>> "skb->tstamp_type = 1;" >>> >>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of >>> "skb->tstamp_type = 0;" >>> >>> >>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also: >>> >>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>> - bool mono) >>> + enum skb_tstamp_type tstamp_type) >>> >> This should be straight-forward as well >> >>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER. >>> >>> struct sk_buff { >>> /* ... */ >>> - __u8 tstamp_type:1; >>> + __u8 tstamp_type:2; >>> /* ... */ >>> }; >>> >>> enum skb_tstamp_type { >>> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >>> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >>> + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock >>> * is in skb->sk->sk_clockid. >>> */ >>> >>> }; >>> >>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/ >> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. >> But i have plan on this and i know how to do it. >> >> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm >> Fingers crossed :) >> >>> >>> >>>> 3. This will be a first step to make the design scalable. >>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is >>>> update the enum and increase the bitfield from 2=>3 and so on. >>>> >>>> I need help from Martin to test the patch which renames the mono_delivery_time >>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) >>> >>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email). >>> >>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime" >>> Martin, I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ? [ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready [ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready [ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready [ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready #214/5 tc_redirect/tc_redirect_dtime:OK #214 tc_redirect:OK Summary: 1/1 PASSED, 0 SKIPPED, 0 FAILED
On 3/19/24 11:22 PM, Abhishek Chauhan (ABC) wrote: > > > On 3/18/2024 12:02 PM, Abhishek Chauhan (ABC) wrote: >> >> >> On 3/14/2024 3:29 PM, Abhishek Chauhan (ABC) wrote: >>> >>> >>> On 3/14/2024 2:48 PM, Martin KaFai Lau wrote: >>>> On 3/14/24 1:53 PM, Abhishek Chauhan (ABC) wrote: >>>>>>> The bpf_convert_tstamp_{read,write} and the helper bpf_skb_set_tstamp need to be >>>>>>> changed to handle the new "user_delivery_time" bit anyway, e.g. >>>>>>> bpf_skb_set_tstamp(BPF_SKB_TSTAMP_DELIVERY_MONO) needs to clear the >>>>>>> "user_delivery_time" bit. >>>>>>> >>>>>>> I think the "struct inet_frag_queue" also needs a new "user_delivery_time" >>>>>>> field. "mono_delivery_time" is already in there. >>>> >>>> [ ... ] >>>> >> >> Martin, Do we really need to add user_delivery_time as part of inet_frag_queue struct? I was wondering why is this required since we are using tstamp_type:2 to >> distinguish between timestamp anyway . >> >> Let me know what you think ? >> >>>> I would think the first step is to revert this patch. I don't think much of the current patch can be reused. >>>> >>>>> 1. I will raise one patch to introduce rename mono_delivery_time to >>>>> tstamp_type >>>> >>>> Right, I expect something like this: >>>> >>>> struct sk_buff { >>>> /* ... */ >>>> - __u8 mono_delivery_time:1; >>>> + __u8 tstamp_type:1; >>>> /* ... */ >>>> }; >>>> >>> >>> Okay ,This should be straight-forward. >>> >>>>> 2. I will introduce setting of userspace timestamp type as the second bit >>>>> whem transmit_time is set. >>>> >>>> I expect the second patch should be introducing the enum first >>>> >>>> enum skb_tstamp_type { >>>> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >>>> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >>>> }; >>>> >>>> and start doing "skb->tstamp_type = SKB_TSTAMP_TYPE_TX_MONO;" instead of >>>> "skb->tstamp_type = 1;" >>>> >>>> and the same for "skb->tstamp_type = SKB_TSTAMP_TYPE_RX_REAL;" instead of >>>> "skb->tstamp_type = 0;" >>>> >>>> >>>> This one I am not sure but probably need to change the skb_set_delivery_time() function signature also: >>>> >>>> static inline void skb_set_delivery_time(struct sk_buff *skb, ktime_t kt, >>>> - bool mono) >>>> + enum skb_tstamp_type tstamp_type) >>>> >>> This should be straight-forward as well >>> >>>> The third patch is to change tstamp_type from 1 bit to 2 bits and add SKB_TSTAMP_TYPE_TX_USER. >>>> >>>> struct sk_buff { >>>> /* ... */ >>>> - __u8 tstamp_type:1; >>>> + __u8 tstamp_type:2; >>>> /* ... */ >>>> }; >>>> >>>> enum skb_tstamp_type { >>>> SKB_TSTAMP_TYPE_RX_REAL = 0, /* A RX (receive) time in real */ >>>> SKB_TSTAMP_TYPE_TX_MONO = 1, /* A TX (delivery) time in mono */ >>>> + SKB_TSTAMP_TYPE_TX_USER = 2, /* A TX (delivery) time and its clock >>>> * is in skb->sk->sk_clockid. >>>> */ >>>> >>>> }; >>>> >>>> This will shift a bit out of the byte where tstamp_type lives. It should be the "inner_protocol_type" bit by my hand count. Please check if it is directly used in bpf instruction (filter.c). As far as I look, it is not, so should be fine. Some details about bpf instruction accessible skb bit field here: https://lore.kernel.org/all/20230321014115.997841-1-kuba@kernel.org/ >>> This is where i would need thorough reviews from you and Willem as my area of expertise is limited to part of network stack and BPF is not one of them. >>> But i have plan on this and i know how to do it. >>> >>> Expect patches to be arriving to your inboxes next week, as we have a long weekend in Qualcomm >>> Fingers crossed :) >>> >>>> >>>> >>>>> 3. This will be a first step to make the design scalable. >>>>> 4. Tomorrow if we have more timestamp to support, upstream community has to do is >>>>> update the enum and increase the bitfield from 2=>3 and so on. >>>>> >>>>> I need help from Martin to test the patch which renames the mono_delivery_time >>>>> to tstamp_type (Which i feel should be straight forward as the value of the bit is 1) >>>> >>>> The bpf change is not a no-op rename of mono_delivery_time. It needs to take care of the new bit added to the tstamp_type. Please see the previous email (and I also left it in the beginning of this email). >>>> >>>> Thus, you need to compile the selftests/bpf/ and run it to verify the changes when handling the new bit. The Documentation/bpf/bpf_devel_QA.rst has the howto details. You probably only need the newer llvm (newer gcc should work also as bpf CI has been using it) and the newer pahole. I can definitely help if there is issue in running the test_progs in selftests/bpf or you have question on making the changes in filter.c. To run the test: "./test_progs -t tc_redirect/tc_redirect_dtime" >>>> > > Martin, > I was able to compile test_progs and execute the above command mentioned by you . Does the output look okay for you ? > > [ 3076.040766] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src_fwd: link becomes ready > [ 3076.040809] IPv6: ADDRCONF(NETDEV_CHANGE): veth_src: link becomes ready > [ 3076.072844] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst: link becomes ready > [ 3076.072880] IPv6: ADDRCONF(NETDEV_CHANGE): veth_dst_fwd: link becomes ready > #214/5 tc_redirect/tc_redirect_dtime:OK > #214 tc_redirect:OK lgtm. thanks.
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2dde34c29203..4726298d4ed4 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -817,9 +817,9 @@ typedef unsigned char *sk_buff_data_t; * @decrypted: Decrypted SKB * @slow_gro: state present at GRO time, slower prepare step required * @mono_delivery_time: When set, skb->tstamp has the - * delivery_time in mono clock base (i.e. EDT). Otherwise, the - * skb->tstamp has the (rcv) timestamp at ingress and - * delivery_time at egress. + * delivery_time in mono clock base (i.e., EDT) or a clock base chosen + * by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at + * ingress. * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @alloc_cpu: CPU which did the skb allocation. diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 5b5a0adb927f..ff1df64c5697 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1455,6 +1455,7 @@ struct sk_buff *__ip_make_skb(struct sock *sk, skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); skb->mark = cork->mark; skb->tstamp = cork->transmit_time; + skb->mono_delivery_time = !!skb->tstamp; /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec * on dst refcount diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index aea89326c697..c4c29fc5b73f 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -353,6 +353,7 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; + skb->mono_delivery_time = !!skb->tstamp; skb_dst_set(skb, &rt->dst); *rtp = NULL; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a722a43dd668..2fc1d03dc07d 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1922,7 +1922,7 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = cork->base.mark; skb->tstamp = cork->base.transmit_time; - + skb->mono_delivery_time = !!skb->tstamp; ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); if (proto == IPPROTO_ICMPV6) { diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 03dbb874c363..13f54f8eea35 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -616,7 +616,7 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; - + skb->mono_delivery_time = !!skb->tstamp; skb_put(skb, length); skb_reset_network_header(skb); iph = ipv6_hdr(skb); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c9bbc2686690..0db31ca4982d 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2057,7 +2057,7 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; - + skb->mono_delivery_time = !!skb->tstamp; skb_setup_tx_timestamp(skb, sockc.tsflags); if (unlikely(extra_len == 4)) @@ -2586,6 +2586,7 @@ static int tpacket_fill_skb(struct packet_sock *po, struct sk_buff *skb, skb->priority = READ_ONCE(po->sk.sk_priority); skb->mark = READ_ONCE(po->sk.sk_mark); skb->tstamp = sockc->transmit_time; + skb->mono_delivery_time = !!skb->tstamp; skb_setup_tx_timestamp(skb, sockc->tsflags); skb_zcopy_set_nouarg(skb, ph.raw); @@ -3064,6 +3065,7 @@ static int packet_snd(struct socket *sock, struct msghdr *msg, size_t len) skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc.mark; skb->tstamp = sockc.transmit_time; + skb->mono_delivery_time = !!skb->tstamp; if (unlikely(extra_len == 4)) skb->no_fcs = 1;
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers. Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path. Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- Changes since v3 - Setting mono_delivery_time at all instances where the skb->tstamp is initialized with sockcm.transmit_time as reviewed by Willem - Removed repetitive comments from all the sources file and limited only to skbuff.h as suggested by Willem - Re-phrased the comment explanation in skbuff.h and made it much simpler and generic as suggested by Willem Changes since v2 - Updated the commit subject and message. - Took care of few comments from Willem to re-use mono_delivery_time with comments and documentations in the header and source file. - Took care of comment from Andrew on the typo in the comment. - Existing self-test test cases are executed to make sure existing implementation is not impacted as stated by Paolo.(so_txtime.sh). - Internal validation of UDP packets using iperf/so_priority/so_txtime with MQPRIO + ETF offload is executed as well. - Test case is included below Test 1 :- FQ + ETF (SW path) [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh [ 280.640551] q->last time is 1707955476143297550 [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready SO_TXTIME ipv4 clock monotonic payload:a delay:109 expected:0 (us) SO_TXTIME ipv6 clock monotonic payload:a delay:140 expected:0 (us) SO_TXTIME ipv6 clock monotonic payload:a delay:12739 expected:10000 (us) SO_TXTIME ipv4 clock monotonic payload:a delay:10054 expected:10000 (us) payload:b delay:20043 expected:20000 (us) SO_TXTIME ipv6 clock monotonic payload:b delay:20078 expected:20000 (us) payload:a delay:20177 expected:20000 (us) SO_TXTIME ipv4 clock tai send: pkt a at -1707955482913ms dropped: invalid txtime [ 287.070504] now is set to 1707955482913404839 [ 287.070509] tx time from SKB is 0 ./so_txtime: recv: timeout: Resource temporarily unavailable SO_TXTIME ipv6 clock tai send: pkt a at 0ms dropped: invalid txtime [ 287.070510] q->last time is 0 [ 287.420590] now is set to 1707955483263491298 [ 287.420596] tx time from SKB is 1707955483263454527 ./so_txtime: recv: timeout: Resource temporarily unavailable SO_TXTIME ipv6 clock tai [ 287.420597] q->last time is 0 [ 287.700598] now is set to 1707955483543498954 [ 287.700604] tx time from SKB is 1707955483553463173 payload:a delay:9655 expected:10000 (us) SO_TXTIME ipv4 clock tai [ 287.700605] q->last time is 0 [ 288.100532] now is set to 1707955483943432391 [ 288.100537] tx time from SKB is 1707955483953413016 payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173 [ 288.100546] now is set to 1707955483943446975 [ 288.100547] tx time from SKB is 1707955483963413016 payload:b delay:20484 expected:20000 (us) SO_TXTIME ipv6 clock tai [ 288.100547] q->last time is 1707955483553463173 [ 288.440582] now is set to 1707955484283482495 [ 288.440587] tx time from SKB is 1707955484303452808 payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016 [ 288.440598] now is set to 1707955484283499370 payload:a delay:22037 expected:20000 (us) [ 288.440599] tx time from SKB is 1707955484293452808 OK. All tests passed Test case 2 (MQPRIO + ETF HW offload) [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \ map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 1@3\ hw 0 [root@ecbldauto-lvarm04-lnx ~]# tc qdisc replace dev eth0 parent 100:4 etf \ clockid CLOCK_TAI delta 40000 offload skip_sock_check [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1 [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3 [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2 SO_TXTIME ipv4 clock tai glob_tstat = 1707955395256170394 [ 199.623650] now is set to 1707955395256215810 [ 199.623655] tx time from SKB is 1707955395257170394 [ 199.623656] q->last time is 0 [ 199.623663] now is set to 1707955395256230029 [ 199.623664] tx time from SKB is 1707955395258170394 [ 199.623665] q->last time is 0 [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec Changes since v1 - Changed the commit subject as i am modifying the mono_delivery_time bit with clockid_delivery_time. - Took care of suggestion mentioned by Willem to use the same bit for userspace delivery time as there are no conflicts between TCP and SCM_TXTIME, because explicit cmsg makes no sense for TCP and only RAW and DGRAM sockets interprets it. - Clear explaination of why this is needed mentioned below and this is extending the work done by Martin for mono_delivery_time https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ - Version 1 patch can be referenced with below link which states the exact problem with tc-etf and discussions which took place https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ include/linux/skbuff.h | 6 +++--- net/ipv4/ip_output.c | 1 + net/ipv4/raw.c | 1 + net/ipv6/ip6_output.c | 2 +- net/ipv6/raw.c | 2 +- net/packet/af_packet.c | 4 +++- 6 files changed, 10 insertions(+), 6 deletions(-)