Message ID | 20240301020556.2303531-1-quic_abchauha@quicinc.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v3] net: Re-use and set mono_delivery_time bit for userspace tstamp packets | expand |
Abhishek Chauhan wrote: > Bridge driver today has no support to forward the userspace timestamp > packets and ends up resetting the timestamp. ETF qdisc checks the > packet coming from userspace and encounters to be 0 thereby dropping > time sensitive packets. These changes will allow userspace timestamps > packets to be forwarded from the bridge to NIC drivers. > > Setting the same bit (mono_delivery_time) to avoid dropping of > userspace tstamp packets in the forwarding path. > > Existing functionality of mono_delivery_time remains unaltered here, > instead just extended with userspace tstamp support for bridge > forwarding path. > > Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> > --- > Changes since v2 > - Updated the commit subject and message. > - Took care of few comments from Willem to re-use mono_delivery_time > with comments and documentations in the header and source file. > - Took care of comment from Andrew on the typo in the comment. > - Existing self-test test cases are executed to make sure existing > implementation is not impacted as stated by Paolo.(so_txtime.sh). > - Internal validation of UDP packets using iperf/so_priority/so_txtime > with MQPRIO + ETF offload is executed as well. > - Test case is included below > > Test 1 :- FQ + ETF (SW path) > > [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh > [ 280.640551] q->last time is 1707955476143297550 > [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready > [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready > > SO_TXTIME ipv4 clock monotonic > payload:a delay:109 expected:0 (us) > > SO_TXTIME ipv6 clock monotonic > payload:a delay:140 expected:0 (us) > > SO_TXTIME ipv6 clock monotonic > payload:a delay:12739 expected:10000 (us) > > SO_TXTIME ipv4 clock monotonic > payload:a delay:10054 expected:10000 (us) > payload:b delay:20043 expected:20000 (us) > > SO_TXTIME ipv6 clock monotonic > payload:b delay:20078 expected:20000 (us) > payload:a delay:20177 expected:20000 (us) > > SO_TXTIME ipv4 clock tai > send: pkt a at -1707955482913ms dropped: invalid txtime > [ 287.070504] now is set to 1707955482913404839 > [ 287.070509] tx time from SKB is 0 > ./so_txtime: recv: timeout: Resource temporarily unavailable > > SO_TXTIME ipv6 clock tai > send: pkt a at 0ms dropped: invalid txtime > [ 287.070510] q->last time is 0 > [ 287.420590] now is set to 1707955483263491298 > [ 287.420596] tx time from SKB is 1707955483263454527 > ./so_txtime: recv: timeout: Resource temporarily unavailable > > SO_TXTIME ipv6 clock tai > [ 287.420597] q->last time is 0 > [ 287.700598] now is set to 1707955483543498954 > [ 287.700604] tx time from SKB is 1707955483553463173 > payload:a delay:9655 expected:10000 (us) > > SO_TXTIME ipv4 clock tai > [ 287.700605] q->last time is 0 > [ 288.100532] now is set to 1707955483943432391 > [ 288.100537] tx time from SKB is 1707955483953413016 > payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173 > > [ 288.100546] now is set to 1707955483943446975 > [ 288.100547] tx time from SKB is 1707955483963413016 > payload:b delay:20484 expected:20000 (us) > > SO_TXTIME ipv6 clock tai > [ 288.100547] q->last time is 1707955483553463173 > [ 288.440582] now is set to 1707955484283482495 > [ 288.440587] tx time from SKB is 1707955484303452808 > payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016 > > [ 288.440598] now is set to 1707955484283499370 > payload:a delay:22037 expected:20000 (us) > [ 288.440599] tx time from SKB is 1707955484293452808 > OK. All tests passed > > > Test case 2 (MQPRIO + ETF HW offload) > > [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \ > map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \ > queues 1@0 1@1 1@2 1@3\ > hw 0 > [root@ecbldauto-lvarm04-lnx ~]# > tc qdisc replace dev eth0 parent 100:4 etf \ > clockid CLOCK_TAI delta 40000 offload skip_sock_check > [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1 > [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3 > > > [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2 > > SO_TXTIME ipv4 clock tai > > glob_tstat = 1707955395256170394 > [ 199.623650] now is set to 1707955395256215810 > [ 199.623655] tx time from SKB is 1707955395257170394 > [ 199.623656] q->last time is 0 > [ 199.623663] now is set to 1707955395256230029 > [ 199.623664] tx time from SKB is 1707955395258170394 > [ 199.623665] q->last time is 0 > [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec > [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec > > Changes since v1 > - Changed the commit subject as i am modifying the mono_delivery_time > bit with clockid_delivery_time. > - Took care of suggestion mentioned by Willem to use the same bit for > userspace delivery time as there are no conflicts between TCP and > SCM_TXTIME, because explicit cmsg makes no sense for TCP and only > RAW and DGRAM sockets interprets it. > - Clear explaination of why this is needed mentioned below and this > is extending the work done by Martin for mono_delivery_time > https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ > - Version 1 patch can be referenced with below link which states > the exact problem with tc-etf and discussions which took place > https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ > > include/linux/skbuff.h | 4 ++++ > net/ipv4/ip_output.c | 7 +++++++ > net/ipv4/raw.c | 7 +++++++ > net/ipv6/ip6_output.c | 8 +++++++- > net/ipv6/raw.c | 8 +++++++- > net/packet/af_packet.c | 8 +++++++- > 6 files changed, 39 insertions(+), 3 deletions(-) > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h > index 2dde34c29203..58586d56b19f 100644 > --- a/include/linux/skbuff.h > +++ b/include/linux/skbuff.h > @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t; > * delivery_time in mono clock base (i.e. EDT). Otherwise, the > * skb->tstamp has the (rcv) timestamp at ingress and > * delivery_time at egress. > + * This bit is also set for tstamp coming from userspace which > + * acts as an information in the bridge forwarding path to avoid > + * resetting the tstamp value when user sets the timestamp using > + * SO_TXTIME sockopts. There are multiple applications of this information aside from bridging. I'd drop that and instead rewrite the existing. Something like "delivery_time in mono clock base (i.e., EDT) or a clock base chosen by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at ingress." > * @napi_id: id of the NAPI struct this skb came from > * @sender_cpu: (aka @napi_id) source CPU in XPS > * @alloc_cpu: CPU which did the skb allocation. > diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c > index 5b5a0adb927f..4ae6aea8f8d6 100644 > --- a/net/ipv4/ip_output.c > +++ b/net/ipv4/ip_output.c > @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk, > skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); > skb->mark = cork->mark; > skb->tstamp = cork->transmit_time; > + /* Timestamp coming from userspace using CMSG is stored as part > + * of transmit_time as part of cork. To ensure bridge does not > + * drop the tstamp in the forwarding path.We are reusing bit > + * mono_delivery_time to avoid reset of tstamp in bridge > + * forwarding path. > + */ > + skb->mono_delivery_time = !!skb->tstamp; This patch adds too much verbose commentary, repeated multiple times, for such a small change. Keep only the comment in skbuff.h. > /* > * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec > * on dst refcount > diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c > index aea89326c697..6e67c0203be8 100644 > --- a/net/ipv4/raw.c > +++ b/net/ipv4/raw.c > @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = sockc->mark; > skb->tstamp = sockc->transmit_time; > + /* Timestamp coming from userspace using CMSG is stored as part > + * of transmit_time as part of sockcmcookie. To ensure bridge does not > + * drop the tstamp in the forwarding path. We are reusing bit > + * mono_delivery_time to avoid reset of tstamp in bridge > + * forwarding path. > + */ > + skb->mono_delivery_time = !!skb->tstamp; > skb_dst_set(skb, &rt->dst); > *rtp = NULL; > > diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c > index a722a43dd668..f5b5e13a920f 100644 > --- a/net/ipv6/ip6_output.c > +++ b/net/ipv6/ip6_output.c > @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = cork->base.mark; > skb->tstamp = cork->base.transmit_time; > - > + /* Timestamp coming from userspace using CMSG is stored as part > + * of transmit_time as part of cork. To ensure bridge does not > + * drop the tstamp in the forwarding path. We are reusing bit > + * mono_delivery_time to avoid reset of tstamp in bridge > + * forwarding path. > + */ > + skb->mono_delivery_time = !!skb->tstamp; > ip6_cork_steal_dst(skb, cork); > IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); > if (proto == IPPROTO_ICMPV6) { > diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c > index 03dbb874c363..d2e2a1ec3de4 100644 > --- a/net/ipv6/raw.c > +++ b/net/ipv6/raw.c > @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = sockc->mark; > skb->tstamp = sockc->transmit_time; > - > + /* Timestamp coming from userspace using CMSG is stored as part > + * of transmit_time as part of sockcmcookie. To ensure bridge does not > + * drop the tstamp in the forwarding path.We are reusing bit > + * mono_delivery_time to avoid reset of tstamp in bridge > + * forwarding path. > + */ > + skb->mono_delivery_time = !!skb->tstamp; > skb_put(skb, length); > skb_reset_network_header(skb); > iph = ipv6_hdr(skb); > diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c > index c9bbc2686690..949e936b5786 100644 > --- a/net/packet/af_packet.c > +++ b/net/packet/af_packet.c > @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, > skb->priority = READ_ONCE(sk->sk_priority); > skb->mark = READ_ONCE(sk->sk_mark); > skb->tstamp = sockc.transmit_time; > - > + /* Timestamp coming from userspace using CMSG is stored as part > + * of transmit_time as part of sockcmcookie. To ensure bridge does not > + * drop the tstamp in the forwarding path. We are reusing bit > + * mono_delivery_time to avoid reset of tstamp in bridge > + * forwarding path. > + */ > + skb->mono_delivery_time = !!skb->tstamp; Search for all occurrences of skb->tstamp getting initialized from sockc.transmit_time. af_packet.c has three such cases. > skb_setup_tx_timestamp(skb, sockc.tsflags); > > if (unlikely(extra_len == 4)) > -- > 2.25.1 >
On 3/1/2024 10:45 AM, Willem de Bruijn wrote: > Abhishek Chauhan wrote: >> Bridge driver today has no support to forward the userspace timestamp >> packets and ends up resetting the timestamp. ETF qdisc checks the >> packet coming from userspace and encounters to be 0 thereby dropping >> time sensitive packets. These changes will allow userspace timestamps >> packets to be forwarded from the bridge to NIC drivers. >> >> Setting the same bit (mono_delivery_time) to avoid dropping of >> userspace tstamp packets in the forwarding path. >> >> Existing functionality of mono_delivery_time remains unaltered here, >> instead just extended with userspace tstamp support for bridge >> forwarding path. >> >> Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> >> --- >> Changes since v2 >> - Updated the commit subject and message. >> - Took care of few comments from Willem to re-use mono_delivery_time >> with comments and documentations in the header and source file. >> - Took care of comment from Andrew on the typo in the comment. >> - Existing self-test test cases are executed to make sure existing >> implementation is not impacted as stated by Paolo.(so_txtime.sh). >> - Internal validation of UDP packets using iperf/so_priority/so_txtime >> with MQPRIO + ETF offload is executed as well. >> - Test case is included below >> >> Test 1 :- FQ + ETF (SW path) >> >> [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh >> [ 280.640551] q->last time is 1707955476143297550 >> [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready >> [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready >> >> SO_TXTIME ipv4 clock monotonic >> payload:a delay:109 expected:0 (us) >> >> SO_TXTIME ipv6 clock monotonic >> payload:a delay:140 expected:0 (us) >> >> SO_TXTIME ipv6 clock monotonic >> payload:a delay:12739 expected:10000 (us) >> >> SO_TXTIME ipv4 clock monotonic >> payload:a delay:10054 expected:10000 (us) >> payload:b delay:20043 expected:20000 (us) >> >> SO_TXTIME ipv6 clock monotonic >> payload:b delay:20078 expected:20000 (us) >> payload:a delay:20177 expected:20000 (us) >> >> SO_TXTIME ipv4 clock tai >> send: pkt a at -1707955482913ms dropped: invalid txtime >> [ 287.070504] now is set to 1707955482913404839 >> [ 287.070509] tx time from SKB is 0 >> ./so_txtime: recv: timeout: Resource temporarily unavailable >> >> SO_TXTIME ipv6 clock tai >> send: pkt a at 0ms dropped: invalid txtime >> [ 287.070510] q->last time is 0 >> [ 287.420590] now is set to 1707955483263491298 >> [ 287.420596] tx time from SKB is 1707955483263454527 >> ./so_txtime: recv: timeout: Resource temporarily unavailable >> >> SO_TXTIME ipv6 clock tai >> [ 287.420597] q->last time is 0 >> [ 287.700598] now is set to 1707955483543498954 >> [ 287.700604] tx time from SKB is 1707955483553463173 >> payload:a delay:9655 expected:10000 (us) >> >> SO_TXTIME ipv4 clock tai >> [ 287.700605] q->last time is 0 >> [ 288.100532] now is set to 1707955483943432391 >> [ 288.100537] tx time from SKB is 1707955483953413016 >> payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173 >> >> [ 288.100546] now is set to 1707955483943446975 >> [ 288.100547] tx time from SKB is 1707955483963413016 >> payload:b delay:20484 expected:20000 (us) >> >> SO_TXTIME ipv6 clock tai >> [ 288.100547] q->last time is 1707955483553463173 >> [ 288.440582] now is set to 1707955484283482495 >> [ 288.440587] tx time from SKB is 1707955484303452808 >> payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016 >> >> [ 288.440598] now is set to 1707955484283499370 >> payload:a delay:22037 expected:20000 (us) >> [ 288.440599] tx time from SKB is 1707955484293452808 >> OK. All tests passed >> >> >> Test case 2 (MQPRIO + ETF HW offload) >> >> [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \ >> map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \ >> queues 1@0 1@1 1@2 1@3\ >> hw 0 >> [root@ecbldauto-lvarm04-lnx ~]# >> tc qdisc replace dev eth0 parent 100:4 etf \ >> clockid CLOCK_TAI delta 40000 offload skip_sock_check >> [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1 >> [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3 >> >> >> [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2 >> >> SO_TXTIME ipv4 clock tai >> >> glob_tstat = 1707955395256170394 >> [ 199.623650] now is set to 1707955395256215810 >> [ 199.623655] tx time from SKB is 1707955395257170394 >> [ 199.623656] q->last time is 0 >> [ 199.623663] now is set to 1707955395256230029 >> [ 199.623664] tx time from SKB is 1707955395258170394 >> [ 199.623665] q->last time is 0 >> [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec >> [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec >> >> Changes since v1 >> - Changed the commit subject as i am modifying the mono_delivery_time >> bit with clockid_delivery_time. >> - Took care of suggestion mentioned by Willem to use the same bit for >> userspace delivery time as there are no conflicts between TCP and >> SCM_TXTIME, because explicit cmsg makes no sense for TCP and only >> RAW and DGRAM sockets interprets it. >> - Clear explaination of why this is needed mentioned below and this >> is extending the work done by Martin for mono_delivery_time >> https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ >> - Version 1 patch can be referenced with below link which states >> the exact problem with tc-etf and discussions which took place >> https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ >> >> include/linux/skbuff.h | 4 ++++ >> net/ipv4/ip_output.c | 7 +++++++ >> net/ipv4/raw.c | 7 +++++++ >> net/ipv6/ip6_output.c | 8 +++++++- >> net/ipv6/raw.c | 8 +++++++- >> net/packet/af_packet.c | 8 +++++++- >> 6 files changed, 39 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h >> index 2dde34c29203..58586d56b19f 100644 >> --- a/include/linux/skbuff.h >> +++ b/include/linux/skbuff.h >> @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t; >> * delivery_time in mono clock base (i.e. EDT). Otherwise, the >> * skb->tstamp has the (rcv) timestamp at ingress and >> * delivery_time at egress. >> + * This bit is also set for tstamp coming from userspace which >> + * acts as an information in the bridge forwarding path to avoid >> + * resetting the tstamp value when user sets the timestamp using >> + * SO_TXTIME sockopts. > > There are multiple applications of this information aside from > bridging. I'd drop that and instead rewrite the existing. Something > like > > "delivery_time in mono clock base (i.e., EDT) or a clock base chosen > by SO_TXTIME. If zero, skb->tstamp has the (rcv) timestamp at > ingress." > Will make the changes accordingly. >> * @napi_id: id of the NAPI struct this skb came from >> * @sender_cpu: (aka @napi_id) source CPU in XPS >> * @alloc_cpu: CPU which did the skb allocation. >> diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c >> index 5b5a0adb927f..4ae6aea8f8d6 100644 >> --- a/net/ipv4/ip_output.c >> +++ b/net/ipv4/ip_output.c >> @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk, >> skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); >> skb->mark = cork->mark; >> skb->tstamp = cork->transmit_time; >> + /* Timestamp coming from userspace using CMSG is stored as part >> + * of transmit_time as part of cork. To ensure bridge does not >> + * drop the tstamp in the forwarding path.We are reusing bit >> + * mono_delivery_time to avoid reset of tstamp in bridge >> + * forwarding path. >> + */ >> + skb->mono_delivery_time = !!skb->tstamp; > > This patch adds too much verbose commentary, repeated multiple times, > for such a small change. Keep only the comment in skbuff.h. > Got it. I was thinking of the same. I will make the change. >> /* >> * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec >> * on dst refcount >> diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c >> index aea89326c697..6e67c0203be8 100644 >> --- a/net/ipv4/raw.c >> +++ b/net/ipv4/raw.c >> @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = sockc->mark; >> skb->tstamp = sockc->transmit_time; >> + /* Timestamp coming from userspace using CMSG is stored as part >> + * of transmit_time as part of sockcmcookie. To ensure bridge does not >> + * drop the tstamp in the forwarding path. We are reusing bit >> + * mono_delivery_time to avoid reset of tstamp in bridge >> + * forwarding path. >> + */ >> + skb->mono_delivery_time = !!skb->tstamp; >> skb_dst_set(skb, &rt->dst); >> *rtp = NULL; >> >> diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c >> index a722a43dd668..f5b5e13a920f 100644 >> --- a/net/ipv6/ip6_output.c >> +++ b/net/ipv6/ip6_output.c >> @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = cork->base.mark; >> skb->tstamp = cork->base.transmit_time; >> - >> + /* Timestamp coming from userspace using CMSG is stored as part >> + * of transmit_time as part of cork. To ensure bridge does not >> + * drop the tstamp in the forwarding path. We are reusing bit >> + * mono_delivery_time to avoid reset of tstamp in bridge >> + * forwarding path. >> + */ >> + skb->mono_delivery_time = !!skb->tstamp; >> ip6_cork_steal_dst(skb, cork); >> IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); >> if (proto == IPPROTO_ICMPV6) { >> diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c >> index 03dbb874c363..d2e2a1ec3de4 100644 >> --- a/net/ipv6/raw.c >> +++ b/net/ipv6/raw.c >> @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = sockc->mark; >> skb->tstamp = sockc->transmit_time; >> - >> + /* Timestamp coming from userspace using CMSG is stored as part >> + * of transmit_time as part of sockcmcookie. To ensure bridge does not >> + * drop the tstamp in the forwarding path.We are reusing bit >> + * mono_delivery_time to avoid reset of tstamp in bridge >> + * forwarding path. >> + */ >> + skb->mono_delivery_time = !!skb->tstamp; >> skb_put(skb, length); >> skb_reset_network_header(skb); >> iph = ipv6_hdr(skb); >> diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c >> index c9bbc2686690..949e936b5786 100644 >> --- a/net/packet/af_packet.c >> +++ b/net/packet/af_packet.c >> @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, >> skb->priority = READ_ONCE(sk->sk_priority); >> skb->mark = READ_ONCE(sk->sk_mark); >> skb->tstamp = sockc.transmit_time; >> - >> + /* Timestamp coming from userspace using CMSG is stored as part >> + * of transmit_time as part of sockcmcookie. To ensure bridge does not >> + * drop the tstamp in the forwarding path. We are reusing bit >> + * mono_delivery_time to avoid reset of tstamp in bridge >> + * forwarding path. >> + */ >> + skb->mono_delivery_time = !!skb->tstamp; > > Search for all occurrences of skb->tstamp getting initialized from > sockc.transmit_time. af_packet.c has three such cases. > Let me check and add at every instance. >> skb_setup_tx_timestamp(skb, sockc.tsflags); >> >> if (unlikely(extra_len == 4)) >> -- >> 2.25.1 >> > >
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h index 2dde34c29203..58586d56b19f 100644 --- a/include/linux/skbuff.h +++ b/include/linux/skbuff.h @@ -820,6 +820,10 @@ typedef unsigned char *sk_buff_data_t; * delivery_time in mono clock base (i.e. EDT). Otherwise, the * skb->tstamp has the (rcv) timestamp at ingress and * delivery_time at egress. + * This bit is also set for tstamp coming from userspace which + * acts as an information in the bridge forwarding path to avoid + * resetting the tstamp value when user sets the timestamp using + * SO_TXTIME sockopts. * @napi_id: id of the NAPI struct this skb came from * @sender_cpu: (aka @napi_id) source CPU in XPS * @alloc_cpu: CPU which did the skb allocation. diff --git a/net/ipv4/ip_output.c b/net/ipv4/ip_output.c index 5b5a0adb927f..4ae6aea8f8d6 100644 --- a/net/ipv4/ip_output.c +++ b/net/ipv4/ip_output.c @@ -1455,6 +1455,13 @@ struct sk_buff *__ip_make_skb(struct sock *sk, skb->priority = (cork->tos != -1) ? cork->priority: READ_ONCE(sk->sk_priority); skb->mark = cork->mark; skb->tstamp = cork->transmit_time; + /* Timestamp coming from userspace using CMSG is stored as part + * of transmit_time as part of cork. To ensure bridge does not + * drop the tstamp in the forwarding path.We are reusing bit + * mono_delivery_time to avoid reset of tstamp in bridge + * forwarding path. + */ + skb->mono_delivery_time = !!skb->tstamp; /* * Steal rt from cork.dst to avoid a pair of atomic_inc/atomic_dec * on dst refcount diff --git a/net/ipv4/raw.c b/net/ipv4/raw.c index aea89326c697..6e67c0203be8 100644 --- a/net/ipv4/raw.c +++ b/net/ipv4/raw.c @@ -353,6 +353,13 @@ static int raw_send_hdrinc(struct sock *sk, struct flowi4 *fl4, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; + /* Timestamp coming from userspace using CMSG is stored as part + * of transmit_time as part of sockcmcookie. To ensure bridge does not + * drop the tstamp in the forwarding path. We are reusing bit + * mono_delivery_time to avoid reset of tstamp in bridge + * forwarding path. + */ + skb->mono_delivery_time = !!skb->tstamp; skb_dst_set(skb, &rt->dst); *rtp = NULL; diff --git a/net/ipv6/ip6_output.c b/net/ipv6/ip6_output.c index a722a43dd668..f5b5e13a920f 100644 --- a/net/ipv6/ip6_output.c +++ b/net/ipv6/ip6_output.c @@ -1922,7 +1922,13 @@ struct sk_buff *__ip6_make_skb(struct sock *sk, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = cork->base.mark; skb->tstamp = cork->base.transmit_time; - + /* Timestamp coming from userspace using CMSG is stored as part + * of transmit_time as part of cork. To ensure bridge does not + * drop the tstamp in the forwarding path. We are reusing bit + * mono_delivery_time to avoid reset of tstamp in bridge + * forwarding path. + */ + skb->mono_delivery_time = !!skb->tstamp; ip6_cork_steal_dst(skb, cork); IP6_INC_STATS(net, rt->rt6i_idev, IPSTATS_MIB_OUTREQUESTS); if (proto == IPPROTO_ICMPV6) { diff --git a/net/ipv6/raw.c b/net/ipv6/raw.c index 03dbb874c363..d2e2a1ec3de4 100644 --- a/net/ipv6/raw.c +++ b/net/ipv6/raw.c @@ -616,7 +616,13 @@ static int rawv6_send_hdrinc(struct sock *sk, struct msghdr *msg, int length, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = sockc->mark; skb->tstamp = sockc->transmit_time; - + /* Timestamp coming from userspace using CMSG is stored as part + * of transmit_time as part of sockcmcookie. To ensure bridge does not + * drop the tstamp in the forwarding path.We are reusing bit + * mono_delivery_time to avoid reset of tstamp in bridge + * forwarding path. + */ + skb->mono_delivery_time = !!skb->tstamp; skb_put(skb, length); skb_reset_network_header(skb); iph = ipv6_hdr(skb); diff --git a/net/packet/af_packet.c b/net/packet/af_packet.c index c9bbc2686690..949e936b5786 100644 --- a/net/packet/af_packet.c +++ b/net/packet/af_packet.c @@ -2057,7 +2057,13 @@ static int packet_sendmsg_spkt(struct socket *sock, struct msghdr *msg, skb->priority = READ_ONCE(sk->sk_priority); skb->mark = READ_ONCE(sk->sk_mark); skb->tstamp = sockc.transmit_time; - + /* Timestamp coming from userspace using CMSG is stored as part + * of transmit_time as part of sockcmcookie. To ensure bridge does not + * drop the tstamp in the forwarding path. We are reusing bit + * mono_delivery_time to avoid reset of tstamp in bridge + * forwarding path. + */ + skb->mono_delivery_time = !!skb->tstamp; skb_setup_tx_timestamp(skb, sockc.tsflags); if (unlikely(extra_len == 4))
Bridge driver today has no support to forward the userspace timestamp packets and ends up resetting the timestamp. ETF qdisc checks the packet coming from userspace and encounters to be 0 thereby dropping time sensitive packets. These changes will allow userspace timestamps packets to be forwarded from the bridge to NIC drivers. Setting the same bit (mono_delivery_time) to avoid dropping of userspace tstamp packets in the forwarding path. Existing functionality of mono_delivery_time remains unaltered here, instead just extended with userspace tstamp support for bridge forwarding path. Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- Changes since v2 - Updated the commit subject and message. - Took care of few comments from Willem to re-use mono_delivery_time with comments and documentations in the header and source file. - Took care of comment from Andrew on the typo in the comment. - Existing self-test test cases are executed to make sure existing implementation is not impacted as stated by Paolo.(so_txtime.sh). - Internal validation of UDP packets using iperf/so_priority/so_txtime with MQPRIO + ETF offload is executed as well. - Test case is included below Test 1 :- FQ + ETF (SW path) [root@ecbldauto-lvarm04-lnx ~]# ./so_txtime.sh [ 280.640551] q->last time is 1707955476143297550 [ 283.338947] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready [ 284.078429] IPv6: ADDRCONF(NETDEV_CHANGE): veth0: link becomes ready SO_TXTIME ipv4 clock monotonic payload:a delay:109 expected:0 (us) SO_TXTIME ipv6 clock monotonic payload:a delay:140 expected:0 (us) SO_TXTIME ipv6 clock monotonic payload:a delay:12739 expected:10000 (us) SO_TXTIME ipv4 clock monotonic payload:a delay:10054 expected:10000 (us) payload:b delay:20043 expected:20000 (us) SO_TXTIME ipv6 clock monotonic payload:b delay:20078 expected:20000 (us) payload:a delay:20177 expected:20000 (us) SO_TXTIME ipv4 clock tai send: pkt a at -1707955482913ms dropped: invalid txtime [ 287.070504] now is set to 1707955482913404839 [ 287.070509] tx time from SKB is 0 ./so_txtime: recv: timeout: Resource temporarily unavailable SO_TXTIME ipv6 clock tai send: pkt a at 0ms dropped: invalid txtime [ 287.070510] q->last time is 0 [ 287.420590] now is set to 1707955483263491298 [ 287.420596] tx time from SKB is 1707955483263454527 ./so_txtime: recv: timeout: Resource temporarily unavailable SO_TXTIME ipv6 clock tai [ 287.420597] q->last time is 0 [ 287.700598] now is set to 1707955483543498954 [ 287.700604] tx time from SKB is 1707955483553463173 payload:a delay:9655 expected:10000 (us) SO_TXTIME ipv4 clock tai [ 287.700605] q->last time is 0 [ 288.100532] now is set to 1707955483943432391 [ 288.100537] tx time from SKB is 1707955483953413016 payload:a delay:9668 expected:10000 (us)[ 288.100538] q->last time is 1707955483553463173 [ 288.100546] now is set to 1707955483943446975 [ 288.100547] tx time from SKB is 1707955483963413016 payload:b delay:20484 expected:20000 (us) SO_TXTIME ipv6 clock tai [ 288.100547] q->last time is 1707955483553463173 [ 288.440582] now is set to 1707955484283482495 [ 288.440587] tx time from SKB is 1707955484303452808 payload:b delay:9648 expected:10000 (us)[ 288.440588] q->last time is 1707955483963413016 [ 288.440598] now is set to 1707955484283499370 payload:a delay:22037 expected:20000 (us) [ 288.440599] tx time from SKB is 1707955484293452808 OK. All tests passed Test case 2 (MQPRIO + ETF HW offload) [root@ecbldauto-lvarm04-lnx ~]# tc qdisc add dev eth0 handle 100: parent root mqprio num_tc 4 \ map 0 2 1 3 3 2 2 2 2 2 2 2 2 2 2 2 \ queues 1@0 1@1 1@2 1@3\ hw 0 [root@ecbldauto-lvarm04-lnx ~]# tc qdisc replace dev eth0 parent 100:4 etf \ clockid CLOCK_TAI delta 40000 offload skip_sock_check [ 89.145838] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue test log 3, number of queues 4, qopt enable 1, tbs queue bit 1 [ 89.145846] qcom-ethqos 23040000.ethernet eth0: enabled ETF for Queue 3 [root@ecbldauto-lvarm04-lnx ~]# ./a.out -4 -c tai -S 192.168.1.1 -D 192.168.1.2 a,1,b,2 SO_TXTIME ipv4 clock tai glob_tstat = 1707955395256170394 [ 199.623650] now is set to 1707955395256215810 [ 199.623655] tx time from SKB is 1707955395257170394 [ 199.623656] q->last time is 0 [ 199.623663] now is set to 1707955395256230029 [ 199.623664] tx time from SKB is 1707955395258170394 [ 199.623665] q->last time is 0 [ 199.624589] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 257170394 nsec [ 199.625573] qcom-ethqos 23040000.ethernet eth0: emac ethqos tx_xmit : lauching tbs packet at 1707955395 sec and 258170394 nsec Changes since v1 - Changed the commit subject as i am modifying the mono_delivery_time bit with clockid_delivery_time. - Took care of suggestion mentioned by Willem to use the same bit for userspace delivery time as there are no conflicts between TCP and SCM_TXTIME, because explicit cmsg makes no sense for TCP and only RAW and DGRAM sockets interprets it. - Clear explaination of why this is needed mentioned below and this is extending the work done by Martin for mono_delivery_time https://patchwork.kernel.org/project/netdevbpf/patch/20220302195525.3480280-1-kafai@fb.com/ - Version 1 patch can be referenced with below link which states the exact problem with tc-etf and discussions which took place https://lore.kernel.org/all/20240215215632.2899370-1-quic_abchauha@quicinc.com/ include/linux/skbuff.h | 4 ++++ net/ipv4/ip_output.c | 7 +++++++ net/ipv4/raw.c | 7 +++++++ net/ipv6/ip6_output.c | 8 +++++++- net/ipv6/raw.c | 8 +++++++- net/packet/af_packet.c | 8 +++++++- 6 files changed, 39 insertions(+), 3 deletions(-)