Message ID | 20240424180458.56211-5-nbd@nbd.name (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add TCP fraglist GRO support | expand |
On Wed, Apr 24, 2024 at 8:05 PM Felix Fietkau <nbd@nbd.name> wrote: > > When forwarding TCP after GRO, software segmentation is very expensive, > especially when the checksum needs to be recalculated. > One case where that's currently unavoidable is when routing packets over > PPPoE. Performance improves significantly when using fraglist GRO > implemented in the same way as for UDP. > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established > socket in the same netns as the receiving device. While this may not > cover all relevant use cases in multi-netns configurations, it should be > good enough for most configurations that need this. > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 > device (2-core Cortex-A53), which runs NAT with flow offload enabled from > one ethernet port to PPPoE on another ethernet port + cake qdisc set to > 1Gbps. > > rx-gro-list off: 630 Mbit/s, CPU 35% idle > rx-gro-list on: 770 Mbit/s, CPU 40% idle > > Signe-off-by: Felix Fietkau <nbd@nbd.name> > --- > net/ipv4/tcp_offload.c | 45 ++++++++++++++++++++++++++++++++++++++- > net/ipv6/tcpv6_offload.c | 46 +++++++++++++++++++++++++++++++++++++++- > 2 files changed, 89 insertions(+), 2 deletions(-) > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > index 6294e7a5c099..f987e2d8423a 100644 > --- a/net/ipv4/tcp_offload.c > +++ b/net/ipv4/tcp_offload.c > @@ -404,6 +404,49 @@ void tcp_gro_complete(struct sk_buff *skb) > } > EXPORT_SYMBOL(tcp_gro_complete); > > +static bool tcp4_check_fraglist_gro(struct sk_buff *skb) > +{ > + const struct iphdr *iph = skb_gro_network_header(skb); > + struct net *net = dev_net(skb->dev); > + unsigned int off, hlen, thlen; > + struct tcphdr *th; > + struct sock *sk; > + int iif, sdif; > + > + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) > + return false; > + > + inet_get_iif_sdif(skb, &iif, &sdif); > + > + off = skb_gro_offset(skb); > + hlen = off + sizeof(*th); > + th = skb_gro_header(skb, hlen, off); > + if (unlikely(!th)) > + return false; > + > + thlen = th->doff * 4; > + if (thlen < sizeof(*th)) > + return false; > + > + hlen = off + thlen; > + if (!skb_gro_may_pull(skb, hlen)) { > + th = skb_gro_header_slow(skb, hlen, off); > + if (unlikely(!th)) > + return false; > + } > + > + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, > + iph->saddr, th->source, > + iph->daddr, ntohs(th->dest), > + iif, sdif); Presumably all this could be done only for the first skb/segment of a GRO train. We could store the fraglist in a single bit in NAPI_GRO_CB(skb) ? GRO does a full tuple evaluation, we can trust it.
On 24.04.24 20:23, Eric Dumazet wrote: > On Wed, Apr 24, 2024 at 8:05 PM Felix Fietkau <nbd@nbd.name> wrote: >> >> When forwarding TCP after GRO, software segmentation is very expensive, >> especially when the checksum needs to be recalculated. >> One case where that's currently unavoidable is when routing packets over >> PPPoE. Performance improves significantly when using fraglist GRO >> implemented in the same way as for UDP. >> >> When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established >> socket in the same netns as the receiving device. While this may not >> cover all relevant use cases in multi-netns configurations, it should be >> good enough for most configurations that need this. >> >> Here's a measurement of running 2 TCP streams through a MediaTek MT7622 >> device (2-core Cortex-A53), which runs NAT with flow offload enabled from >> one ethernet port to PPPoE on another ethernet port + cake qdisc set to >> 1Gbps. >> >> rx-gro-list off: 630 Mbit/s, CPU 35% idle >> rx-gro-list on: 770 Mbit/s, CPU 40% idle >> >> Signe-off-by: Felix Fietkau <nbd@nbd.name> >> --- >> net/ipv4/tcp_offload.c | 45 ++++++++++++++++++++++++++++++++++++++- >> net/ipv6/tcpv6_offload.c | 46 +++++++++++++++++++++++++++++++++++++++- >> 2 files changed, 89 insertions(+), 2 deletions(-) >> >> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c >> index 6294e7a5c099..f987e2d8423a 100644 >> --- a/net/ipv4/tcp_offload.c >> +++ b/net/ipv4/tcp_offload.c >> @@ -404,6 +404,49 @@ void tcp_gro_complete(struct sk_buff *skb) >> } >> EXPORT_SYMBOL(tcp_gro_complete); >> >> +static bool tcp4_check_fraglist_gro(struct sk_buff *skb) >> +{ >> + const struct iphdr *iph = skb_gro_network_header(skb); >> + struct net *net = dev_net(skb->dev); >> + unsigned int off, hlen, thlen; >> + struct tcphdr *th; >> + struct sock *sk; >> + int iif, sdif; >> + >> + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) >> + return false; >> + >> + inet_get_iif_sdif(skb, &iif, &sdif); >> + >> + off = skb_gro_offset(skb); >> + hlen = off + sizeof(*th); >> + th = skb_gro_header(skb, hlen, off); >> + if (unlikely(!th)) >> + return false; >> + >> + thlen = th->doff * 4; >> + if (thlen < sizeof(*th)) >> + return false; >> + >> + hlen = off + thlen; >> + if (!skb_gro_may_pull(skb, hlen)) { >> + th = skb_gro_header_slow(skb, hlen, off); >> + if (unlikely(!th)) >> + return false; >> + } >> + >> + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, >> + iph->saddr, th->source, >> + iph->daddr, ntohs(th->dest), >> + iif, sdif); > > Presumably all this could be done only for the first skb/segment of a GRO train. > > We could store the fraglist in a single bit in NAPI_GRO_CB(skb) ? > > GRO does a full tuple evaluation, we can trust it. I will look into that, thanks. - Felix
Hi Felix, kernel test robot noticed the following build warnings: [auto build test WARNING on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Felix-Fietkau/net-move-skb_gro_receive_list-from-udp-to-core/20240425-060838 base: net-next/main patch link: https://lore.kernel.org/r/20240424180458.56211-5-nbd%40nbd.name patch subject: [PATCH net-next 4/4] net: add heuristic for enabling TCP fraglist GRO config: openrisc-defconfig (https://download.01.org/0day-ci/archive/20240425/202404251517.ZUALo8e5-lkp@intel.com/config) compiler: or1k-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240425/202404251517.ZUALo8e5-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202404251517.ZUALo8e5-lkp@intel.com/ All warnings (new ones prefixed by >>): net/ipv6/tcpv6_offload.c: In function 'tcp6_check_fraglist_gro': net/ipv6/tcpv6_offload.c:48:14: error: implicit declaration of function '__inet6_lookup_established'; did you mean '__inet_lookup_established'? [-Werror=implicit-function-declaration] 48 | sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, | ^~~~~~~~~~~~~~~~~~~~~~~~~~ | __inet_lookup_established >> net/ipv6/tcpv6_offload.c:48:12: warning: assignment to 'struct sock *' from 'int' makes pointer from integer without a cast [-Wint-conversion] 48 | sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, | ^ cc1: some warnings being treated as errors vim +48 net/ipv6/tcpv6_offload.c 16 17 static bool tcp6_check_fraglist_gro(struct sk_buff *skb) 18 { 19 const struct ipv6hdr *hdr = skb_gro_network_header(skb); 20 struct net *net = dev_net(skb->dev); 21 unsigned int off, hlen, thlen; 22 struct tcphdr *th; 23 struct sock *sk; 24 int iif, sdif; 25 26 if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) 27 return false; 28 29 inet6_get_iif_sdif(skb, &iif, &sdif); 30 31 off = skb_gro_offset(skb); 32 hlen = off + sizeof(*th); 33 th = skb_gro_header(skb, hlen, off); 34 if (unlikely(!th)) 35 return false; 36 37 thlen = th->doff * 4; 38 if (thlen < sizeof(*th)) 39 return false; 40 41 hlen = off + thlen; 42 if (!skb_gro_may_pull(skb, hlen)) { 43 th = skb_gro_header_slow(skb, hlen, off); 44 if (unlikely(!th)) 45 return false; 46 } 47 > 48 sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, 49 &hdr->saddr, th->source, 50 &hdr->daddr, ntohs(th->dest), 51 iif, sdif); 52 if (!sk) 53 return true; 54 55 sock_put(sk); 56 57 return false; 58 } 59
Hi Felix, kernel test robot noticed the following build errors: [auto build test ERROR on net-next/main] url: https://github.com/intel-lab-lkp/linux/commits/Felix-Fietkau/net-move-skb_gro_receive_list-from-udp-to-core/20240425-060838 base: net-next/main patch link: https://lore.kernel.org/r/20240424180458.56211-5-nbd%40nbd.name patch subject: [PATCH net-next 4/4] net: add heuristic for enabling TCP fraglist GRO config: um-x86_64_defconfig (https://download.01.org/0day-ci/archive/20240425/202404251744.Tq24y05K-lkp@intel.com/config) compiler: clang version 15.0.7 (https://github.com/llvm/llvm-project.git 8dfdcc7b7bf66834a761bd8de445840ef68e4d1a) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240425/202404251744.Tq24y05K-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202404251744.Tq24y05K-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from net/ipv6/tcpv6_offload.c:9: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:547:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __raw_readb(PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:560:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:37:51: note: expanded from macro '__le16_to_cpu' #define __le16_to_cpu(x) ((__force __u16)(__le16)(x)) ^ In file included from net/ipv6/tcpv6_offload.c:9: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:573:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr)); ~~~~~~~~~~ ^ include/uapi/linux/byteorder/little_endian.h:35:51: note: expanded from macro '__le32_to_cpu' #define __le32_to_cpu(x) ((__force __u32)(__le32)(x)) ^ In file included from net/ipv6/tcpv6_offload.c:9: In file included from include/linux/skbuff.h:17: In file included from include/linux/bvec.h:10: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/um/include/asm/hardirq.h:5: In file included from include/asm-generic/hardirq.h:17: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/um/include/asm/io.h:24: include/asm-generic/io.h:584:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writeb(value, PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:594:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:604:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr); ~~~~~~~~~~ ^ include/asm-generic/io.h:692:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:700:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:708:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] readsl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:717:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesb(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:726:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesw(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ include/asm-generic/io.h:735:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] writesl(PCI_IOBASE + addr, buffer, count); ~~~~~~~~~~ ^ >> net/ipv6/tcpv6_offload.c:48:7: error: call to undeclared function '__inet6_lookup_established'; ISO C99 and later do not support implicit function declarations [-Werror,-Wimplicit-function-declaration] sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, ^ net/ipv6/tcpv6_offload.c:48:7: note: did you mean '__inet_lookup_established'? include/net/inet_hashtables.h:371:14: note: '__inet_lookup_established' declared here struct sock *__inet_lookup_established(struct net *net, ^ net/ipv6/tcpv6_offload.c:48:5: error: incompatible integer to pointer conversion assigning to 'struct sock *' from 'int' [-Wint-conversion] sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, ^ ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ 12 warnings and 2 errors generated. vim +/__inet6_lookup_established +48 net/ipv6/tcpv6_offload.c 16 17 static bool tcp6_check_fraglist_gro(struct sk_buff *skb) 18 { 19 const struct ipv6hdr *hdr = skb_gro_network_header(skb); 20 struct net *net = dev_net(skb->dev); 21 unsigned int off, hlen, thlen; 22 struct tcphdr *th; 23 struct sock *sk; 24 int iif, sdif; 25 26 if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) 27 return false; 28 29 inet6_get_iif_sdif(skb, &iif, &sdif); 30 31 off = skb_gro_offset(skb); 32 hlen = off + sizeof(*th); 33 th = skb_gro_header(skb, hlen, off); 34 if (unlikely(!th)) 35 return false; 36 37 thlen = th->doff * 4; 38 if (thlen < sizeof(*th)) 39 return false; 40 41 hlen = off + thlen; 42 if (!skb_gro_may_pull(skb, hlen)) { 43 th = skb_gro_header_slow(skb, hlen, off); 44 if (unlikely(!th)) 45 return false; 46 } 47 > 48 sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, 49 &hdr->saddr, th->source, 50 &hdr->daddr, ntohs(th->dest), 51 iif, sdif); 52 if (!sk) 53 return true; 54 55 sock_put(sk); 56 57 return false; 58 } 59
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 6294e7a5c099..f987e2d8423a 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -404,6 +404,49 @@ void tcp_gro_complete(struct sk_buff *skb) } EXPORT_SYMBOL(tcp_gro_complete); +static bool tcp4_check_fraglist_gro(struct sk_buff *skb) +{ + const struct iphdr *iph = skb_gro_network_header(skb); + struct net *net = dev_net(skb->dev); + unsigned int off, hlen, thlen; + struct tcphdr *th; + struct sock *sk; + int iif, sdif; + + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) + return false; + + inet_get_iif_sdif(skb, &iif, &sdif); + + off = skb_gro_offset(skb); + hlen = off + sizeof(*th); + th = skb_gro_header(skb, hlen, off); + if (unlikely(!th)) + return false; + + thlen = th->doff * 4; + if (thlen < sizeof(*th)) + return false; + + hlen = off + thlen; + if (!skb_gro_may_pull(skb, hlen)) { + th = skb_gro_header_slow(skb, hlen, off); + if (unlikely(!th)) + return false; + } + + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, + iph->saddr, th->source, + iph->daddr, ntohs(th->dest), + iif, sdif); + if (!sk) + return true; + + sock_put(sk); + + return false; +} + INDIRECT_CALLABLE_SCOPE struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) { @@ -415,7 +458,7 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) return NULL; } - return tcp_gro_receive(head, skb, false); + return tcp_gro_receive(head, skb, tcp4_check_fraglist_gro(skb)); } INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff) diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 239588557dc4..c214f5cfe595 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -7,12 +7,56 @@ */ #include <linux/indirect_call_wrapper.h> #include <linux/skbuff.h> +#include <net/inet6_hashtables.h> #include <net/gro.h> #include <net/protocol.h> #include <net/tcp.h> #include <net/ip6_checksum.h> #include "ip6_offload.h" +static bool tcp6_check_fraglist_gro(struct sk_buff *skb) +{ + const struct ipv6hdr *hdr = skb_gro_network_header(skb); + struct net *net = dev_net(skb->dev); + unsigned int off, hlen, thlen; + struct tcphdr *th; + struct sock *sk; + int iif, sdif; + + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) + return false; + + inet6_get_iif_sdif(skb, &iif, &sdif); + + off = skb_gro_offset(skb); + hlen = off + sizeof(*th); + th = skb_gro_header(skb, hlen, off); + if (unlikely(!th)) + return false; + + thlen = th->doff * 4; + if (thlen < sizeof(*th)) + return false; + + hlen = off + thlen; + if (!skb_gro_may_pull(skb, hlen)) { + th = skb_gro_header_slow(skb, hlen, off); + if (unlikely(!th)) + return false; + } + + sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, + &hdr->saddr, th->source, + &hdr->daddr, ntohs(th->dest), + iif, sdif); + if (!sk) + return true; + + sock_put(sk); + + return false; +} + INDIRECT_CALLABLE_SCOPE struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) { @@ -24,7 +68,7 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) return NULL; } - return tcp_gro_receive(head, skb, false); + return tcp_gro_receive(head, skb, tcp6_check_fraglist_gro(skb)); } INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff)