Message ID | 20240427182305.24461-7-nbd@nbd.name (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Add TCP fraglist GRO support | expand |
On Sat, 27 Apr 2024 20:23:02 +0200 Felix Fietkau wrote:
> Signe-off-by: Felix Fietkau <nbd@nbd.name>
^
d
If you have to respin - please update.
On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote: > When forwarding TCP after GRO, software segmentation is very expensive, > especially when the checksum needs to be recalculated. > One case where that's currently unavoidable is when routing packets over > PPPoE. Performance improves significantly when using fraglist GRO > implemented in the same way as for UDP. > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established > socket in the same netns as the receiving device. While this may not > cover all relevant use cases in multi-netns configurations, it should be > good enough for most configurations that need this. > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 > device (2-core Cortex-A53), which runs NAT with flow offload enabled from > one ethernet port to PPPoE on another ethernet port + cake qdisc set to > 1Gbps. > > rx-gro-list off: 630 Mbit/s, CPU 35% idle > rx-gro-list on: 770 Mbit/s, CPU 40% idle > > Signe-off-by: Felix Fietkau <nbd@nbd.name> > --- > net/ipv4/tcp_offload.c | 32 ++++++++++++++++++++++++++++++++ > net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++ > 2 files changed, 67 insertions(+) > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > index 87ae9808e260..3e9b8c6f9c8c 100644 > --- a/net/ipv4/tcp_offload.c > +++ b/net/ipv4/tcp_offload.c > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) > } > EXPORT_SYMBOL(tcp_gro_complete); > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, > + struct tcphdr *th) > +{ > + const struct iphdr *iph; > + struct sk_buff *p; > + struct sock *sk; > + struct net *net; > + int iif, sdif; > + > + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) Should we add an 'unlikely()' here to pair with unlikely(is_flist) in *gro_receive / *gro_complete? Should this test be moved into the caller, to avoid an unconditional function call in the ipv6 code? (Also waiting for explicit ack from Eric) Thank, Paolo
On 30.04.24 12:12, Paolo Abeni wrote: > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote: >> When forwarding TCP after GRO, software segmentation is very expensive, >> especially when the checksum needs to be recalculated. >> One case where that's currently unavoidable is when routing packets over >> PPPoE. Performance improves significantly when using fraglist GRO >> implemented in the same way as for UDP. >> >> When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established >> socket in the same netns as the receiving device. While this may not >> cover all relevant use cases in multi-netns configurations, it should be >> good enough for most configurations that need this. >> >> Here's a measurement of running 2 TCP streams through a MediaTek MT7622 >> device (2-core Cortex-A53), which runs NAT with flow offload enabled from >> one ethernet port to PPPoE on another ethernet port + cake qdisc set to >> 1Gbps. >> >> rx-gro-list off: 630 Mbit/s, CPU 35% idle >> rx-gro-list on: 770 Mbit/s, CPU 40% idle >> >> Signe-off-by: Felix Fietkau <nbd@nbd.name> >> --- >> net/ipv4/tcp_offload.c | 32 ++++++++++++++++++++++++++++++++ >> net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++ >> 2 files changed, 67 insertions(+) >> >> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c >> index 87ae9808e260..3e9b8c6f9c8c 100644 >> --- a/net/ipv4/tcp_offload.c >> +++ b/net/ipv4/tcp_offload.c >> @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) >> } >> EXPORT_SYMBOL(tcp_gro_complete); >> >> +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, >> + struct tcphdr *th) >> +{ >> + const struct iphdr *iph; >> + struct sk_buff *p; >> + struct sock *sk; >> + struct net *net; >> + int iif, sdif; >> + >> + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) > > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in > *gro_receive / *gro_complete? Not sure if unlikely() will make any difference here. I think it makes more sense in the other places than here. > Should this test be moved into the caller, to avoid an unconditional > function call in the ipv6 code? The function is already called from tcp4_gro_receive, which is only called by IPv4 code. Also, since it's a static function called in only one place, it gets inlined by the compiler (at least in my builds). Not sure what unconditional function call you're referring to. - Felix
On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote: > On 30.04.24 12:12, Paolo Abeni wrote: > > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote: > > > When forwarding TCP after GRO, software segmentation is very expensive, > > > especially when the checksum needs to be recalculated. > > > One case where that's currently unavoidable is when routing packets over > > > PPPoE. Performance improves significantly when using fraglist GRO > > > implemented in the same way as for UDP. > > > > > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established > > > socket in the same netns as the receiving device. While this may not > > > cover all relevant use cases in multi-netns configurations, it should be > > > good enough for most configurations that need this. > > > > > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 > > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from > > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to > > > 1Gbps. > > > > > > rx-gro-list off: 630 Mbit/s, CPU 35% idle > > > rx-gro-list on: 770 Mbit/s, CPU 40% idle > > > > > > Signe-off-by: Felix Fietkau <nbd@nbd.name> > > > --- > > > net/ipv4/tcp_offload.c | 32 ++++++++++++++++++++++++++++++++ > > > net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++ > > > 2 files changed, 67 insertions(+) > > > > > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > > > index 87ae9808e260..3e9b8c6f9c8c 100644 > > > --- a/net/ipv4/tcp_offload.c > > > +++ b/net/ipv4/tcp_offload.c > > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) > > > } > > > EXPORT_SYMBOL(tcp_gro_complete); > > > > > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, > > > + struct tcphdr *th) > > > +{ > > > + const struct iphdr *iph; > > > + struct sk_buff *p; > > > + struct sock *sk; > > > + struct net *net; > > > + int iif, sdif; > > > + > > > + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) > > > > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in > > *gro_receive / *gro_complete? > Not sure if unlikely() will make any difference here. I think it makes > more sense in the other places than here. Why? AFAICS this will be called for every packet on the wire, exactly as the code getting this annotation in patch 3/6. > > Should this test be moved into the caller, to avoid an unconditional > > function call in the ipv6 code? > > The function is already called from tcp4_gro_receive, which is only > called by IPv4 code. Also, since it's a static function called in only > one place, it gets inlined by the compiler (at least in my builds). > Not sure what unconditional function call you're referring to. Right you are. I just got fooled by my hope to reuse the same function for ipv4 and v6. Please just ignore this last part. Cheers, Paolo
On Sat, Apr 27, 2024 at 8:23 PM Felix Fietkau <nbd@nbd.name> wrote: > > When forwarding TCP after GRO, software segmentation is very expensive, > especially when the checksum needs to be recalculated. > One case where that's currently unavoidable is when routing packets over > PPPoE. Performance improves significantly when using fraglist GRO > implemented in the same way as for UDP. > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established > socket in the same netns as the receiving device. While this may not > cover all relevant use cases in multi-netns configurations, it should be > good enough for most configurations that need this. > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 > device (2-core Cortex-A53), which runs NAT with flow offload enabled from > one ethernet port to PPPoE on another ethernet port + cake qdisc set to > 1Gbps. > > rx-gro-list off: 630 Mbit/s, CPU 35% idle > rx-gro-list on: 770 Mbit/s, CPU 40% idle > > Signe-off-by: Felix Fietkau <nbd@nbd.name> Reviewed-by: Eric Dumazet <edumazet@google.com> Thanks
On 30.04.24 12:31, Paolo Abeni wrote: > On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote: >> On 30.04.24 12:12, Paolo Abeni wrote: >> > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote: >> > > When forwarding TCP after GRO, software segmentation is very expensive, >> > > especially when the checksum needs to be recalculated. >> > > One case where that's currently unavoidable is when routing packets over >> > > PPPoE. Performance improves significantly when using fraglist GRO >> > > implemented in the same way as for UDP. >> > > >> > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established >> > > socket in the same netns as the receiving device. While this may not >> > > cover all relevant use cases in multi-netns configurations, it should be >> > > good enough for most configurations that need this. >> > > >> > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 >> > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from >> > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to >> > > 1Gbps. >> > > >> > > rx-gro-list off: 630 Mbit/s, CPU 35% idle >> > > rx-gro-list on: 770 Mbit/s, CPU 40% idle >> > > >> > > Signe-off-by: Felix Fietkau <nbd@nbd.name> >> > > --- >> > > net/ipv4/tcp_offload.c | 32 ++++++++++++++++++++++++++++++++ >> > > net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++ >> > > 2 files changed, 67 insertions(+) >> > > >> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c >> > > index 87ae9808e260..3e9b8c6f9c8c 100644 >> > > --- a/net/ipv4/tcp_offload.c >> > > +++ b/net/ipv4/tcp_offload.c >> > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) >> > > } >> > > EXPORT_SYMBOL(tcp_gro_complete); >> > > >> > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, >> > > + struct tcphdr *th) >> > > +{ >> > > + const struct iphdr *iph; >> > > + struct sk_buff *p; >> > > + struct sock *sk; >> > > + struct net *net; >> > > + int iif, sdif; >> > > + >> > > + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) >> > >> > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in >> > *gro_receive / *gro_complete? >> Not sure if unlikely() will make any difference here. I think it makes >> more sense in the other places than here. > > Why? AFAICS this will be called for every packet on the wire, exactly > as the code getting this annotation in patch 3/6. I had compared assembly after adding an annotation and didn't see a difference. However, my annotation was wrong. When I add: if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST))) the generated code is different, and I probably should use that. - Felix
On Tue, 2024-04-30 at 12:55 +0200, Felix Fietkau wrote: > On 30.04.24 12:31, Paolo Abeni wrote: > > On Tue, 2024-04-30 at 12:23 +0200, Felix Fietkau wrote: > > > On 30.04.24 12:12, Paolo Abeni wrote: > > > > On Sat, 2024-04-27 at 20:23 +0200, Felix Fietkau wrote: > > > > > When forwarding TCP after GRO, software segmentation is very expensive, > > > > > especially when the checksum needs to be recalculated. > > > > > One case where that's currently unavoidable is when routing packets over > > > > > PPPoE. Performance improves significantly when using fraglist GRO > > > > > implemented in the same way as for UDP. > > > > > > > > > > When NETIF_F_GRO_FRAGLIST is enabled, perform a lookup for an established > > > > > socket in the same netns as the receiving device. While this may not > > > > > cover all relevant use cases in multi-netns configurations, it should be > > > > > good enough for most configurations that need this. > > > > > > > > > > Here's a measurement of running 2 TCP streams through a MediaTek MT7622 > > > > > device (2-core Cortex-A53), which runs NAT with flow offload enabled from > > > > > one ethernet port to PPPoE on another ethernet port + cake qdisc set to > > > > > 1Gbps. > > > > > > > > > > rx-gro-list off: 630 Mbit/s, CPU 35% idle > > > > > rx-gro-list on: 770 Mbit/s, CPU 40% idle > > > > > > > > > > Signe-off-by: Felix Fietkau <nbd@nbd.name> > > > > > --- > > > > > net/ipv4/tcp_offload.c | 32 ++++++++++++++++++++++++++++++++ > > > > > net/ipv6/tcpv6_offload.c | 35 +++++++++++++++++++++++++++++++++++ > > > > > 2 files changed, 67 insertions(+) > > > > > > > > > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > > > > > index 87ae9808e260..3e9b8c6f9c8c 100644 > > > > > --- a/net/ipv4/tcp_offload.c > > > > > +++ b/net/ipv4/tcp_offload.c > > > > > @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) > > > > > } > > > > > EXPORT_SYMBOL(tcp_gro_complete); > > > > > > > > > > +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, > > > > > + struct tcphdr *th) > > > > > +{ > > > > > + const struct iphdr *iph; > > > > > + struct sk_buff *p; > > > > > + struct sock *sk; > > > > > + struct net *net; > > > > > + int iif, sdif; > > > > > + > > > > > + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) > > > > > > > > Should we add an 'unlikely()' here to pair with unlikely(is_flist) in > > > > *gro_receive / *gro_complete? > > > Not sure if unlikely() will make any difference here. I think it makes > > > more sense in the other places than here. > > > > Why? AFAICS this will be called for every packet on the wire, exactly > > as the code getting this annotation in patch 3/6. > > I had compared assembly after adding an annotation and didn't see a > difference. However, my annotation was wrong. > When I add: if (likely(!(skb->dev->features & NETIF_F_GRO_FRAGLIST))) > the generated code is different, and I probably should use that. I read the above as you intend to send a new revision. If so, feel free to add Acked-by: Paolo Abeni <pabeni@redhat.com> to the whole series. Otherwise, please LMK, I think we can merge it as-is and ev. follow-up. Thanks, Paolo
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 87ae9808e260..3e9b8c6f9c8c 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -407,6 +407,36 @@ void tcp_gro_complete(struct sk_buff *skb) } EXPORT_SYMBOL(tcp_gro_complete); +static void tcp4_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, + struct tcphdr *th) +{ + const struct iphdr *iph; + struct sk_buff *p; + struct sock *sk; + struct net *net; + int iif, sdif; + + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) + return; + + p = tcp_gro_lookup(head, th); + if (p) { + NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist; + return; + } + + inet_get_iif_sdif(skb, &iif, &sdif); + iph = skb_gro_network_header(skb); + net = dev_net(skb->dev); + sk = __inet_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, + iph->saddr, th->source, + iph->daddr, ntohs(th->dest), + iif, sdif); + NAPI_GRO_CB(skb)->is_flist = !sk; + if (sk) + sock_put(sk); +} + INDIRECT_CALLABLE_SCOPE struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) { @@ -422,6 +452,8 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) if (!th) goto flush; + tcp4_check_fraglist_gro(head, skb, th); + return tcp_gro_receive(head, skb, th); flush: diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index e73a4f74fd96..ba7b0b3cb9f2 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -7,12 +7,45 @@ */ #include <linux/indirect_call_wrapper.h> #include <linux/skbuff.h> +#include <net/inet6_hashtables.h> #include <net/gro.h> #include <net/protocol.h> #include <net/tcp.h> #include <net/ip6_checksum.h> #include "ip6_offload.h" +static void tcp6_check_fraglist_gro(struct list_head *head, struct sk_buff *skb, + struct tcphdr *th) +{ +#if IS_ENABLED(CONFIG_IPV6) + const struct ipv6hdr *hdr; + struct sk_buff *p; + struct sock *sk; + struct net *net; + int iif, sdif; + + if (!(skb->dev->features & NETIF_F_GRO_FRAGLIST)) + return; + + p = tcp_gro_lookup(head, th); + if (p) { + NAPI_GRO_CB(skb)->is_flist = NAPI_GRO_CB(p)->is_flist; + return; + } + + inet6_get_iif_sdif(skb, &iif, &sdif); + hdr = skb_gro_network_header(skb); + net = dev_net(skb->dev); + sk = __inet6_lookup_established(net, net->ipv4.tcp_death_row.hashinfo, + &hdr->saddr, th->source, + &hdr->daddr, ntohs(th->dest), + iif, sdif); + NAPI_GRO_CB(skb)->is_flist = !sk; + if (sk) + sock_put(sk); +#endif /* IS_ENABLED(CONFIG_IPV6) */ +} + INDIRECT_CALLABLE_SCOPE struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) { @@ -28,6 +61,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) if (!th) goto flush; + tcp6_check_fraglist_gro(head, skb, th); + return tcp_gro_receive(head, skb, th); flush: