diff mbox series

[net-next,3/8] udp: properly complete L4 GRO over UDP tunnel packet

Message ID 72d8fc8a6d35a74d267cca6c9eddb3ff7852868b.1616345643.git.pabeni@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series udp: GRO L4 improvements | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 90 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/header_inline success Link

Commit Message

Paolo Abeni March 21, 2021, 5:01 p.m. UTC
After the previous patch the stack can do L4 UDP aggregation
on top of an UDP tunnel.

The current GRO complete code tries frag based aggregation first;
in the above scenario will generate corrupted frames.

We need to try first UDP tunnel based aggregation, if the GRO
packet requires that. We can use time GRO 'encap_mark' field
to track the need GRO complete action. If encap_mark is set,
skip the frag_list aggregation.

On tunnel encap GRO complete clear such field, so that an inner
frag_list GRO complete could take action.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 net/ipv4/udp_offload.c | 8 +++++++-
 net/ipv6/udp_offload.c | 3 ++-
 2 files changed, 9 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn March 22, 2021, 1:30 p.m. UTC | #1
On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> After the previous patch the stack can do L4 UDP aggregation
> on top of an UDP tunnel.
>
> The current GRO complete code tries frag based aggregation first;
> in the above scenario will generate corrupted frames.
>
> We need to try first UDP tunnel based aggregation, if the GRO
> packet requires that. We can use time GRO 'encap_mark' field
> to track the need GRO complete action. If encap_mark is set,
> skip the frag_list aggregation.
>
> On tunnel encap GRO complete clear such field, so that an inner
> frag_list GRO complete could take action.
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> ---
>  net/ipv4/udp_offload.c | 8 +++++++-
>  net/ipv6/udp_offload.c | 3 ++-
>  2 files changed, 9 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index 25134a3548e99..54e06b88af69a 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
>                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
>                                         : SKB_GSO_UDP_TUNNEL;
>
> +               /* clear the encap mark, so that inner frag_list gro_complete
> +                * can take place
> +                */
> +               NAPI_GRO_CB(skb)->encap_mark = 0;
> +
>                 /* Set encapsulation before calling into inner gro_complete()
>                  * functions to make them set up the inner offsets.
>                  */
> @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct iphdr *iph = ip_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> -       if (NAPI_GRO_CB(skb)->is_flist) {
> +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {

Sorry, I don't follow. I thought the point was to avoid fraglist if an
outer udp tunnel header is present. But the above code clears the mark
and allows entering the fraglist branch exactly when such a header is
encountered?

>                 uh->len = htons(skb->len - nhoff);
>
>                 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
> index faa823c242923..b3d9ed96e5ea5 100644
> --- a/net/ipv6/udp_offload.c
> +++ b/net/ipv6/udp_offload.c
> @@ -163,7 +163,8 @@ INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
>         const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
>         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
>
> -       if (NAPI_GRO_CB(skb)->is_flist) {
> +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
>                 uh->len = htons(skb->len - nhoff);
>
>                 skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
> --
> 2.26.2
>
Paolo Abeni March 22, 2021, 4:59 p.m. UTC | #2
On Mon, 2021-03-22 at 09:30 -0400, Willem de Bruijn wrote:
> On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > After the previous patch the stack can do L4 UDP aggregation
> > on top of an UDP tunnel.
> > 
> > The current GRO complete code tries frag based aggregation first;
> > in the above scenario will generate corrupted frames.
> > 
> > We need to try first UDP tunnel based aggregation, if the GRO
> > packet requires that. We can use time GRO 'encap_mark' field
> > to track the need GRO complete action. If encap_mark is set,
> > skip the frag_list aggregation.
> > 
> > On tunnel encap GRO complete clear such field, so that an inner
> > frag_list GRO complete could take action.
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > ---
> >  net/ipv4/udp_offload.c | 8 +++++++-
> >  net/ipv6/udp_offload.c | 3 ++-
> >  2 files changed, 9 insertions(+), 2 deletions(-)
> > 
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index 25134a3548e99..54e06b88af69a 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> >                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
> >                                         : SKB_GSO_UDP_TUNNEL;
> > 
> > +               /* clear the encap mark, so that inner frag_list gro_complete
> > +                * can take place
> > +                */
> > +               NAPI_GRO_CB(skb)->encap_mark = 0;
> > +
> >                 /* Set encapsulation before calling into inner gro_complete()
> >                  * functions to make them set up the inner offsets.
> >                  */
> > @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
> >         const struct iphdr *iph = ip_hdr(skb);
> >         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> > 
> > -       if (NAPI_GRO_CB(skb)->is_flist) {
> > +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> > +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
> 
> Sorry, I don't follow. I thought the point was to avoid fraglist if an
> outer udp tunnel header is present. But the above code clears the mark
> and allows entering the fraglist branch exactly when such a header is
> encountered?

The relevant UDP packet has gone through:

[l2/l3 GRO] -> udp_gro_receive  -> udp_sk(sk)->gro_receive -> [some
more GRO layers] -> udp_gro_receive (again)

The first udp_gro_receive set NAPI_GRO_CB(skb)->encap_mark, the
latter udp_gro_receive set NAPI_GRO_CB(skb)->is_flist.

Then, at GRO complete time:

[l2/l3 GRO] -> udp{4,6}_gro_complete -> udp_sk(sk)->gro_complete ->
[more GRO layers] -> udp{4,6}_gro_complete (again).

In the first udp{4,6}_gro_complete invocation 'encap_mark' is 1, so
with this patch we do the 'udp_sk(sk)->gro_complete' path. In the
second udp{4,6}_gro_complete invocation 'encap_mark' has been cleared
(by udp_gro_complete), so we do the SKB_GSO_FRAGLIST completion.

In case SKB_GSO_FRAGLIST with no UDP tunnel, 'encap_mark' is 0 and we
do the SKB_GSO_FRAGLIST completion.

Another alternative, possibly more readable, would be avoid clearing
'encap_mark' in udp_gro_complete() and replacing the above check with:

	if (NAPI_GRO_CB(skb)->is_flist &&
            (!NAPI_GRO_CB(skb)->encap_mark ||
 	     (NAPI_GRO_CB(skb)->encap_mark && skb->encapsulation))) {

I opted otherwise to simplify the conditional expression.

Please let me know if the above is somewhat more clear and if you have
preferecens between the two options.

Cheers,

Paolo
Willem de Bruijn March 24, 2021, 2:13 a.m. UTC | #3
On Mon, Mar 22, 2021 at 1:00 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-22 at 09:30 -0400, Willem de Bruijn wrote:
> > On Sun, Mar 21, 2021 at 1:01 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > After the previous patch the stack can do L4 UDP aggregation
> > > on top of an UDP tunnel.
> > >
> > > The current GRO complete code tries frag based aggregation first;
> > > in the above scenario will generate corrupted frames.
> > >
> > > We need to try first UDP tunnel based aggregation, if the GRO
> > > packet requires that. We can use time GRO 'encap_mark' field
> > > to track the need GRO complete action. If encap_mark is set,
> > > skip the frag_list aggregation.
> > >
> > > On tunnel encap GRO complete clear such field, so that an inner
> > > frag_list GRO complete could take action.
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > ---
> > >  net/ipv4/udp_offload.c | 8 +++++++-
> > >  net/ipv6/udp_offload.c | 3 ++-
> > >  2 files changed, 9 insertions(+), 2 deletions(-)
> > >
> > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > > index 25134a3548e99..54e06b88af69a 100644
> > > --- a/net/ipv4/udp_offload.c
> > > +++ b/net/ipv4/udp_offload.c
> > > @@ -642,6 +642,11 @@ int udp_gro_complete(struct sk_buff *skb, int nhoff,
> > >                 skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
> > >                                         : SKB_GSO_UDP_TUNNEL;
> > >
> > > +               /* clear the encap mark, so that inner frag_list gro_complete
> > > +                * can take place
> > > +                */
> > > +               NAPI_GRO_CB(skb)->encap_mark = 0;
> > > +
> > >                 /* Set encapsulation before calling into inner gro_complete()
> > >                  * functions to make them set up the inner offsets.
> > >                  */
> > > @@ -665,7 +670,8 @@ INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
> > >         const struct iphdr *iph = ip_hdr(skb);
> > >         struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
> > >
> > > -       if (NAPI_GRO_CB(skb)->is_flist) {
> > > +       /* do fraglist only if there is no outer UDP encap (or we already processed it) */
> > > +       if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
> >
> > Sorry, I don't follow. I thought the point was to avoid fraglist if an
> > outer udp tunnel header is present. But the above code clears the mark
> > and allows entering the fraglist branch exactly when such a header is
> > encountered?
>
> The relevant UDP packet has gone through:
>
> [l2/l3 GRO] -> udp_gro_receive  -> udp_sk(sk)->gro_receive -> [some
> more GRO layers] -> udp_gro_receive (again)
>
> The first udp_gro_receive set NAPI_GRO_CB(skb)->encap_mark, the
> latter udp_gro_receive set NAPI_GRO_CB(skb)->is_flist.
>
> Then, at GRO complete time:
>
> [l2/l3 GRO] -> udp{4,6}_gro_complete -> udp_sk(sk)->gro_complete ->
> [more GRO layers] -> udp{4,6}_gro_complete (again).
>
> In the first udp{4,6}_gro_complete invocation 'encap_mark' is 1, so
> with this patch we do the 'udp_sk(sk)->gro_complete' path. In the
> second udp{4,6}_gro_complete invocation 'encap_mark' has been cleared
> (by udp_gro_complete), so we do the SKB_GSO_FRAGLIST completion.
>
> In case SKB_GSO_FRAGLIST with no UDP tunnel, 'encap_mark' is 0 and we
> do the SKB_GSO_FRAGLIST completion.
>
> Another alternative, possibly more readable, would be avoid clearing
> 'encap_mark' in udp_gro_complete() and replacing the above check with:
>
>         if (NAPI_GRO_CB(skb)->is_flist &&
>             (!NAPI_GRO_CB(skb)->encap_mark ||
>              (NAPI_GRO_CB(skb)->encap_mark && skb->encapsulation))) {
>
> I opted otherwise to simplify the conditional expression.
>
> Please let me know if the above is somewhat more clear and if you have
> preferecens between the two options.

Got it now, thanks. Yes, this patch makes sense.

When the GRO layer has built a GSO packet with both inner UDP
(GSO_UDP_L4) and UDP tunnel header (GSO_UDP_TUNNEL), then on
completion udp{4,6}_gro_complete will be called twice. This function
will enter its is_flist branch immediately, even though that is only
correct on the second call, as GSO_FRAGLIST is only relevant for the
inner packet. Skip this while encap_mark == 1, identifying processing
of the outer tunnel header. Clear the field to enter the path on the next
round, for the inner header.

This is subject to only supporting a single layer of encap, but that
is indeed the current state afaik.
diff mbox series

Patch

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index 25134a3548e99..54e06b88af69a 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -642,6 +642,11 @@  int udp_gro_complete(struct sk_buff *skb, int nhoff,
 		skb_shinfo(skb)->gso_type = uh->check ? SKB_GSO_UDP_TUNNEL_CSUM
 					: SKB_GSO_UDP_TUNNEL;
 
+		/* clear the encap mark, so that inner frag_list gro_complete
+		 * can take place
+		 */
+		NAPI_GRO_CB(skb)->encap_mark = 0;
+
 		/* Set encapsulation before calling into inner gro_complete()
 		 * functions to make them set up the inner offsets.
 		 */
@@ -665,7 +670,8 @@  INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct iphdr *iph = ip_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (NAPI_GRO_CB(skb)->is_flist) {
+	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
+	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
 		uh->len = htons(skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);
diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c
index faa823c242923..b3d9ed96e5ea5 100644
--- a/net/ipv6/udp_offload.c
+++ b/net/ipv6/udp_offload.c
@@ -163,7 +163,8 @@  INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff)
 	const struct ipv6hdr *ipv6h = ipv6_hdr(skb);
 	struct udphdr *uh = (struct udphdr *)(skb->data + nhoff);
 
-	if (NAPI_GRO_CB(skb)->is_flist) {
+	/* do fraglist only if there is no outer UDP encap (or we already processed it) */
+	if (NAPI_GRO_CB(skb)->is_flist && !NAPI_GRO_CB(skb)->encap_mark) {
 		uh->len = htons(skb->len - nhoff);
 
 		skb_shinfo(skb)->gso_type |= (SKB_GSO_FRAGLIST|SKB_GSO_UDP_L4);