diff mbox series

[net-next,v2,1/8] udp: fixup csum for GSO receive slow path

Message ID 28d04433c648ea8143c199459bfe60650b1a0d28.1616692794.git.pabeni@redhat.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series udp: GRO L4 improvements | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net-next
netdev/subject_prefix success Link
netdev/cc_maintainers warning 2 maintainers not CCed: yoshfuji@linux-ipv6.org dsahern@kernel.org
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 185 this patch: 185
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch warning WARNING: line length of 83 exceeds 80 columns
netdev/build_allmodconfig_warn success Errors and warnings before: 293 this patch: 293
netdev/header_inline success Link

Commit Message

Paolo Abeni March 25, 2021, 5:24 p.m. UTC
When UDP packets generated locally by a socket with UDP_SEGMENT
traverse the following path:

UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
	UDP tunnel (rx) -> UDP socket (no UDP_GRO)

they are segmented as part of the rx socket receive operation, and
present a CHECKSUM_NONE after segmentation.

Additionally the segmented packets UDP CB still refers to the original
GSO packet len. Overall that causes unexpected/wrong csum validation
errors later in the UDP receive path.

We could possibly address the issue with some additional checks and
csum mangling in the UDP tunnel code. Since the issue affects only
this UDP receive slow path, let's set a suitable csum status there.

v1 -> v2:
 - restrict the csum update to the packets strictly needing them
 - hopefully clarify the commit message and code comments

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
 include/net/udp.h | 18 ++++++++++++++++++
 net/ipv4/udp.c    |  2 ++
 net/ipv6/udp.c    |  1 +
 3 files changed, 21 insertions(+)

Comments

Willem de Bruijn March 26, 2021, 6:30 p.m. UTC | #1
On Thu, Mar 25, 2021 at 1:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> When UDP packets generated locally by a socket with UDP_SEGMENT
> traverse the following path:
>
> UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
>         UDP tunnel (rx) -> UDP socket (no UDP_GRO)
>
> they are segmented as part of the rx socket receive operation, and
> present a CHECKSUM_NONE after segmentation.

would be good to capture how this happens, as it was not immediately obvious.

>
> Additionally the segmented packets UDP CB still refers to the original
> GSO packet len. Overall that causes unexpected/wrong csum validation
> errors later in the UDP receive path.
>
> We could possibly address the issue with some additional checks and
> csum mangling in the UDP tunnel code. Since the issue affects only
> this UDP receive slow path, let's set a suitable csum status there.
>
> v1 -> v2:
>  - restrict the csum update to the packets strictly needing them
>  - hopefully clarify the commit message and code comments
>
> Signed-off-by: Paolo Abeni <pabeni@redhat.com>

> +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> +               skb->csum_valid = 1;

Not entirely obvious is that UDP packets arriving on a device with rx
checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
this test.

I assume that such packets are not coalesced by the GRO layer in the
first place. But I can't immediately spot the reason for it..
Paolo Abeni March 29, 2021, 11:25 a.m. UTC | #2
On Fri, 2021-03-26 at 14:30 -0400, Willem de Bruijn wrote:
> On Thu, Mar 25, 2021 at 1:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > When UDP packets generated locally by a socket with UDP_SEGMENT
> > traverse the following path:
> > 
> > UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
> >         UDP tunnel (rx) -> UDP socket (no UDP_GRO)
> > 
> > they are segmented as part of the rx socket receive operation, and
> > present a CHECKSUM_NONE after segmentation.
> 
> would be good to capture how this happens, as it was not immediately obvious.

The CHECKSUM_PARTIAL is propagated up to the UDP tunnel processing,
where we have:

	__iptunnel_pull_header() -> skb_pull_rcsum() ->
skb_postpull_rcsum() -> __skb_postpull_rcsum() and the latter do the
conversion.

> > Additionally the segmented packets UDP CB still refers to the original
> > GSO packet len. Overall that causes unexpected/wrong csum validation
> > errors later in the UDP receive path.
> > 
> > We could possibly address the issue with some additional checks and
> > csum mangling in the UDP tunnel code. Since the issue affects only
> > this UDP receive slow path, let's set a suitable csum status there.
> > 
> > v1 -> v2:
> >  - restrict the csum update to the packets strictly needing them
> >  - hopefully clarify the commit message and code comments
> > 
> > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > +               skb->csum_valid = 1;
> 
> Not entirely obvious is that UDP packets arriving on a device with rx
> checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> this test.
> 
> I assume that such packets are not coalesced by the GRO layer in the
> first place. But I can't immediately spot the reason for it..

Packets with CHECKSUM_NONE are actually aggregated by the GRO engine. 

Their checksum is validated by:

udp4_gro_receive -> skb_gro_checksum_validate_zero_check()
	-> __skb_gro_checksum_validate -> __skb_gro_checksum_validate_complete() 

and skb->ip_summed is changed to CHECKSUM_UNNECESSARY by:

__skb_gro_checksum_validate -> skb_gro_incr_csum_unnecessary
	-> __skb_incr_checksum_unnecessary()

and finally to CHECKSUM_PARTIAL by:

udp4_gro_complete() -> udp_gro_complete() -> udp_gro_complete_segment()

Do you prefer I resubmit with some more comments, either in the commit
message or in the code?

Thanks

Paolo

side note: perf probe here is fooled by skb->ip_summed being a bitfield
and does not dump the real value. I had to look at skb-
>__pkt_type_offset[0] instead.
Willem de Bruijn March 29, 2021, 12:28 p.m. UTC | #3
On Mon, Mar 29, 2021 at 7:26 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Fri, 2021-03-26 at 14:30 -0400, Willem de Bruijn wrote:
> > On Thu, Mar 25, 2021 at 1:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > When UDP packets generated locally by a socket with UDP_SEGMENT
> > > traverse the following path:
> > >
> > > UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
> > >         UDP tunnel (rx) -> UDP socket (no UDP_GRO)
> > >
> > > they are segmented as part of the rx socket receive operation, and
> > > present a CHECKSUM_NONE after segmentation.
> >
> > would be good to capture how this happens, as it was not immediately obvious.
>
> The CHECKSUM_PARTIAL is propagated up to the UDP tunnel processing,
> where we have:
>
>         __iptunnel_pull_header() -> skb_pull_rcsum() ->
> skb_postpull_rcsum() -> __skb_postpull_rcsum() and the latter do the
> conversion.

Please capture this in the commit message.

> > > Additionally the segmented packets UDP CB still refers to the original
> > > GSO packet len. Overall that causes unexpected/wrong csum validation
> > > errors later in the UDP receive path.
> > >
> > > We could possibly address the issue with some additional checks and
> > > csum mangling in the UDP tunnel code. Since the issue affects only
> > > this UDP receive slow path, let's set a suitable csum status there.
> > >
> > > v1 -> v2:
> > >  - restrict the csum update to the packets strictly needing them
> > >  - hopefully clarify the commit message and code comments
> > >
> > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > +               skb->csum_valid = 1;
> >
> > Not entirely obvious is that UDP packets arriving on a device with rx
> > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > this test.
> >
> > I assume that such packets are not coalesced by the GRO layer in the
> > first place. But I can't immediately spot the reason for it..
>
> Packets with CHECKSUM_NONE are actually aggregated by the GRO engine.
>
> Their checksum is validated by:
>
> udp4_gro_receive -> skb_gro_checksum_validate_zero_check()
>         -> __skb_gro_checksum_validate -> __skb_gro_checksum_validate_complete()
>
> and skb->ip_summed is changed to CHECKSUM_UNNECESSARY by:
>
> __skb_gro_checksum_validate -> skb_gro_incr_csum_unnecessary
>         -> __skb_incr_checksum_unnecessary()
>
> and finally to CHECKSUM_PARTIAL by:
>
> udp4_gro_complete() -> udp_gro_complete() -> udp_gro_complete_segment()
>
> Do you prefer I resubmit with some more comments, either in the commit
> message or in the code?

That breaks the checksum-and-copy optimization when delivering to
local sockets. I wonder if that is a regression.
Paolo Abeni March 29, 2021, 1:24 p.m. UTC | #4
On Mon, 2021-03-29 at 08:28 -0400, Willem de Bruijn wrote:
> On Mon, Mar 29, 2021 at 7:26 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Fri, 2021-03-26 at 14:30 -0400, Willem de Bruijn wrote:
> > > On Thu, Mar 25, 2021 at 1:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
> > > > When UDP packets generated locally by a socket with UDP_SEGMENT
> > > > traverse the following path:
> > > > 
> > > > UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) ->
> > > >         UDP tunnel (rx) -> UDP socket (no UDP_GRO)
> > > > 
> > > > they are segmented as part of the rx socket receive operation, and
> > > > present a CHECKSUM_NONE after segmentation.
> > > 
> > > would be good to capture how this happens, as it was not immediately obvious.
> > 
> > The CHECKSUM_PARTIAL is propagated up to the UDP tunnel processing,
> > where we have:
> > 
> >         __iptunnel_pull_header() -> skb_pull_rcsum() ->
> > skb_postpull_rcsum() -> __skb_postpull_rcsum() and the latter do the
> > conversion.
> 
> Please capture this in the commit message.

I will do.

> > > > Additionally the segmented packets UDP CB still refers to the original
> > > > GSO packet len. Overall that causes unexpected/wrong csum validation
> > > > errors later in the UDP receive path.
> > > > 
> > > > We could possibly address the issue with some additional checks and
> > > > csum mangling in the UDP tunnel code. Since the issue affects only
> > > > this UDP receive slow path, let's set a suitable csum status there.
> > > > 
> > > > v1 -> v2:
> > > >  - restrict the csum update to the packets strictly needing them
> > > >  - hopefully clarify the commit message and code comments
> > > > 
> > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > > +               skb->csum_valid = 1;
> > > 
> > > Not entirely obvious is that UDP packets arriving on a device with rx
> > > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > > this test.
> > > 
> > > I assume that such packets are not coalesced by the GRO layer in the
> > > first place. But I can't immediately spot the reason for it..
> > 
> > Packets with CHECKSUM_NONE are actually aggregated by the GRO engine.
> > 
> > Their checksum is validated by:
> > 
> > udp4_gro_receive -> skb_gro_checksum_validate_zero_check()
> >         -> __skb_gro_checksum_validate -> __skb_gro_checksum_validate_complete()
> > 
> > and skb->ip_summed is changed to CHECKSUM_UNNECESSARY by:
> > 
> > __skb_gro_checksum_validate -> skb_gro_incr_csum_unnecessary
> >         -> __skb_incr_checksum_unnecessary()
> > 
> > and finally to CHECKSUM_PARTIAL by:
> > 
> > udp4_gro_complete() -> udp_gro_complete() -> udp_gro_complete_segment()
> > 
> > Do you prefer I resubmit with some more comments, either in the commit
> > message or in the code?
> 
> That breaks the checksum-and-copy optimization when delivering to
> local sockets. I wonder if that is a regression.

The conversion to CHECKSUM_UNNECESSARY happens since
commit 573e8fca255a27e3573b51f9b183d62641c47a3d.

Even the conversion to CHECKSUM_PARTIAL happens independently from this
series, since commit 6f1c0ea133a6e4a193a7b285efe209664caeea43.

I don't see a regression here ?!?

Thanks!

Paolo
Willem de Bruijn March 29, 2021, 1:52 p.m. UTC | #5
> > > > > Additionally the segmented packets UDP CB still refers to the original
> > > > > GSO packet len. Overall that causes unexpected/wrong csum validation
> > > > > errors later in the UDP receive path.
> > > > >
> > > > > We could possibly address the issue with some additional checks and
> > > > > csum mangling in the UDP tunnel code. Since the issue affects only
> > > > > this UDP receive slow path, let's set a suitable csum status there.
> > > > >
> > > > > v1 -> v2:
> > > > >  - restrict the csum update to the packets strictly needing them
> > > > >  - hopefully clarify the commit message and code comments
> > > > >
> > > > > Signed-off-by: Paolo Abeni <pabeni@redhat.com>
> > > > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > > > +               skb->csum_valid = 1;
> > > >
> > > > Not entirely obvious is that UDP packets arriving on a device with rx
> > > > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > > > this test.
> > > >
> > > > I assume that such packets are not coalesced by the GRO layer in the
> > > > first place. But I can't immediately spot the reason for it..
> > >
> > > Packets with CHECKSUM_NONE are actually aggregated by the GRO engine.
> > >
> > > Their checksum is validated by:
> > >
> > > udp4_gro_receive -> skb_gro_checksum_validate_zero_check()
> > >         -> __skb_gro_checksum_validate -> __skb_gro_checksum_validate_complete()
> > >
> > > and skb->ip_summed is changed to CHECKSUM_UNNECESSARY by:
> > >
> > > __skb_gro_checksum_validate -> skb_gro_incr_csum_unnecessary
> > >         -> __skb_incr_checksum_unnecessary()
> > >
> > > and finally to CHECKSUM_PARTIAL by:
> > >
> > > udp4_gro_complete() -> udp_gro_complete() -> udp_gro_complete_segment()
> > >
> > > Do you prefer I resubmit with some more comments, either in the commit
> > > message or in the code?
> >
> > That breaks the checksum-and-copy optimization when delivering to
> > local sockets. I wonder if that is a regression.
>
> The conversion to CHECKSUM_UNNECESSARY happens since
> commit 573e8fca255a27e3573b51f9b183d62641c47a3d.
>
> Even the conversion to CHECKSUM_PARTIAL happens independently from this
> series, since commit 6f1c0ea133a6e4a193a7b285efe209664caeea43.
>
> I don't see a regression here ?!?

I mean that UDP packets with local destination socket and no tunnels
that arrive with CHECKSUM_NONE normally benefit from the
checksum-and-copy optimization in recvmsg() when copying to user.

If those packets are now checksummed during GRO, that voids that
optimization, and the packet payload is now touched twice.
Paolo Abeni March 29, 2021, 3 p.m. UTC | #6
On Mon, 2021-03-29 at 09:52 -0400, Willem de Bruijn wrote:
> > > That breaks the checksum-and-copy optimization when delivering to
> > > local sockets. I wonder if that is a regression.
> > 
> > The conversion to CHECKSUM_UNNECESSARY happens since
> > commit 573e8fca255a27e3573b51f9b183d62641c47a3d.
> > 
> > Even the conversion to CHECKSUM_PARTIAL happens independently from this
> > series, since commit 6f1c0ea133a6e4a193a7b285efe209664caeea43.
> > 
> > I don't see a regression here ?!?
> 
> I mean that UDP packets with local destination socket and no tunnels
> that arrive with CHECKSUM_NONE normally benefit from the
> checksum-and-copy optimization in recvmsg() when copying to user.
> 
> If those packets are now checksummed during GRO, that voids that
> optimization, and the packet payload is now touched twice.

The 'now' part confuses me. Nothing in this patch or this series
changes the processing of CHECKSUM_NONE UDP packets with no tunnel.

I do see checksum validation in the GRO engine for CHECKSUM_NONE UDP
packet prior to this series.

I *think* the checksum-and-copy optimization is lost
since 573e8fca255a27e3573b51f9b183d62641c47a3d.

Regards,

Paolo
Willem de Bruijn March 29, 2021, 3:24 p.m. UTC | #7
On Mon, Mar 29, 2021 at 11:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-29 at 09:52 -0400, Willem de Bruijn wrote:
> > > > That breaks the checksum-and-copy optimization when delivering to
> > > > local sockets. I wonder if that is a regression.
> > >
> > > The conversion to CHECKSUM_UNNECESSARY happens since
> > > commit 573e8fca255a27e3573b51f9b183d62641c47a3d.
> > >
> > > Even the conversion to CHECKSUM_PARTIAL happens independently from this
> > > series, since commit 6f1c0ea133a6e4a193a7b285efe209664caeea43.
> > >
> > > I don't see a regression here ?!?
> >
> > I mean that UDP packets with local destination socket and no tunnels
> > that arrive with CHECKSUM_NONE normally benefit from the
> > checksum-and-copy optimization in recvmsg() when copying to user.
> >
> > If those packets are now checksummed during GRO, that voids that
> > optimization, and the packet payload is now touched twice.
>
> The 'now' part confuses me. Nothing in this patch or this series
> changes the processing of CHECKSUM_NONE UDP packets with no tunnel.

Agreed. I did not mean to imply that this patch changes that. I was
responding to

> > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > +               skb->csum_valid = 1;
> >
> > Not entirely obvious is that UDP packets arriving on a device with rx
> > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > this test.
> >
> > I assume that such packets are not coalesced by the GRO layer in the
> > first place. But I can't immediately spot the reason for it..

As you point out, such packets will already have had their checksum
verified at this point, so this branch only matches tunneled packets.
That point is just not immediately obvious from the code.

> I do see checksum validation in the GRO engine for CHECKSUM_NONE UDP
> packet prior to this series.
>
> I *think* the checksum-and-copy optimization is lost
> since 573e8fca255a27e3573b51f9b183d62641c47a3d.

Wouldn't this have been introduced with UDP_GRO?
Paolo Abeni March 29, 2021, 4:23 p.m. UTC | #8
On Mon, 2021-03-29 at 11:24 -0400, Willem de Bruijn wrote:
> On Mon, Mar 29, 2021 at 11:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > On Mon, 2021-03-29 at 09:52 -0400, Willem de Bruijn wrote:
> > > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > > +               skb->csum_valid = 1;
> > > 
> > > Not entirely obvious is that UDP packets arriving on a device with rx
> > > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > > this test.
> > > 
> > > I assume that such packets are not coalesced by the GRO layer in the
> > > first place. But I can't immediately spot the reason for it..
> 
> As you point out, such packets will already have had their checksum
> verified at this point, so this branch only matches tunneled packets.
> That point is just not immediately obvious from the code.

I understand is a matter of comment clarity ?!?

I'll rewrite the related code comment - in udp_post_segment_fix_csum()
- as:

	/* UDP packets generated with UDP_SEGMENT and traversing:
	 *
         * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
	 * 
         * land here with CHECKSUM_NONE, because __iptunnel_pull_header() converts
         * CHECKSUM_PARTIAL into NONE.
	 * SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets with no UDP tunnel will land
	 * here with valid checksum, as the GRO engine validates the UDP csum
	 * before the aggregation and nobody strips such info in between.
	 * Instead of adding another check in the tunnel fastpath, we can force
	 * a valid csum here.
         * Additionally fixup the UDP CB.
         */

Would that be clear enough?

> > I do see checksum validation in the GRO engine for CHECKSUM_NONE UDP
> > packet prior to this series.
> > 
> > I *think* the checksum-and-copy optimization is lost
> > since 573e8fca255a27e3573b51f9b183d62641c47a3d.
> 
> Wouldn't this have been introduced with UDP_GRO?

Uhmm.... looks like the checksum-and-copy optimization has been lost
and recovered a few times. I think the last one
with 9fd1ff5d2ac7181844735806b0a703c942365291, which move the csum
validation before the static branch on udp_encap_needed_key.

Can we agree re-introducing the optimization is independent from this
series?

Thanks!

Paolo
Willem de Bruijn March 29, 2021, 10:37 p.m. UTC | #9
On Mon, Mar 29, 2021 at 12:24 PM Paolo Abeni <pabeni@redhat.com> wrote:
>
> On Mon, 2021-03-29 at 11:24 -0400, Willem de Bruijn wrote:
> > On Mon, Mar 29, 2021 at 11:01 AM Paolo Abeni <pabeni@redhat.com> wrote:
> > > On Mon, 2021-03-29 at 09:52 -0400, Willem de Bruijn wrote:
> > > > > +       if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
> > > > > +               skb->csum_valid = 1;
> > > >
> > > > Not entirely obvious is that UDP packets arriving on a device with rx
> > > > checksum offload off, i.e., with CHECKSUM_NONE, are not matched by
> > > > this test.
> > > >
> > > > I assume that such packets are not coalesced by the GRO layer in the
> > > > first place. But I can't immediately spot the reason for it..
> >
> > As you point out, such packets will already have had their checksum
> > verified at this point, so this branch only matches tunneled packets.
> > That point is just not immediately obvious from the code.
>
> I understand is a matter of comment clarity ?!?
>
> I'll rewrite the related code comment - in udp_post_segment_fix_csum()
> - as:
>
>         /* UDP packets generated with UDP_SEGMENT and traversing:
>          *
>          * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
>          *
>          * land here with CHECKSUM_NONE, because __iptunnel_pull_header() converts
>          * CHECKSUM_PARTIAL into NONE.
>          * SKB_GSO_UDP_L4 or SKB_GSO_FRAGLIST packets with no UDP tunnel will land
>          * here with valid checksum, as the GRO engine validates the UDP csum
>          * before the aggregation and nobody strips such info in between.
>          * Instead of adding another check in the tunnel fastpath, we can force
>          * a valid csum here.
>          * Additionally fixup the UDP CB.
>          */
>
> Would that be clear enough?

Definitely. Thanks!

> > > I do see checksum validation in the GRO engine for CHECKSUM_NONE UDP
> > > packet prior to this series.
> > >
> > > I *think* the checksum-and-copy optimization is lost
> > > since 573e8fca255a27e3573b51f9b183d62641c47a3d.
> >
> > Wouldn't this have been introduced with UDP_GRO?
>
> Uhmm.... looks like the checksum-and-copy optimization has been lost
> and recovered a few times. I think the last one
> with 9fd1ff5d2ac7181844735806b0a703c942365291, which move the csum
> validation before the static branch on udp_encap_needed_key.
>
> Can we agree re-introducing the optimization is independent from this
> series?

Yep :)
> Thanks!
>
> Paolo
>
>
diff mbox series

Patch

diff --git a/include/net/udp.h b/include/net/udp.h
index d4d064c592328..7fc735919f4df 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -515,6 +515,24 @@  static inline struct sk_buff *udp_rcv_segment(struct sock *sk,
 	return segs;
 }
 
+static inline void udp_post_segment_fix_csum(struct sk_buff *skb)
+{
+	/* UDP-lite can't land here - no GRO */
+	WARN_ON_ONCE(UDP_SKB_CB(skb)->partial_cov);
+
+	/* UDP packets generated with UDP_SEGMENT and traversing:
+	 * UDP tunnel(xmit) -> veth (segmentation) -> veth (gro) -> UDP tunnel (rx)
+	 * land here with CHECKSUM_NONE. Instead of adding another check
+	 * in the tunnel fastpath, we can force valid csums here:
+	 * packets are locally generated and the GRO engine already validated
+	 * the csum.
+	 * Additionally fixup the UDP CB
+	 */
+	UDP_SKB_CB(skb)->cscov = skb->len;
+	if (skb->ip_summed == CHECKSUM_NONE && !skb->csum_valid)
+		skb->csum_valid = 1;
+}
+
 #ifdef CONFIG_BPF_SYSCALL
 struct sk_psock;
 struct proto *udp_bpf_get_proto(struct sock *sk, struct sk_psock *psock);
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 4a0478b17243a..fe85dcf8c0087 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -2178,6 +2178,8 @@  static int udp_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	segs = udp_rcv_segment(sk, skb, true);
 	skb_list_walk_safe(segs, skb, next) {
 		__skb_pull(skb, skb_transport_offset(skb));
+
+		udp_post_segment_fix_csum(skb);
 		ret = udp_queue_rcv_one_skb(sk, skb);
 		if (ret > 0)
 			ip_protocol_deliver_rcu(dev_net(skb->dev), skb, ret);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index d25e5a9252fdb..fa2f547383925 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -749,6 +749,7 @@  static int udpv6_queue_rcv_skb(struct sock *sk, struct sk_buff *skb)
 	skb_list_walk_safe(segs, skb, next) {
 		__skb_pull(skb, skb_transport_offset(skb));
 
+		udp_post_segment_fix_csum(skb);
 		ret = udpv6_queue_rcv_one_skb(sk, skb);
 		if (ret > 0)
 			ip6_protocol_deliver_rcu(dev_net(skb->dev), skb, ret,