Message ID | 20240725-udp-gso-egress-from-tunnel-v1-1-5e5530ead524@cloudflare.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Fix bad offload warning when sending UDP GSO from a tunnel device | expand |
On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no > checksum offload") we have added a tweak in the UDP GSO code to mark GSO > packets being sent out as CHECKSUM_UNNECESSARY when the egress device > doesn't support checksum offload. This was done to satisfy the offload > checks in the gso stack. > > However, when sending a UDP GSO packet from a tunnel device, we will go > through the TX path and the GSO offload twice. Once for the tunnel device, > which acts as a passthru for GSO packets, and once for the underlying > egress device. > > Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO > offload checks still happen on transmit from a tunnel device. So if the skb > is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a > warning from the gso stack. I don't entirely understand. The check should not hit on pass through, where segs == skb: if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs))) skb_warn_bad_offload(skb); > Today this can occur in two situations, which we check for in > __ip_append_data() and __ip6_append_data(): > > 1) when the tunnel device does not advertise checksum offload, or > 2) when there are IPv6 extension headers present. > > To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX > path, when still in the udp layer, since we need to have ip_summed set up > correctly for GSO processing by tunnel devices. The previous patch converted segments post segmentation to CHECKSUM_UNNECESSARY, which is fine as they had already been checksummed in software, and CHECKSUM_NONE packets on egress are common. This creates GSO packets without CHECKSUM_PARTIAL. Segmentation offload always requires checksum offload. So these would be weird new packets. And having CHECKSUM_NONE (or equivalent), but entering software checksumming is also confusing. The crux is that I don't understand why the warning fires on tunnel exit when no segmentation takes place there. Hopefully we can fix in a way that does not introduce these weird GSO packets (but if not, so be it).
On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote: > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: >> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device >> doesn't support checksum offload. This was done to satisfy the offload >> checks in the gso stack. >> >> However, when sending a UDP GSO packet from a tunnel device, we will go >> through the TX path and the GSO offload twice. Once for the tunnel device, >> which acts as a passthru for GSO packets, and once for the underlying >> egress device. >> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO >> offload checks still happen on transmit from a tunnel device. So if the skb >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a >> warning from the gso stack. > > I don't entirely understand. The check should not hit on pass through, > where segs == skb: > > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && > !IS_ERR(segs))) > skb_warn_bad_offload(skb); > That's something I should have explained better. Let me try to shed some light on it now. We're hitting the skb_warn_bad_offload warning because skb_mac_gso_segment doesn't return any segments (segs == NULL). And that's because we bail out early out of __udp_gso_segment when we detect that the tunnel device is capable of tx-udp-segmentation (GSO_UDP_L4): if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { /* Packet is from an untrusted source, reset gso_segs. */ skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), mss); return NULL; } It has not occurred to me before, but in the spirit of commit 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could tighten the check to exclude cases when segs == NULL. I'm thinking of: if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path))) skb_warn_bad_offload(skb); That would be an alternative. Though I'm not sure I understand the consequences of such change fully yet. Namely if we're wouldn't be losing some diagnostics from the bad offload warning. [1] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada >> Today this can occur in two situations, which we check for in >> __ip_append_data() and __ip6_append_data(): >> >> 1) when the tunnel device does not advertise checksum offload, or >> 2) when there are IPv6 extension headers present. >> >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX >> path, when still in the udp layer, since we need to have ip_summed set up >> correctly for GSO processing by tunnel devices. > > The previous patch converted segments post segmentation to > CHECKSUM_UNNECESSARY, which is fine as they had > already been checksummed in software, and CHECKSUM_NONE > packets on egress are common. > > This creates GSO packets without CHECKSUM_PARTIAL. > Segmentation offload always requires checksum offload. So these > would be weird new packets. And having CHECKSUM_NONE (or > equivalent), but entering software checksumming is also confusing. I agree this is confusing to reason about. That is a GSO packet with CHECKSUM_UNNECESSARY which has not undergone segmentation and csum offload in software. Kind of related, I noticed that turning off tx-checksum-ip-generic with ethtool doesn't disable tx-udp-segmentation. That looks like a bug. > The crux is that I don't understand why the warning fires on tunnel > exit when no segmentation takes place there. Hopefully we can fix > in a way that does not introduce these weird GSO packets (but if > not, so be it). Attaching a self contained repro which I've been using to trace and understand the GSO code: ---8<--- sh# cat repro-full.py #!/bin/env python # # `modprobe ip6_tunnel` might be needed. # import os import subprocess import shutil from socket import * UDP_SEGMENT = 103 cmd = [shutil.which("ip"), "-batch", "/dev/stdin"] script = b""" link set dev lo up link add name sink mtu 1540 type dummy addr add dev sink fd11::2/48 nodad link set dev sink up tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink link set dev iptnl mtu 1500 addr add dev iptnl fd00::2/48 nodad link set dev iptnl up """ proc = subprocess.Popen(cmd, stdin=subprocess.PIPE) proc.communicate(input=script) os.system("ethtool -K sink tx-udp-segmentation off > /dev/null") os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null") # Alternatively to hopopts: # os.system("ethtool -K iptnl tx-checksum-ip-generic off") hopopts = b"\x00" * 8 s = socket(AF_INET6, SOCK_DGRAM) s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts) s.setsockopt(SOL_UDP, UDP_SEGMENT, 145) s.sendto(b"x" * 3000, ("fd00::1", 9)) sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py # tracer: function_graph # # CPU DURATION FUNCTION CALLS # | | | | | | | 16) | __skb_gso_segment() { 16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */ 16) 0.172 us | idle_cpu(); /* = 0x0 */ 16) | skb_mac_gso_segment() { 16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */ 16) 0.161 us | __rcu_read_lock(); /* = 0x2 */ 16) | ipv6_gso_segment() { 16) | rcu_read_lock_held() { 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.514 us | } /* rcu_read_lock_held = 0x1 */ 16) | rcu_read_lock_held() { 16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */ 16) | rcu_read_lock_held() { 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */ 16) | udp6_ufo_fragment() { 16) 0.237 us | __udp_gso_segment(); /* = 0x0 */ 16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */ 16) 3.049 us | } /* ipv6_gso_segment = 0x0 */ 16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */ 16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */ 16) | skb_warn_bad_offload() { [...] 16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */ 16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */ 16) | __skb_gso_segment() { 16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */ 16) 0.181 us | idle_cpu(); /* = 0x0 */ 16) | skb_mac_gso_segment() { 16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */ 16) 0.178 us | __rcu_read_lock(); /* = 0x3 */ 16) | ipv6_gso_segment() { 16) | rcu_read_lock_held() { 16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.556 us | } /* rcu_read_lock_held = 0x1 */ 16) | rcu_read_lock_held() { 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */ 16) | rcu_read_lock_held() { 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */ 16) | ip6ip6_gso_segment() { 16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */ 16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */ 16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */ 16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */ 16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */ 16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */ sh#
On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote: > > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > >> > >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no > >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO > >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device > >> doesn't support checksum offload. This was done to satisfy the offload > >> checks in the gso stack. > >> > >> However, when sending a UDP GSO packet from a tunnel device, we will go > >> through the TX path and the GSO offload twice. Once for the tunnel device, > >> which acts as a passthru for GSO packets, and once for the underlying > >> egress device. > >> > >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO > >> offload checks still happen on transmit from a tunnel device. So if the skb > >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a > >> warning from the gso stack. > > > > I don't entirely understand. The check should not hit on pass through, > > where segs == skb: > > > > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && > > !IS_ERR(segs))) > > skb_warn_bad_offload(skb); > > > > That's something I should have explained better. Let me try to shed some > light on it now. We're hitting the skb_warn_bad_offload warning because > skb_mac_gso_segment doesn't return any segments (segs == NULL). > > And that's because we bail out early out of __udp_gso_segment when we > detect that the tunnel device is capable of tx-udp-segmentation > (GSO_UDP_L4): > > if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { > /* Packet is from an untrusted source, reset gso_segs. */ > skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), > mss); > return NULL; > } Oh I see. Thanks. > It has not occurred to me before, but in the spirit of commit > 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could > tighten the check to exclude cases when segs == NULL. I'm thinking of: > > if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path))) > skb_warn_bad_offload(skb); That looks sensible to me. And nicer than the ip_summed conversion in udp_send_skb. > That would be an alternative. Though I'm not sure I understand the > consequences of such change fully yet. Namely if we're wouldn't be > losing some diagnostics from the bad offload warning. > > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/commit/?id=8d74e9f88d65af8bb2e095aff506aa6eac755ada > > >> Today this can occur in two situations, which we check for in > >> __ip_append_data() and __ip6_append_data(): > >> > >> 1) when the tunnel device does not advertise checksum offload, or > >> 2) when there are IPv6 extension headers present. > >> > >> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX > >> path, when still in the udp layer, since we need to have ip_summed set up > >> correctly for GSO processing by tunnel devices. > > > > The previous patch converted segments post segmentation to > > CHECKSUM_UNNECESSARY, which is fine as they had > > already been checksummed in software, and CHECKSUM_NONE > > packets on egress are common. > > > > This creates GSO packets without CHECKSUM_PARTIAL. > > Segmentation offload always requires checksum offload. So these > > would be weird new packets. And having CHECKSUM_NONE (or > > equivalent), but entering software checksumming is also confusing. > > I agree this is confusing to reason about. That is a GSO packet with > CHECKSUM_UNNECESSARY which has not undergone segmentation and csum > offload in software. I was mistaken earlier. Was looking at this code just yesterday too for https://lore.kernel.org/netdev/20240726023359.879166-1-willemdebruijn.kernel@gmail.com/ We do set the GSO skb already skb CHECKSUM_NONE. So your suggestion is not a significant change. > Kind of related, I noticed that turning off tx-checksum-ip-generic with > ethtool doesn't disable tx-udp-segmentation. That looks like a bug. I saw the same :) > > The crux is that I don't understand why the warning fires on tunnel > > exit when no segmentation takes place there. Hopefully we can fix > > in a way that does not introduce these weird GSO packets (but if > > not, so be it). > > Attaching a self contained repro which I've been using to trace and > understand the GSO code: > > ---8<--- > > sh# cat repro-full.py > #!/bin/env python > # > # `modprobe ip6_tunnel` might be needed. > # > > import os > import subprocess > import shutil > from socket import * > > UDP_SEGMENT = 103 > > cmd = [shutil.which("ip"), "-batch", "/dev/stdin"] > script = b""" > link set dev lo up > > link add name sink mtu 1540 type dummy > addr add dev sink fd11::2/48 nodad > link set dev sink up > > tunnel add iptnl mode ip6ip6 remote fd11::1 local fd11::2 dev sink > link set dev iptnl mtu 1500 > addr add dev iptnl fd00::2/48 nodad > link set dev iptnl up > """ > proc = subprocess.Popen(cmd, stdin=subprocess.PIPE) > proc.communicate(input=script) > > os.system("ethtool -K sink tx-udp-segmentation off > /dev/null") > os.system("ethtool -K sink tx-checksum-ip-generic off > /dev/null") > > # Alternatively to hopopts: > # os.system("ethtool -K iptnl tx-checksum-ip-generic off") > > hopopts = b"\x00" * 8 > s = socket(AF_INET6, SOCK_DGRAM) > s.setsockopt(IPPROTO_IPV6, IPV6_HOPOPTS, hopopts) > s.setsockopt(SOL_UDP, UDP_SEGMENT, 145) > s.sendto(b"x" * 3000, ("fd00::1", 9)) > sh# perf ftrace -G __skb_gso_segment --graph-opts noirqs,depth=5 -- unshare -n python repro-full.py > # tracer: function_graph > # > # CPU DURATION FUNCTION CALLS > # | | | | | | | > 16) | __skb_gso_segment() { > 16) 0.288 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */ > 16) 0.172 us | idle_cpu(); /* = 0x0 */ > 16) | skb_mac_gso_segment() { > 16) 0.184 us | skb_network_protocol(); /* = 0xdd86 */ > 16) 0.161 us | __rcu_read_lock(); /* = 0x2 */ > 16) | ipv6_gso_segment() { > 16) | rcu_read_lock_held() { > 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.514 us | } /* rcu_read_lock_held = 0x1 */ > 16) | rcu_read_lock_held() { > 16) 0.152 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */ > 16) | rcu_read_lock_held() { > 16) 0.151 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.459 us | } /* rcu_read_lock_held = 0x1 */ > 16) | udp6_ufo_fragment() { > 16) 0.237 us | __udp_gso_segment(); /* = 0x0 */ > 16) 0.727 us | } /* udp6_ufo_fragment = 0x0 */ > 16) 3.049 us | } /* ipv6_gso_segment = 0x0 */ > 16) 0.171 us | __rcu_read_unlock(); /* = 0x1 */ > 16) 4.748 us | } /* skb_mac_gso_segment = 0x0 */ > 16) | skb_warn_bad_offload() { > [...] > 16) ! 785.215 us | } /* skb_warn_bad_offload = 0x0 */ > 16) ! 800.986 us | } /* __skb_gso_segment = 0x0 */ > 16) | __skb_gso_segment() { > 16) 0.394 us | irq_enter_rcu(); /* = 0xffffa00c03d89ac0 */ > 16) 0.181 us | idle_cpu(); /* = 0x0 */ > 16) | skb_mac_gso_segment() { > 16) 0.182 us | skb_network_protocol(); /* = 0xdd86 */ > 16) 0.178 us | __rcu_read_lock(); /* = 0x3 */ > 16) | ipv6_gso_segment() { > 16) | rcu_read_lock_held() { > 16) 0.155 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.556 us | } /* rcu_read_lock_held = 0x1 */ > 16) | rcu_read_lock_held() { > 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */ > 16) | rcu_read_lock_held() { > 16) 0.159 us | rcu_lockdep_current_cpu_online(); /* = 0x1 */ > 16) 0.480 us | } /* rcu_read_lock_held = 0x1 */ > 16) | ip6ip6_gso_segment() { > 16) + 22.176 us | ipv6_gso_segment(); /* = 0xffffa00c03018c00 */ > 16) + 24.875 us | } /* ip6ip6_gso_segment = 0xffffa00c03018c00 */ > 16) + 27.416 us | } /* ipv6_gso_segment = 0xffffa00c03018c00 */ > 16) 0.230 us | __rcu_read_unlock(); /* = 0x2 */ > 16) + 29.065 us | } /* skb_mac_gso_segment = 0xffffa00c03018c00 */ > 16) + 32.828 us | } /* __skb_gso_segment = 0xffffa00c03018c00 */ > sh#
On Fri, Jul 26, 2024 at 09:58 AM -04, Willem de Bruijn wrote: > On Fri, Jul 26, 2024 at 7:23 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: >> >> On Thu, Jul 25, 2024 at 10:21 AM -04, Willem de Bruijn wrote: >> > On Thu, Jul 25, 2024 at 5:56 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: >> >> >> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no >> >> checksum offload") we have added a tweak in the UDP GSO code to mark GSO >> >> packets being sent out as CHECKSUM_UNNECESSARY when the egress device >> >> doesn't support checksum offload. This was done to satisfy the offload >> >> checks in the gso stack. >> >> >> >> However, when sending a UDP GSO packet from a tunnel device, we will go >> >> through the TX path and the GSO offload twice. Once for the tunnel device, >> >> which acts as a passthru for GSO packets, and once for the underlying >> >> egress device. >> >> >> >> Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO >> >> offload checks still happen on transmit from a tunnel device. So if the skb >> >> is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a >> >> warning from the gso stack. >> > >> > I don't entirely understand. The check should not hit on pass through, >> > where segs == skb: >> > >> > if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && >> > !IS_ERR(segs))) >> > skb_warn_bad_offload(skb); >> > >> >> That's something I should have explained better. Let me try to shed some >> light on it now. We're hitting the skb_warn_bad_offload warning because >> skb_mac_gso_segment doesn't return any segments (segs == NULL). >> >> And that's because we bail out early out of __udp_gso_segment when we >> detect that the tunnel device is capable of tx-udp-segmentation >> (GSO_UDP_L4): >> >> if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { >> /* Packet is from an untrusted source, reset gso_segs. */ >> skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), >> mss); >> return NULL; >> } > > Oh I see. Thanks. > >> It has not occurred to me before, but in the spirit of commit >> 8d74e9f88d65 "net: avoid skb_warn_bad_offload on IS_ERR" [1], we could >> tighten the check to exclude cases when segs == NULL. I'm thinking of: >> >> if (segs != skb && !IS_ERR_OR_NULL(segs) && unlikely(skb_needs_check(skb, tx_path))) >> skb_warn_bad_offload(skb); > > That looks sensible to me. And nicer than the ip_summed conversion in > udp_send_skb. I've audited all existing ->gso_segment callbacks. skb_mac_gso_segment() returns no segments, that is segs == NULL, if the callback chain ends with either of these: … → udp[46]_ufo_fragment → __udp_gso_segment → skb_gso_ok == true … → tcp[46]_gso_segment → tcp_gso_segment → skb_gso_ok == true … → sctp_gso_segment → skb_gso_ok == true IOW when the device advertises that it can handle the desired GSO kind (skb_gso_ok() returns true). Considering that a device offering HW GSO and no checksum offload at the same time makes no sense, I also think that tweaking the bad offload detection to exclude the !segs case doesn't deprive us of diagnostics. I will change to that in v2.
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 49c622e743e8..b7254b8a1e56 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -946,6 +946,13 @@ static int udp_send_skb(struct sk_buff *skb, struct flowi4 *fl4, } if (datalen > cork->gso_size) { + /* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY + * have the same meaning. However, check for bad + * offloads in the GSO stack expects the latter, if the + * checksum can be calculated in software. + */ + if (skb->ip_summed == CHECKSUM_NONE) + skb->ip_summed = CHECKSUM_UNNECESSARY; skb_shinfo(skb)->gso_size = cork->gso_size; skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4; skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen, diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index aa2e0a28ca61..59448a2dbf2c 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -357,14 +357,6 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, else uh->check = gso_make_checksum(seg, ~check) ? : CSUM_MANGLED_0; - /* On the TX path, CHECKSUM_NONE and CHECKSUM_UNNECESSARY have the same - * meaning. However, check for bad offloads in the GSO stack expects the - * latter, if the checksum was calculated in software. To vouch for the - * segment skbs we actually need to set it on the gso_skb. - */ - if (gso_skb->ip_summed == CHECKSUM_NONE) - gso_skb->ip_summed = CHECKSUM_UNNECESSARY; - /* update refcount for the packet */ if (copy_dtor) { int delta = sum_truesize - gso_skb->truesize; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 6602a2e9cdb5..360392fc2b68 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -1262,6 +1262,13 @@ static int udp_v6_send_skb(struct sk_buff *skb, struct flowi6 *fl6, } if (datalen > cork->gso_size) { + /* On the TX path CHECKSUM_NONE and CHECKSUM_UNNECESSARY + * have the same meaning. However, check for bad + * offloads in the GSO stack expects the latter, if the + * checksum can be calculated in software. + */ + if (skb->ip_summed == CHECKSUM_NONE) + skb->ip_summed = CHECKSUM_UNNECESSARY; skb_shinfo(skb)->gso_size = cork->gso_size; skb_shinfo(skb)->gso_type = SKB_GSO_UDP_L4; skb_shinfo(skb)->gso_segs = DIV_ROUND_UP(datalen,
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") we have added a tweak in the UDP GSO code to mark GSO packets being sent out as CHECKSUM_UNNECESSARY when the egress device doesn't support checksum offload. This was done to satisfy the offload checks in the gso stack. However, when sending a UDP GSO packet from a tunnel device, we will go through the TX path and the GSO offload twice. Once for the tunnel device, which acts as a passthru for GSO packets, and once for the underlying egress device. Even though a tunnel device acts as a passthru for a UDP GSO packet, GSO offload checks still happen on transmit from a tunnel device. So if the skb is not marked as CHECKSUM_UNNECESSARY or CHECKSUM_PARTIAL, we will get a warning from the gso stack. Today this can occur in two situations, which we check for in __ip_append_data() and __ip6_append_data(): 1) when the tunnel device does not advertise checksum offload, or 2) when there are IPv6 extension headers present. Syzbot has triggered the second case, producing a warning as below: ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 Modules linked in: CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 [...] Call Trace: <TASK> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xef/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 ___sys_sendmsg net/socket.c:2639 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 __do_sys_sendmmsg net/socket.c:2754 [inline] __se_sys_sendmmsg net/socket.c:2751 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [...] </TASK> To fix it mark UDP_GSO packets as CHECKSUM_UNNECESSARY early on the TX path, when still in the udp layer, since we need to have ip_summed set up correctly for GSO processing by tunnel devices. Note that even if GSO packet gets marked as CHECKSUM_PARTIAL due to tunnel advertising HW csum offload, it will not prevent software csum offload in UDP GSO from kicking in if the underlying device doesn't offer csum offload (for example, a TUN/TAP device with default config). This is because we recheck device features in gso stack instead relying on the ip_summed hint. Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- net/ipv4/udp.c | 7 +++++++ net/ipv4/udp_offload.c | 8 -------- net/ipv6/udp.c | 7 +++++++ 3 files changed, 14 insertions(+), 8 deletions(-)