Message ID | 20240801-udp-gso-egress-from-tunnel-v2-1-9a2af2f15d8d@cloudflare.com (mailing list archive) |
---|---|
State | Deferred |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | Silence bad offload warning when sending UDP GSO with IPv6 extension headers | expand |
On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > > In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no > checksum offload") we have intentionally allowed UDP GSO packets marked > CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and > checksummed by a software fallback when the egress device lacks these > features. > > What was not taken into consideration is that a CHECKSUM_NONE skb can be > handed over to the GSO stack also when the egress device advertises the > tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature. > > This can happen in two situations, which we detect in __ip_append_data() > and __ip6_append_data(): > > 1) when there are IPv6 extension headers present, or > 2) when the tunnel device does not advertise checksum offload. > > Note that in the latter case we have a nonsensical device configuration. > Device support for UDP segmentation offload requires checksum offload in > hardware as well. > > Syzbot has discovered the first case, producing a warning as below: > > ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) > WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 > Modules linked in: > CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 > Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 > RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 > [...] > Call Trace: > <TASK> > __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 > skb_gso_segment include/net/gso.h:83 [inline] > validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 > __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 > neigh_output include/net/neighbour.h:542 [inline] > ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 > ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 > ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 > udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 > udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 > sock_sendmsg_nosec net/socket.c:730 [inline] > __sock_sendmsg+0xef/0x270 net/socket.c:745 > ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 > ___sys_sendmsg net/socket.c:2639 [inline] > __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 > __do_sys_sendmmsg net/socket.c:2754 [inline] > __se_sys_sendmmsg net/socket.c:2751 [inline] > __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 > do_syscall_x64 arch/x86/entry/common.c:52 [inline] > do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > entry_SYSCALL_64_after_hwframe+0x77/0x7f > [...] > </TASK> > > We are hitting the bad offload warning because when an egress device is > capable of handling segmentation offload requested by > skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce > any segment skbs and return NULL. See the skb_gso_ok() branch in > {__udp,tcp,sctp}_gso_segment helpers. > > To fix it, skip bad offload detection when gso_segment has returned > nothing. We know that in such case the egress device supports the desired > GSO offload, which implies that it can fill in L4 checksums. Hence we don't > need to check the skb->ip_summed value, which reflects the egress device > checksum capabilities. > > Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") > Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ > Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> Reviewed-by: Willem de Bruijn <willemb@google.com> It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just being ignored and devices are trusted to always be able to checksum offload when they can segment offload -- even when the device does not advertise checksum offload. I think we should have a follow-on that makes advertising NETIF_F_GSO_UDP_L4 dependent on having at least one of the NETIF_F_*_CSUM bits set (handwaving over what happens when only advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote: > On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: >> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no >> checksum offload") we have intentionally allowed UDP GSO packets marked >> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and >> checksummed by a software fallback when the egress device lacks these >> features. >> >> What was not taken into consideration is that a CHECKSUM_NONE skb can be >> handed over to the GSO stack also when the egress device advertises the >> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature. >> >> This can happen in two situations, which we detect in __ip_append_data() >> and __ip6_append_data(): >> >> 1) when there are IPv6 extension headers present, or >> 2) when the tunnel device does not advertise checksum offload. >> >> Note that in the latter case we have a nonsensical device configuration. >> Device support for UDP segmentation offload requires checksum offload in >> hardware as well. >> >> Syzbot has discovered the first case, producing a warning as below: >> >> ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) >> WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 >> Modules linked in: >> CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 >> RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 >> [...] >> Call Trace: >> <TASK> >> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 >> skb_gso_segment include/net/gso.h:83 [inline] >> validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 >> __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 >> neigh_output include/net/neighbour.h:542 [inline] >> ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 >> ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 >> ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 >> udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 >> udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 >> sock_sendmsg_nosec net/socket.c:730 [inline] >> __sock_sendmsg+0xef/0x270 net/socket.c:745 >> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 >> ___sys_sendmsg net/socket.c:2639 [inline] >> __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 >> __do_sys_sendmmsg net/socket.c:2754 [inline] >> __se_sys_sendmmsg net/socket.c:2751 [inline] >> __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] >> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 >> entry_SYSCALL_64_after_hwframe+0x77/0x7f >> [...] >> </TASK> >> >> We are hitting the bad offload warning because when an egress device is >> capable of handling segmentation offload requested by >> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce >> any segment skbs and return NULL. See the skb_gso_ok() branch in >> {__udp,tcp,sctp}_gso_segment helpers. >> >> To fix it, skip bad offload detection when gso_segment has returned >> nothing. We know that in such case the egress device supports the desired >> GSO offload, which implies that it can fill in L4 checksums. Hence we don't >> need to check the skb->ip_summed value, which reflects the egress device >> checksum capabilities. >> >> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") >> Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com >> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > > Reviewed-by: Willem de Bruijn <willemb@google.com> > > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just > being ignored and devices are trusted to always be able to checksum > offload when they can segment offload -- even when the device does not > advertise checksum offload. > > I think we should have a follow-on that makes advertising > NETIF_F_GSO_UDP_L4 dependent on having at least one of the > NETIF_F_*_CSUM bits set (handwaving over what happens when only > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM). I agree. I've also gained some clarity as to how the fix should look. Let's circle back to it, if we still think it's relevant once we hash out the fix. After spending some quality time debugging the addded regression test [1], I've realized this fix is wrong. You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload"), I've opened up the UDP_SEGMENT API to two uses, which I think should not be allowed: 1. Hardware USO for IPv6 dgrams with extension headers Previously that led to -EIO, because __ip6_append_data won't annotate such packets as CHECKSUM_PARTIAL. I'm guessing that we do this because some drivers that advertise csum offload can't actually handle checksumming when extension headers are present. Extension headers are not part of IPv6 pseudo header, but who knows what limitations NIC firmwares have. Either way, changing it just like that sounds risky, so I think we need to fall back to software USO with software checksum in this case. Alternatively, we could catch it in the udp layer and error out with EIO as ealier. But that shifts some burden onto the user space (detect and segment before sendmsg()). 2. Hardware USO when hardware csum is unsupported / disabled That sounds like a pathological device configuration case, but since it is possible today with some drivers to disable csum offload for one IP version and not the other, it seems safest to just handle that gracefully. I don't know why one might want to do that. Perhaps as a workaround for some firmware bug while waiting for a fix? In this scenario I think we also need to fall back to software USO and checksum. Code-wise that could look like below. WDYT? [1] I feel silly for submitting a broken patch. I've been running a stale test prog in a VM. It seems that virtme-ng with a read+write overlay for rootfs played a trick on me. Changes to the host files are not (always?) visible to the guest. ---8<--- diff --git a/net/core/skbuff.c b/net/core/skbuff.c index 83f8cd8aa2d1..819173807c81 100644 --- a/net/core/skbuff.c +++ b/net/core/skbuff.c @@ -4806,7 +4806,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, goto perform_csum_check; if (!sg) { - if (!csum) { + /* Fall back to software checksum for segments if: + * 1) device can't checksum this network protocol, OR + * 2) we consider the packet to be not checksummable in + * hardware, for example IPv6 extension headers are + * present. + */ + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) { if (!nskb->remcsum_offload) nskb->ip_summed = CHECKSUM_NONE; SKB_GSO_CB(nskb)->csum = @@ -4896,7 +4902,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, nskb->truesize += nskb->data_len; perform_csum_check: - if (!csum) { + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) { if (skb_has_shared_frag(nskb) && __skb_linearize(nskb)) goto err; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index bc8a9da750fe..0a34b418b83c 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -282,7 +282,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, skb_transport_header(gso_skb))) return ERR_PTR(-EINVAL); - if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { + /* Pass-through to device for segmentation only if: + * 1) we consider the packet checksummable in hardware, that is no + * IPv6 extension headers present, AND + * 2) device supports checksum offload for this network protocol + * (NETIF_F_{IP,IPV6}_CSUM or NETIF_F_HW_CSUM), AND + * 3) device supports the requested GSO kind. + */ + if (gso_skb->ip_summed == CHECKSUM_PARTIAL && + skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { /* Packet is from an untrusted source, reset gso_segs. */ skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), mss);
Jakub Sitnicki wrote: > On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote: > > On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote: > >> > >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no > >> checksum offload") we have intentionally allowed UDP GSO packets marked > >> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and > >> checksummed by a software fallback when the egress device lacks these > >> features. > >> > >> What was not taken into consideration is that a CHECKSUM_NONE skb can be > >> handed over to the GSO stack also when the egress device advertises the > >> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature. > >> > >> This can happen in two situations, which we detect in __ip_append_data() > >> and __ip6_append_data(): > >> > >> 1) when there are IPv6 extension headers present, or > >> 2) when the tunnel device does not advertise checksum offload. > >> > >> Note that in the latter case we have a nonsensical device configuration. > >> Device support for UDP segmentation offload requires checksum offload in > >> hardware as well. > >> > >> Syzbot has discovered the first case, producing a warning as below: > >> > >> ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) > >> WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 > >> Modules linked in: > >> CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 > >> Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 > >> RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 > >> [...] > >> Call Trace: > >> <TASK> > >> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 > >> skb_gso_segment include/net/gso.h:83 [inline] > >> validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 > >> __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 > >> neigh_output include/net/neighbour.h:542 [inline] > >> ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 > >> ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 > >> ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 > >> udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 > >> udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 > >> sock_sendmsg_nosec net/socket.c:730 [inline] > >> __sock_sendmsg+0xef/0x270 net/socket.c:745 > >> ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 > >> ___sys_sendmsg net/socket.c:2639 [inline] > >> __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 > >> __do_sys_sendmmsg net/socket.c:2754 [inline] > >> __se_sys_sendmmsg net/socket.c:2751 [inline] > >> __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 > >> do_syscall_x64 arch/x86/entry/common.c:52 [inline] > >> do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 > >> entry_SYSCALL_64_after_hwframe+0x77/0x7f > >> [...] > >> </TASK> > >> > >> We are hitting the bad offload warning because when an egress device is > >> capable of handling segmentation offload requested by > >> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce > >> any segment skbs and return NULL. See the skb_gso_ok() branch in > >> {__udp,tcp,sctp}_gso_segment helpers. > >> > >> To fix it, skip bad offload detection when gso_segment has returned > >> nothing. We know that in such case the egress device supports the desired > >> GSO offload, which implies that it can fill in L4 checksums. Hence we don't > >> need to check the skb->ip_summed value, which reflects the egress device > >> checksum capabilities. > >> > >> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") > >> Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com > >> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ > >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> > > > > Reviewed-by: Willem de Bruijn <willemb@google.com> > > > > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just > > being ignored and devices are trusted to always be able to checksum > > offload when they can segment offload -- even when the device does not > > advertise checksum offload. > > > > I think we should have a follow-on that makes advertising > > NETIF_F_GSO_UDP_L4 dependent on having at least one of the > > NETIF_F_*_CSUM bits set (handwaving over what happens when only > > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM). > > I agree. I've also gained some clarity as to how the fix should > look. Let's circle back to it, if we still think it's relevant once we > hash out the fix. > > After spending some quality time debugging the addded regression test > [1], I've realized this fix is wrong. > > You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices > with no checksum offload"), I've opened up the UDP_SEGMENT API to two > uses, which I think should not be allowed: > > 1. Hardware USO for IPv6 dgrams with extension headers > > Previously that led to -EIO, because __ip6_append_data won't annotate > such packets as CHECKSUM_PARTIAL. > > I'm guessing that we do this because some drivers that advertise csum > offload can't actually handle checksumming when extension headers are > present. > > Extension headers are not part of IPv6 pseudo header, but who knows what > limitations NIC firmwares have. > > Either way, changing it just like that sounds risky, so I think we need > to fall back to software USO with software checksum in this case. > > Alternatively, we could catch it in the udp layer and error out with EIO > as ealier. But that shifts some burden onto the user space (detect and > segment before sendmsg()). > > 2. Hardware USO when hardware csum is unsupported / disabled > > That sounds like a pathological device configuration case, but since it > is possible today with some drivers to disable csum offload for one IP > version and not the other, it seems safest to just handle that > gracefully. > > I don't know why one might want to do that. Perhaps as a workaround for > some firmware bug while waiting for a fix? I doubt that this is actually used. But today it can be configured. Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum offload (in netdev_fix_features). I doubt that that will break any real user. > In this scenario I think we also need to fall back to software USO and > checksum. > > Code-wise that could look like below. WDYT? Since this only affects USO, can we fix this is in __udp_gso_segment. Basically, not taking the NETIF_F_GSO_ROBUST path. skb_segment is so complicated already. Whatever we can do to avoid adding to that. > [1] I feel silly for submitting a broken patch. I've been running a > stale test prog in a VM. It seems that virtme-ng with a read+write > overlay for rootfs played a trick on me. Changes to the host files are > not (always?) visible to the guest. > > ---8<--- > diff --git a/net/core/skbuff.c b/net/core/skbuff.c > index 83f8cd8aa2d1..819173807c81 100644 > --- a/net/core/skbuff.c > +++ b/net/core/skbuff.c > @@ -4806,7 +4806,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > goto perform_csum_check; > > if (!sg) { > - if (!csum) { > + /* Fall back to software checksum for segments if: > + * 1) device can't checksum this network protocol, OR > + * 2) we consider the packet to be not checksummable in > + * hardware, for example IPv6 extension headers are > + * present. > + */ > + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) { > if (!nskb->remcsum_offload) > nskb->ip_summed = CHECKSUM_NONE; > SKB_GSO_CB(nskb)->csum = > @@ -4896,7 +4902,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb, > nskb->truesize += nskb->data_len; > > perform_csum_check: > - if (!csum) { > + if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) { > if (skb_has_shared_frag(nskb) && > __skb_linearize(nskb)) > goto err; > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index bc8a9da750fe..0a34b418b83c 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -282,7 +282,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb, > skb_transport_header(gso_skb))) > return ERR_PTR(-EINVAL); > > - if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { > + /* Pass-through to device for segmentation only if: > + * 1) we consider the packet checksummable in hardware, that is no > + * IPv6 extension headers present, AND > + * 2) device supports checksum offload for this network protocol > + * (NETIF_F_{IP,IPV6}_CSUM or NETIF_F_HW_CSUM), AND > + * 3) device supports the requested GSO kind. > + */ > + if (gso_skb->ip_summed == CHECKSUM_PARTIAL && > + skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) { > /* Packet is from an untrusted source, reset gso_segs. */ > skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh), > mss);
On Mon, Aug 05, 2024 at 10:25 AM -04, Willem de Bruijn wrote: > Jakub Sitnicki wrote: >> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote: [...] >> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just >> > being ignored and devices are trusted to always be able to checksum >> > offload when they can segment offload -- even when the device does not >> > advertise checksum offload. >> > >> > I think we should have a follow-on that makes advertising >> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the >> > NETIF_F_*_CSUM bits set (handwaving over what happens when only >> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM). >> >> I agree. I've also gained some clarity as to how the fix should >> look. Let's circle back to it, if we still think it's relevant once we >> hash out the fix. >> >> After spending some quality time debugging the addded regression test >> [1], I've realized this fix is wrong. >> >> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices >> with no checksum offload"), I've opened up the UDP_SEGMENT API to two >> uses, which I think should not be allowed: >> >> 1. Hardware USO for IPv6 dgrams with extension headers >> >> Previously that led to -EIO, because __ip6_append_data won't annotate >> such packets as CHECKSUM_PARTIAL. >> >> I'm guessing that we do this because some drivers that advertise csum >> offload can't actually handle checksumming when extension headers are >> present. >> >> Extension headers are not part of IPv6 pseudo header, but who knows what >> limitations NIC firmwares have. >> >> Either way, changing it just like that sounds risky, so I think we need >> to fall back to software USO with software checksum in this case. >> >> Alternatively, we could catch it in the udp layer and error out with EIO >> as ealier. But that shifts some burden onto the user space (detect and >> segment before sendmsg()). >> >> 2. Hardware USO when hardware csum is unsupported / disabled >> >> That sounds like a pathological device configuration case, but since it >> is possible today with some drivers to disable csum offload for one IP >> version and not the other, it seems safest to just handle that >> gracefully. >> >> I don't know why one might want to do that. Perhaps as a workaround for >> some firmware bug while waiting for a fix? > > I doubt that this is actually used. But today it can be configured. > > Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum > offload (in netdev_fix_features). I doubt that that will break any > real user. Sounds like a plan. If we're talking about simply disabling GSO_UDP_L4 whenever either NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM is "off", then that is straightforward. And the NETIF_F_HW_CSUM dependency is clear. I could even piggy back it on this series, at the risk of additional iterations. > >> In this scenario I think we also need to fall back to software USO and >> checksum. >> >> Code-wise that could look like below. WDYT? > > Since this only affects USO, can we fix this is in __udp_gso_segment. > Basically, not taking the NETIF_F_GSO_ROBUST path. > > skb_segment is so complicated already. Whatever we can do to avoid > adding to that. skb_segment is a complex beast. No disagreement there. Keeping the changes down seems doable. We can drive skb_segment to compute the checksum, when we know that's needed (because IPv6 extension headers are present -> ip_summed is CHECKSUM_NONE) by masking off csum flags. Thanks for the suggestion. [...]
Jakub Sitnicki wrote: > On Mon, Aug 05, 2024 at 10:25 AM -04, Willem de Bruijn wrote: > > Jakub Sitnicki wrote: > >> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote: > > [...] > > >> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just > >> > being ignored and devices are trusted to always be able to checksum > >> > offload when they can segment offload -- even when the device does not > >> > advertise checksum offload. > >> > > >> > I think we should have a follow-on that makes advertising > >> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the > >> > NETIF_F_*_CSUM bits set (handwaving over what happens when only > >> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM). > >> > >> I agree. I've also gained some clarity as to how the fix should > >> look. Let's circle back to it, if we still think it's relevant once we > >> hash out the fix. > >> > >> After spending some quality time debugging the addded regression test > >> [1], I've realized this fix is wrong. > >> > >> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices > >> with no checksum offload"), I've opened up the UDP_SEGMENT API to two > >> uses, which I think should not be allowed: > >> > >> 1. Hardware USO for IPv6 dgrams with extension headers > >> > >> Previously that led to -EIO, because __ip6_append_data won't annotate > >> such packets as CHECKSUM_PARTIAL. > >> > >> I'm guessing that we do this because some drivers that advertise csum > >> offload can't actually handle checksumming when extension headers are > >> present. > >> > >> Extension headers are not part of IPv6 pseudo header, but who knows what > >> limitations NIC firmwares have. > >> > >> Either way, changing it just like that sounds risky, so I think we need > >> to fall back to software USO with software checksum in this case. > >> > >> Alternatively, we could catch it in the udp layer and error out with EIO > >> as ealier. But that shifts some burden onto the user space (detect and > >> segment before sendmsg()). > >> > >> 2. Hardware USO when hardware csum is unsupported / disabled > >> > >> That sounds like a pathological device configuration case, but since it > >> is possible today with some drivers to disable csum offload for one IP > >> version and not the other, it seems safest to just handle that > >> gracefully. > >> > >> I don't know why one might want to do that. Perhaps as a workaround for > >> some firmware bug while waiting for a fix? > > > > I doubt that this is actually used. But today it can be configured. > > > > Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum > > offload (in netdev_fix_features). I doubt that that will break any > > real user. > > Sounds like a plan. If we're talking about simply disabling GSO_UDP_L4 > whenever either NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM is "off", then that > is straightforward. And the NETIF_F_HW_CSUM dependency is clear. > > I could even piggy back it on this series, at the risk of additional > iterations. A two patch series SGTM. > > > >> In this scenario I think we also need to fall back to software USO and > >> checksum. > >> > >> Code-wise that could look like below. WDYT? > > > > Since this only affects USO, can we fix this is in __udp_gso_segment. > > Basically, not taking the NETIF_F_GSO_ROBUST path. > > > > skb_segment is so complicated already. Whatever we can do to avoid > > adding to that. > > skb_segment is a complex beast. No disagreement there. > > Keeping the changes down seems doable. We can drive skb_segment to > compute the checksum, when we know that's needed (because IPv6 extension > headers are present -> ip_summed is CHECKSUM_NONE) by masking off csum > flags. Thanks for the suggestion. Perfect. The effort to add a selftest already proved its worth btw :)
diff --git a/net/core/gso.c b/net/core/gso.c index bcd156372f4d..88d5ca7d4fe7 100644 --- a/net/core/gso.c +++ b/net/core/gso.c @@ -122,10 +122,12 @@ struct sk_buff *__skb_gso_segment(struct sk_buff *skb, skb_reset_mac_len(skb); segs = skb_mac_gso_segment(skb, features); + if (IS_ERR_OR_NULL(segs)) + goto out; - if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs))) + if (segs != skb && unlikely(skb_needs_check(skb, tx_path))) skb_warn_bad_offload(skb); - +out: return segs; } EXPORT_SYMBOL(__skb_gso_segment);
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") we have intentionally allowed UDP GSO packets marked CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and checksummed by a software fallback when the egress device lacks these features. What was not taken into consideration is that a CHECKSUM_NONE skb can be handed over to the GSO stack also when the egress device advertises the tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature. This can happen in two situations, which we detect in __ip_append_data() and __ip6_append_data(): 1) when there are IPv6 extension headers present, or 2) when the tunnel device does not advertise checksum offload. Note that in the latter case we have a nonsensical device configuration. Device support for UDP segmentation offload requires checksum offload in hardware as well. Syzbot has discovered the first case, producing a warning as below: ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869) WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 Modules linked in: CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024 RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291 [...] Call Trace: <TASK> __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127 skb_gso_segment include/net/gso.h:83 [inline] validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661 __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415 neigh_output include/net/neighbour.h:542 [inline] ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137 ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222 ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958 udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292 udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588 sock_sendmsg_nosec net/socket.c:730 [inline] __sock_sendmsg+0xef/0x270 net/socket.c:745 ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585 ___sys_sendmsg net/socket.c:2639 [inline] __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725 __do_sys_sendmmsg net/socket.c:2754 [inline] __se_sys_sendmmsg net/socket.c:2751 [inline] __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751 do_syscall_x64 arch/x86/entry/common.c:52 [inline] do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83 entry_SYSCALL_64_after_hwframe+0x77/0x7f [...] </TASK> We are hitting the bad offload warning because when an egress device is capable of handling segmentation offload requested by skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce any segment skbs and return NULL. See the skb_gso_ok() branch in {__udp,tcp,sctp}_gso_segment helpers. To fix it, skip bad offload detection when gso_segment has returned nothing. We know that in such case the egress device supports the desired GSO offload, which implies that it can fill in L4 checksums. Hence we don't need to check the skb->ip_summed value, which reflects the egress device checksum capabilities. Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload") Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/ Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com> --- net/core/gso.c | 6 ++++-- 1 file changed, 4 insertions(+), 2 deletions(-)