diff mbox series

[net,v2,1/2] gso: Skip bad offload detection when device supports requested GSO

Message ID 20240801-udp-gso-egress-from-tunnel-v2-1-9a2af2f15d8d@cloudflare.com (mailing list archive)
State Deferred
Delegated to: Netdev Maintainers
Headers show
Series Silence bad offload warning when sending UDP GSO with IPv6 extension headers | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 2 maintainers not CCed: dsahern@kernel.org soheil@google.com
netdev/build_clang success Errors and warnings before: 43 this patch: 43
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 43 this patch: 43
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 9 this patch: 9
netdev/source_inline success Was 0 now: 0
netdev/contest warning net-next-2024-08-01--21-00 (tests: 346)

Commit Message

Jakub Sitnicki Aug. 1, 2024, 1:52 p.m. UTC
In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
checksum offload") we have intentionally allowed UDP GSO packets marked
CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
checksummed by a software fallback when the egress device lacks these
features.

What was not taken into consideration is that a CHECKSUM_NONE skb can be
handed over to the GSO stack also when the egress device advertises the
tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.

This can happen in two situations, which we detect in __ip_append_data()
and __ip6_append_data():

1) when there are IPv6 extension headers present, or
2) when the tunnel device does not advertise checksum offload.

Note that in the latter case we have a nonsensical device configuration.
Device support for UDP segmentation offload requires checksum offload in
hardware as well.

Syzbot has discovered the first case, producing a warning as below:

  ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
  WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
  Modules linked in:
  CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
  Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
  RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
  [...]
  Call Trace:
   <TASK>
   __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
   skb_gso_segment include/net/gso.h:83 [inline]
   validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
   __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
   neigh_output include/net/neighbour.h:542 [inline]
   ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
   ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
   ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
   udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
   udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
   sock_sendmsg_nosec net/socket.c:730 [inline]
   __sock_sendmsg+0xef/0x270 net/socket.c:745
   ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
   ___sys_sendmsg net/socket.c:2639 [inline]
   __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
   __do_sys_sendmmsg net/socket.c:2754 [inline]
   __se_sys_sendmmsg net/socket.c:2751 [inline]
   __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
   do_syscall_x64 arch/x86/entry/common.c:52 [inline]
   do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
   entry_SYSCALL_64_after_hwframe+0x77/0x7f
   [...]
   </TASK>

We are hitting the bad offload warning because when an egress device is
capable of handling segmentation offload requested by
skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
any segment skbs and return NULL. See the skb_gso_ok() branch in
{__udp,tcp,sctp}_gso_segment helpers.

To fix it, skip bad offload detection when gso_segment has returned
nothing. We know that in such case the egress device supports the desired
GSO offload, which implies that it can fill in L4 checksums. Hence we don't
need to check the skb->ip_summed value, which reflects the egress device
checksum capabilities.

Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
---
 net/core/gso.c | 6 ++++--
 1 file changed, 4 insertions(+), 2 deletions(-)

Comments

Willem de Bruijn Aug. 1, 2024, 7:13 p.m. UTC | #1
On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>
> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> checksum offload") we have intentionally allowed UDP GSO packets marked
> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
> checksummed by a software fallback when the egress device lacks these
> features.
>
> What was not taken into consideration is that a CHECKSUM_NONE skb can be
> handed over to the GSO stack also when the egress device advertises the
> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
>
> This can happen in two situations, which we detect in __ip_append_data()
> and __ip6_append_data():
>
> 1) when there are IPv6 extension headers present, or
> 2) when the tunnel device does not advertise checksum offload.
>
> Note that in the latter case we have a nonsensical device configuration.
> Device support for UDP segmentation offload requires checksum offload in
> hardware as well.
>
> Syzbot has discovered the first case, producing a warning as below:
>
>   ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
>   WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
>   Modules linked in:
>   CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
>   RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
>   [...]
>   Call Trace:
>    <TASK>
>    __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
>    skb_gso_segment include/net/gso.h:83 [inline]
>    validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
>    __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
>    neigh_output include/net/neighbour.h:542 [inline]
>    ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
>    ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
>    ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
>    udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
>    udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
>    sock_sendmsg_nosec net/socket.c:730 [inline]
>    __sock_sendmsg+0xef/0x270 net/socket.c:745
>    ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
>    ___sys_sendmsg net/socket.c:2639 [inline]
>    __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
>    __do_sys_sendmmsg net/socket.c:2754 [inline]
>    __se_sys_sendmmsg net/socket.c:2751 [inline]
>    __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>    [...]
>    </TASK>
>
> We are hitting the bad offload warning because when an egress device is
> capable of handling segmentation offload requested by
> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
> any segment skbs and return NULL. See the skb_gso_ok() branch in
> {__udp,tcp,sctp}_gso_segment helpers.
>
> To fix it, skip bad offload detection when gso_segment has returned
> nothing. We know that in such case the egress device supports the desired
> GSO offload, which implies that it can fill in L4 checksums. Hence we don't
> need to check the skb->ip_summed value, which reflects the egress device
> checksum capabilities.
>
> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
> Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>

Reviewed-by: Willem de Bruijn <willemb@google.com>

It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
being ignored and devices are trusted to always be able to checksum
offload when they can segment offload -- even when the device does not
advertise checksum offload.

I think we should have a follow-on that makes advertising
NETIF_F_GSO_UDP_L4 dependent on having at least one of the
NETIF_F_*_CSUM bits set (handwaving over what happens when only
advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
Jakub Sitnicki Aug. 5, 2024, 10:04 a.m. UTC | #2
On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote:
> On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
>>
>> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
>> checksum offload") we have intentionally allowed UDP GSO packets marked
>> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
>> checksummed by a software fallback when the egress device lacks these
>> features.
>>
>> What was not taken into consideration is that a CHECKSUM_NONE skb can be
>> handed over to the GSO stack also when the egress device advertises the
>> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
>>
>> This can happen in two situations, which we detect in __ip_append_data()
>> and __ip6_append_data():
>>
>> 1) when there are IPv6 extension headers present, or
>> 2) when the tunnel device does not advertise checksum offload.
>>
>> Note that in the latter case we have a nonsensical device configuration.
>> Device support for UDP segmentation offload requires checksum offload in
>> hardware as well.
>>
>> Syzbot has discovered the first case, producing a warning as below:
>>
>>   ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
>>   WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
>>   Modules linked in:
>>   CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
>>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
>>   RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
>>   [...]
>>   Call Trace:
>>    <TASK>
>>    __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
>>    skb_gso_segment include/net/gso.h:83 [inline]
>>    validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
>>    __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
>>    neigh_output include/net/neighbour.h:542 [inline]
>>    ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
>>    ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
>>    ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
>>    udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
>>    udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
>>    sock_sendmsg_nosec net/socket.c:730 [inline]
>>    __sock_sendmsg+0xef/0x270 net/socket.c:745
>>    ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
>>    ___sys_sendmsg net/socket.c:2639 [inline]
>>    __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
>>    __do_sys_sendmmsg net/socket.c:2754 [inline]
>>    __se_sys_sendmmsg net/socket.c:2751 [inline]
>>    __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
>>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
>>    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
>>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
>>    [...]
>>    </TASK>
>>
>> We are hitting the bad offload warning because when an egress device is
>> capable of handling segmentation offload requested by
>> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
>> any segment skbs and return NULL. See the skb_gso_ok() branch in
>> {__udp,tcp,sctp}_gso_segment helpers.
>>
>> To fix it, skip bad offload detection when gso_segment has returned
>> nothing. We know that in such case the egress device supports the desired
>> GSO offload, which implies that it can fill in L4 checksums. Hence we don't
>> need to check the skb->ip_summed value, which reflects the egress device
>> checksum capabilities.
>>
>> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
>> Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
>> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
>> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
>
> Reviewed-by: Willem de Bruijn <willemb@google.com>
>
> It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
> being ignored and devices are trusted to always be able to checksum
> offload when they can segment offload -- even when the device does not
> advertise checksum offload.
>
> I think we should have a follow-on that makes advertising
> NETIF_F_GSO_UDP_L4 dependent on having at least one of the
> NETIF_F_*_CSUM bits set (handwaving over what happens when only
> advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).

I agree. I've also gained some clarity as to how the fix should
look. Let's circle back to it, if we still think it's relevant once we
hash out the fix.

After spending some quality time debugging the addded regression test
[1], I've realized this fix is wrong.

You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices
with no checksum offload"), I've opened up the UDP_SEGMENT API to two
uses, which I think should not be allowed:

1. Hardware USO for IPv6 dgrams with extension headers

Previously that led to -EIO, because __ip6_append_data won't annotate
such packets as CHECKSUM_PARTIAL.

I'm guessing that we do this because some drivers that advertise csum
offload can't actually handle checksumming when extension headers are
present.

Extension headers are not part of IPv6 pseudo header, but who knows what
limitations NIC firmwares have.

Either way, changing it just like that sounds risky, so I think we need
to fall back to software USO with software checksum in this case.

Alternatively, we could catch it in the udp layer and error out with EIO
as ealier. But that shifts some burden onto the user space (detect and
segment before sendmsg()).

2. Hardware USO when hardware csum is unsupported / disabled

That sounds like a pathological device configuration case, but since it
is possible today with some drivers to disable csum offload for one IP
version and not the other, it seems safest to just handle that
gracefully.

I don't know why one might want to do that. Perhaps as a workaround for
some firmware bug while waiting for a fix?

In this scenario I think we also need to fall back to software USO and
checksum.

Code-wise that could look like below. WDYT?

[1] I feel silly for submitting a broken patch. I've been running a
stale test prog in a VM. It seems that virtme-ng with a read+write
overlay for rootfs played a trick on me. Changes to the host files are
not (always?) visible to the guest.

---8<---
diff --git a/net/core/skbuff.c b/net/core/skbuff.c
index 83f8cd8aa2d1..819173807c81 100644
--- a/net/core/skbuff.c
+++ b/net/core/skbuff.c
@@ -4806,7 +4806,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 			goto perform_csum_check;
 
 		if (!sg) {
-			if (!csum) {
+			/* Fall back to software checksum for segments if:
+			 * 1) device can't checksum this network protocol, OR
+			 * 2) we consider the packet to be not checksummable in
+			 *    hardware, for example IPv6 extension headers are
+			 *    present.
+			 */
+			if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
 				if (!nskb->remcsum_offload)
 					nskb->ip_summed = CHECKSUM_NONE;
 				SKB_GSO_CB(nskb)->csum =
@@ -4896,7 +4902,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
 		nskb->truesize += nskb->data_len;
 
 perform_csum_check:
-		if (!csum) {
+		if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
 			if (skb_has_shared_frag(nskb) &&
 			    __skb_linearize(nskb))
 				goto err;
diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index bc8a9da750fe..0a34b418b83c 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -282,7 +282,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
 		     skb_transport_header(gso_skb)))
 		return ERR_PTR(-EINVAL);
 
-	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
+	/* Pass-through to device for segmentation only if:
+	 * 1) we consider the packet checksummable in hardware, that is no
+	 *    IPv6 extension headers present, AND
+	 * 2) device supports checksum offload for this network protocol
+	 *    (NETIF_F_{IP,IPV6}_CSUM or NETIF_F_HW_CSUM), AND
+	 * 3) device supports the requested GSO kind.
+	 */
+	if (gso_skb->ip_summed == CHECKSUM_PARTIAL &&
+	    skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
 		/* Packet is from an untrusted source, reset gso_segs. */
 		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
 							     mss);
Willem de Bruijn Aug. 5, 2024, 2:25 p.m. UTC | #3
Jakub Sitnicki wrote:
> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote:
> > On Thu, Aug 1, 2024 at 10:09 AM Jakub Sitnicki <jakub@cloudflare.com> wrote:
> >>
> >> In commit 10154dbded6d ("udp: Allow GSO transmit from devices with no
> >> checksum offload") we have intentionally allowed UDP GSO packets marked
> >> CHECKSUM_NONE to pass to the GSO stack, so that they can be segmented and
> >> checksummed by a software fallback when the egress device lacks these
> >> features.
> >>
> >> What was not taken into consideration is that a CHECKSUM_NONE skb can be
> >> handed over to the GSO stack also when the egress device advertises the
> >> tx-udp-segmentation / NETIF_F_GSO_UDP_L4 feature.
> >>
> >> This can happen in two situations, which we detect in __ip_append_data()
> >> and __ip6_append_data():
> >>
> >> 1) when there are IPv6 extension headers present, or
> >> 2) when the tunnel device does not advertise checksum offload.
> >>
> >> Note that in the latter case we have a nonsensical device configuration.
> >> Device support for UDP segmentation offload requires checksum offload in
> >> hardware as well.
> >>
> >> Syzbot has discovered the first case, producing a warning as below:
> >>
> >>   ip6tnl0: caps=(0x00000006401d7869, 0x00000006401d7869)
> >>   WARNING: CPU: 0 PID: 5112 at net/core/dev.c:3293 skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
> >>   Modules linked in:
> >>   CPU: 0 PID: 5112 Comm: syz-executor391 Not tainted 6.10.0-rc7-syzkaller-01603-g80ab5445da62 #0
> >>   Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 06/07/2024
> >>   RIP: 0010:skb_warn_bad_offload+0x166/0x1a0 net/core/dev.c:3291
> >>   [...]
> >>   Call Trace:
> >>    <TASK>
> >>    __skb_gso_segment+0x3be/0x4c0 net/core/gso.c:127
> >>    skb_gso_segment include/net/gso.h:83 [inline]
> >>    validate_xmit_skb+0x585/0x1120 net/core/dev.c:3661
> >>    __dev_queue_xmit+0x17a4/0x3e90 net/core/dev.c:4415
> >>    neigh_output include/net/neighbour.h:542 [inline]
> >>    ip6_finish_output2+0xffa/0x1680 net/ipv6/ip6_output.c:137
> >>    ip6_finish_output+0x41e/0x810 net/ipv6/ip6_output.c:222
> >>    ip6_send_skb+0x112/0x230 net/ipv6/ip6_output.c:1958
> >>    udp_v6_send_skb+0xbf5/0x1870 net/ipv6/udp.c:1292
> >>    udpv6_sendmsg+0x23b3/0x3270 net/ipv6/udp.c:1588
> >>    sock_sendmsg_nosec net/socket.c:730 [inline]
> >>    __sock_sendmsg+0xef/0x270 net/socket.c:745
> >>    ____sys_sendmsg+0x525/0x7d0 net/socket.c:2585
> >>    ___sys_sendmsg net/socket.c:2639 [inline]
> >>    __sys_sendmmsg+0x3b2/0x740 net/socket.c:2725
> >>    __do_sys_sendmmsg net/socket.c:2754 [inline]
> >>    __se_sys_sendmmsg net/socket.c:2751 [inline]
> >>    __x64_sys_sendmmsg+0xa0/0xb0 net/socket.c:2751
> >>    do_syscall_x64 arch/x86/entry/common.c:52 [inline]
> >>    do_syscall_64+0xf3/0x230 arch/x86/entry/common.c:83
> >>    entry_SYSCALL_64_after_hwframe+0x77/0x7f
> >>    [...]
> >>    </TASK>
> >>
> >> We are hitting the bad offload warning because when an egress device is
> >> capable of handling segmentation offload requested by
> >> skb_shinfo(skb)->gso_type, the chain of gso_segment callbacks won't produce
> >> any segment skbs and return NULL. See the skb_gso_ok() branch in
> >> {__udp,tcp,sctp}_gso_segment helpers.
> >>
> >> To fix it, skip bad offload detection when gso_segment has returned
> >> nothing. We know that in such case the egress device supports the desired
> >> GSO offload, which implies that it can fill in L4 checksums. Hence we don't
> >> need to check the skb->ip_summed value, which reflects the egress device
> >> checksum capabilities.
> >>
> >> Fixes: 10154dbded6d ("udp: Allow GSO transmit from devices with no checksum offload")
> >> Reported-by: syzbot+e15b7e15b8a751a91d9a@syzkaller.appspotmail.com
> >> Closes: https://lore.kernel.org/all/000000000000e1609a061d5330ce@google.com/
> >> Signed-off-by: Jakub Sitnicki <jakub@cloudflare.com>
> >
> > Reviewed-by: Willem de Bruijn <willemb@google.com>
> >
> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
> > being ignored and devices are trusted to always be able to checksum
> > offload when they can segment offload -- even when the device does not
> > advertise checksum offload.
> >
> > I think we should have a follow-on that makes advertising
> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the
> > NETIF_F_*_CSUM bits set (handwaving over what happens when only
> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
> 
> I agree. I've also gained some clarity as to how the fix should
> look. Let's circle back to it, if we still think it's relevant once we
> hash out the fix.
> 
> After spending some quality time debugging the addded regression test
> [1], I've realized this fix is wrong.
> 
> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices
> with no checksum offload"), I've opened up the UDP_SEGMENT API to two
> uses, which I think should not be allowed:
> 
> 1. Hardware USO for IPv6 dgrams with extension headers
> 
> Previously that led to -EIO, because __ip6_append_data won't annotate
> such packets as CHECKSUM_PARTIAL.
> 
> I'm guessing that we do this because some drivers that advertise csum
> offload can't actually handle checksumming when extension headers are
> present.
> 
> Extension headers are not part of IPv6 pseudo header, but who knows what
> limitations NIC firmwares have.
> 
> Either way, changing it just like that sounds risky, so I think we need
> to fall back to software USO with software checksum in this case.
> 
> Alternatively, we could catch it in the udp layer and error out with EIO
> as ealier. But that shifts some burden onto the user space (detect and
> segment before sendmsg()).
> 
> 2. Hardware USO when hardware csum is unsupported / disabled
> 
> That sounds like a pathological device configuration case, but since it
> is possible today with some drivers to disable csum offload for one IP
> version and not the other, it seems safest to just handle that
> gracefully.
> 
> I don't know why one might want to do that. Perhaps as a workaround for
> some firmware bug while waiting for a fix?

I doubt that this is actually used. But today it can be configured.

Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum
offload (in netdev_fix_features). I doubt that that will break any
real user.
 
> In this scenario I think we also need to fall back to software USO and
> checksum.
> 
> Code-wise that could look like below. WDYT?

Since this only affects USO, can we fix this is in __udp_gso_segment.
Basically, not taking the NETIF_F_GSO_ROBUST path.

skb_segment is so complicated already. Whatever we can do to avoid
adding to that.

> [1] I feel silly for submitting a broken patch. I've been running a
> stale test prog in a VM. It seems that virtme-ng with a read+write
> overlay for rootfs played a trick on me. Changes to the host files are
> not (always?) visible to the guest.
> 
> ---8<---
> diff --git a/net/core/skbuff.c b/net/core/skbuff.c
> index 83f8cd8aa2d1..819173807c81 100644
> --- a/net/core/skbuff.c
> +++ b/net/core/skbuff.c
> @@ -4806,7 +4806,13 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  			goto perform_csum_check;
>  
>  		if (!sg) {
> -			if (!csum) {
> +			/* Fall back to software checksum for segments if:
> +			 * 1) device can't checksum this network protocol, OR
> +			 * 2) we consider the packet to be not checksummable in
> +			 *    hardware, for example IPv6 extension headers are
> +			 *    present.
> +			 */
> +			if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
>  				if (!nskb->remcsum_offload)
>  					nskb->ip_summed = CHECKSUM_NONE;
>  				SKB_GSO_CB(nskb)->csum =
> @@ -4896,7 +4902,7 @@ struct sk_buff *skb_segment(struct sk_buff *head_skb,
>  		nskb->truesize += nskb->data_len;
>  
>  perform_csum_check:
> -		if (!csum) {
> +		if (!csum || head_skb->ip_summed != CHECKSUM_PARTIAL) {
>  			if (skb_has_shared_frag(nskb) &&
>  			    __skb_linearize(nskb))
>  				goto err;
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index bc8a9da750fe..0a34b418b83c 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -282,7 +282,15 @@ struct sk_buff *__udp_gso_segment(struct sk_buff *gso_skb,
>  		     skb_transport_header(gso_skb)))
>  		return ERR_PTR(-EINVAL);
>  
> -	if (skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
> +	/* Pass-through to device for segmentation only if:
> +	 * 1) we consider the packet checksummable in hardware, that is no
> +	 *    IPv6 extension headers present, AND
> +	 * 2) device supports checksum offload for this network protocol
> +	 *    (NETIF_F_{IP,IPV6}_CSUM or NETIF_F_HW_CSUM), AND
> +	 * 3) device supports the requested GSO kind.
> +	 */
> +	if (gso_skb->ip_summed == CHECKSUM_PARTIAL &&
> +	    skb_gso_ok(gso_skb, features | NETIF_F_GSO_ROBUST)) {
>  		/* Packet is from an untrusted source, reset gso_segs. */
>  		skb_shinfo(gso_skb)->gso_segs = DIV_ROUND_UP(gso_skb->len - sizeof(*uh),
>  							     mss);
Jakub Sitnicki Aug. 5, 2024, 3:59 p.m. UTC | #4
On Mon, Aug 05, 2024 at 10:25 AM -04, Willem de Bruijn wrote:
> Jakub Sitnicki wrote:
>> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote:

[...]

>> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
>> > being ignored and devices are trusted to always be able to checksum
>> > offload when they can segment offload -- even when the device does not
>> > advertise checksum offload.
>> >
>> > I think we should have a follow-on that makes advertising
>> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the
>> > NETIF_F_*_CSUM bits set (handwaving over what happens when only
>> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
>> 
>> I agree. I've also gained some clarity as to how the fix should
>> look. Let's circle back to it, if we still think it's relevant once we
>> hash out the fix.
>> 
>> After spending some quality time debugging the addded regression test
>> [1], I've realized this fix is wrong.
>> 
>> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices
>> with no checksum offload"), I've opened up the UDP_SEGMENT API to two
>> uses, which I think should not be allowed:
>> 
>> 1. Hardware USO for IPv6 dgrams with extension headers
>> 
>> Previously that led to -EIO, because __ip6_append_data won't annotate
>> such packets as CHECKSUM_PARTIAL.
>> 
>> I'm guessing that we do this because some drivers that advertise csum
>> offload can't actually handle checksumming when extension headers are
>> present.
>> 
>> Extension headers are not part of IPv6 pseudo header, but who knows what
>> limitations NIC firmwares have.
>> 
>> Either way, changing it just like that sounds risky, so I think we need
>> to fall back to software USO with software checksum in this case.
>> 
>> Alternatively, we could catch it in the udp layer and error out with EIO
>> as ealier. But that shifts some burden onto the user space (detect and
>> segment before sendmsg()).
>> 
>> 2. Hardware USO when hardware csum is unsupported / disabled
>> 
>> That sounds like a pathological device configuration case, but since it
>> is possible today with some drivers to disable csum offload for one IP
>> version and not the other, it seems safest to just handle that
>> gracefully.
>> 
>> I don't know why one might want to do that. Perhaps as a workaround for
>> some firmware bug while waiting for a fix?
>
> I doubt that this is actually used. But today it can be configured.
>
> Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum
> offload (in netdev_fix_features). I doubt that that will break any
> real user.

Sounds like a plan. If we're talking about simply disabling GSO_UDP_L4
whenever either NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM is "off", then that
is straightforward. And the NETIF_F_HW_CSUM dependency is clear.

I could even piggy back it on this series, at the risk of additional
iterations.

>  
>> In this scenario I think we also need to fall back to software USO and
>> checksum.
>> 
>> Code-wise that could look like below. WDYT?
>
> Since this only affects USO, can we fix this is in __udp_gso_segment.
> Basically, not taking the NETIF_F_GSO_ROBUST path.
>
> skb_segment is so complicated already. Whatever we can do to avoid
> adding to that.

skb_segment is a complex beast. No disagreement there.

Keeping the changes down seems doable. We can drive skb_segment to
compute the checksum, when we know that's needed (because IPv6 extension
headers are present -> ip_summed is CHECKSUM_NONE) by masking off csum
flags. Thanks for the suggestion.

[...]
Willem de Bruijn Aug. 5, 2024, 4:11 p.m. UTC | #5
Jakub Sitnicki wrote:
> On Mon, Aug 05, 2024 at 10:25 AM -04, Willem de Bruijn wrote:
> > Jakub Sitnicki wrote:
> >> On Thu, Aug 01, 2024 at 03:13 PM -04, Willem de Bruijn wrote:
> 
> [...]
> 
> >> > It's a bit odd, in that the ip_summed == CHECKSUM_NONE ends up just
> >> > being ignored and devices are trusted to always be able to checksum
> >> > offload when they can segment offload -- even when the device does not
> >> > advertise checksum offload.
> >> >
> >> > I think we should have a follow-on that makes advertising
> >> > NETIF_F_GSO_UDP_L4 dependent on having at least one of the
> >> > NETIF_F_*_CSUM bits set (handwaving over what happens when only
> >> > advertising NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM).
> >> 
> >> I agree. I've also gained some clarity as to how the fix should
> >> look. Let's circle back to it, if we still think it's relevant once we
> >> hash out the fix.
> >> 
> >> After spending some quality time debugging the addded regression test
> >> [1], I've realized this fix is wrong.
> >> 
> >> You see, with commit 10154dbded6d ("udp: Allow GSO transmit from devices
> >> with no checksum offload"), I've opened up the UDP_SEGMENT API to two
> >> uses, which I think should not be allowed:
> >> 
> >> 1. Hardware USO for IPv6 dgrams with extension headers
> >> 
> >> Previously that led to -EIO, because __ip6_append_data won't annotate
> >> such packets as CHECKSUM_PARTIAL.
> >> 
> >> I'm guessing that we do this because some drivers that advertise csum
> >> offload can't actually handle checksumming when extension headers are
> >> present.
> >> 
> >> Extension headers are not part of IPv6 pseudo header, but who knows what
> >> limitations NIC firmwares have.
> >> 
> >> Either way, changing it just like that sounds risky, so I think we need
> >> to fall back to software USO with software checksum in this case.
> >> 
> >> Alternatively, we could catch it in the udp layer and error out with EIO
> >> as ealier. But that shifts some burden onto the user space (detect and
> >> segment before sendmsg()).
> >> 
> >> 2. Hardware USO when hardware csum is unsupported / disabled
> >> 
> >> That sounds like a pathological device configuration case, but since it
> >> is possible today with some drivers to disable csum offload for one IP
> >> version and not the other, it seems safest to just handle that
> >> gracefully.
> >> 
> >> I don't know why one might want to do that. Perhaps as a workaround for
> >> some firmware bug while waiting for a fix?
> >
> > I doubt that this is actually used. But today it can be configured.
> >
> > Which is why I suggested making NETIF_F_GSO_UDP_L4 dependent on csum
> > offload (in netdev_fix_features). I doubt that that will break any
> > real user.
> 
> Sounds like a plan. If we're talking about simply disabling GSO_UDP_L4
> whenever either NETIF_F_IP_CSUM or NETIF_F_IPV6_CSUM is "off", then that
> is straightforward. And the NETIF_F_HW_CSUM dependency is clear.
> 
> I could even piggy back it on this series, at the risk of additional
> iterations.

A two patch series SGTM.

> >  
> >> In this scenario I think we also need to fall back to software USO and
> >> checksum.
> >> 
> >> Code-wise that could look like below. WDYT?
> >
> > Since this only affects USO, can we fix this is in __udp_gso_segment.
> > Basically, not taking the NETIF_F_GSO_ROBUST path.
> >
> > skb_segment is so complicated already. Whatever we can do to avoid
> > adding to that.
> 
> skb_segment is a complex beast. No disagreement there.
> 
> Keeping the changes down seems doable. We can drive skb_segment to
> compute the checksum, when we know that's needed (because IPv6 extension
> headers are present -> ip_summed is CHECKSUM_NONE) by masking off csum
> flags. Thanks for the suggestion.

Perfect.

The effort to add a selftest already proved its worth btw :)
diff mbox series

Patch

diff --git a/net/core/gso.c b/net/core/gso.c
index bcd156372f4d..88d5ca7d4fe7 100644
--- a/net/core/gso.c
+++ b/net/core/gso.c
@@ -122,10 +122,12 @@  struct sk_buff *__skb_gso_segment(struct sk_buff *skb,
 	skb_reset_mac_len(skb);
 
 	segs = skb_mac_gso_segment(skb, features);
+	if (IS_ERR_OR_NULL(segs))
+		goto out;
 
-	if (segs != skb && unlikely(skb_needs_check(skb, tx_path) && !IS_ERR(segs)))
+	if (segs != skb && unlikely(skb_needs_check(skb, tx_path)))
 		skb_warn_bad_offload(skb);
-
+out:
 	return segs;
 }
 EXPORT_SYMBOL(__skb_gso_segment);