diff mbox series

[ipsec,v2] xfrm: check MAC header is shown with both skb->mac_len and skb_mac_header_was_set()

Message ID 20240912071702.221128-1-en-wei.wu@canonical.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [ipsec,v2] xfrm: check MAC header is shown with both skb->mac_len and skb_mac_header_was_set() | expand

Checks

Context Check Description
netdev/series_format warning Single patches do not need cover letters; Target tree name not specified in the subject
netdev/tree_selection success Guessed tree name to be net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 7 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 19 this patch: 19
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

En-Wei Wu Sept. 12, 2024, 7:17 a.m. UTC
When we use Intel WWAN with xfrm, our system always hangs after
browsing websites for a few seconds. The error message shows that
it is a slab-out-of-bounds error:

[ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740
[ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26

[ 67.162043] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted 6.11.0-rc6-c763c4339688+ #2
[ 67.162053] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024
[ 67.162058] Call Trace:
[ 67.162062] <TASK>
[ 67.162068] dump_stack_lvl+0x76/0xa0
[ 67.162079] print_report+0xce/0x5f0
[ 67.162088] ? xfrm_input+0x426e/0x6740
[ 67.162096] ? kasan_complete_mode_report_info+0x26/0x200
[ 67.162105] ? xfrm_input+0x426e/0x6740
[ 67.162112] kasan_report+0xbe/0x110
[ 67.162119] ? xfrm_input+0x426e/0x6740
[ 67.162129] __asan_report_store_n_noabort+0x12/0x30
[ 67.162138] xfrm_input+0x426e/0x6740
[ 67.162149] ? __pfx_xfrm_input+0x10/0x10
[ 67.162160] ? __kasan_check_read+0x11/0x20
[ 67.162168] ? __call_rcu_common+0x3e7/0x15b0
[ 67.162178] xfrm4_rcv_encap+0x214/0x470
[ 67.162186] ? __xfrm4_udp_encap_rcv.part.0+0x3cd/0x560
[ 67.162195] xfrm4_udp_encap_rcv+0xdd/0xf0
[ 67.162203] udp_queue_rcv_one_skb+0x880/0x12f0
[ 67.162212] udp_queue_rcv_skb+0x139/0xa90
[ 67.162221] udp_unicast_rcv_skb+0x116/0x350
[ 67.162229] __udp4_lib_rcv+0x213b/0x3410
[ 67.162237] ? ldsem_down_write+0x211/0x4ed
[ 67.162246] ? __pfx___udp4_lib_rcv+0x10/0x10
[ 67.162254] ? __pfx_raw_local_deliver+0x10/0x10
[ 67.162262] ? __pfx_cache_tag_flush_range_np+0x10/0x10
[ 67.162273] udp_rcv+0x86/0xb0
[ 67.162280] ip_protocol_deliver_rcu+0x152/0x380
[ 67.162289] ip_local_deliver_finish+0x282/0x370
[ 67.162296] ip_local_deliver+0x1a8/0x380
[ 67.162303] ? __pfx_ip_local_deliver+0x10/0x10
[ 67.162310] ? ip_rcv_finish_core.constprop.0+0x481/0x1ce0
[ 67.162317] ? ip_rcv_core+0x5df/0xd60
[ 67.162325] ip_rcv+0x2fc/0x380
[ 67.162332] ? __pfx_ip_rcv+0x10/0x10
[ 67.162338] ? __pfx_dma_map_page_attrs+0x10/0x10
[ 67.162346] ? __kasan_check_write+0x14/0x30
[ 67.162354] ? __build_skb_around+0x23a/0x350
[ 67.162363] ? __pfx_ip_rcv+0x10/0x10
[ 67.162369] __netif_receive_skb_one_core+0x173/0x1d0
[ 67.162377] ? __pfx___netif_receive_skb_one_core+0x10/0x10
[ 67.162386] ? __kasan_check_write+0x14/0x30
[ 67.162394] ? _raw_spin_lock_irq+0x8b/0x100
[ 67.162402] __netif_receive_skb+0x21/0x160
[ 67.162409] process_backlog+0x1c0/0x590
[ 67.162417] __napi_poll+0xab/0x550
[ 67.162425] net_rx_action+0x53e/0xd10
[ 67.162434] ? __pfx_net_rx_action+0x10/0x10
[ 67.162443] ? __pfx_wake_up_var+0x10/0x10
[ 67.162453] ? tasklet_action_common.constprop.0+0x22c/0x670
[ 67.162463] handle_softirqs+0x18f/0x5d0
[ 67.162472] ? __pfx_run_ksoftirqd+0x10/0x10
[ 67.162480] run_ksoftirqd+0x3c/0x60
[ 67.162487] smpboot_thread_fn+0x2f3/0x700
[ 67.162497] kthread+0x2b5/0x390
[ 67.162505] ? __pfx_smpboot_thread_fn+0x10/0x10
[ 67.162512] ? __pfx_kthread+0x10/0x10
[ 67.162519] ret_from_fork+0x43/0x90
[ 67.162527] ? __pfx_kthread+0x10/0x10
[ 67.162534] ret_from_fork_asm+0x1a/0x30
[ 67.162544] </TASK>

[ 67.162551] The buggy address belongs to the object at ffff888156cb8000
                which belongs to the cache kmalloc-rnd-09-8k of size 8192
[ 67.162557] The buggy address is located 331 bytes inside of
                allocated 8192-byte region [ffff888156cb8000, ffff888156cba000)

[ 67.162566] The buggy address belongs to the physical page:
[ 67.162570] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x156cb8
[ 67.162578] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
[ 67.162583] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
[ 67.162591] page_type: 0xfdffffff(slab)
[ 67.162599] raw: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000
[ 67.162605] raw: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000
[ 67.162611] head: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000
[ 67.162616] head: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000
[ 67.162621] head: 0017ffffc0000003 ffffea00055b2e01 ffffffffffffffff 0000000000000000
[ 67.162626] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
[ 67.162630] page dumped because: kasan: bad access detected

[ 67.162636] Memory state around the buggy address:
[ 67.162640] ffff888156cb8000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.162645] ffff888156cb8080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.162650] >ffff888156cb8100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.162653] ^
[ 67.162658] ffff888156cb8180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
[ 67.162663] ffff888156cb8200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc

The reason is that the eth_hdr(skb) inside if statement evaluated
to an unexpected address with skb->mac_header = ~0U (indicating there
is no MAC header). The unreliability of skb->mac_len causes the if
statement to become true even if there is no MAC header inside the
skb data buffer.

Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue.

Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto")
Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com>
---
Changes in v2:
* Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown"
* Remain skb->mac_len check
* Apply fix on ipv6 path too
---
 net/xfrm/xfrm_input.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Sept. 12, 2024, 7:35 a.m. UTC | #1
On Thu, Sep 12, 2024 at 9:17 AM En-Wei Wu <en-wei.wu@canonical.com> wrote:
>
> When we use Intel WWAN with xfrm, our system always hangs after
> browsing websites for a few seconds. The error message shows that
> it is a slab-out-of-bounds error:
>
> [ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740
> [ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26
>
> The reason is that the eth_hdr(skb) inside if statement evaluated
> to an unexpected address with skb->mac_header = ~0U (indicating there
> is no MAC header). The unreliability of skb->mac_len causes the if
> statement to become true even if there is no MAC header inside the
> skb data buffer.
>
> Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue.
>
> Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto")
> Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com>
> ---
> Changes in v2:
> * Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown"
> * Remain skb->mac_len check
> * Apply fix on ipv6 path too
> ---
>  net/xfrm/xfrm_input.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 749e7eea99e4..eef0145c73a7 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -251,7 +251,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)
>
>         skb_reset_network_header(skb);
>         skb_mac_header_rebuild(skb);
> -       if (skb->mac_len)
> +       if (skb->mac_len && skb_mac_header_was_set(skb))
>                 eth_hdr(skb)->h_proto = skb->protocol;

I would swap the two conditions :
We might in the future debug kernels leave mac_len uninitialized if
mac_header was never set.

It would be nice to catch the issue sooner.
Something is calling skb_reset_mac_len() while the mac_header was not set ?
Considering the stack trace, I can not see why mac_header is not set.
Could you try the following patch, and compile your test kernel with
CONFIG_DEBUG_NET=y ?

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 39f1d16f362887821caa022464695c4045461493..fb06dc81039253bafeb49f0b7228748e898f480f
100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2909,9 +2909,19 @@ static inline void
skb_reset_inner_headers(struct sk_buff *skb)
        skb->inner_transport_header = skb->transport_header;
 }

+static inline int skb_mac_header_was_set(const struct sk_buff *skb)
+{
+       return skb->mac_header != (typeof(skb->mac_header))~0U;
+}
+
 static inline void skb_reset_mac_len(struct sk_buff *skb)
 {
-       skb->mac_len = skb->network_header - skb->mac_header;
+       if (!skb_mac_header_was_set(skb)) {
+               DEBUG_NET_WARN_ON_ONCE(1);
+               skb->mac_len = 0;
+       } else {
+               skb->mac_len = skb->network_header - skb->mac_header;
+       }
 }

 static inline unsigned char *skb_inner_transport_header(const struct sk_buff
@@ -3014,11 +3024,6 @@ static inline void
skb_set_network_header(struct sk_buff *skb, const int offset)
        skb->network_header += offset;
 }

-static inline int skb_mac_header_was_set(const struct sk_buff *skb)
-{
-       return skb->mac_header != (typeof(skb->mac_header))~0U;
-}
-
 static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
 {
        DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
Peter Seiderer Sept. 12, 2024, 9:35 a.m. UTC | #2
Hello *,

On Thu, 12 Sep 2024 15:17:02 +0800, En-Wei Wu <en-wei.wu@canonical.com> wrote:

> When we use Intel WWAN with xfrm, our system always hangs after
> browsing websites for a few seconds. The error message shows that
> it is a slab-out-of-bounds error:
>
> [ 67.162014] BUG: KASAN: slab-out-of-bounds in xfrm_input+0x426e/0x6740
> [ 67.162030] Write of size 2 at addr ffff888156cb814b by task ksoftirqd/2/26
>
> [ 67.162043] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted 6.11.0-rc6-c763c4339688+ #2
> [ 67.162053] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS 1.15.0 07/15/2024
> [ 67.162058] Call Trace:
> [ 67.162062] <TASK>
> [ 67.162068] dump_stack_lvl+0x76/0xa0
> [ 67.162079] print_report+0xce/0x5f0
> [ 67.162088] ? xfrm_input+0x426e/0x6740
> [ 67.162096] ? kasan_complete_mode_report_info+0x26/0x200
> [ 67.162105] ? xfrm_input+0x426e/0x6740
> [ 67.162112] kasan_report+0xbe/0x110
> [ 67.162119] ? xfrm_input+0x426e/0x6740
> [ 67.162129] __asan_report_store_n_noabort+0x12/0x30
> [ 67.162138] xfrm_input+0x426e/0x6740
> [ 67.162149] ? __pfx_xfrm_input+0x10/0x10
> [ 67.162160] ? __kasan_check_read+0x11/0x20
> [ 67.162168] ? __call_rcu_common+0x3e7/0x15b0
> [ 67.162178] xfrm4_rcv_encap+0x214/0x470
> [ 67.162186] ? __xfrm4_udp_encap_rcv.part.0+0x3cd/0x560
> [ 67.162195] xfrm4_udp_encap_rcv+0xdd/0xf0
> [ 67.162203] udp_queue_rcv_one_skb+0x880/0x12f0
> [ 67.162212] udp_queue_rcv_skb+0x139/0xa90
> [ 67.162221] udp_unicast_rcv_skb+0x116/0x350
> [ 67.162229] __udp4_lib_rcv+0x213b/0x3410
> [ 67.162237] ? ldsem_down_write+0x211/0x4ed
> [ 67.162246] ? __pfx___udp4_lib_rcv+0x10/0x10
> [ 67.162254] ? __pfx_raw_local_deliver+0x10/0x10
> [ 67.162262] ? __pfx_cache_tag_flush_range_np+0x10/0x10
> [ 67.162273] udp_rcv+0x86/0xb0
> [ 67.162280] ip_protocol_deliver_rcu+0x152/0x380
> [ 67.162289] ip_local_deliver_finish+0x282/0x370
> [ 67.162296] ip_local_deliver+0x1a8/0x380
> [ 67.162303] ? __pfx_ip_local_deliver+0x10/0x10
> [ 67.162310] ? ip_rcv_finish_core.constprop.0+0x481/0x1ce0
> [ 67.162317] ? ip_rcv_core+0x5df/0xd60
> [ 67.162325] ip_rcv+0x2fc/0x380
> [ 67.162332] ? __pfx_ip_rcv+0x10/0x10
> [ 67.162338] ? __pfx_dma_map_page_attrs+0x10/0x10
> [ 67.162346] ? __kasan_check_write+0x14/0x30
> [ 67.162354] ? __build_skb_around+0x23a/0x350
> [ 67.162363] ? __pfx_ip_rcv+0x10/0x10
> [ 67.162369] __netif_receive_skb_one_core+0x173/0x1d0
> [ 67.162377] ? __pfx___netif_receive_skb_one_core+0x10/0x10
> [ 67.162386] ? __kasan_check_write+0x14/0x30
> [ 67.162394] ? _raw_spin_lock_irq+0x8b/0x100
> [ 67.162402] __netif_receive_skb+0x21/0x160
> [ 67.162409] process_backlog+0x1c0/0x590
> [ 67.162417] __napi_poll+0xab/0x550
> [ 67.162425] net_rx_action+0x53e/0xd10
> [ 67.162434] ? __pfx_net_rx_action+0x10/0x10
> [ 67.162443] ? __pfx_wake_up_var+0x10/0x10
> [ 67.162453] ? tasklet_action_common.constprop.0+0x22c/0x670
> [ 67.162463] handle_softirqs+0x18f/0x5d0
> [ 67.162472] ? __pfx_run_ksoftirqd+0x10/0x10
> [ 67.162480] run_ksoftirqd+0x3c/0x60
> [ 67.162487] smpboot_thread_fn+0x2f3/0x700
> [ 67.162497] kthread+0x2b5/0x390
> [ 67.162505] ? __pfx_smpboot_thread_fn+0x10/0x10
> [ 67.162512] ? __pfx_kthread+0x10/0x10
> [ 67.162519] ret_from_fork+0x43/0x90
> [ 67.162527] ? __pfx_kthread+0x10/0x10
> [ 67.162534] ret_from_fork_asm+0x1a/0x30
> [ 67.162544] </TASK>
>
> [ 67.162551] The buggy address belongs to the object at ffff888156cb8000
>                 which belongs to the cache kmalloc-rnd-09-8k of size 8192
> [ 67.162557] The buggy address is located 331 bytes inside of
>                 allocated 8192-byte region [ffff888156cb8000, ffff888156cba000)
>
> [ 67.162566] The buggy address belongs to the physical page:
> [ 67.162570] page: refcount:1 mapcount:0 mapping:0000000000000000 index:0x0 pfn:0x156cb8
> [ 67.162578] head: order:3 mapcount:0 entire_mapcount:0 nr_pages_mapped:0 pincount:0
> [ 67.162583] flags: 0x17ffffc0000040(head|node=0|zone=2|lastcpupid=0x1fffff)
> [ 67.162591] page_type: 0xfdffffff(slab)
> [ 67.162599] raw: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000
> [ 67.162605] raw: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000
> [ 67.162611] head: 0017ffffc0000040 ffff888100056780 dead000000000122 0000000000000000
> [ 67.162616] head: 0000000000000000 0000000080020002 00000001fdffffff 0000000000000000
> [ 67.162621] head: 0017ffffc0000003 ffffea00055b2e01 ffffffffffffffff 0000000000000000
> [ 67.162626] head: 0000000000000008 0000000000000000 00000000ffffffff 0000000000000000
> [ 67.162630] page dumped because: kasan: bad access detected
>
> [ 67.162636] Memory state around the buggy address:
> [ 67.162640] ffff888156cb8000: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 67.162645] ffff888156cb8080: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 67.162650] >ffff888156cb8100: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 67.162653] ^
> [ 67.162658] ffff888156cb8180: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
> [ 67.162663] ffff888156cb8200: fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc fc
>
> The reason is that the eth_hdr(skb) inside if statement evaluated
> to an unexpected address with skb->mac_header = ~0U (indicating there
> is no MAC header). The unreliability of skb->mac_len causes the if
> statement to become true even if there is no MAC header inside the
> skb data buffer.
>
> Check both the skb->mac_len and skb_mac_header_was_set(skb) fixes this issue.
>
> Fixes: 87cdf3148b11 ("xfrm: Verify MAC header exists before overwriting eth_hdr(skb)->h_proto")
> Signed-off-by: En-Wei Wu <en-wei.wu@canonical.com>
> ---
> Changes in v2:
> * Change the title from "xfrm: avoid using skb->mac_len to decide if mac header is shown"
> * Remain skb->mac_len check
> * Apply fix on ipv6 path too
> ---
>  net/xfrm/xfrm_input.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
> index 749e7eea99e4..eef0145c73a7 100644
> --- a/net/xfrm/xfrm_input.c
> +++ b/net/xfrm/xfrm_input.c
> @@ -251,7 +251,7 @@ static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)
>
>  	skb_reset_network_header(skb);
>  	skb_mac_header_rebuild(skb);
> -	if (skb->mac_len)
> +	if (skb->mac_len && skb_mac_header_was_set(skb))
>  		eth_hdr(skb)->h_proto = skb->protocol;
>
>  	err = 0;
> @@ -288,7 +288,7 @@ static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)
>
>  	skb_reset_network_header(skb);
>  	skb_mac_header_rebuild(skb);
> -	if (skb->mac_len)
> +	if (skb->mac_len && skb_mac_header_was_set(skb))
>  		eth_hdr(skb)->h_proto = skb->protocol;
>
>  	err = 0;

Same change (and request for more debugging) already suggested in 2023, see [1]...

Regards,
Peter

[1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/
Eric Dumazet Sept. 12, 2024, 10:53 a.m. UTC | #3
On Thu, Sep 12, 2024 at 11:35 AM Peter Seiderer <ps.report@gmx.net> wrote:
>

> Same change (and request for more debugging) already suggested in 2023, see [1]...
>
> Regards,
> Peter
>
> [1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/

Indeed !
Nice to see some consistency among us :)
En-Wei Wu Sept. 13, 2024, 5:29 a.m. UTC | #4
> Could you try the following patch, and compile your test kernel with
> CONFIG_DEBUG_NET=y ?
[  323.870221] ------------[ cut here ]------------
[  323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
__netif_receive_skb_core.constprop.0+0x201/0x39d0
[  323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted
6.11.0-rc6-c763c4339688+ #12
[  323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
1.15.0 07/15/2024
[  323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0
[  323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0
7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8
ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89
fa 48
[  323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246
[  323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090
[  323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a
[  323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000
[  323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0
[  323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0
[  323.870385] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
knlGS:0000000000000000
[  323.870387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0
[  323.870389] PKRU: 55555554
[  323.870390] Call Trace:
[  323.870391]  <TASK>
[  323.870393]  ? show_regs+0x71/0x90
[  323.870397]  ? __warn+0xce/0x270
[  323.870399]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
[  323.870401]  ? report_bug+0x2ad/0x300
[  323.870404]  ? handle_bug+0x46/0x90
[  323.870407]  ? exc_invalid_op+0x19/0x50
[  323.870409]  ? asm_exc_invalid_op+0x1b/0x20
[  323.870413]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
[  323.870415]  ? intel_iommu_iotlb_sync_map+0x1a/0x30
[  323.870418]  ? iommu_map+0xab/0x140
[  323.870421]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
[  323.870423]  ? iommu_dma_map_page+0x159/0x720
[  323.870425]  ? dma_map_page_attrs+0x568/0xdc0
[  323.870427]  ? __kasan_slab_alloc+0x9d/0xa0
[  323.870430]  ? __pfx_dma_map_page_attrs+0x10/0x10
[  323.870431]  ? __kasan_check_write+0x14/0x30
[  323.870434]  ? __build_skb_around+0x23a/0x350
[  323.870437]  __netif_receive_skb_one_core+0xb4/0x1d0
[  323.870439]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  323.870441]  ? __kasan_check_write+0x14/0x30
[  323.870443]  ? _raw_spin_lock_irq+0x8b/0x100
[  323.870445]  __netif_receive_skb+0x21/0x160
[  323.870447]  process_backlog+0x1c0/0x590
[  323.870449]  __napi_poll+0xab/0x560
[  323.870451]  net_rx_action+0x53e/0xd10
[  323.870453]  ? __pfx_net_rx_action+0x10/0x10
[  323.870455]  ? __pfx_wake_up_var+0x10/0x10
[  323.870457]  ? tasklet_action_common.constprop.0+0x22c/0x670
[  323.870461]  handle_softirqs+0x18f/0x5d0
[  323.870463]  ? __pfx_run_ksoftirqd+0x10/0x10
[  323.870465]  run_ksoftirqd+0x3c/0x60
[  323.870467]  smpboot_thread_fn+0x2f3/0x700
[  323.870470]  kthread+0x2b5/0x390
[  323.870472]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  323.870474]  ? __pfx_kthread+0x10/0x10
[  323.870476]  ret_from_fork+0x43/0x90
[  323.870478]  ? __pfx_kthread+0x10/0x10
[  323.870480]  ret_from_fork_asm+0x1a/0x30
[  323.870483]  </TASK>
[  323.870484] ---[ end trace 0000000000000000 ]---
[  350.300485] Initializing XFRM netlink socket
[  351.586993] ------------[ cut here ]------------
[  351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
dev_gro_receive+0x172c/0x2860
[  351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G
  W          6.11.0-rc6-c763c4339688+ #12
[  351.587144] Tainted: [W]=WARN
[  351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
1.15.0 07/15/2024
[  351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860
[  351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00
8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee
f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41
0f 95
[  351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246
[  351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000
[  351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a
[  351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000
[  351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0
[  351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8
[  351.587159] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
knlGS:0000000000000000
[  351.587160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[  351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0
[  351.587163] PKRU: 55555554
[  351.587163] Call Trace:
[  351.587164]  <TASK>
[  351.587167]  ? show_regs+0x71/0x90
[  351.587171]  ? __warn+0xce/0x270
[  351.587173]  ? dev_gro_receive+0x172c/0x2860
[  351.587175]  ? report_bug+0x2ad/0x300
[  351.587178]  ? handle_bug+0x46/0x90
[  351.587181]  ? exc_invalid_op+0x19/0x50
[  351.587182]  ? asm_exc_invalid_op+0x1b/0x20
[  351.587187]  ? dev_gro_receive+0x172c/0x2860
[  351.587188]  ? dev_gro_receive+0xcdd/0x2860
[  351.587190]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
[  351.587192]  ? __mutex_lock.constprop.0+0x150/0x1180
[  351.587195]  napi_gro_receive+0x3a2/0x900
[  351.587197]  gro_cell_poll+0xe5/0x1d0
[  351.587200]  __napi_poll+0xab/0x560
[  351.587202]  net_rx_action+0x53e/0xd10
[  351.587204]  ? __pfx_net_rx_action+0x10/0x10
[  351.587206]  ? __pfx_wake_up_var+0x10/0x10
[  351.587209]  ? tasklet_action_common.constprop.0+0x22c/0x670
[  351.587212]  handle_softirqs+0x18f/0x5d0
[  351.587214]  ? __pfx_run_ksoftirqd+0x10/0x10
[  351.587216]  run_ksoftirqd+0x3c/0x60
[  351.587218]  smpboot_thread_fn+0x2f3/0x700
[  351.587220]  kthread+0x2b5/0x390
[  351.587223]  ? __pfx_smpboot_thread_fn+0x10/0x10
[  351.587224]  ? __pfx_kthread+0x10/0x10
[  351.587226]  ret_from_fork+0x43/0x90
[  351.587229]  ? __pfx_kthread+0x10/0x10
[  351.587231]  ret_from_fork_asm+0x1a/0x30
[  351.587234]  </TASK>
[  351.587235] ---[ end trace 0000000000000000 ]---

Seems like the __netif_receive_skb_core() and dev_gro_receive() are
the places where it calls skb_reset_mac_len() with skb->mac_header =
~0U.

On Thu, 12 Sept 2024 at 18:54, Eric Dumazet <edumazet@google.com> wrote:
>
> On Thu, Sep 12, 2024 at 11:35 AM Peter Seiderer <ps.report@gmx.net> wrote:
> >
>
> > Same change (and request for more debugging) already suggested in 2023, see [1]...
> >
> > Regards,
> > Peter
> >
> > [1] https://lore.kernel.org/netdev/d1cf5a66-03e1-44b8-929d-ac123b1bbd7b@sylv.io/T/
>
> Indeed !
> Nice to see some consistency among us :)
Eric Dumazet Sept. 13, 2024, 7:04 a.m. UTC | #5
On Fri, Sep 13, 2024 at 7:29 AM En-Wei WU <en-wei.wu@canonical.com> wrote:
>
> > Could you try the following patch, and compile your test kernel with
> > CONFIG_DEBUG_NET=y ?
> [  323.870221] ------------[ cut here ]------------
> [  323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
> __netif_receive_skb_core.constprop.0+0x201/0x39d0
> [  323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted
> 6.11.0-rc6-c763c4339688+ #12
> [  323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
> 1.15.0 07/15/2024
> [  323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0
> [  323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0
> 7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8
> ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89
> fa 48
> [  323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246
> [  323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090
> [  323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a
> [  323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000
> [  323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0
> [  323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0
> [  323.870385] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
> knlGS:0000000000000000
> [  323.870387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0
> [  323.870389] PKRU: 55555554
> [  323.870390] Call Trace:
> [  323.870391]  <TASK>
> [  323.870393]  ? show_regs+0x71/0x90
> [  323.870397]  ? __warn+0xce/0x270
> [  323.870399]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
> [  323.870401]  ? report_bug+0x2ad/0x300
> [  323.870404]  ? handle_bug+0x46/0x90
> [  323.870407]  ? exc_invalid_op+0x19/0x50
> [  323.870409]  ? asm_exc_invalid_op+0x1b/0x20
> [  323.870413]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
> [  323.870415]  ? intel_iommu_iotlb_sync_map+0x1a/0x30
> [  323.870418]  ? iommu_map+0xab/0x140
> [  323.870421]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
> [  323.870423]  ? iommu_dma_map_page+0x159/0x720
> [  323.870425]  ? dma_map_page_attrs+0x568/0xdc0
> [  323.870427]  ? __kasan_slab_alloc+0x9d/0xa0
> [  323.870430]  ? __pfx_dma_map_page_attrs+0x10/0x10
> [  323.870431]  ? __kasan_check_write+0x14/0x30
> [  323.870434]  ? __build_skb_around+0x23a/0x350
> [  323.870437]  __netif_receive_skb_one_core+0xb4/0x1d0
> [  323.870439]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
> [  323.870441]  ? __kasan_check_write+0x14/0x30
> [  323.870443]  ? _raw_spin_lock_irq+0x8b/0x100
> [  323.870445]  __netif_receive_skb+0x21/0x160
> [  323.870447]  process_backlog+0x1c0/0x590
> [  323.870449]  __napi_poll+0xab/0x560
> [  323.870451]  net_rx_action+0x53e/0xd10
> [  323.870453]  ? __pfx_net_rx_action+0x10/0x10
> [  323.870455]  ? __pfx_wake_up_var+0x10/0x10
> [  323.870457]  ? tasklet_action_common.constprop.0+0x22c/0x670
> [  323.870461]  handle_softirqs+0x18f/0x5d0
> [  323.870463]  ? __pfx_run_ksoftirqd+0x10/0x10
> [  323.870465]  run_ksoftirqd+0x3c/0x60
> [  323.870467]  smpboot_thread_fn+0x2f3/0x700
> [  323.870470]  kthread+0x2b5/0x390
> [  323.870472]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [  323.870474]  ? __pfx_kthread+0x10/0x10
> [  323.870476]  ret_from_fork+0x43/0x90
> [  323.870478]  ? __pfx_kthread+0x10/0x10
> [  323.870480]  ret_from_fork_asm+0x1a/0x30
> [  323.870483]  </TASK>
> [  323.870484] ---[ end trace 0000000000000000 ]---
> [  350.300485] Initializing XFRM netlink socket
> [  351.586993] ------------[ cut here ]------------
> [  351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
> dev_gro_receive+0x172c/0x2860
> [  351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G
>   W          6.11.0-rc6-c763c4339688+ #12
> [  351.587144] Tainted: [W]=WARN
> [  351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
> 1.15.0 07/15/2024
> [  351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860
> [  351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00
> 8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee
> f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41
> 0f 95
> [  351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246
> [  351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000
> [  351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a
> [  351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000
> [  351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0
> [  351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8
> [  351.587159] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
> knlGS:0000000000000000
> [  351.587160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [  351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0
> [  351.587163] PKRU: 55555554
> [  351.587163] Call Trace:
> [  351.587164]  <TASK>
> [  351.587167]  ? show_regs+0x71/0x90
> [  351.587171]  ? __warn+0xce/0x270
> [  351.587173]  ? dev_gro_receive+0x172c/0x2860
> [  351.587175]  ? report_bug+0x2ad/0x300
> [  351.587178]  ? handle_bug+0x46/0x90
> [  351.587181]  ? exc_invalid_op+0x19/0x50
> [  351.587182]  ? asm_exc_invalid_op+0x1b/0x20
> [  351.587187]  ? dev_gro_receive+0x172c/0x2860
> [  351.587188]  ? dev_gro_receive+0xcdd/0x2860
> [  351.587190]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
> [  351.587192]  ? __mutex_lock.constprop.0+0x150/0x1180
> [  351.587195]  napi_gro_receive+0x3a2/0x900
> [  351.587197]  gro_cell_poll+0xe5/0x1d0
> [  351.587200]  __napi_poll+0xab/0x560
> [  351.587202]  net_rx_action+0x53e/0xd10
> [  351.587204]  ? __pfx_net_rx_action+0x10/0x10
> [  351.587206]  ? __pfx_wake_up_var+0x10/0x10
> [  351.587209]  ? tasklet_action_common.constprop.0+0x22c/0x670
> [  351.587212]  handle_softirqs+0x18f/0x5d0
> [  351.587214]  ? __pfx_run_ksoftirqd+0x10/0x10
> [  351.587216]  run_ksoftirqd+0x3c/0x60
> [  351.587218]  smpboot_thread_fn+0x2f3/0x700
> [  351.587220]  kthread+0x2b5/0x390
> [  351.587223]  ? __pfx_smpboot_thread_fn+0x10/0x10
> [  351.587224]  ? __pfx_kthread+0x10/0x10
> [  351.587226]  ret_from_fork+0x43/0x90
> [  351.587229]  ? __pfx_kthread+0x10/0x10
> [  351.587231]  ret_from_fork_asm+0x1a/0x30
> [  351.587234]  </TASK>
> [  351.587235] ---[ end trace 0000000000000000 ]---
>
> Seems like the __netif_receive_skb_core() and dev_gro_receive() are
> the places where it calls skb_reset_mac_len() with skb->mac_header =
> ~0U.

Ouch, let me take a look.
En-Wei Wu Oct. 2, 2024, 10:40 a.m. UTC | #6
Hi,

I would kindly ask if there is any progress :)

Thanks.
En-Wei.

On Fri, 13 Sept 2024 at 09:04, Eric Dumazet <edumazet@google.com> wrote:
>
> On Fri, Sep 13, 2024 at 7:29 AM En-Wei WU <en-wei.wu@canonical.com> wrote:
> >
> > > Could you try the following patch, and compile your test kernel with
> > > CONFIG_DEBUG_NET=y ?
> > [  323.870221] ------------[ cut here ]------------
> > [  323.870226] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
> > __netif_receive_skb_core.constprop.0+0x201/0x39d0
> > [  323.870369] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Not tainted
> > 6.11.0-rc6-c763c4339688+ #12
> > [  323.870372] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
> > 1.15.0 07/15/2024
> > [  323.870373] RIP: 0010:__netif_receive_skb_core.constprop.0+0x201/0x39d0
> > [  323.870376] Code: 03 0f b6 14 02 48 89 f8 83 e0 07 83 c0 01 38 d0
> > 7c 08 84 d2 0f 85 b4 24 00 00 41 0f b7 87 ba 00 00 00 29 c3 66 83 f8
> > ff 75 04 <0f> 0b 31 db 48 b8 00 00 00 00 00 fc ff df 49 8d 7f 78 48 89
> > fa 48
> > [  323.870378] RSP: 0018:ffffc90000377838 EFLAGS: 00010246
> > [  323.870380] RAX: 000000000000ffff RBX: 00000000ffff0061 RCX: ffff88876cf48090
> > [  323.870381] RDX: 0000000000000000 RSI: 0000000000000000 RDI: ffff8881756b2e7a
> > [  323.870382] RBP: ffffc90000377a88 R08: ffff88876cf48184 R09: 0000000000000000
> > [  323.870383] R10: 0000000000000000 R11: 1ffff1102ead65b9 R12: ffff8881756b2dc0
> > [  323.870384] R13: ffffc90000377b20 R14: ffff8881635ca000 R15: ffff8881756b2dc0
> > [  323.870385] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
> > knlGS:0000000000000000
> > [  323.870387] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  323.870388] CR2: 0000769acfa9d080 CR3: 0000000712498000 CR4: 0000000000f50ef0
> > [  323.870389] PKRU: 55555554
> > [  323.870390] Call Trace:
> > [  323.870391]  <TASK>
> > [  323.870393]  ? show_regs+0x71/0x90
> > [  323.870397]  ? __warn+0xce/0x270
> > [  323.870399]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
> > [  323.870401]  ? report_bug+0x2ad/0x300
> > [  323.870404]  ? handle_bug+0x46/0x90
> > [  323.870407]  ? exc_invalid_op+0x19/0x50
> > [  323.870409]  ? asm_exc_invalid_op+0x1b/0x20
> > [  323.870413]  ? __netif_receive_skb_core.constprop.0+0x201/0x39d0
> > [  323.870415]  ? intel_iommu_iotlb_sync_map+0x1a/0x30
> > [  323.870418]  ? iommu_map+0xab/0x140
> > [  323.870421]  ? __pfx___netif_receive_skb_core.constprop.0+0x10/0x10
> > [  323.870423]  ? iommu_dma_map_page+0x159/0x720
> > [  323.870425]  ? dma_map_page_attrs+0x568/0xdc0
> > [  323.870427]  ? __kasan_slab_alloc+0x9d/0xa0
> > [  323.870430]  ? __pfx_dma_map_page_attrs+0x10/0x10
> > [  323.870431]  ? __kasan_check_write+0x14/0x30
> > [  323.870434]  ? __build_skb_around+0x23a/0x350
> > [  323.870437]  __netif_receive_skb_one_core+0xb4/0x1d0
> > [  323.870439]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
> > [  323.870441]  ? __kasan_check_write+0x14/0x30
> > [  323.870443]  ? _raw_spin_lock_irq+0x8b/0x100
> > [  323.870445]  __netif_receive_skb+0x21/0x160
> > [  323.870447]  process_backlog+0x1c0/0x590
> > [  323.870449]  __napi_poll+0xab/0x560
> > [  323.870451]  net_rx_action+0x53e/0xd10
> > [  323.870453]  ? __pfx_net_rx_action+0x10/0x10
> > [  323.870455]  ? __pfx_wake_up_var+0x10/0x10
> > [  323.870457]  ? tasklet_action_common.constprop.0+0x22c/0x670
> > [  323.870461]  handle_softirqs+0x18f/0x5d0
> > [  323.870463]  ? __pfx_run_ksoftirqd+0x10/0x10
> > [  323.870465]  run_ksoftirqd+0x3c/0x60
> > [  323.870467]  smpboot_thread_fn+0x2f3/0x700
> > [  323.870470]  kthread+0x2b5/0x390
> > [  323.870472]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > [  323.870474]  ? __pfx_kthread+0x10/0x10
> > [  323.870476]  ret_from_fork+0x43/0x90
> > [  323.870478]  ? __pfx_kthread+0x10/0x10
> > [  323.870480]  ret_from_fork_asm+0x1a/0x30
> > [  323.870483]  </TASK>
> > [  323.870484] ---[ end trace 0000000000000000 ]---
> > [  350.300485] Initializing XFRM netlink socket
> > [  351.586993] ------------[ cut here ]------------
> > [  351.586999] WARNING: CPU: 2 PID: 26 at include/linux/skbuff.h:2904
> > dev_gro_receive+0x172c/0x2860
> > [  351.587141] CPU: 2 UID: 0 PID: 26 Comm: ksoftirqd/2 Tainted: G
> >   W          6.11.0-rc6-c763c4339688+ #12
> > [  351.587144] Tainted: [W]=WARN
> > [  351.587145] Hardware name: Dell Inc. Latitude 5340/0SG010, BIOS
> > 1.15.0 07/15/2024
> > [  351.587147] RIP: 0010:dev_gro_receive+0x172c/0x2860
> > [  351.587149] Code: 07 83 c2 01 38 ca 7c 08 84 c9 0f 85 d2 09 00 00
> > 8d 14 c5 00 00 00 00 41 0f b6 45 46 83 e0 c7 09 d0 41 88 45 46 e9 ee
> > f9 ff ff <0f> 0b 45 31 f6 e9 64 f7 ff ff 45 31 e4 81 e3 c0 00 00 00 41
> > 0f 95
> > [  351.587151] RSP: 0018:ffffc90000377aa8 EFLAGS: 00010246
> > [  351.587153] RAX: ffff888128d72840 RBX: ffffffff95a0d9c0 RCX: 0000000000000000
> > [  351.587154] RDX: 000000000000ffff RSI: ffff88876cf52418 RDI: ffff88815880ad3a
> > [  351.587155] RBP: ffffc90000377b48 R08: 0000000000000000 R09: 0000000000000000
> > [  351.587156] R10: 1ffff110ed9ea481 R11: 0000000000000000 R12: ffffffff95a0d9d0
> > [  351.587157] R13: ffff88815880ac80 R14: 00000000ffff008d R15: ffff88815880acb8
> > [  351.587159] FS:  0000000000000000(0000) GS:ffff88876cf00000(0000)
> > knlGS:0000000000000000
> > [  351.587160] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> > [  351.587161] CR2: 000078e9ea9e25b0 CR3: 0000000712498000 CR4: 0000000000f50ef0
> > [  351.587163] PKRU: 55555554
> > [  351.587163] Call Trace:
> > [  351.587164]  <TASK>
> > [  351.587167]  ? show_regs+0x71/0x90
> > [  351.587171]  ? __warn+0xce/0x270
> > [  351.587173]  ? dev_gro_receive+0x172c/0x2860
> > [  351.587175]  ? report_bug+0x2ad/0x300
> > [  351.587178]  ? handle_bug+0x46/0x90
> > [  351.587181]  ? exc_invalid_op+0x19/0x50
> > [  351.587182]  ? asm_exc_invalid_op+0x1b/0x20
> > [  351.587187]  ? dev_gro_receive+0x172c/0x2860
> > [  351.587188]  ? dev_gro_receive+0xcdd/0x2860
> > [  351.587190]  ? __pfx___netif_receive_skb_one_core+0x10/0x10
> > [  351.587192]  ? __mutex_lock.constprop.0+0x150/0x1180
> > [  351.587195]  napi_gro_receive+0x3a2/0x900
> > [  351.587197]  gro_cell_poll+0xe5/0x1d0
> > [  351.587200]  __napi_poll+0xab/0x560
> > [  351.587202]  net_rx_action+0x53e/0xd10
> > [  351.587204]  ? __pfx_net_rx_action+0x10/0x10
> > [  351.587206]  ? __pfx_wake_up_var+0x10/0x10
> > [  351.587209]  ? tasklet_action_common.constprop.0+0x22c/0x670
> > [  351.587212]  handle_softirqs+0x18f/0x5d0
> > [  351.587214]  ? __pfx_run_ksoftirqd+0x10/0x10
> > [  351.587216]  run_ksoftirqd+0x3c/0x60
> > [  351.587218]  smpboot_thread_fn+0x2f3/0x700
> > [  351.587220]  kthread+0x2b5/0x390
> > [  351.587223]  ? __pfx_smpboot_thread_fn+0x10/0x10
> > [  351.587224]  ? __pfx_kthread+0x10/0x10
> > [  351.587226]  ret_from_fork+0x43/0x90
> > [  351.587229]  ? __pfx_kthread+0x10/0x10
> > [  351.587231]  ret_from_fork_asm+0x1a/0x30
> > [  351.587234]  </TASK>
> > [  351.587235] ---[ end trace 0000000000000000 ]---
> >
> > Seems like the __netif_receive_skb_core() and dev_gro_receive() are
> > the places where it calls skb_reset_mac_len() with skb->mac_header =
> > ~0U.
>
> Ouch, let me take a look.
Eric Dumazet Oct. 2, 2024, 12:59 p.m. UTC | #7
On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote:
>
> Hi,
>
> I would kindly ask if there is any progress :)

Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) :

diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132
100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -2909,9 +2909,19 @@ static inline void
skb_reset_inner_headers(struct sk_buff *skb)
        skb->inner_transport_header = skb->transport_header;
 }

+static inline int skb_mac_header_was_set(const struct sk_buff *skb)
+{
+       return skb->mac_header != (typeof(skb->mac_header))~0U;
+}
+
 static inline void skb_reset_mac_len(struct sk_buff *skb)
 {
-       skb->mac_len = skb->network_header - skb->mac_header;
+       if (!skb_mac_header_was_set(skb)) {
+               DEBUG_NET_WARN_ON_ONCE(1);
+               skb->mac_len = 0;
+       } else {
+               skb->mac_len = skb->network_header - skb->mac_header;
+       }
 }

 static inline unsigned char *skb_inner_transport_header(const struct sk_buff
@@ -3014,11 +3024,6 @@ static inline void
skb_set_network_header(struct sk_buff *skb, const int offset)
        skb->network_header += offset;
 }

-static inline int skb_mac_header_was_set(const struct sk_buff *skb)
-{
-       return skb->mac_header != (typeof(skb->mac_header))~0U;
-}
-
 static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
 {
        DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
@@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct
sk_buff *skb)

 static inline void skb_reset_mac_header(struct sk_buff *skb)
 {
+       DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
        skb->mac_header = skb->data - skb->head;
 }

@@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct
sk_buff *skb, const int offset)
 {
        skb_reset_mac_header(skb);
        skb->mac_header += offset;
+       DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head);
 }

 static inline void skb_pop_mac_header(struct sk_buff *skb)
En-Wei Wu Oct. 14, 2024, 8:06 p.m. UTC | #8
Hi, sorry for the late reply.

I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine,
and the DEBUG_NET_WARN_ON_ONCE never got triggered.

Thanks.

On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote:
>
> On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote:
> >
> > Hi,
> >
> > I would kindly ask if there is any progress :)
>
> Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) :
>
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132
> 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -2909,9 +2909,19 @@ static inline void
> skb_reset_inner_headers(struct sk_buff *skb)
>         skb->inner_transport_header = skb->transport_header;
>  }
>
> +static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> +{
> +       return skb->mac_header != (typeof(skb->mac_header))~0U;
> +}
> +
>  static inline void skb_reset_mac_len(struct sk_buff *skb)
>  {
> -       skb->mac_len = skb->network_header - skb->mac_header;
> +       if (!skb_mac_header_was_set(skb)) {
> +               DEBUG_NET_WARN_ON_ONCE(1);
> +               skb->mac_len = 0;
> +       } else {
> +               skb->mac_len = skb->network_header - skb->mac_header;
> +       }
>  }
>
>  static inline unsigned char *skb_inner_transport_header(const struct sk_buff
> @@ -3014,11 +3024,6 @@ static inline void
> skb_set_network_header(struct sk_buff *skb, const int offset)
>         skb->network_header += offset;
>  }
>
> -static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> -{
> -       return skb->mac_header != (typeof(skb->mac_header))~0U;
> -}
> -
>  static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
>  {
>         DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
> @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct
> sk_buff *skb)
>
>  static inline void skb_reset_mac_header(struct sk_buff *skb)
>  {
> +       DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
>         skb->mac_header = skb->data - skb->head;
>  }
>
> @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct
> sk_buff *skb, const int offset)
>  {
>         skb_reset_mac_header(skb);
>         skb->mac_header += offset;
> +       DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head);
>  }
>
>  static inline void skb_pop_mac_header(struct sk_buff *skb)
En-Wei Wu Oct. 18, 2024, 1:21 p.m. UTC | #9
> Seems like the __netif_receive_skb_core() and dev_gro_receive() are
> the places where it calls skb_reset_mac_len() with skb->mac_header =
> ~0U.
I believe it's the root cause.

My concern is that if we put something like:
+       if (!skb_mac_header_was_set(skb)) {
+               DEBUG_NET_WARN_ON_ONCE(1);
+               skb->mac_len = 0;
in skb_reset_mac_len(), it may degrade the RX path a bit.

Catching the bug in xfrm4_remove_tunnel_encap() and
xfrm6_remove_tunnel_encap() (the original patch) is nice because it
won't affect the systems which are not using the xfrm.

Kind Regards,
En-Wei.

On Mon, 14 Oct 2024 at 22:06, En-Wei WU <en-wei.wu@canonical.com> wrote:
>
> Hi, sorry for the late reply.
>
> I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine,
> and the DEBUG_NET_WARN_ON_ONCE never got triggered.
>
> Thanks.
>
> On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote:
> > >
> > > Hi,
> > >
> > > I would kindly ask if there is any progress :)
> >
> > Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) :
> >
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132
> > 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -2909,9 +2909,19 @@ static inline void
> > skb_reset_inner_headers(struct sk_buff *skb)
> >         skb->inner_transport_header = skb->transport_header;
> >  }
> >
> > +static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> > +{
> > +       return skb->mac_header != (typeof(skb->mac_header))~0U;
> > +}
> > +
> >  static inline void skb_reset_mac_len(struct sk_buff *skb)
> >  {
> > -       skb->mac_len = skb->network_header - skb->mac_header;
> > +       if (!skb_mac_header_was_set(skb)) {
> > +               DEBUG_NET_WARN_ON_ONCE(1);
> > +               skb->mac_len = 0;
> > +       } else {
> > +               skb->mac_len = skb->network_header - skb->mac_header;
> > +       }
> >  }
> >
> >  static inline unsigned char *skb_inner_transport_header(const struct sk_buff
> > @@ -3014,11 +3024,6 @@ static inline void
> > skb_set_network_header(struct sk_buff *skb, const int offset)
> >         skb->network_header += offset;
> >  }
> >
> > -static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> > -{
> > -       return skb->mac_header != (typeof(skb->mac_header))~0U;
> > -}
> > -
> >  static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
> >  {
> >         DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
> > @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct
> > sk_buff *skb)
> >
> >  static inline void skb_reset_mac_header(struct sk_buff *skb)
> >  {
> > +       DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
> >         skb->mac_header = skb->data - skb->head;
> >  }
> >
> > @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct
> > sk_buff *skb, const int offset)
> >  {
> >         skb_reset_mac_header(skb);
> >         skb->mac_header += offset;
> > +       DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head);
> >  }
> >
> >  static inline void skb_pop_mac_header(struct sk_buff *skb)
En-Wei Wu Nov. 5, 2024, 8:05 a.m. UTC | #10
Hi,

Can I kindly ask if there is any progress?

Thanks,
Regards.

On Fri, 18 Oct 2024 at 21:21, En-Wei WU <en-wei.wu@canonical.com> wrote:
>
> > Seems like the __netif_receive_skb_core() and dev_gro_receive() are
> > the places where it calls skb_reset_mac_len() with skb->mac_header =
> > ~0U.
> I believe it's the root cause.
>
> My concern is that if we put something like:
> +       if (!skb_mac_header_was_set(skb)) {
> +               DEBUG_NET_WARN_ON_ONCE(1);
> +               skb->mac_len = 0;
> in skb_reset_mac_len(), it may degrade the RX path a bit.
>
> Catching the bug in xfrm4_remove_tunnel_encap() and
> xfrm6_remove_tunnel_encap() (the original patch) is nice because it
> won't affect the systems which are not using the xfrm.
>
> Kind Regards,
> En-Wei.
>
> On Mon, 14 Oct 2024 at 22:06, En-Wei WU <en-wei.wu@canonical.com> wrote:
> >
> > Hi, sorry for the late reply.
> >
> > I've tested this debug patch (with CONFIG_DEBUG_NET=y) on my machine,
> > and the DEBUG_NET_WARN_ON_ONCE never got triggered.
> >
> > Thanks.
> >
> > On Wed, 2 Oct 2024 at 14:59, Eric Dumazet <edumazet@google.com> wrote:
> > >
> > > On Wed, Oct 2, 2024 at 12:40 PM En-Wei WU <en-wei.wu@canonical.com> wrote:
> > > >
> > > > Hi,
> > > >
> > > > I would kindly ask if there is any progress :)
> > >
> > > Can you now try this debug patch (with CONFIG_DEBUG_NET=y ) :
> > >
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 39f1d16f362887821caa022464695c4045461493..e0e4154cbeb90474d92634d505869526c566f132
> > > 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -2909,9 +2909,19 @@ static inline void
> > > skb_reset_inner_headers(struct sk_buff *skb)
> > >         skb->inner_transport_header = skb->transport_header;
> > >  }
> > >
> > > +static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> > > +{
> > > +       return skb->mac_header != (typeof(skb->mac_header))~0U;
> > > +}
> > > +
> > >  static inline void skb_reset_mac_len(struct sk_buff *skb)
> > >  {
> > > -       skb->mac_len = skb->network_header - skb->mac_header;
> > > +       if (!skb_mac_header_was_set(skb)) {
> > > +               DEBUG_NET_WARN_ON_ONCE(1);
> > > +               skb->mac_len = 0;
> > > +       } else {
> > > +               skb->mac_len = skb->network_header - skb->mac_header;
> > > +       }
> > >  }
> > >
> > >  static inline unsigned char *skb_inner_transport_header(const struct sk_buff
> > > @@ -3014,11 +3024,6 @@ static inline void
> > > skb_set_network_header(struct sk_buff *skb, const int offset)
> > >         skb->network_header += offset;
> > >  }
> > >
> > > -static inline int skb_mac_header_was_set(const struct sk_buff *skb)
> > > -{
> > > -       return skb->mac_header != (typeof(skb->mac_header))~0U;
> > > -}
> > > -
> > >  static inline unsigned char *skb_mac_header(const struct sk_buff *skb)
> > >  {
> > >         DEBUG_NET_WARN_ON_ONCE(!skb_mac_header_was_set(skb));
> > > @@ -3043,6 +3048,7 @@ static inline void skb_unset_mac_header(struct
> > > sk_buff *skb)
> > >
> > >  static inline void skb_reset_mac_header(struct sk_buff *skb)
> > >  {
> > > +       DEBUG_NET_WARN_ON_ONCE(skb->data < skb->head);
> > >         skb->mac_header = skb->data - skb->head;
> > >  }
> > >
> > > @@ -3050,6 +3056,7 @@ static inline void skb_set_mac_header(struct
> > > sk_buff *skb, const int offset)
> > >  {
> > >         skb_reset_mac_header(skb);
> > >         skb->mac_header += offset;
> > > +       DEBUG_NET_WARN_ON_ONCE(skb_mac_header(skb) < skb->head);
> > >  }
> > >
> > >  static inline void skb_pop_mac_header(struct sk_buff *skb)
Eric Dumazet Nov. 5, 2024, 9:25 a.m. UTC | #11
On Fri, Oct 18, 2024 at 3:22 PM En-Wei WU <en-wei.wu@canonical.com> wrote:
>
> > Seems like the __netif_receive_skb_core() and dev_gro_receive() are
> > the places where it calls skb_reset_mac_len() with skb->mac_header =
> > ~0U.
> I believe it's the root cause.
>
> My concern is that if we put something like:
> +       if (!skb_mac_header_was_set(skb)) {
> +               DEBUG_NET_WARN_ON_ONCE(1);
> +               skb->mac_len = 0;
> in skb_reset_mac_len(), it may degrade the RX path a bit.

I do not have such concerns. Note this is temporary until we fix the root cause.

>
> Catching the bug in xfrm4_remove_tunnel_encap() and
> xfrm6_remove_tunnel_encap() (the original patch) is nice because it
> won't affect the systems which are not using the xfrm.
>

Somehow xfrm is feeding to gro_cells_receive() packets without the mac
header being set, this is the bug that needs to be fixed.

GRO needs skb_mac_header() to return the correct pointer.

For normal GRO, it is set either in :

1) napi_gro_frags : napi_frags_skb()  calls skb_reset_mac_header(skb);

2) napi_gro_receive() : callers are supposed to call eth_type_trans()
before calling napi_gro_receive().
    eth_type_trans() calls skb_reset_mac_header() as expected.

xfrm calls skb_mac_header_rebuild(), but it might be a NOP if MAC
header was never set.
diff mbox series

Patch

diff --git a/net/xfrm/xfrm_input.c b/net/xfrm/xfrm_input.c
index 749e7eea99e4..eef0145c73a7 100644
--- a/net/xfrm/xfrm_input.c
+++ b/net/xfrm/xfrm_input.c
@@ -251,7 +251,7 @@  static int xfrm4_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	skb_mac_header_rebuild(skb);
-	if (skb->mac_len)
+	if (skb->mac_len && skb_mac_header_was_set(skb))
 		eth_hdr(skb)->h_proto = skb->protocol;
 
 	err = 0;
@@ -288,7 +288,7 @@  static int xfrm6_remove_tunnel_encap(struct xfrm_state *x, struct sk_buff *skb)
 
 	skb_reset_network_header(skb);
 	skb_mac_header_rebuild(skb);
-	if (skb->mac_len)
+	if (skb->mac_len && skb_mac_header_was_set(skb))
 		eth_hdr(skb)->h_proto = skb->protocol;
 
 	err = 0;