diff mbox series

[net] bpf: Don't redirect too small packets

Message ID 20240322122407.1329861-1-edumazet@google.com (mailing list archive)
State Accepted
Commit 96b0e83341e8305a759d776689325857be054ef3
Delegated to: BPF
Headers show
Series [net] bpf: Don't redirect too small packets | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 972 this patch: 972
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers fail 1 blamed authors not CCed: shaozhengchao@huawei.com; 8 maintainers not CCed: haoluo@google.com john.fastabend@gmail.com eddyz87@gmail.com song@kernel.org kpsingh@kernel.org yonghong.song@linux.dev jolsa@kernel.org shaozhengchao@huawei.com
netdev/build_clang success Errors and warnings before: 956 this patch: 956
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 983 this patch: 983
netdev/checkpatch warning WARNING: Possible repeated word: 'Google'
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next-PR success PR summary
bpf/vmtest-bpf-next-VM_Test-2 success Logs for Unittests
bpf/vmtest-bpf-next-VM_Test-5 success Logs for aarch64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-3 success Logs for Validate matrix.py
bpf/vmtest-bpf-next-VM_Test-6 pending Logs for s390x-gcc / build / build for s390x with gcc
bpf/vmtest-bpf-next-VM_Test-8 success Logs for set-matrix
bpf/vmtest-bpf-next-VM_Test-4 pending Logs for aarch64-gcc / build / build for aarch64 with gcc
bpf/vmtest-bpf-next-VM_Test-7 success Logs for s390x-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-9 pending Logs for x86_64-gcc / build / build for x86_64 with gcc
bpf/vmtest-bpf-next-VM_Test-0 success Logs for Lint
bpf/vmtest-bpf-next-VM_Test-11 success Logs for x86_64-llvm-17 / build / build for x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-12 pending Logs for x86_64-llvm-17 / build-release / build for x86_64 with llvm-17 and -O2 optimization
bpf/vmtest-bpf-next-VM_Test-1 success Logs for ShellCheck
bpf/vmtest-bpf-next-VM_Test-16 pending Logs for x86_64-llvm-17 / test (test_verifier, false, 360) / test_verifier on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-14 pending Logs for x86_64-llvm-17 / test (test_progs, false, 360) / test_progs on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-18 pending Logs for x86_64-llvm-18 / build / build for x86_64 with llvm-18
bpf/vmtest-bpf-next-VM_Test-10 success Logs for x86_64-gcc / build-release
bpf/vmtest-bpf-next-VM_Test-13 pending Logs for x86_64-llvm-17 / test (test_maps, false, 360) / test_maps on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-15 pending Logs for x86_64-llvm-17 / test (test_progs_no_alu32, false, 360) / test_progs_no_alu32 on x86_64 with llvm-17
bpf/vmtest-bpf-next-VM_Test-17 success Logs for x86_64-llvm-17 / veristat
bpf/vmtest-bpf-next-VM_Test-19 pending Logs for x86_64-llvm-18 / build-release / build for x86_64 with llvm-18 and -O2 optimization

Commit Message

Eric Dumazet March 22, 2024, 12:24 p.m. UTC
Some drivers ndo_start_xmit() expect a minimal size, as shown
by various syzbot reports [1].

Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
the missing attribute that can be used by upper layers.

We need to use it in __bpf_redirect_common().

[1]

BUG: KMSAN: uninit-value in erspan_build_header+0x170/0x2f0 include/net/erspan.h:197
  erspan_build_header+0x170/0x2f0 include/net/erspan.h:197
  erspan_xmit+0x128a/0x1ec0 net/ipv4/ip_gre.c:706
  __netdev_start_xmit include/linux/netdevice.h:4903 [inline]
  netdev_start_xmit include/linux/netdevice.h:4917 [inline]
  xmit_one net/core/dev.c:3531 [inline]
  dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
  sch_direct_xmit+0x3c5/0xd50 net/sched/sch_generic.c:343
  __dev_xmit_skb net/core/dev.c:3760 [inline]
  __dev_queue_xmit+0x2e6a/0x52c0 net/core/dev.c:4301
  dev_queue_xmit include/linux/netdevice.h:3091 [inline]
  __bpf_tx_skb net/core/filter.c:2136 [inline]
  __bpf_redirect_common net/core/filter.c:2180 [inline]
  __bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187
  ____bpf_clone_redirect net/core/filter.c:2460 [inline]
  bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432
  ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997
  __bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238
  bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
  __bpf_prog_run include/linux/filter.h:657 [inline]
  bpf_prog_run include/linux/filter.h:664 [inline]
  bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425
  bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058
  bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4240
  __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5649
  __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
  __x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5736
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

Uninit was created at:
  slab_post_alloc_hook mm/slub.c:3804 [inline]
  slab_alloc_node mm/slub.c:3845 [inline]
  kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888
  kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577
  pskb_expand_head+0x222/0x19d0 net/core/skbuff.c:2245
  __skb_cow include/linux/skbuff.h:3671 [inline]
  skb_cow_head include/linux/skbuff.h:3705 [inline]
  erspan_xmit+0xb08/0x1ec0 net/ipv4/ip_gre.c:692
  __netdev_start_xmit include/linux/netdevice.h:4903 [inline]
  netdev_start_xmit include/linux/netdevice.h:4917 [inline]
  xmit_one net/core/dev.c:3531 [inline]
  dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547
  sch_direct_xmit+0x3c5/0xd50 net/sched/sch_generic.c:343
  __dev_xmit_skb net/core/dev.c:3760 [inline]
  __dev_queue_xmit+0x2e6a/0x52c0 net/core/dev.c:4301
  dev_queue_xmit include/linux/netdevice.h:3091 [inline]
  __bpf_tx_skb net/core/filter.c:2136 [inline]
  __bpf_redirect_common net/core/filter.c:2180 [inline]
  __bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187
  ____bpf_clone_redirect net/core/filter.c:2460 [inline]
  bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432
  ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997
  __bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238
  bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline]
  __bpf_prog_run include/linux/filter.h:657 [inline]
  bpf_prog_run include/linux/filter.h:664 [inline]
  bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425
  bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058
  bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4240
  __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5649
  __do_sys_bpf kernel/bpf/syscall.c:5738 [inline]
  __se_sys_bpf kernel/bpf/syscall.c:5736 [inline]
  __x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5736
 do_syscall_64+0xd5/0x1f0
 entry_SYSCALL_64_after_hwframe+0x6d/0x75

CPU: 0 PID: 5041 Comm: syz-executor167 Not tainted 6.8.0-syzkaller-11743-ga4145ce1e7bc #0
Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024

Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len")
Reported-by: syzbot+9e27778c0edc62cb97d8@syzkaller.appspotmail.com
Closes: https://lore.kernel.org/netdev/000000000000205af206143ece22@google.com/
Signed-off-by: Eric Dumazet <edumazet@google.com>
Cc: Stanislav Fomichev <sdf@google.com>
Cc: Willem de Bruijn <willemb@google.com>
---
 net/core/filter.c | 5 +++++
 1 file changed, 5 insertions(+)

Comments

patchwork-bot+netdevbpf@kernel.org March 22, 2024, 2:10 p.m. UTC | #1
Hello:

This patch was applied to bpf/bpf.git (master)
by Daniel Borkmann <daniel@iogearbox.net>:

On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> Some drivers ndo_start_xmit() expect a minimal size, as shown
> by various syzbot reports [1].
> 
> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> the missing attribute that can be used by upper layers.
> 
> We need to use it in __bpf_redirect_common().
> 
> [...]

Here is the summary with links:
  - [net] bpf: Don't redirect too small packets
    https://git.kernel.org/bpf/bpf/c/96b0e83341e8

You are awesome, thank you!
Alexei Starovoitov March 23, 2024, 3:01 a.m. UTC | #2
On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
>
> Hello:
>
> This patch was applied to bpf/bpf.git (master)
> by Daniel Borkmann <daniel@iogearbox.net>:
>
> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > Some drivers ndo_start_xmit() expect a minimal size, as shown
> > by various syzbot reports [1].
> >
> > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > the missing attribute that can be used by upper layers.
> >
> > We need to use it in __bpf_redirect_common().

This patch broke empty_skb test:
$ test_progs -t empty_skb

test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
[redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
[redirect_ingress]: actual -34 != expected 0
test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
[redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
[redirect_egress]: actual -34 != expected 1

And looking at the test I think it's not a test issue.
This check
if (unlikely(skb->len < dev->min_header_len))
is rejecting more than it should.

So I reverted this patch for now.
Eric Dumazet March 25, 2024, 1:33 p.m. UTC | #3
On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
<alexei.starovoitov@gmail.com> wrote:
>
> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >
> > Hello:
> >
> > This patch was applied to bpf/bpf.git (master)
> > by Daniel Borkmann <daniel@iogearbox.net>:
> >
> > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > > Some drivers ndo_start_xmit() expect a minimal size, as shown
> > > by various syzbot reports [1].
> > >
> > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > > the missing attribute that can be used by upper layers.
> > >
> > > We need to use it in __bpf_redirect_common().
>
> This patch broke empty_skb test:
> $ test_progs -t empty_skb
>
> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> [redirect_ingress]: actual -34 != expected 0
> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress]: actual -34 != expected 1
>
> And looking at the test I think it's not a test issue.
> This check
> if (unlikely(skb->len < dev->min_header_len))
> is rejecting more than it should.
>
> So I reverted this patch for now.

OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
move my sanity test in __bpf_tx_skb(),
the bpf test program still fails, I am suspecting the test needs to be adjusted.



diff --git a/net/core/filter.c b/net/core/filter.c
index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
net_device *dev, struct sk_buff *skb)
                return -ENETDOWN;
        }

+       if (unlikely(skb->len < dev->min_header_len)) {
+               pr_err_once("__bpf_tx_skb skb->len=%u <
dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
dev->min_header_len);
+               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
+               kfree_skb(skb);
+               return -ERANGE;
+       } // Note: this is before we change skb->dev
        skb->dev = dev;
        skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
        skb_clear_tstamp(skb);


-->


test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
[redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
[redirect_egress]: actual -34 != expected 1

[   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
[   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
               mac=(64,14) net=(78,-1) trans=-1
               shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
               csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
               hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0

Note that veth driver is one of the few 'Ethernet' drivers that make
sure to get at least 14 bytes in the skb at ndo_start_xmit()
after commit 726e2c5929de841fdcef4e2bf995680688ae1b87 ("veth: Ensure
eth header is in skb's linear part")

BTW this last patch (changing veth) should have been done generically
from act_mirred

(We do not want to patch ~400 ethernet drivers in the tree)
Daniel Borkmann March 25, 2024, 3:47 p.m. UTC | #4
On 3/25/24 2:33 PM, Eric Dumazet wrote:
> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
>>>
>>> Hello:
>>>
>>> This patch was applied to bpf/bpf.git (master)
>>> by Daniel Borkmann <daniel@iogearbox.net>:
>>>
>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
>>>> by various syzbot reports [1].
>>>>
>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
>>>> the missing attribute that can be used by upper layers.
>>>>
>>>> We need to use it in __bpf_redirect_common().
>>
>> This patch broke empty_skb test:
>> $ test_progs -t empty_skb
>>
>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
>> [redirect_ingress]: actual -34 != expected 0
>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
>> [redirect_egress]: actual -34 != expected 1
>>
>> And looking at the test I think it's not a test issue.
>> This check
>> if (unlikely(skb->len < dev->min_header_len))
>> is rejecting more than it should.
>>
>> So I reverted this patch for now.
> 
> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> move my sanity test in __bpf_tx_skb(),
> the bpf test program still fails, I am suspecting the test needs to be adjusted.

Let me take a look, I do think so too that we'd need to adjust the test.
I'll see to have a patch for the latter so that we can reapply the fix
as-is along with it given it's correct.

> diff --git a/net/core/filter.c b/net/core/filter.c
> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> net_device *dev, struct sk_buff *skb)
>                  return -ENETDOWN;
>          }
> 
> +       if (unlikely(skb->len < dev->min_header_len)) {
> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> dev->min_header_len);
> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> +               kfree_skb(skb);
> +               return -ERANGE;
> +       } // Note: this is before we change skb->dev
>          skb->dev = dev;
>          skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
>          skb_clear_tstamp(skb);
> 
> 
> -->
> 
> 
> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress]: actual -34 != expected 1
> 
> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
>                 mac=(64,14) net=(78,-1) trans=-1
>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                 csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
>                 hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> 
> Note that veth driver is one of the few 'Ethernet' drivers that make
> sure to get at least 14 bytes in the skb at ndo_start_xmit()
> after commit 726e2c5929de841fdcef4e2bf995680688ae1b87 ("veth: Ensure
> eth header is in skb's linear part")
> 
> BTW this last patch (changing veth) should have been done generically
> from act_mirred
> 
> (We do not want to patch ~400 ethernet drivers in the tree)
>
Alexei Starovoitov March 25, 2024, 3:56 p.m. UTC | #5
On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
>
> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> <alexei.starovoitov@gmail.com> wrote:
> >
> > On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >
> > > Hello:
> > >
> > > This patch was applied to bpf/bpf.git (master)
> > > by Daniel Borkmann <daniel@iogearbox.net>:
> > >
> > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > > > Some drivers ndo_start_xmit() expect a minimal size, as shown
> > > > by various syzbot reports [1].
> > > >
> > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > > > the missing attribute that can be used by upper layers.
> > > >
> > > > We need to use it in __bpf_redirect_common().
> >
> > This patch broke empty_skb test:
> > $ test_progs -t empty_skb
> >
> > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > [redirect_ingress]: actual -34 != expected 0
> > test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > [redirect_egress]: actual -34 != expected 1
> >
> > And looking at the test I think it's not a test issue.
> > This check
> > if (unlikely(skb->len < dev->min_header_len))
> > is rejecting more than it should.
> >
> > So I reverted this patch for now.
>
> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> move my sanity test in __bpf_tx_skb(),
> the bpf test program still fails, I am suspecting the test needs to be adjusted.
>
>
>
> diff --git a/net/core/filter.c b/net/core/filter.c
> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> 100644
> --- a/net/core/filter.c
> +++ b/net/core/filter.c
> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> net_device *dev, struct sk_buff *skb)
>                 return -ENETDOWN;
>         }
>
> +       if (unlikely(skb->len < dev->min_header_len)) {
> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> dev->min_header_len);
> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> +               kfree_skb(skb);
> +               return -ERANGE;
> +       } // Note: this is before we change skb->dev
>         skb->dev = dev;
>         skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
>         skb_clear_tstamp(skb);
>
>
> -->
>
>
> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> [redirect_egress]: actual -34 != expected 1
>
> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
>                mac=(64,14) net=(78,-1) trans=-1
>                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>                csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
>                hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0

Hmm. Something is off.
That test creates 15 byte skb.
It's not obvious to me how it got reduced to 1.
Something stripped L2 header and the prog is trying to redirect
such skb into veth that expects skb with L2 ?

Stan,
please take a look.
Since you wrote that test.
Stanislav Fomichev March 25, 2024, 4:28 p.m. UTC | #6
On 03/25, Alexei Starovoitov wrote:
> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
> >
> > On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> > <alexei.starovoitov@gmail.com> wrote:
> > >
> > > On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > > >
> > > > Hello:
> > > >
> > > > This patch was applied to bpf/bpf.git (master)
> > > > by Daniel Borkmann <daniel@iogearbox.net>:
> > > >
> > > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > > > > Some drivers ndo_start_xmit() expect a minimal size, as shown
> > > > > by various syzbot reports [1].
> > > > >
> > > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > > > > the missing attribute that can be used by upper layers.
> > > > >
> > > > > We need to use it in __bpf_redirect_common().
> > >
> > > This patch broke empty_skb test:
> > > $ test_progs -t empty_skb
> > >
> > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > > [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > > [redirect_ingress]: actual -34 != expected 0
> > > test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > > [redirect_egress]: actual -34 != expected 1
> > >
> > > And looking at the test I think it's not a test issue.
> > > This check
> > > if (unlikely(skb->len < dev->min_header_len))
> > > is rejecting more than it should.
> > >
> > > So I reverted this patch for now.
> >
> > OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> > move my sanity test in __bpf_tx_skb(),
> > the bpf test program still fails, I am suspecting the test needs to be adjusted.
> >
> >
> >
> > diff --git a/net/core/filter.c b/net/core/filter.c
> > index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> > 100644
> > --- a/net/core/filter.c
> > +++ b/net/core/filter.c
> > @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> > net_device *dev, struct sk_buff *skb)
> >                 return -ENETDOWN;
> >         }
> >
> > +       if (unlikely(skb->len < dev->min_header_len)) {
> > +               pr_err_once("__bpf_tx_skb skb->len=%u <
> > dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> > dev->min_header_len);
> > +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> > +               kfree_skb(skb);
> > +               return -ERANGE;
> > +       } // Note: this is before we change skb->dev
> >         skb->dev = dev;
> >         skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
> >         skb_clear_tstamp(skb);
> >
> >
> > -->
> >
> >
> > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > [redirect_egress]: actual -34 != expected 1
> >
> > [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> > [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
> >                mac=(64,14) net=(78,-1) trans=-1
> >                shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> >                csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> >                hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> 
> Hmm. Something is off.
> That test creates 15 byte skb.
> It's not obvious to me how it got reduced to 1.
> Something stripped L2 header and the prog is trying to redirect
> such skb into veth that expects skb with L2 ?
> 
> Stan,
> please take a look.
> Since you wrote that test.

Sure. Daniel wants to take a look on a separate thread, so we can sync
up. Tentatively, seems like the failure is in the lwt path that does
indeed drop the l2.
Daniel Borkmann March 26, 2024, 12:46 p.m. UTC | #7
On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
> On 03/25, Alexei Starovoitov wrote:
>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
>>>
>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
>>> <alexei.starovoitov@gmail.com> wrote:
>>>>
>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
>>>>>
>>>>> Hello:
>>>>>
>>>>> This patch was applied to bpf/bpf.git (master)
>>>>> by Daniel Borkmann <daniel@iogearbox.net>:
>>>>>
>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
>>>>>> by various syzbot reports [1].
>>>>>>
>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
>>>>>> the missing attribute that can be used by upper layers.
>>>>>>
>>>>>> We need to use it in __bpf_redirect_common().
>>>>
>>>> This patch broke empty_skb test:
>>>> $ test_progs -t empty_skb
>>>>
>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>>> [redirect_ingress]: actual -34 != expected 0
>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>>> [redirect_egress]: actual -34 != expected 1
>>>>
>>>> And looking at the test I think it's not a test issue.
>>>> This check
>>>> if (unlikely(skb->len < dev->min_header_len))
>>>> is rejecting more than it should.
>>>>
>>>> So I reverted this patch for now.
>>>
>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
>>> move my sanity test in __bpf_tx_skb(),
>>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
>>>
>>>
>>>
>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
>>> 100644
>>> --- a/net/core/filter.c
>>> +++ b/net/core/filter.c
>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
>>> net_device *dev, struct sk_buff *skb)
>>>                  return -ENETDOWN;
>>>          }
>>>
>>> +       if (unlikely(skb->len < dev->min_header_len)) {
>>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
>>> dev->min_header_len);
>>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
>>> +               kfree_skb(skb);
>>> +               return -ERANGE;
>>> +       } // Note: this is before we change skb->dev
>>>          skb->dev = dev;
>>>          skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
>>>          skb_clear_tstamp(skb);
>>>
>>>
>>> -->
>>>
>>>
>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>> [redirect_egress]: actual -34 != expected 1
>>>
>>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
>>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
>>>                 mac=(64,14) net=(78,-1) trans=-1
>>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>>>                 csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
>>>                 hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
>>
>> Hmm. Something is off.
>> That test creates 15 byte skb.
>> It's not obvious to me how it got reduced to 1.
>> Something stripped L2 header and the prog is trying to redirect
>> such skb into veth that expects skb with L2 ?
>>
>> Stan,
>> please take a look.
>> Since you wrote that test.
> 
> Sure. Daniel wants to take a look on a separate thread, so we can sync
> up. Tentatively, seems like the failure is in the lwt path that does
> indeed drop the l2.

If we'd change the test into the below, the tc and empty_skb tests pass.
run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
skb->len is 1 in this test. We do use skb_mac_header_len() also in other
tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
way forward..

static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
                                  u32 flags)
{
         /* Verify that a link layer header is carried */
         if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
                 kfree_skb(skb);
                 return -ERANGE;
         }

         if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {
                 kfree_skb(skb);
                 return -ERANGE;
         }

         bpf_push_mac_rcsum(skb);
         return flags & BPF_F_INGRESS ?
                __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
}

Thanks,
Daniel
Eric Dumazet March 26, 2024, 1:37 p.m. UTC | #8
On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
> > On 03/25, Alexei Starovoitov wrote:
> >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
> >>>
> >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> >>> <alexei.starovoitov@gmail.com> wrote:
> >>>>
> >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >>>>>
> >>>>> Hello:
> >>>>>
> >>>>> This patch was applied to bpf/bpf.git (master)
> >>>>> by Daniel Borkmann <daniel@iogearbox.net>:
> >>>>>
> >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
> >>>>>> by various syzbot reports [1].
> >>>>>>
> >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> >>>>>> the missing attribute that can be used by upper layers.
> >>>>>>
> >>>>>> We need to use it in __bpf_redirect_common().
> >>>>
> >>>> This patch broke empty_skb test:
> >>>> $ test_progs -t empty_skb
> >>>>
> >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>>> [redirect_ingress]: actual -34 != expected 0
> >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>>> [redirect_egress]: actual -34 != expected 1
> >>>>
> >>>> And looking at the test I think it's not a test issue.
> >>>> This check
> >>>> if (unlikely(skb->len < dev->min_header_len))
> >>>> is rejecting more than it should.
> >>>>
> >>>> So I reverted this patch for now.
> >>>
> >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> >>> move my sanity test in __bpf_tx_skb(),
> >>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
> >>>
> >>>
> >>>
> >>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> >>> 100644
> >>> --- a/net/core/filter.c
> >>> +++ b/net/core/filter.c
> >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> >>> net_device *dev, struct sk_buff *skb)
> >>>                  return -ENETDOWN;
> >>>          }
> >>>
> >>> +       if (unlikely(skb->len < dev->min_header_len)) {
> >>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> >>> dev->min_header_len);
> >>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> >>> +               kfree_skb(skb);
> >>> +               return -ERANGE;
> >>> +       } // Note: this is before we change skb->dev
> >>>          skb->dev = dev;
> >>>          skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
> >>>          skb_clear_tstamp(skb);
> >>>
> >>>
> >>> -->
> >>>
> >>>
> >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>> [redirect_egress]: actual -34 != expected 1
> >>>
> >>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> >>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
> >>>                 mac=(64,14) net=(78,-1) trans=-1
> >>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> >>>                 csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> >>>                 hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> >>
> >> Hmm. Something is off.
> >> That test creates 15 byte skb.
> >> It's not obvious to me how it got reduced to 1.
> >> Something stripped L2 header and the prog is trying to redirect
> >> such skb into veth that expects skb with L2 ?
> >>
> >> Stan,
> >> please take a look.
> >> Since you wrote that test.
> >
> > Sure. Daniel wants to take a look on a separate thread, so we can sync
> > up. Tentatively, seems like the failure is in the lwt path that does
> > indeed drop the l2.
>
> If we'd change the test into the below, the tc and empty_skb tests pass.
> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
> skb->len is 1 in this test. We do use skb_mac_header_len() also in other
> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
> way forward..
>
> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
>                                   u32 flags)
> {
>          /* Verify that a link layer header is carried */
>          if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
>                  kfree_skb(skb);
>                  return -ERANGE;
>          }
>
>          if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {

Unfortunately this will not prevent frames with skb->len == 1 to reach
an Ethernet driver ndo_start_xmit()

At ndo_start_xmit(), we do not look where the MAC header supposedly
starts in the skb, we only use skb->data

I have a syzbot repro using team driver, so I added the following part in team :

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c
100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff
*skb, struct net_device *dev)
        bool tx_success;
        unsigned int len = skb->len;

+       if (len < 14) {
+               pr_err_once("team_xmit(len=%u)\n", len);
+               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
+               WARN_ON_ONCE(1);
+       }
        tx_success = team_queue_override_transmit(team, skb);
        if (!tx_success)
                tx_success = team->ops.transmit(team, skb);


And I get (with your suggestion instead of skb->len)

mac=(78,0) net=(78,-1) trans=-1
shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0
[   41.126553] dev name=team0 feat=0x0000e0064fddfbe9
[   41.127132] skb linear:   00000000: 55
[   41.128487] ------------[ cut here ]------------
[   41.128551] WARNING: CPU: 2 PID: 1880 at
drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720
(discriminator 1))
[   41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard
curve25519_x86_64 libcurve25519_generic libchacha20poly1305
chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod
cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet
[   41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238
[   41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
BIOS 1.16.3-debian-1.16.3-2 04/01/2014
[   41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720
(discriminator 1))
[ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80
3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00
00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66
85
All code
========
   0: 41 54                push   %r12
   2: 53                    push   %rbx
   3: 44 8b 7f 70          mov    0x70(%rdi),%r15d
   7: 48 89 fb              mov    %rdi,%rbx
   a: 41 83 ff 0d          cmp    $0xd,%r15d
   e: 77 1c                ja     0x2c
  10: 80 3d a0 24 8d 01 00 cmpb   $0x0,0x18d24a0(%rip)        # 0x18d24b7
  17: 0f 84 0d 01 00 00    je     0x12a
  1d: 80 3d 92 24 8d 01 00 cmpb   $0x0,0x18d2492(%rip)        # 0x18d24b6
  24: 0f 84 e3 00 00 00    je     0x10d
  2a:* 0f 0b                ud2 <-- trapping instruction
  2c: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
  33: 00
  34: 0f 84 9d 00 00 00    je     0xd7
  3a: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
  3e: 66                    data16
  3f: 85                    .byte 0x85

Code starting with the faulting instruction
===========================================
   0: 0f 0b                ud2
   2: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
   9: 00
   a: 0f 84 9d 00 00 00    je     0xad
  10: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
  14: 66                    data16
  15: 85                    .byte 0x85
[   41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246
[   41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000
[   41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001
[   41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630
[   41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000
[   41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001
[   41.130074] FS:  00007f91d9549740(0000) GS:ffffa42fffa80000(0000)
knlGS:0000000000000000
[   41.130095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[   41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0
[   41.130229] Call Trace:
[   41.130331]  <TASK>
[   41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479)
[   41.130598] ? __warn (kernel/panic.c:694)
[   41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
[   41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219)
[   41.130640] ? handle_bug (arch/x86/kernel/traps.c:239)
[   41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
[   41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
[   41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
[   41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6))
[   41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903
./include/linux/netdevice.h:4917 net/core/dev.c:3531
net/core/dev.c:3547)
[   41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287
(discriminator 25) net/core/dev.c:4336 (discriminator 25))
[   41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4))
[   41.130796] ? pskb_expand_head (net/core/skbuff.c:2292)
[   41.130815] __bpf_redirect (./include/linux/netdevice.h:3287
(discriminator 25) net/core/filter.c:2143 (discriminator 25)
net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196
(discriminator 25))
[   41.130825] bpf_clone_redirect (net/core/filter.c:2467
(discriminator 1) net/core/filter.c:2439 (discriminator 1))
[   41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66
[   41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113
./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734)
[   41.130963] ? is_bpf_text_address
(./arch/x86/include/asm/preempt.h:84 (discriminator 13)
./include/linux/rcupdate.h:97 (discriminator 13)
./include/linux/rcupdate.h:813 (discriminator 13)
kernel/bpf/core.c:769 (discriminator 13))
[   41.130976] ? kernel_text_address (kernel/extable.c:125
(discriminator 1) kernel/extable.c:94 (discriminator 1))
[   41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1))
[   41.131002] ? unwind_get_return_address
(arch/x86/kernel/unwind_frame.c:19 (discriminator 1))
[   41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
[   41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26)
[   41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675)
[   41.131062] ? ktime_get (kernel/time/timekeeping.c:292
kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848)
[   41.131076] bpf_test_run (./include/linux/bpf.h:1234
./include/linux/filter.h:657 ./include/linux/filter.h:664
net/bpf/test_run.c:425)
[   41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
Eric Dumazet March 26, 2024, 1:38 p.m. UTC | #9
On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
> > > On 03/25, Alexei Starovoitov wrote:
> > >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
> > >>>
> > >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> > >>> <alexei.starovoitov@gmail.com> wrote:
> > >>>>
> > >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >>>>>
> > >>>>> Hello:
> > >>>>>
> > >>>>> This patch was applied to bpf/bpf.git (master)
> > >>>>> by Daniel Borkmann <daniel@iogearbox.net>:
> > >>>>>
> > >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
> > >>>>>> by various syzbot reports [1].
> > >>>>>>
> > >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > >>>>>> the missing attribute that can be used by upper layers.
> > >>>>>>
> > >>>>>> We need to use it in __bpf_redirect_common().
> > >>>>
> > >>>> This patch broke empty_skb test:
> > >>>> $ test_progs -t empty_skb
> > >>>>
> > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>>> [redirect_ingress]: actual -34 != expected 0
> > >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>>> [redirect_egress]: actual -34 != expected 1
> > >>>>
> > >>>> And looking at the test I think it's not a test issue.
> > >>>> This check
> > >>>> if (unlikely(skb->len < dev->min_header_len))
> > >>>> is rejecting more than it should.
> > >>>>
> > >>>> So I reverted this patch for now.
> > >>>
> > >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> > >>> move my sanity test in __bpf_tx_skb(),
> > >>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
> > >>>
> > >>>
> > >>>
> > >>> diff --git a/net/core/filter.c b/net/core/filter.c
> > >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> > >>> 100644
> > >>> --- a/net/core/filter.c
> > >>> +++ b/net/core/filter.c
> > >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> > >>> net_device *dev, struct sk_buff *skb)
> > >>>                  return -ENETDOWN;
> > >>>          }
> > >>>
> > >>> +       if (unlikely(skb->len < dev->min_header_len)) {
> > >>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> > >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> > >>> dev->min_header_len);
> > >>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> > >>> +               kfree_skb(skb);
> > >>> +               return -ERANGE;
> > >>> +       } // Note: this is before we change skb->dev
> > >>>          skb->dev = dev;
> > >>>          skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
> > >>>          skb_clear_tstamp(skb);
> > >>>
> > >>>
> > >>> -->
> > >>>
> > >>>
> > >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>> [redirect_egress]: actual -34 != expected 1
> > >>>
> > >>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> > >>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
> > >>>                 mac=(64,14) net=(78,-1) trans=-1
> > >>>                 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> > >>>                 csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> > >>>                 hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> > >>
> > >> Hmm. Something is off.
> > >> That test creates 15 byte skb.
> > >> It's not obvious to me how it got reduced to 1.
> > >> Something stripped L2 header and the prog is trying to redirect
> > >> such skb into veth that expects skb with L2 ?
> > >>
> > >> Stan,
> > >> please take a look.
> > >> Since you wrote that test.
> > >
> > > Sure. Daniel wants to take a look on a separate thread, so we can sync
> > > up. Tentatively, seems like the failure is in the lwt path that does
> > > indeed drop the l2.
> >
> > If we'd change the test into the below, the tc and empty_skb tests pass.
> > run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
> > skb->len is 1 in this test. We do use skb_mac_header_len() also in other
> > tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
> > way forward..
> >
> > static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
> >                                   u32 flags)
> > {
> >          /* Verify that a link layer header is carried */
> >          if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
> >                  kfree_skb(skb);
> >                  return -ERANGE;
> >          }
> >
> >          if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {
>
> Unfortunately this will not prevent frames with skb->len == 1 to reach
> an Ethernet driver ndo_start_xmit()
>
> At ndo_start_xmit(), we do not look where the MAC header supposedly
> starts in the skb, we only use skb->data
>
> I have a syzbot repro using team driver, so I added the following part in team :
>
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c
> 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff
> *skb, struct net_device *dev)
>         bool tx_success;
>         unsigned int len = skb->len;
>
> +       if (len < 14) {
> +               pr_err_once("team_xmit(len=%u)\n", len);
> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> +               WARN_ON_ONCE(1);
> +       }
>         tx_success = team_queue_override_transmit(team, skb);
>         if (!tx_success)
>                 tx_success = team->ops.transmit(team, skb);
>
>
> And I get (with your suggestion instead of skb->len)
>

Missing part in my copy/paste :

[   41.123829] team_xmit(len=1)
[   41.124335] skb len=1 headroom=78 headlen=1 tailroom=113


> mac=(78,0) net=(78,-1) trans=-1
> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0
> [   41.126553] dev name=team0 feat=0x0000e0064fddfbe9
> [   41.127132] skb linear:   00000000: 55
> [   41.128487] ------------[ cut here ]------------
> [   41.128551] WARNING: CPU: 2 PID: 1880 at
> drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720
> (discriminator 1))
> [   41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard
> curve25519_x86_64 libcurve25519_generic libchacha20poly1305
> chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod
> cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet
> [   41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238
> [   41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
> [   41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720
> (discriminator 1))
> [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80
> 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00
> 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66
> 85
> All code
> ========
>    0: 41 54                push   %r12
>    2: 53                    push   %rbx
>    3: 44 8b 7f 70          mov    0x70(%rdi),%r15d
>    7: 48 89 fb              mov    %rdi,%rbx
>    a: 41 83 ff 0d          cmp    $0xd,%r15d
>    e: 77 1c                ja     0x2c
>   10: 80 3d a0 24 8d 01 00 cmpb   $0x0,0x18d24a0(%rip)        # 0x18d24b7
>   17: 0f 84 0d 01 00 00    je     0x12a
>   1d: 80 3d 92 24 8d 01 00 cmpb   $0x0,0x18d2492(%rip)        # 0x18d24b6
>   24: 0f 84 e3 00 00 00    je     0x10d
>   2a:* 0f 0b                ud2 <-- trapping instruction
>   2c: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
>   33: 00
>   34: 0f 84 9d 00 00 00    je     0xd7
>   3a: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
>   3e: 66                    data16
>   3f: 85                    .byte 0x85
>
> Code starting with the faulting instruction
> ===========================================
>    0: 0f 0b                ud2
>    2: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
>    9: 00
>    a: 0f 84 9d 00 00 00    je     0xad
>   10: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
>   14: 66                    data16
>   15: 85                    .byte 0x85
> [   41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246
> [   41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000
> [   41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001
> [   41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630
> [   41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000
> [   41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001
> [   41.130074] FS:  00007f91d9549740(0000) GS:ffffa42fffa80000(0000)
> knlGS:0000000000000000
> [   41.130095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
> [   41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0
> [   41.130229] Call Trace:
> [   41.130331]  <TASK>
> [   41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479)
> [   41.130598] ? __warn (kernel/panic.c:694)
> [   41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
> [   41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219)
> [   41.130640] ? handle_bug (arch/x86/kernel/traps.c:239)
> [   41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
> [   41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
> [   41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
> [   41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6))
> [   41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903
> ./include/linux/netdevice.h:4917 net/core/dev.c:3531
> net/core/dev.c:3547)
> [   41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287
> (discriminator 25) net/core/dev.c:4336 (discriminator 25))
> [   41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4))
> [   41.130796] ? pskb_expand_head (net/core/skbuff.c:2292)
> [   41.130815] __bpf_redirect (./include/linux/netdevice.h:3287
> (discriminator 25) net/core/filter.c:2143 (discriminator 25)
> net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196
> (discriminator 25))
> [   41.130825] bpf_clone_redirect (net/core/filter.c:2467
> (discriminator 1) net/core/filter.c:2439 (discriminator 1))
> [   41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66
> [   41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113
> ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734)
> [   41.130963] ? is_bpf_text_address
> (./arch/x86/include/asm/preempt.h:84 (discriminator 13)
> ./include/linux/rcupdate.h:97 (discriminator 13)
> ./include/linux/rcupdate.h:813 (discriminator 13)
> kernel/bpf/core.c:769 (discriminator 13))
> [   41.130976] ? kernel_text_address (kernel/extable.c:125
> (discriminator 1) kernel/extable.c:94 (discriminator 1))
> [   41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1))
> [   41.131002] ? unwind_get_return_address
> (arch/x86/kernel/unwind_frame.c:19 (discriminator 1))
> [   41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
> [   41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26)
> [   41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675)
> [   41.131062] ? ktime_get (kernel/time/timekeeping.c:292
> kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848)
> [   41.131076] bpf_test_run (./include/linux/bpf.h:1234
> ./include/linux/filter.h:657 ./include/linux/filter.h:664
> net/bpf/test_run.c:425)
> [   41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
Daniel Borkmann March 26, 2024, 5:57 p.m. UTC | #10
On 3/26/24 2:38 PM, Eric Dumazet wrote:
> On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote:
>> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
>>>> On 03/25, Alexei Starovoitov wrote:
>>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
>>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
>>>>>> <alexei.starovoitov@gmail.com> wrote:
>>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
>>>>>>>>
>>>>>>>> Hello:
>>>>>>>>
>>>>>>>> This patch was applied to bpf/bpf.git (master)
>>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>:
>>>>>>>>
>>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
>>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
>>>>>>>>> by various syzbot reports [1].
>>>>>>>>>
>>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
>>>>>>>>> the missing attribute that can be used by upper layers.
>>>>>>>>>
>>>>>>>>> We need to use it in __bpf_redirect_common().
>>>>>>>
>>>>>>> This patch broke empty_skb test:
>>>>>>> $ test_progs -t empty_skb
>>>>>>>
>>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>>>>>> [redirect_ingress]: actual -34 != expected 0
>>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
>>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>>>>>> [redirect_egress]: actual -34 != expected 1
>>>>>>>
>>>>>>> And looking at the test I think it's not a test issue.
>>>>>>> This check
>>>>>>> if (unlikely(skb->len < dev->min_header_len))
>>>>>>> is rejecting more than it should.
>>>>>>>
>>>>>>> So I reverted this patch for now.
>>>>>>
>>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
>>>>>> move my sanity test in __bpf_tx_skb(),
>>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
>>>>>>
>>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
>>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
>>>>>> 100644
>>>>>> --- a/net/core/filter.c
>>>>>> +++ b/net/core/filter.c
>>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
>>>>>> net_device *dev, struct sk_buff *skb)
>>>>>>                   return -ENETDOWN;
>>>>>>           }
>>>>>>
>>>>>> +       if (unlikely(skb->len < dev->min_header_len)) {
>>>>>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
>>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
>>>>>> dev->min_header_len);
>>>>>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
>>>>>> +               kfree_skb(skb);
>>>>>> +               return -ERANGE;
>>>>>> +       } // Note: this is before we change skb->dev
>>>>>>           skb->dev = dev;
>>>>>>           skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
>>>>>>           skb_clear_tstamp(skb);
>>>>>>
>>>>>>
>>>>>> -->
>>>>>>
>>>>>>
>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
>>>>>> [redirect_egress]: actual -34 != expected 1
>>>>>>
>>>>>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
>>>>>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
>>>>>>                  mac=(64,14) net=(78,-1) trans=-1
>>>>>>                  shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>>>>>>                  csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
>>>>>>                  hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
>>>>>
>>>>> Hmm. Something is off.
>>>>> That test creates 15 byte skb.
>>>>> It's not obvious to me how it got reduced to 1.
>>>>> Something stripped L2 header and the prog is trying to redirect
>>>>> such skb into veth that expects skb with L2 ?
>>>>>
>>>>> Stan,
>>>>> please take a look.
>>>>> Since you wrote that test.
>>>>
>>>> Sure. Daniel wants to take a look on a separate thread, so we can sync
>>>> up. Tentatively, seems like the failure is in the lwt path that does
>>>> indeed drop the l2.
>>>
>>> If we'd change the test into the below, the tc and empty_skb tests pass.
>>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
>>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other
>>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
>>> way forward..
>>>
>>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
>>>                                    u32 flags)
>>> {
>>>           /* Verify that a link layer header is carried */
>>>           if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
>>>                   kfree_skb(skb);
>>>                   return -ERANGE;
>>>           }
>>>
>>>           if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {
>>
>> Unfortunately this will not prevent frames with skb->len == 1 to reach
>> an Ethernet driver ndo_start_xmit()
>>
>> At ndo_start_xmit(), we do not look where the MAC header supposedly
>> starts in the skb, we only use skb->data
>>
>> I have a syzbot repro using team driver, so I added the following part in team :
>>
>> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
>> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c
>> 100644
>> --- a/drivers/net/team/team.c
>> +++ b/drivers/net/team/team.c
>> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff
>> *skb, struct net_device *dev)
>>          bool tx_success;
>>          unsigned int len = skb->len;
>>
>> +       if (len < 14) {
>> +               pr_err_once("team_xmit(len=%u)\n", len);
>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
>> +               WARN_ON_ONCE(1);
>> +       }
>>          tx_success = team_queue_override_transmit(team, skb);
>>          if (!tx_success)
>>                  tx_success = team->ops.transmit(team, skb);
>>
>>
>> And I get (with your suggestion instead of skb->len)
> 
> Missing part in my copy/paste :
> 
> [   41.123829] team_xmit(len=1)
> [   41.124335] skb len=1 headroom=78 headlen=1 tailroom=113
> 
>> mac=(78,0) net=(78,-1) trans=-1

Interesting.

Could you also dump dev->type and/or dev->min_header_len? I suspect
this case may not be ARPHRD_ETHER in team.

Above says mac=(78,0), so mac len is 0 and the check against the
dev->min_header_len should have dropped it if it went that branch.

I wonder, is team driver missing sth like :

diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
index 0a44bbdcfb7b..6256f0d2f565 100644
--- a/drivers/net/team/team.c
+++ b/drivers/net/team/team.c
@@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev,
         dev->type = port_dev->type;
         dev->hard_header_len = port_dev->hard_header_len;
         dev->needed_headroom = port_dev->needed_headroom;
+       dev->min_header_len = port_dev->min_header_len;
         dev->addr_len = port_dev->addr_len;
         dev->mtu = port_dev->mtu;
         memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len);

>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
>> hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0
>> [   41.126553] dev name=team0 feat=0x0000e0064fddfbe9
>> [   41.127132] skb linear:   00000000: 55
>> [   41.128487] ------------[ cut here ]------------
>> [   41.128551] WARNING: CPU: 2 PID: 1880 at
>> drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720
>> (discriminator 1))
>> [   41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard
>> curve25519_x86_64 libcurve25519_generic libchacha20poly1305
>> chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod
>> cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet
>> [   41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238
>> [   41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996),
>> BIOS 1.16.3-debian-1.16.3-2 04/01/2014
>> [   41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720
>> (discriminator 1))
>> [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80
>> 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00
>> 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66
>> 85
>> All code
>> ========
>>     0: 41 54                push   %r12
>>     2: 53                    push   %rbx
>>     3: 44 8b 7f 70          mov    0x70(%rdi),%r15d
>>     7: 48 89 fb              mov    %rdi,%rbx
>>     a: 41 83 ff 0d          cmp    $0xd,%r15d
>>     e: 77 1c                ja     0x2c
>>    10: 80 3d a0 24 8d 01 00 cmpb   $0x0,0x18d24a0(%rip)        # 0x18d24b7
>>    17: 0f 84 0d 01 00 00    je     0x12a
>>    1d: 80 3d 92 24 8d 01 00 cmpb   $0x0,0x18d2492(%rip)        # 0x18d24b6
>>    24: 0f 84 e3 00 00 00    je     0x10d
>>    2a:* 0f 0b                ud2 <-- trapping instruction
>>    2c: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
>>    33: 00
>>    34: 0f 84 9d 00 00 00    je     0xd7
>>    3a: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
>>    3e: 66                    data16
>>    3f: 85                    .byte 0x85
>>
>> Code starting with the faulting instruction
>> ===========================================
>>     0: 0f 0b                ud2
>>     2: 41 80 be 21 0b 00 00 cmpb   $0x0,0xb21(%r14)
>>     9: 00
>>     a: 0f 84 9d 00 00 00    je     0xad
>>    10: 0f b7 43 7c          movzwl 0x7c(%rbx),%eax
>>    14: 66                    data16
>>    15: 85                    .byte 0x85
>> [   41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246
>> [   41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000
>> [   41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001
>> [   41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630
>> [   41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000
>> [   41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001
>> [   41.130074] FS:  00007f91d9549740(0000) GS:ffffa42fffa80000(0000)
>> knlGS:0000000000000000
>> [   41.130095] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
>> [   41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0
>> [   41.130229] Call Trace:
>> [   41.130331]  <TASK>
>> [   41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479)
>> [   41.130598] ? __warn (kernel/panic.c:694)
>> [   41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
>> [   41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219)
>> [   41.130640] ? handle_bug (arch/x86/kernel/traps.c:239)
>> [   41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1))
>> [   41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621)
>> [   41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1))
>> [   41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6))
>> [   41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903
>> ./include/linux/netdevice.h:4917 net/core/dev.c:3531
>> net/core/dev.c:3547)
>> [   41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287
>> (discriminator 25) net/core/dev.c:4336 (discriminator 25))
>> [   41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4))
>> [   41.130796] ? pskb_expand_head (net/core/skbuff.c:2292)
>> [   41.130815] __bpf_redirect (./include/linux/netdevice.h:3287
>> (discriminator 25) net/core/filter.c:2143 (discriminator 25)
>> net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196
>> (discriminator 25))
>> [   41.130825] bpf_clone_redirect (net/core/filter.c:2467
>> (discriminator 1) net/core/filter.c:2439 (discriminator 1))
>> [   41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66
>> [   41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113
>> ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734)
>> [   41.130963] ? is_bpf_text_address
>> (./arch/x86/include/asm/preempt.h:84 (discriminator 13)
>> ./include/linux/rcupdate.h:97 (discriminator 13)
>> ./include/linux/rcupdate.h:813 (discriminator 13)
>> kernel/bpf/core.c:769 (discriminator 13))
>> [   41.130976] ? kernel_text_address (kernel/extable.c:125
>> (discriminator 1) kernel/extable.c:94 (discriminator 1))
>> [   41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1))
>> [   41.131002] ? unwind_get_return_address
>> (arch/x86/kernel/unwind_frame.c:19 (discriminator 1))
>> [   41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83)
>> [   41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26)
>> [   41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675)
>> [   41.131062] ? ktime_get (kernel/time/timekeeping.c:292
>> kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848)
>> [   41.131076] bpf_test_run (./include/linux/bpf.h:1234
>> ./include/linux/filter.h:657 ./include/linux/filter.h:664
>> net/bpf/test_run.c:425)
>> [   41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
Eric Dumazet March 26, 2024, 6:08 p.m. UTC | #11
On Tue, Mar 26, 2024 at 6:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 3/26/24 2:38 PM, Eric Dumazet wrote:
> > On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote:
> >> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
> >>>> On 03/25, Alexei Starovoitov wrote:
> >>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
> >>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> >>>>>> <alexei.starovoitov@gmail.com> wrote:
> >>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> >>>>>>>>
> >>>>>>>> Hello:
> >>>>>>>>
> >>>>>>>> This patch was applied to bpf/bpf.git (master)
> >>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>:
> >>>>>>>>
> >>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> >>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
> >>>>>>>>> by various syzbot reports [1].
> >>>>>>>>>
> >>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> >>>>>>>>> the missing attribute that can be used by upper layers.
> >>>>>>>>>
> >>>>>>>>> We need to use it in __bpf_redirect_common().
> >>>>>>>
> >>>>>>> This patch broke empty_skb test:
> >>>>>>> $ test_progs -t empty_skb
> >>>>>>>
> >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>>>>>> [redirect_ingress]: actual -34 != expected 0
> >>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>>>>>> [redirect_egress]: actual -34 != expected 1
> >>>>>>>
> >>>>>>> And looking at the test I think it's not a test issue.
> >>>>>>> This check
> >>>>>>> if (unlikely(skb->len < dev->min_header_len))
> >>>>>>> is rejecting more than it should.
> >>>>>>>
> >>>>>>> So I reverted this patch for now.
> >>>>>>
> >>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> >>>>>> move my sanity test in __bpf_tx_skb(),
> >>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
> >>>>>>
> >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> >>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> >>>>>> 100644
> >>>>>> --- a/net/core/filter.c
> >>>>>> +++ b/net/core/filter.c
> >>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> >>>>>> net_device *dev, struct sk_buff *skb)
> >>>>>>                   return -ENETDOWN;
> >>>>>>           }
> >>>>>>
> >>>>>> +       if (unlikely(skb->len < dev->min_header_len)) {
> >>>>>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> >>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> >>>>>> dev->min_header_len);
> >>>>>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> >>>>>> +               kfree_skb(skb);
> >>>>>> +               return -ERANGE;
> >>>>>> +       } // Note: this is before we change skb->dev
> >>>>>>           skb->dev = dev;
> >>>>>>           skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
> >>>>>>           skb_clear_tstamp(skb);
> >>>>>>
> >>>>>>
> >>>>>> -->
> >>>>>>
> >>>>>>
> >>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> >>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> >>>>>> [redirect_egress]: actual -34 != expected 1
> >>>>>>
> >>>>>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> >>>>>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
> >>>>>>                  mac=(64,14) net=(78,-1) trans=-1
> >>>>>>                  shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> >>>>>>                  csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> >>>>>>                  hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> >>>>>
> >>>>> Hmm. Something is off.
> >>>>> That test creates 15 byte skb.
> >>>>> It's not obvious to me how it got reduced to 1.
> >>>>> Something stripped L2 header and the prog is trying to redirect
> >>>>> such skb into veth that expects skb with L2 ?
> >>>>>
> >>>>> Stan,
> >>>>> please take a look.
> >>>>> Since you wrote that test.
> >>>>
> >>>> Sure. Daniel wants to take a look on a separate thread, so we can sync
> >>>> up. Tentatively, seems like the failure is in the lwt path that does
> >>>> indeed drop the l2.
> >>>
> >>> If we'd change the test into the below, the tc and empty_skb tests pass.
> >>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
> >>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other
> >>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
> >>> way forward..
> >>>
> >>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
> >>>                                    u32 flags)
> >>> {
> >>>           /* Verify that a link layer header is carried */
> >>>           if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
> >>>                   kfree_skb(skb);
> >>>                   return -ERANGE;
> >>>           }
> >>>
> >>>           if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {
> >>
> >> Unfortunately this will not prevent frames with skb->len == 1 to reach
> >> an Ethernet driver ndo_start_xmit()
> >>
> >> At ndo_start_xmit(), we do not look where the MAC header supposedly
> >> starts in the skb, we only use skb->data
> >>
> >> I have a syzbot repro using team driver, so I added the following part in team :
> >>
> >> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> >> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c
> >> 100644
> >> --- a/drivers/net/team/team.c
> >> +++ b/drivers/net/team/team.c
> >> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff
> >> *skb, struct net_device *dev)
> >>          bool tx_success;
> >>          unsigned int len = skb->len;
> >>
> >> +       if (len < 14) {
> >> +               pr_err_once("team_xmit(len=%u)\n", len);
> >> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> >> +               WARN_ON_ONCE(1);
> >> +       }
> >>          tx_success = team_queue_override_transmit(team, skb);
> >>          if (!tx_success)
> >>                  tx_success = team->ops.transmit(team, skb);
> >>
> >>
> >> And I get (with your suggestion instead of skb->len)
> >
> > Missing part in my copy/paste :
> >
> > [   41.123829] team_xmit(len=1)
> > [   41.124335] skb len=1 headroom=78 headlen=1 tailroom=113
> >
> >> mac=(78,0) net=(78,-1) trans=-1
>
> Interesting.
>
> Could you also dump dev->type and/or dev->min_header_len? I suspect
> this case may not be ARPHRD_ETHER in team.
>
> Above says mac=(78,0), so mac len is 0 and the check against the
> dev->min_header_len should have dropped it if it went that branch.

mac header is reset in __dev_queue_xmit() :

         skb_reset_mac_header(skb);

So when the bpf code ran, skb_mac_header_len(skb) was 14,
but later the MAC header was set (to skb->data)

>
> I wonder, is team driver missing sth like :
>
> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> index 0a44bbdcfb7b..6256f0d2f565 100644
> --- a/drivers/net/team/team.c
> +++ b/drivers/net/team/team.c
> @@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev,
>          dev->type = port_dev->type;
>          dev->hard_header_len = port_dev->hard_header_len;
>          dev->needed_headroom = port_dev->needed_headroom;
> +       dev->min_header_len = port_dev->min_header_len;
>          dev->addr_len = port_dev->addr_len;
>          dev->mtu = port_dev->mtu;
>          memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len);
>


I have confirmed that team min_header_len is 14, nothing seems to be
missing I think.

team_xmit(dev team0, skb->len=1, dev->min_header_len=14)
Eric Dumazet Sept. 22, 2024, 5:39 p.m. UTC | #12
On Tue, Mar 26, 2024 at 7:08 PM Eric Dumazet <edumazet@google.com> wrote:
>
> On Tue, Mar 26, 2024 at 6:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 3/26/24 2:38 PM, Eric Dumazet wrote:
> > > On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote:
> > >> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote:
> > >>>> On 03/25, Alexei Starovoitov wrote:
> > >>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote:
> > >>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov
> > >>>>>> <alexei.starovoitov@gmail.com> wrote:
> > >>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote:
> > >>>>>>>>
> > >>>>>>>> Hello:
> > >>>>>>>>
> > >>>>>>>> This patch was applied to bpf/bpf.git (master)
> > >>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>:
> > >>>>>>>>
> > >>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote:
> > >>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown
> > >>>>>>>>> by various syzbot reports [1].
> > >>>>>>>>>
> > >>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len")
> > >>>>>>>>> the missing attribute that can be used by upper layers.
> > >>>>>>>>>
> > >>>>>>>>> We need to use it in __bpf_redirect_common().
> > >>>>>>>
> > >>>>>>> This patch broke empty_skb test:
> > >>>>>>> $ test_progs -t empty_skb
> > >>>>>>>
> > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>>>>>> [redirect_ingress]: actual -34 != expected 0
> > >>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec
> > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>>>>>> [redirect_egress]: actual -34 != expected 1
> > >>>>>>>
> > >>>>>>> And looking at the test I think it's not a test issue.
> > >>>>>>> This check
> > >>>>>>> if (unlikely(skb->len < dev->min_header_len))
> > >>>>>>> is rejecting more than it should.
> > >>>>>>>
> > >>>>>>> So I reverted this patch for now.
> > >>>>>>
> > >>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I
> > >>>>>> move my sanity test in __bpf_tx_skb(),
> > >>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted.
> > >>>>>>
> > >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c
> > >>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d
> > >>>>>> 100644
> > >>>>>> --- a/net/core/filter.c
> > >>>>>> +++ b/net/core/filter.c
> > >>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct
> > >>>>>> net_device *dev, struct sk_buff *skb)
> > >>>>>>                   return -ENETDOWN;
> > >>>>>>           }
> > >>>>>>
> > >>>>>> +       if (unlikely(skb->len < dev->min_header_len)) {
> > >>>>>> +               pr_err_once("__bpf_tx_skb skb->len=%u <
> > >>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name,
> > >>>>>> dev->min_header_len);
> > >>>>>> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> > >>>>>> +               kfree_skb(skb);
> > >>>>>> +               return -ERANGE;
> > >>>>>> +       } // Note: this is before we change skb->dev
> > >>>>>>           skb->dev = dev;
> > >>>>>>           skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb));
> > >>>>>>           skb_clear_tstamp(skb);
> > >>>>>>
> > >>>>>>
> > >>>>>> -->
> > >>>>>>
> > >>>>>>
> > >>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress
> > >>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress
> > >>>>>> [redirect_egress]: actual -34 != expected 1
> > >>>>>>
> > >>>>>> [   58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14)
> > >>>>>> [   58.382778] skb len=1 headroom=78 headlen=1 tailroom=113
> > >>>>>>                  mac=(64,14) net=(78,-1) trans=-1
> > >>>>>>                  shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0))
> > >>>>>>                  csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0)
> > >>>>>>                  hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0
> > >>>>>
> > >>>>> Hmm. Something is off.
> > >>>>> That test creates 15 byte skb.
> > >>>>> It's not obvious to me how it got reduced to 1.
> > >>>>> Something stripped L2 header and the prog is trying to redirect
> > >>>>> such skb into veth that expects skb with L2 ?
> > >>>>>
> > >>>>> Stan,
> > >>>>> please take a look.
> > >>>>> Since you wrote that test.
> > >>>>
> > >>>> Sure. Daniel wants to take a look on a separate thread, so we can sync
> > >>>> up. Tentatively, seems like the failure is in the lwt path that does
> > >>>> indeed drop the l2.
> > >>>
> > >>> If we'd change the test into the below, the tc and empty_skb tests pass.
> > >>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus
> > >>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other
> > >>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best
> > >>> way forward..
> > >>>
> > >>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
> > >>>                                    u32 flags)
> > >>> {
> > >>>           /* Verify that a link layer header is carried */
> > >>>           if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) {
> > >>>                   kfree_skb(skb);
> > >>>                   return -ERANGE;
> > >>>           }
> > >>>
> > >>>           if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) {
> > >>
> > >> Unfortunately this will not prevent frames with skb->len == 1 to reach
> > >> an Ethernet driver ndo_start_xmit()
> > >>
> > >> At ndo_start_xmit(), we do not look where the MAC header supposedly
> > >> starts in the skb, we only use skb->data
> > >>
> > >> I have a syzbot repro using team driver, so I added the following part in team :
> > >>
> > >> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> > >> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c
> > >> 100644
> > >> --- a/drivers/net/team/team.c
> > >> +++ b/drivers/net/team/team.c
> > >> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff
> > >> *skb, struct net_device *dev)
> > >>          bool tx_success;
> > >>          unsigned int len = skb->len;
> > >>
> > >> +       if (len < 14) {
> > >> +               pr_err_once("team_xmit(len=%u)\n", len);
> > >> +               DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false);
> > >> +               WARN_ON_ONCE(1);
> > >> +       }
> > >>          tx_success = team_queue_override_transmit(team, skb);
> > >>          if (!tx_success)
> > >>                  tx_success = team->ops.transmit(team, skb);
> > >>
> > >>
> > >> And I get (with your suggestion instead of skb->len)
> > >
> > > Missing part in my copy/paste :
> > >
> > > [   41.123829] team_xmit(len=1)
> > > [   41.124335] skb len=1 headroom=78 headlen=1 tailroom=113
> > >
> > >> mac=(78,0) net=(78,-1) trans=-1
> >
> > Interesting.
> >
> > Could you also dump dev->type and/or dev->min_header_len? I suspect
> > this case may not be ARPHRD_ETHER in team.
> >
> > Above says mac=(78,0), so mac len is 0 and the check against the
> > dev->min_header_len should have dropped it if it went that branch.
>
> mac header is reset in __dev_queue_xmit() :
>
>          skb_reset_mac_header(skb);
>
> So when the bpf code ran, skb_mac_header_len(skb) was 14,
> but later the MAC header was set (to skb->data)
>
> >
> > I wonder, is team driver missing sth like :
> >
> > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c
> > index 0a44bbdcfb7b..6256f0d2f565 100644
> > --- a/drivers/net/team/team.c
> > +++ b/drivers/net/team/team.c
> > @@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev,
> >          dev->type = port_dev->type;
> >          dev->hard_header_len = port_dev->hard_header_len;
> >          dev->needed_headroom = port_dev->needed_headroom;
> > +       dev->min_header_len = port_dev->min_header_len;
> >          dev->addr_len = port_dev->addr_len;
> >          dev->mtu = port_dev->mtu;
> >          memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len);
> >
>
>
> I have confirmed that team min_header_len is 14, nothing seems to be
> missing I think.
>
> team_xmit(dev team0, skb->len=1, dev->min_header_len=14)

FYI, I am releasing today a syzbot report with the same problem.
diff mbox series

Patch

diff --git a/net/core/filter.c b/net/core/filter.c
index 8adf95765cdd967a15b2661dfb454db0ccf350b0..745697c08acb3a74721d26ee93389efa81e973a0 100644
--- a/net/core/filter.c
+++ b/net/core/filter.c
@@ -2175,6 +2175,11 @@  static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev,
 		return -ERANGE;
 	}
 
+	if (unlikely(skb->len < dev->min_header_len)) {
+		kfree_skb(skb);
+		return -ERANGE;
+	}
+
 	bpf_push_mac_rcsum(skb);
 	return flags & BPF_F_INGRESS ?
 	       __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);