Message ID | 20240322122407.1329861-1-edumazet@google.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 96b0e83341e8305a759d776689325857be054ef3 |
Delegated to: | BPF |
Headers | show |
Series | [net] bpf: Don't redirect too small packets | expand |
Hello: This patch was applied to bpf/bpf.git (master) by Daniel Borkmann <daniel@iogearbox.net>: On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > Some drivers ndo_start_xmit() expect a minimal size, as shown > by various syzbot reports [1]. > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > the missing attribute that can be used by upper layers. > > We need to use it in __bpf_redirect_common(). > > [...] Here is the summary with links: - [net] bpf: Don't redirect too small packets https://git.kernel.org/bpf/bpf/c/96b0e83341e8 You are awesome, thank you!
On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > Hello: > > This patch was applied to bpf/bpf.git (master) > by Daniel Borkmann <daniel@iogearbox.net>: > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > Some drivers ndo_start_xmit() expect a minimal size, as shown > > by various syzbot reports [1]. > > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > the missing attribute that can be used by upper layers. > > > > We need to use it in __bpf_redirect_common(). This patch broke empty_skb test: $ test_progs -t empty_skb test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress [redirect_ingress]: actual -34 != expected 0 test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress [redirect_egress]: actual -34 != expected 1 And looking at the test I think it's not a test issue. This check if (unlikely(skb->len < dev->min_header_len)) is rejecting more than it should. So I reverted this patch for now.
On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov <alexei.starovoitov@gmail.com> wrote: > > On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > Hello: > > > > This patch was applied to bpf/bpf.git (master) > > by Daniel Borkmann <daniel@iogearbox.net>: > > > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > > Some drivers ndo_start_xmit() expect a minimal size, as shown > > > by various syzbot reports [1]. > > > > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > > the missing attribute that can be used by upper layers. > > > > > > We need to use it in __bpf_redirect_common(). > > This patch broke empty_skb test: > $ test_progs -t empty_skb > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > [redirect_ingress]: actual -34 != expected 0 > test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > [redirect_egress]: actual -34 != expected 1 > > And looking at the test I think it's not a test issue. > This check > if (unlikely(skb->len < dev->min_header_len)) > is rejecting more than it should. > > So I reverted this patch for now. OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I move my sanity test in __bpf_tx_skb(), the bpf test program still fails, I am suspecting the test needs to be adjusted. diff --git a/net/core/filter.c b/net/core/filter.c index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct net_device *dev, struct sk_buff *skb) return -ENETDOWN; } + if (unlikely(skb->len < dev->min_header_len)) { + pr_err_once("__bpf_tx_skb skb->len=%u < dev(%s)->min_header_len(%u)\n", skb->len, dev->name, dev->min_header_len); + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); + kfree_skb(skb); + return -ERANGE; + } // Note: this is before we change skb->dev skb->dev = dev; skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); skb_clear_tstamp(skb); --> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress [redirect_egress]: actual -34 != expected 1 [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 mac=(64,14) net=(78,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 Note that veth driver is one of the few 'Ethernet' drivers that make sure to get at least 14 bytes in the skb at ndo_start_xmit() after commit 726e2c5929de841fdcef4e2bf995680688ae1b87 ("veth: Ensure eth header is in skb's linear part") BTW this last patch (changing veth) should have been done generically from act_mirred (We do not want to patch ~400 ethernet drivers in the tree)
On 3/25/24 2:33 PM, Eric Dumazet wrote: > On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: >> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >>> >>> Hello: >>> >>> This patch was applied to bpf/bpf.git (master) >>> by Daniel Borkmann <daniel@iogearbox.net>: >>> >>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: >>>> Some drivers ndo_start_xmit() expect a minimal size, as shown >>>> by various syzbot reports [1]. >>>> >>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") >>>> the missing attribute that can be used by upper layers. >>>> >>>> We need to use it in __bpf_redirect_common(). >> >> This patch broke empty_skb test: >> $ test_progs -t empty_skb >> >> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress >> [redirect_ingress]: actual -34 != expected 0 >> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec >> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress >> [redirect_egress]: actual -34 != expected 1 >> >> And looking at the test I think it's not a test issue. >> This check >> if (unlikely(skb->len < dev->min_header_len)) >> is rejecting more than it should. >> >> So I reverted this patch for now. > > OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > move my sanity test in __bpf_tx_skb(), > the bpf test program still fails, I am suspecting the test needs to be adjusted. Let me take a look, I do think so too that we'd need to adjust the test. I'll see to have a patch for the latter so that we can reapply the fix as-is along with it given it's correct. > diff --git a/net/core/filter.c b/net/core/filter.c > index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > net_device *dev, struct sk_buff *skb) > return -ENETDOWN; > } > > + if (unlikely(skb->len < dev->min_header_len)) { > + pr_err_once("__bpf_tx_skb skb->len=%u < > dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > dev->min_header_len); > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > + kfree_skb(skb); > + return -ERANGE; > + } // Note: this is before we change skb->dev > skb->dev = dev; > skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > skb_clear_tstamp(skb); > > > --> > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > [redirect_egress]: actual -34 != expected 1 > > [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > mac=(64,14) net=(78,-1) trans=-1 > shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > > Note that veth driver is one of the few 'Ethernet' drivers that make > sure to get at least 14 bytes in the skb at ndo_start_xmit() > after commit 726e2c5929de841fdcef4e2bf995680688ae1b87 ("veth: Ensure > eth header is in skb's linear part") > > BTW this last patch (changing veth) should have been done generically > from act_mirred > > (We do not want to patch ~400 ethernet drivers in the tree) >
On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > > On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > <alexei.starovoitov@gmail.com> wrote: > > > > On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > > > Hello: > > > > > > This patch was applied to bpf/bpf.git (master) > > > by Daniel Borkmann <daniel@iogearbox.net>: > > > > > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > > > Some drivers ndo_start_xmit() expect a minimal size, as shown > > > > by various syzbot reports [1]. > > > > > > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > > > the missing attribute that can be used by upper layers. > > > > > > > > We need to use it in __bpf_redirect_common(). > > > > This patch broke empty_skb test: > > $ test_progs -t empty_skb > > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > > [redirect_ingress]: actual -34 != expected 0 > > test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > [redirect_egress]: actual -34 != expected 1 > > > > And looking at the test I think it's not a test issue. > > This check > > if (unlikely(skb->len < dev->min_header_len)) > > is rejecting more than it should. > > > > So I reverted this patch for now. > > OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > move my sanity test in __bpf_tx_skb(), > the bpf test program still fails, I am suspecting the test needs to be adjusted. > > > > diff --git a/net/core/filter.c b/net/core/filter.c > index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > 100644 > --- a/net/core/filter.c > +++ b/net/core/filter.c > @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > net_device *dev, struct sk_buff *skb) > return -ENETDOWN; > } > > + if (unlikely(skb->len < dev->min_header_len)) { > + pr_err_once("__bpf_tx_skb skb->len=%u < > dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > dev->min_header_len); > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > + kfree_skb(skb); > + return -ERANGE; > + } // Note: this is before we change skb->dev > skb->dev = dev; > skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > skb_clear_tstamp(skb); > > > --> > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > [redirect_egress]: actual -34 != expected 1 > > [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > mac=(64,14) net=(78,-1) trans=-1 > shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 Hmm. Something is off. That test creates 15 byte skb. It's not obvious to me how it got reduced to 1. Something stripped L2 header and the prog is trying to redirect such skb into veth that expects skb with L2 ? Stan, please take a look. Since you wrote that test.
On 03/25, Alexei Starovoitov wrote: > On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > > <alexei.starovoitov@gmail.com> wrote: > > > > > > On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > > > > > > > Hello: > > > > > > > > This patch was applied to bpf/bpf.git (master) > > > > by Daniel Borkmann <daniel@iogearbox.net>: > > > > > > > > On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > > > > Some drivers ndo_start_xmit() expect a minimal size, as shown > > > > > by various syzbot reports [1]. > > > > > > > > > > Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > > > > the missing attribute that can be used by upper layers. > > > > > > > > > > We need to use it in __bpf_redirect_common(). > > > > > > This patch broke empty_skb test: > > > $ test_progs -t empty_skb > > > > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > > [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > > > [redirect_ingress]: actual -34 != expected 0 > > > test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > > [redirect_egress]: actual -34 != expected 1 > > > > > > And looking at the test I think it's not a test issue. > > > This check > > > if (unlikely(skb->len < dev->min_header_len)) > > > is rejecting more than it should. > > > > > > So I reverted this patch for now. > > > > OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > > move my sanity test in __bpf_tx_skb(), > > the bpf test program still fails, I am suspecting the test needs to be adjusted. > > > > > > > > diff --git a/net/core/filter.c b/net/core/filter.c > > index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > > 100644 > > --- a/net/core/filter.c > > +++ b/net/core/filter.c > > @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > > net_device *dev, struct sk_buff *skb) > > return -ENETDOWN; > > } > > > > + if (unlikely(skb->len < dev->min_header_len)) { > > + pr_err_once("__bpf_tx_skb skb->len=%u < > > dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > > dev->min_header_len); > > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > + kfree_skb(skb); > > + return -ERANGE; > > + } // Note: this is before we change skb->dev > > skb->dev = dev; > > skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > > skb_clear_tstamp(skb); > > > > > > --> > > > > > > test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > [redirect_egress]: actual -34 != expected 1 > > > > [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > > [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > > mac=(64,14) net=(78,-1) trans=-1 > > shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > > hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > > Hmm. Something is off. > That test creates 15 byte skb. > It's not obvious to me how it got reduced to 1. > Something stripped L2 header and the prog is trying to redirect > such skb into veth that expects skb with L2 ? > > Stan, > please take a look. > Since you wrote that test. Sure. Daniel wants to take a look on a separate thread, so we can sync up. Tentatively, seems like the failure is in the lwt path that does indeed drop the l2.
On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > On 03/25, Alexei Starovoitov wrote: >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: >>> >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov >>> <alexei.starovoitov@gmail.com> wrote: >>>> >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >>>>> >>>>> Hello: >>>>> >>>>> This patch was applied to bpf/bpf.git (master) >>>>> by Daniel Borkmann <daniel@iogearbox.net>: >>>>> >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown >>>>>> by various syzbot reports [1]. >>>>>> >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") >>>>>> the missing attribute that can be used by upper layers. >>>>>> >>>>>> We need to use it in __bpf_redirect_common(). >>>> >>>> This patch broke empty_skb test: >>>> $ test_progs -t empty_skb >>>> >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress >>>> [redirect_ingress]: actual -34 != expected 0 >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress >>>> [redirect_egress]: actual -34 != expected 1 >>>> >>>> And looking at the test I think it's not a test issue. >>>> This check >>>> if (unlikely(skb->len < dev->min_header_len)) >>>> is rejecting more than it should. >>>> >>>> So I reverted this patch for now. >>> >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I >>> move my sanity test in __bpf_tx_skb(), >>> the bpf test program still fails, I am suspecting the test needs to be adjusted. >>> >>> >>> >>> diff --git a/net/core/filter.c b/net/core/filter.c >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d >>> 100644 >>> --- a/net/core/filter.c >>> +++ b/net/core/filter.c >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct >>> net_device *dev, struct sk_buff *skb) >>> return -ENETDOWN; >>> } >>> >>> + if (unlikely(skb->len < dev->min_header_len)) { >>> + pr_err_once("__bpf_tx_skb skb->len=%u < >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, >>> dev->min_header_len); >>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); >>> + kfree_skb(skb); >>> + return -ERANGE; >>> + } // Note: this is before we change skb->dev >>> skb->dev = dev; >>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); >>> skb_clear_tstamp(skb); >>> >>> >>> --> >>> >>> >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress >>> [redirect_egress]: actual -34 != expected 1 >>> >>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) >>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 >>> mac=(64,14) net=(78,-1) trans=-1 >>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) >>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) >>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 >> >> Hmm. Something is off. >> That test creates 15 byte skb. >> It's not obvious to me how it got reduced to 1. >> Something stripped L2 header and the prog is trying to redirect >> such skb into veth that expects skb with L2 ? >> >> Stan, >> please take a look. >> Since you wrote that test. > > Sure. Daniel wants to take a look on a separate thread, so we can sync > up. Tentatively, seems like the failure is in the lwt path that does > indeed drop the l2. If we'd change the test into the below, the tc and empty_skb tests pass. run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus skb->len is 1 in this test. We do use skb_mac_header_len() also in other tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best way forward.. static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, u32 flags) { /* Verify that a link layer header is carried */ if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { kfree_skb(skb); return -ERANGE; } if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { kfree_skb(skb); return -ERANGE; } bpf_push_mac_rcsum(skb); return flags & BPF_F_INGRESS ? __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb); } Thanks, Daniel
On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > > On 03/25, Alexei Starovoitov wrote: > >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > >>> > >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > >>> <alexei.starovoitov@gmail.com> wrote: > >>>> > >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > >>>>> > >>>>> Hello: > >>>>> > >>>>> This patch was applied to bpf/bpf.git (master) > >>>>> by Daniel Borkmann <daniel@iogearbox.net>: > >>>>> > >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown > >>>>>> by various syzbot reports [1]. > >>>>>> > >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > >>>>>> the missing attribute that can be used by upper layers. > >>>>>> > >>>>>> We need to use it in __bpf_redirect_common(). > >>>> > >>>> This patch broke empty_skb test: > >>>> $ test_progs -t empty_skb > >>>> > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>>> [redirect_ingress]: actual -34 != expected 0 > >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>>> [redirect_egress]: actual -34 != expected 1 > >>>> > >>>> And looking at the test I think it's not a test issue. > >>>> This check > >>>> if (unlikely(skb->len < dev->min_header_len)) > >>>> is rejecting more than it should. > >>>> > >>>> So I reverted this patch for now. > >>> > >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > >>> move my sanity test in __bpf_tx_skb(), > >>> the bpf test program still fails, I am suspecting the test needs to be adjusted. > >>> > >>> > >>> > >>> diff --git a/net/core/filter.c b/net/core/filter.c > >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > >>> 100644 > >>> --- a/net/core/filter.c > >>> +++ b/net/core/filter.c > >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > >>> net_device *dev, struct sk_buff *skb) > >>> return -ENETDOWN; > >>> } > >>> > >>> + if (unlikely(skb->len < dev->min_header_len)) { > >>> + pr_err_once("__bpf_tx_skb skb->len=%u < > >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > >>> dev->min_header_len); > >>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > >>> + kfree_skb(skb); > >>> + return -ERANGE; > >>> + } // Note: this is before we change skb->dev > >>> skb->dev = dev; > >>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > >>> skb_clear_tstamp(skb); > >>> > >>> > >>> --> > >>> > >>> > >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>> [redirect_egress]: actual -34 != expected 1 > >>> > >>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > >>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > >>> mac=(64,14) net=(78,-1) trans=-1 > >>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > >>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > >>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > >> > >> Hmm. Something is off. > >> That test creates 15 byte skb. > >> It's not obvious to me how it got reduced to 1. > >> Something stripped L2 header and the prog is trying to redirect > >> such skb into veth that expects skb with L2 ? > >> > >> Stan, > >> please take a look. > >> Since you wrote that test. > > > > Sure. Daniel wants to take a look on a separate thread, so we can sync > > up. Tentatively, seems like the failure is in the lwt path that does > > indeed drop the l2. > > If we'd change the test into the below, the tc and empty_skb tests pass. > run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus > skb->len is 1 in this test. We do use skb_mac_header_len() also in other > tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best > way forward.. > > static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, > u32 flags) > { > /* Verify that a link layer header is carried */ > if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { > kfree_skb(skb); > return -ERANGE; > } > > if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { Unfortunately this will not prevent frames with skb->len == 1 to reach an Ethernet driver ndo_start_xmit() At ndo_start_xmit(), we do not look where the MAC header supposedly starts in the skb, we only use skb->data I have a syzbot repro using team driver, so I added the following part in team : diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff *skb, struct net_device *dev) bool tx_success; unsigned int len = skb->len; + if (len < 14) { + pr_err_once("team_xmit(len=%u)\n", len); + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); + WARN_ON_ONCE(1); + } tx_success = team_queue_override_transmit(team, skb); if (!tx_success) tx_success = team->ops.transmit(team, skb); And I get (with your suggestion instead of skb->len) mac=(78,0) net=(78,-1) trans=-1 shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0 [ 41.126553] dev name=team0 feat=0x0000e0064fddfbe9 [ 41.127132] skb linear: 00000000: 55 [ 41.128487] ------------[ cut here ]------------ [ 41.128551] WARNING: CPU: 2 PID: 1880 at drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) [ 41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard curve25519_x86_64 libcurve25519_generic libchacha20poly1305 chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet [ 41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238 [ 41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.16.3-debian-1.16.3-2 04/01/2014 [ 41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66 85 All code ======== 0: 41 54 push %r12 2: 53 push %rbx 3: 44 8b 7f 70 mov 0x70(%rdi),%r15d 7: 48 89 fb mov %rdi,%rbx a: 41 83 ff 0d cmp $0xd,%r15d e: 77 1c ja 0x2c 10: 80 3d a0 24 8d 01 00 cmpb $0x0,0x18d24a0(%rip) # 0x18d24b7 17: 0f 84 0d 01 00 00 je 0x12a 1d: 80 3d 92 24 8d 01 00 cmpb $0x0,0x18d2492(%rip) # 0x18d24b6 24: 0f 84 e3 00 00 00 je 0x10d 2a:* 0f 0b ud2 <-- trapping instruction 2c: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) 33: 00 34: 0f 84 9d 00 00 00 je 0xd7 3a: 0f b7 43 7c movzwl 0x7c(%rbx),%eax 3e: 66 data16 3f: 85 .byte 0x85 Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) 9: 00 a: 0f 84 9d 00 00 00 je 0xad 10: 0f b7 43 7c movzwl 0x7c(%rbx),%eax 14: 66 data16 15: 85 .byte 0x85 [ 41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246 [ 41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000 [ 41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001 [ 41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630 [ 41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000 [ 41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001 [ 41.130074] FS: 00007f91d9549740(0000) GS:ffffa42fffa80000(0000) knlGS:0000000000000000 [ 41.130095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0 [ 41.130229] Call Trace: [ 41.130331] <TASK> [ 41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479) [ 41.130598] ? __warn (kernel/panic.c:694) [ 41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) [ 41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 41.130640] ? handle_bug (arch/x86/kernel/traps.c:239) [ 41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) [ 41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) [ 41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) [ 41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6)) [ 41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903 ./include/linux/netdevice.h:4917 net/core/dev.c:3531 net/core/dev.c:3547) [ 41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287 (discriminator 25) net/core/dev.c:4336 (discriminator 25)) [ 41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4)) [ 41.130796] ? pskb_expand_head (net/core/skbuff.c:2292) [ 41.130815] __bpf_redirect (./include/linux/netdevice.h:3287 (discriminator 25) net/core/filter.c:2143 (discriminator 25) net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196 (discriminator 25)) [ 41.130825] bpf_clone_redirect (net/core/filter.c:2467 (discriminator 1) net/core/filter.c:2439 (discriminator 1)) [ 41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66 [ 41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113 ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734) [ 41.130963] ? is_bpf_text_address (./arch/x86/include/asm/preempt.h:84 (discriminator 13) ./include/linux/rcupdate.h:97 (discriminator 13) ./include/linux/rcupdate.h:813 (discriminator 13) kernel/bpf/core.c:769 (discriminator 13)) [ 41.130976] ? kernel_text_address (kernel/extable.c:125 (discriminator 1) kernel/extable.c:94 (discriminator 1)) [ 41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1)) [ 41.131002] ? unwind_get_return_address (arch/x86/kernel/unwind_frame.c:19 (discriminator 1)) [ 41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83) [ 41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) [ 41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675) [ 41.131062] ? ktime_get (kernel/time/timekeeping.c:292 kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848) [ 41.131076] bpf_test_run (./include/linux/bpf.h:1234 ./include/linux/filter.h:657 ./include/linux/filter.h:664 net/bpf/test_run.c:425) [ 41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > > > On 03/25, Alexei Starovoitov wrote: > > >> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > > >>> > > >>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > > >>> <alexei.starovoitov@gmail.com> wrote: > > >>>> > > >>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > >>>>> > > >>>>> Hello: > > >>>>> > > >>>>> This patch was applied to bpf/bpf.git (master) > > >>>>> by Daniel Borkmann <daniel@iogearbox.net>: > > >>>>> > > >>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > >>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown > > >>>>>> by various syzbot reports [1]. > > >>>>>> > > >>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > >>>>>> the missing attribute that can be used by upper layers. > > >>>>>> > > >>>>>> We need to use it in __bpf_redirect_common(). > > >>>> > > >>>> This patch broke empty_skb test: > > >>>> $ test_progs -t empty_skb > > >>>> > > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_ingress]: actual -34 != expected 0 > > >>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > > >>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>> [redirect_egress]: actual -34 != expected 1 > > >>>> > > >>>> And looking at the test I think it's not a test issue. > > >>>> This check > > >>>> if (unlikely(skb->len < dev->min_header_len)) > > >>>> is rejecting more than it should. > > >>>> > > >>>> So I reverted this patch for now. > > >>> > > >>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > > >>> move my sanity test in __bpf_tx_skb(), > > >>> the bpf test program still fails, I am suspecting the test needs to be adjusted. > > >>> > > >>> > > >>> > > >>> diff --git a/net/core/filter.c b/net/core/filter.c > > >>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > > >>> 100644 > > >>> --- a/net/core/filter.c > > >>> +++ b/net/core/filter.c > > >>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > > >>> net_device *dev, struct sk_buff *skb) > > >>> return -ENETDOWN; > > >>> } > > >>> > > >>> + if (unlikely(skb->len < dev->min_header_len)) { > > >>> + pr_err_once("__bpf_tx_skb skb->len=%u < > > >>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > > >>> dev->min_header_len); > > >>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > >>> + kfree_skb(skb); > > >>> + return -ERANGE; > > >>> + } // Note: this is before we change skb->dev > > >>> skb->dev = dev; > > >>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > > >>> skb_clear_tstamp(skb); > > >>> > > >>> > > >>> --> > > >>> > > >>> > > >>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>> [redirect_egress]: actual -34 != expected 1 > > >>> > > >>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > > >>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > > >>> mac=(64,14) net=(78,-1) trans=-1 > > >>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > > >>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > > >>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > > >> > > >> Hmm. Something is off. > > >> That test creates 15 byte skb. > > >> It's not obvious to me how it got reduced to 1. > > >> Something stripped L2 header and the prog is trying to redirect > > >> such skb into veth that expects skb with L2 ? > > >> > > >> Stan, > > >> please take a look. > > >> Since you wrote that test. > > > > > > Sure. Daniel wants to take a look on a separate thread, so we can sync > > > up. Tentatively, seems like the failure is in the lwt path that does > > > indeed drop the l2. > > > > If we'd change the test into the below, the tc and empty_skb tests pass. > > run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus > > skb->len is 1 in this test. We do use skb_mac_header_len() also in other > > tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best > > way forward.. > > > > static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, > > u32 flags) > > { > > /* Verify that a link layer header is carried */ > > if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { > > kfree_skb(skb); > > return -ERANGE; > > } > > > > if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { > > Unfortunately this will not prevent frames with skb->len == 1 to reach > an Ethernet driver ndo_start_xmit() > > At ndo_start_xmit(), we do not look where the MAC header supposedly > starts in the skb, we only use skb->data > > I have a syzbot repro using team driver, so I added the following part in team : > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c > 100644 > --- a/drivers/net/team/team.c > +++ b/drivers/net/team/team.c > @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff > *skb, struct net_device *dev) > bool tx_success; > unsigned int len = skb->len; > > + if (len < 14) { > + pr_err_once("team_xmit(len=%u)\n", len); > + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > + WARN_ON_ONCE(1); > + } > tx_success = team_queue_override_transmit(team, skb); > if (!tx_success) > tx_success = team->ops.transmit(team, skb); > > > And I get (with your suggestion instead of skb->len) > Missing part in my copy/paste : [ 41.123829] team_xmit(len=1) [ 41.124335] skb len=1 headroom=78 headlen=1 tailroom=113 > mac=(78,0) net=(78,-1) trans=-1 > shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0 > [ 41.126553] dev name=team0 feat=0x0000e0064fddfbe9 > [ 41.127132] skb linear: 00000000: 55 > [ 41.128487] ------------[ cut here ]------------ > [ 41.128551] WARNING: CPU: 2 PID: 1880 at > drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720 > (discriminator 1)) > [ 41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard > curve25519_x86_64 libcurve25519_generic libchacha20poly1305 > chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod > cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet > [ 41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238 > [ 41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), > BIOS 1.16.3-debian-1.16.3-2 04/01/2014 > [ 41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720 > (discriminator 1)) > [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80 > 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00 > 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66 > 85 > All code > ======== > 0: 41 54 push %r12 > 2: 53 push %rbx > 3: 44 8b 7f 70 mov 0x70(%rdi),%r15d > 7: 48 89 fb mov %rdi,%rbx > a: 41 83 ff 0d cmp $0xd,%r15d > e: 77 1c ja 0x2c > 10: 80 3d a0 24 8d 01 00 cmpb $0x0,0x18d24a0(%rip) # 0x18d24b7 > 17: 0f 84 0d 01 00 00 je 0x12a > 1d: 80 3d 92 24 8d 01 00 cmpb $0x0,0x18d2492(%rip) # 0x18d24b6 > 24: 0f 84 e3 00 00 00 je 0x10d > 2a:* 0f 0b ud2 <-- trapping instruction > 2c: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) > 33: 00 > 34: 0f 84 9d 00 00 00 je 0xd7 > 3a: 0f b7 43 7c movzwl 0x7c(%rbx),%eax > 3e: 66 data16 > 3f: 85 .byte 0x85 > > Code starting with the faulting instruction > =========================================== > 0: 0f 0b ud2 > 2: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) > 9: 00 > a: 0f 84 9d 00 00 00 je 0xad > 10: 0f b7 43 7c movzwl 0x7c(%rbx),%eax > 14: 66 data16 > 15: 85 .byte 0x85 > [ 41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246 > [ 41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000 > [ 41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001 > [ 41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630 > [ 41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000 > [ 41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001 > [ 41.130074] FS: 00007f91d9549740(0000) GS:ffffa42fffa80000(0000) > knlGS:0000000000000000 > [ 41.130095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > [ 41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0 > [ 41.130229] Call Trace: > [ 41.130331] <TASK> > [ 41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479) > [ 41.130598] ? __warn (kernel/panic.c:694) > [ 41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) > [ 41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219) > [ 41.130640] ? handle_bug (arch/x86/kernel/traps.c:239) > [ 41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) > [ 41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) > [ 41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) > [ 41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6)) > [ 41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903 > ./include/linux/netdevice.h:4917 net/core/dev.c:3531 > net/core/dev.c:3547) > [ 41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287 > (discriminator 25) net/core/dev.c:4336 (discriminator 25)) > [ 41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4)) > [ 41.130796] ? pskb_expand_head (net/core/skbuff.c:2292) > [ 41.130815] __bpf_redirect (./include/linux/netdevice.h:3287 > (discriminator 25) net/core/filter.c:2143 (discriminator 25) > net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196 > (discriminator 25)) > [ 41.130825] bpf_clone_redirect (net/core/filter.c:2467 > (discriminator 1) net/core/filter.c:2439 (discriminator 1)) > [ 41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66 > [ 41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113 > ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734) > [ 41.130963] ? is_bpf_text_address > (./arch/x86/include/asm/preempt.h:84 (discriminator 13) > ./include/linux/rcupdate.h:97 (discriminator 13) > ./include/linux/rcupdate.h:813 (discriminator 13) > kernel/bpf/core.c:769 (discriminator 13)) > [ 41.130976] ? kernel_text_address (kernel/extable.c:125 > (discriminator 1) kernel/extable.c:94 (discriminator 1)) > [ 41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1)) > [ 41.131002] ? unwind_get_return_address > (arch/x86/kernel/unwind_frame.c:19 (discriminator 1)) > [ 41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83) > [ 41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) > [ 41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675) > [ 41.131062] ? ktime_get (kernel/time/timekeeping.c:292 > kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848) > [ 41.131076] bpf_test_run (./include/linux/bpf.h:1234 > ./include/linux/filter.h:657 ./include/linux/filter.h:664 > net/bpf/test_run.c:425) > [ 41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
On 3/26/24 2:38 PM, Eric Dumazet wrote: > On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote: >> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote: >>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote: >>>> On 03/25, Alexei Starovoitov wrote: >>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: >>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov >>>>>> <alexei.starovoitov@gmail.com> wrote: >>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: >>>>>>>> >>>>>>>> Hello: >>>>>>>> >>>>>>>> This patch was applied to bpf/bpf.git (master) >>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>: >>>>>>>> >>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: >>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown >>>>>>>>> by various syzbot reports [1]. >>>>>>>>> >>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") >>>>>>>>> the missing attribute that can be used by upper layers. >>>>>>>>> >>>>>>>>> We need to use it in __bpf_redirect_common(). >>>>>>> >>>>>>> This patch broke empty_skb test: >>>>>>> $ test_progs -t empty_skb >>>>>>> >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress >>>>>>> [redirect_ingress]: actual -34 != expected 0 >>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress >>>>>>> [redirect_egress]: actual -34 != expected 1 >>>>>>> >>>>>>> And looking at the test I think it's not a test issue. >>>>>>> This check >>>>>>> if (unlikely(skb->len < dev->min_header_len)) >>>>>>> is rejecting more than it should. >>>>>>> >>>>>>> So I reverted this patch for now. >>>>>> >>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I >>>>>> move my sanity test in __bpf_tx_skb(), >>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted. >>>>>> >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c >>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d >>>>>> 100644 >>>>>> --- a/net/core/filter.c >>>>>> +++ b/net/core/filter.c >>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct >>>>>> net_device *dev, struct sk_buff *skb) >>>>>> return -ENETDOWN; >>>>>> } >>>>>> >>>>>> + if (unlikely(skb->len < dev->min_header_len)) { >>>>>> + pr_err_once("__bpf_tx_skb skb->len=%u < >>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, >>>>>> dev->min_header_len); >>>>>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); >>>>>> + kfree_skb(skb); >>>>>> + return -ERANGE; >>>>>> + } // Note: this is before we change skb->dev >>>>>> skb->dev = dev; >>>>>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); >>>>>> skb_clear_tstamp(skb); >>>>>> >>>>>> >>>>>> --> >>>>>> >>>>>> >>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress >>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress >>>>>> [redirect_egress]: actual -34 != expected 1 >>>>>> >>>>>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) >>>>>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 >>>>>> mac=(64,14) net=(78,-1) trans=-1 >>>>>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) >>>>>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) >>>>>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 >>>>> >>>>> Hmm. Something is off. >>>>> That test creates 15 byte skb. >>>>> It's not obvious to me how it got reduced to 1. >>>>> Something stripped L2 header and the prog is trying to redirect >>>>> such skb into veth that expects skb with L2 ? >>>>> >>>>> Stan, >>>>> please take a look. >>>>> Since you wrote that test. >>>> >>>> Sure. Daniel wants to take a look on a separate thread, so we can sync >>>> up. Tentatively, seems like the failure is in the lwt path that does >>>> indeed drop the l2. >>> >>> If we'd change the test into the below, the tc and empty_skb tests pass. >>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus >>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other >>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best >>> way forward.. >>> >>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, >>> u32 flags) >>> { >>> /* Verify that a link layer header is carried */ >>> if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { >>> kfree_skb(skb); >>> return -ERANGE; >>> } >>> >>> if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { >> >> Unfortunately this will not prevent frames with skb->len == 1 to reach >> an Ethernet driver ndo_start_xmit() >> >> At ndo_start_xmit(), we do not look where the MAC header supposedly >> starts in the skb, we only use skb->data >> >> I have a syzbot repro using team driver, so I added the following part in team : >> >> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c >> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c >> 100644 >> --- a/drivers/net/team/team.c >> +++ b/drivers/net/team/team.c >> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff >> *skb, struct net_device *dev) >> bool tx_success; >> unsigned int len = skb->len; >> >> + if (len < 14) { >> + pr_err_once("team_xmit(len=%u)\n", len); >> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); >> + WARN_ON_ONCE(1); >> + } >> tx_success = team_queue_override_transmit(team, skb); >> if (!tx_success) >> tx_success = team->ops.transmit(team, skb); >> >> >> And I get (with your suggestion instead of skb->len) > > Missing part in my copy/paste : > > [ 41.123829] team_xmit(len=1) > [ 41.124335] skb len=1 headroom=78 headlen=1 tailroom=113 > >> mac=(78,0) net=(78,-1) trans=-1 Interesting. Could you also dump dev->type and/or dev->min_header_len? I suspect this case may not be ARPHRD_ETHER in team. Above says mac=(78,0), so mac len is 0 and the check against the dev->min_header_len should have dropped it if it went that branch. I wonder, is team driver missing sth like : diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c index 0a44bbdcfb7b..6256f0d2f565 100644 --- a/drivers/net/team/team.c +++ b/drivers/net/team/team.c @@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev, dev->type = port_dev->type; dev->hard_header_len = port_dev->hard_header_len; dev->needed_headroom = port_dev->needed_headroom; + dev->min_header_len = port_dev->min_header_len; dev->addr_len = port_dev->addr_len; dev->mtu = port_dev->mtu; memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len); >> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) >> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) >> hash(0x0 sw=0 l4=0) proto=0x88a8 pkttype=3 iif=0 >> [ 41.126553] dev name=team0 feat=0x0000e0064fddfbe9 >> [ 41.127132] skb linear: 00000000: 55 >> [ 41.128487] ------------[ cut here ]------------ >> [ 41.128551] WARNING: CPU: 2 PID: 1880 at >> drivers/net/team/team.c:1720 team_xmit (drivers/net/team/team.c:1720 >> (discriminator 1)) >> [ 41.129072] Modules linked in: macsec macvtap macvlan hsr wireguard >> curve25519_x86_64 libcurve25519_generic libchacha20poly1305 >> chacha_x86_64 libchacha poly1305_x86_64 batman_adv dummy bridge sr_mod >> cdrom evdev pcspkr i2c_piix4 9pnet_virtio 9p netfs 9pnet >> [ 41.129613] CPU: 2 PID: 1880 Comm: b330650301 Not tainted 6.8.0-virtme #238 >> [ 41.129664] Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), >> BIOS 1.16.3-debian-1.16.3-2 04/01/2014 >> [ 41.129780] RIP: 0010:team_xmit (drivers/net/team/team.c:1720 >> (discriminator 1)) >> [ 41.129847] Code: 41 54 53 44 8b 7f 70 48 89 fb 41 83 ff 0d 77 1c 80 >> 3d a0 24 8d 01 00 0f 84 0d 01 00 00 80 3d 92 24 8d 01 00 0f 84 e3 00 >> 00 00 <0f> 0b 41 80 be 21 0b 00 00 00 0f 84 9d 00 00 00 0f b7 43 7c 66 >> 85 >> All code >> ======== >> 0: 41 54 push %r12 >> 2: 53 push %rbx >> 3: 44 8b 7f 70 mov 0x70(%rdi),%r15d >> 7: 48 89 fb mov %rdi,%rbx >> a: 41 83 ff 0d cmp $0xd,%r15d >> e: 77 1c ja 0x2c >> 10: 80 3d a0 24 8d 01 00 cmpb $0x0,0x18d24a0(%rip) # 0x18d24b7 >> 17: 0f 84 0d 01 00 00 je 0x12a >> 1d: 80 3d 92 24 8d 01 00 cmpb $0x0,0x18d2492(%rip) # 0x18d24b6 >> 24: 0f 84 e3 00 00 00 je 0x10d >> 2a:* 0f 0b ud2 <-- trapping instruction >> 2c: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) >> 33: 00 >> 34: 0f 84 9d 00 00 00 je 0xd7 >> 3a: 0f b7 43 7c movzwl 0x7c(%rbx),%eax >> 3e: 66 data16 >> 3f: 85 .byte 0x85 >> >> Code starting with the faulting instruction >> =========================================== >> 0: 0f 0b ud2 >> 2: 41 80 be 21 0b 00 00 cmpb $0x0,0xb21(%r14) >> 9: 00 >> a: 0f 84 9d 00 00 00 je 0xad >> 10: 0f b7 43 7c movzwl 0x7c(%rbx),%eax >> 14: 66 data16 >> 15: 85 .byte 0x85 >> [ 41.129902] RSP: 0018:ffffa4210433b938 EFLAGS: 00000246 >> [ 41.129945] RAX: 0000000000000000 RBX: ffffa4210858a300 RCX: 0000000000000000 >> [ 41.129961] RDX: 0000000000000000 RSI: 00000000ffff7fff RDI: 0000000000000001 >> [ 41.129975] RBP: ffffa4210433b960 R08: 0000000000000000 R09: ffffa4210433b630 >> [ 41.129989] R10: 0000000000000001 R11: ffffffff8407d340 R12: 0000000000000000 >> [ 41.130004] R13: ffffa4210ecee000 R14: ffffa4210ece4000 R15: 0000000000000001 >> [ 41.130074] FS: 00007f91d9549740(0000) GS:ffffa42fffa80000(0000) >> knlGS:0000000000000000 >> [ 41.130095] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 >> [ 41.130140] CR2: 00007f8953077fb0 CR3: 0000000104f42000 CR4: 00000000000006f0 >> [ 41.130229] Call Trace: >> [ 41.130331] <TASK> >> [ 41.130530] ? show_regs (arch/x86/kernel/dumpstack.c:479) >> [ 41.130598] ? __warn (kernel/panic.c:694) >> [ 41.130611] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) >> [ 41.130625] ? report_bug (lib/bug.c:180 lib/bug.c:219) >> [ 41.130640] ? handle_bug (arch/x86/kernel/traps.c:239) >> [ 41.130653] ? exc_invalid_op (arch/x86/kernel/traps.c:260 (discriminator 1)) >> [ 41.130665] ? asm_exc_invalid_op (./arch/x86/include/asm/idtentry.h:621) >> [ 41.130700] ? team_xmit (drivers/net/team/team.c:1720 (discriminator 1)) >> [ 41.130714] ? team_xmit (drivers/net/team/team.c:1719 (discriminator 6)) >> [ 41.130734] dev_hard_start_xmit (./include/linux/netdevice.h:4903 >> ./include/linux/netdevice.h:4917 net/core/dev.c:3531 >> net/core/dev.c:3547) >> [ 41.130768] __dev_queue_xmit (./include/linux/netdevice.h:3287 >> (discriminator 25) net/core/dev.c:4336 (discriminator 25)) >> [ 41.130780] ? kmalloc_reserve (net/core/skbuff.c:580 (discriminator 4)) >> [ 41.130796] ? pskb_expand_head (net/core/skbuff.c:2292) >> [ 41.130815] __bpf_redirect (./include/linux/netdevice.h:3287 >> (discriminator 25) net/core/filter.c:2143 (discriminator 25) >> net/core/filter.c:2172 (discriminator 25) net/core/filter.c:2196 >> (discriminator 25)) >> [ 41.130825] bpf_clone_redirect (net/core/filter.c:2467 >> (discriminator 1) net/core/filter.c:2439 (discriminator 1)) >> [ 41.130841] bpf_prog_9845f5eee09e82c6+0x61/0x66 >> [ 41.130948] ? bpf_ksym_find (./include/linux/rbtree_latch.h:113 >> ./include/linux/rbtree_latch.h:208 kernel/bpf/core.c:734) >> [ 41.130963] ? is_bpf_text_address >> (./arch/x86/include/asm/preempt.h:84 (discriminator 13) >> ./include/linux/rcupdate.h:97 (discriminator 13) >> ./include/linux/rcupdate.h:813 (discriminator 13) >> kernel/bpf/core.c:769 (discriminator 13)) >> [ 41.130976] ? kernel_text_address (kernel/extable.c:125 >> (discriminator 1) kernel/extable.c:94 (discriminator 1)) >> [ 41.130989] ? __kernel_text_address (kernel/extable.c:79 (discriminator 1)) >> [ 41.131002] ? unwind_get_return_address >> (arch/x86/kernel/unwind_frame.c:19 (discriminator 1)) >> [ 41.131014] ? __pfx_stack_trace_consume_entry (kernel/stacktrace.c:83) >> [ 41.131028] ? arch_stack_walk (arch/x86/kernel/stacktrace.c:26) >> [ 41.131044] ? stack_depot_save_flags (lib/stackdepot.c:675) >> [ 41.131062] ? ktime_get (kernel/time/timekeeping.c:292 >> kernel/time/timekeeping.c:388 kernel/time/timekeeping.c:848) >> [ 41.131076] bpf_test_run (./include/linux/bpf.h:1234 >> ./include/linux/filter.h:657 ./include/linux/filter.h:664 >> net/bpf/test_run.c:425) >> [ 41.131087] ? security_sk_alloc (security/security.c:4662 (discriminator 13))
On Tue, Mar 26, 2024 at 6:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 3/26/24 2:38 PM, Eric Dumazet wrote: > > On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote: > >> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > >>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > >>>> On 03/25, Alexei Starovoitov wrote: > >>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > >>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > >>>>>> <alexei.starovoitov@gmail.com> wrote: > >>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > >>>>>>>> > >>>>>>>> Hello: > >>>>>>>> > >>>>>>>> This patch was applied to bpf/bpf.git (master) > >>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>: > >>>>>>>> > >>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > >>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown > >>>>>>>>> by various syzbot reports [1]. > >>>>>>>>> > >>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > >>>>>>>>> the missing attribute that can be used by upper layers. > >>>>>>>>> > >>>>>>>>> We need to use it in __bpf_redirect_common(). > >>>>>>> > >>>>>>> This patch broke empty_skb test: > >>>>>>> $ test_progs -t empty_skb > >>>>>>> > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>>>>>> [redirect_ingress]: actual -34 != expected 0 > >>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>>>>>> [redirect_egress]: actual -34 != expected 1 > >>>>>>> > >>>>>>> And looking at the test I think it's not a test issue. > >>>>>>> This check > >>>>>>> if (unlikely(skb->len < dev->min_header_len)) > >>>>>>> is rejecting more than it should. > >>>>>>> > >>>>>>> So I reverted this patch for now. > >>>>>> > >>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > >>>>>> move my sanity test in __bpf_tx_skb(), > >>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted. > >>>>>> > >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c > >>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > >>>>>> 100644 > >>>>>> --- a/net/core/filter.c > >>>>>> +++ b/net/core/filter.c > >>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > >>>>>> net_device *dev, struct sk_buff *skb) > >>>>>> return -ENETDOWN; > >>>>>> } > >>>>>> > >>>>>> + if (unlikely(skb->len < dev->min_header_len)) { > >>>>>> + pr_err_once("__bpf_tx_skb skb->len=%u < > >>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > >>>>>> dev->min_header_len); > >>>>>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > >>>>>> + kfree_skb(skb); > >>>>>> + return -ERANGE; > >>>>>> + } // Note: this is before we change skb->dev > >>>>>> skb->dev = dev; > >>>>>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > >>>>>> skb_clear_tstamp(skb); > >>>>>> > >>>>>> > >>>>>> --> > >>>>>> > >>>>>> > >>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > >>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > >>>>>> [redirect_egress]: actual -34 != expected 1 > >>>>>> > >>>>>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > >>>>>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > >>>>>> mac=(64,14) net=(78,-1) trans=-1 > >>>>>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > >>>>>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > >>>>>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > >>>>> > >>>>> Hmm. Something is off. > >>>>> That test creates 15 byte skb. > >>>>> It's not obvious to me how it got reduced to 1. > >>>>> Something stripped L2 header and the prog is trying to redirect > >>>>> such skb into veth that expects skb with L2 ? > >>>>> > >>>>> Stan, > >>>>> please take a look. > >>>>> Since you wrote that test. > >>>> > >>>> Sure. Daniel wants to take a look on a separate thread, so we can sync > >>>> up. Tentatively, seems like the failure is in the lwt path that does > >>>> indeed drop the l2. > >>> > >>> If we'd change the test into the below, the tc and empty_skb tests pass. > >>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus > >>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other > >>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best > >>> way forward.. > >>> > >>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, > >>> u32 flags) > >>> { > >>> /* Verify that a link layer header is carried */ > >>> if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { > >>> kfree_skb(skb); > >>> return -ERANGE; > >>> } > >>> > >>> if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { > >> > >> Unfortunately this will not prevent frames with skb->len == 1 to reach > >> an Ethernet driver ndo_start_xmit() > >> > >> At ndo_start_xmit(), we do not look where the MAC header supposedly > >> starts in the skb, we only use skb->data > >> > >> I have a syzbot repro using team driver, so I added the following part in team : > >> > >> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > >> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c > >> 100644 > >> --- a/drivers/net/team/team.c > >> +++ b/drivers/net/team/team.c > >> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff > >> *skb, struct net_device *dev) > >> bool tx_success; > >> unsigned int len = skb->len; > >> > >> + if (len < 14) { > >> + pr_err_once("team_xmit(len=%u)\n", len); > >> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > >> + WARN_ON_ONCE(1); > >> + } > >> tx_success = team_queue_override_transmit(team, skb); > >> if (!tx_success) > >> tx_success = team->ops.transmit(team, skb); > >> > >> > >> And I get (with your suggestion instead of skb->len) > > > > Missing part in my copy/paste : > > > > [ 41.123829] team_xmit(len=1) > > [ 41.124335] skb len=1 headroom=78 headlen=1 tailroom=113 > > > >> mac=(78,0) net=(78,-1) trans=-1 > > Interesting. > > Could you also dump dev->type and/or dev->min_header_len? I suspect > this case may not be ARPHRD_ETHER in team. > > Above says mac=(78,0), so mac len is 0 and the check against the > dev->min_header_len should have dropped it if it went that branch. mac header is reset in __dev_queue_xmit() : skb_reset_mac_header(skb); So when the bpf code ran, skb_mac_header_len(skb) was 14, but later the MAC header was set (to skb->data) > > I wonder, is team driver missing sth like : > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > index 0a44bbdcfb7b..6256f0d2f565 100644 > --- a/drivers/net/team/team.c > +++ b/drivers/net/team/team.c > @@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev, > dev->type = port_dev->type; > dev->hard_header_len = port_dev->hard_header_len; > dev->needed_headroom = port_dev->needed_headroom; > + dev->min_header_len = port_dev->min_header_len; > dev->addr_len = port_dev->addr_len; > dev->mtu = port_dev->mtu; > memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len); > I have confirmed that team min_header_len is 14, nothing seems to be missing I think. team_xmit(dev team0, skb->len=1, dev->min_header_len=14)
On Tue, Mar 26, 2024 at 7:08 PM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Mar 26, 2024 at 6:57 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > > > On 3/26/24 2:38 PM, Eric Dumazet wrote: > > > On Tue, Mar 26, 2024 at 2:37 PM Eric Dumazet <edumazet@google.com> wrote: > > >> On Tue, Mar 26, 2024 at 1:46 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > >>> On 3/25/24 5:28 PM, Stanislav Fomichev wrote: > > >>>> On 03/25, Alexei Starovoitov wrote: > > >>>>> On Mon, Mar 25, 2024 at 6:33 AM Eric Dumazet <edumazet@google.com> wrote: > > >>>>>> On Sat, Mar 23, 2024 at 4:02 AM Alexei Starovoitov > > >>>>>> <alexei.starovoitov@gmail.com> wrote: > > >>>>>>> On Fri, Mar 22, 2024 at 7:10 AM <patchwork-bot+netdevbpf@kernel.org> wrote: > > >>>>>>>> > > >>>>>>>> Hello: > > >>>>>>>> > > >>>>>>>> This patch was applied to bpf/bpf.git (master) > > >>>>>>>> by Daniel Borkmann <daniel@iogearbox.net>: > > >>>>>>>> > > >>>>>>>> On Fri, 22 Mar 2024 12:24:07 +0000 you wrote: > > >>>>>>>>> Some drivers ndo_start_xmit() expect a minimal size, as shown > > >>>>>>>>> by various syzbot reports [1]. > > >>>>>>>>> > > >>>>>>>>> Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") > > >>>>>>>>> the missing attribute that can be used by upper layers. > > >>>>>>>>> > > >>>>>>>>> We need to use it in __bpf_redirect_common(). > > >>>>>>> > > >>>>>>> This patch broke empty_skb test: > > >>>>>>> $ test_progs -t empty_skb > > >>>>>>> > > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>>>>> [redirect_ingress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>>>>> [redirect_ingress]: actual -34 != expected 0 > > >>>>>>> test_empty_skb:PASS:err: veth ETH_HLEN+1 packet ingress [redirect_egress] 0 nsec > > >>>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>>>>> [redirect_egress]: actual -34 != expected 1 > > >>>>>>> > > >>>>>>> And looking at the test I think it's not a test issue. > > >>>>>>> This check > > >>>>>>> if (unlikely(skb->len < dev->min_header_len)) > > >>>>>>> is rejecting more than it should. > > >>>>>>> > > >>>>>>> So I reverted this patch for now. > > >>>>>> > > >>>>>> OK, it seems I missed __bpf_rx_skb() vs __bpf_tx_skb(), but even if I > > >>>>>> move my sanity test in __bpf_tx_skb(), > > >>>>>> the bpf test program still fails, I am suspecting the test needs to be adjusted. > > >>>>>> > > >>>>>> diff --git a/net/core/filter.c b/net/core/filter.c > > >>>>>> index 745697c08acb3a74721d26ee93389efa81e973a0..e9c0e2087a08f1d8afd2c3e8e7871ddc9231b76d > > >>>>>> 100644 > > >>>>>> --- a/net/core/filter.c > > >>>>>> +++ b/net/core/filter.c > > >>>>>> @@ -2128,6 +2128,12 @@ static inline int __bpf_tx_skb(struct > > >>>>>> net_device *dev, struct sk_buff *skb) > > >>>>>> return -ENETDOWN; > > >>>>>> } > > >>>>>> > > >>>>>> + if (unlikely(skb->len < dev->min_header_len)) { > > >>>>>> + pr_err_once("__bpf_tx_skb skb->len=%u < > > >>>>>> dev(%s)->min_header_len(%u)\n", skb->len, dev->name, > > >>>>>> dev->min_header_len); > > >>>>>> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > >>>>>> + kfree_skb(skb); > > >>>>>> + return -ERANGE; > > >>>>>> + } // Note: this is before we change skb->dev > > >>>>>> skb->dev = dev; > > >>>>>> skb_set_redirected_noclear(skb, skb_at_tc_ingress(skb)); > > >>>>>> skb_clear_tstamp(skb); > > >>>>>> > > >>>>>> > > >>>>>> --> > > >>>>>> > > >>>>>> > > >>>>>> test_empty_skb:FAIL:ret: veth ETH_HLEN+1 packet ingress > > >>>>>> [redirect_egress] unexpected ret: veth ETH_HLEN+1 packet ingress > > >>>>>> [redirect_egress]: actual -34 != expected 1 > > >>>>>> > > >>>>>> [ 58.382051] __bpf_tx_skb skb->len=1 < dev(veth0)->min_header_len(14) > > >>>>>> [ 58.382778] skb len=1 headroom=78 headlen=1 tailroom=113 > > >>>>>> mac=(64,14) net=(78,-1) trans=-1 > > >>>>>> shinfo(txflags=0 nr_frags=0 gso(size=0 type=0 segs=0)) > > >>>>>> csum(0x0 ip_summed=0 complete_sw=0 valid=0 level=0) > > >>>>>> hash(0x0 sw=0 l4=0) proto=0x7f00 pkttype=0 iif=0 > > >>>>> > > >>>>> Hmm. Something is off. > > >>>>> That test creates 15 byte skb. > > >>>>> It's not obvious to me how it got reduced to 1. > > >>>>> Something stripped L2 header and the prog is trying to redirect > > >>>>> such skb into veth that expects skb with L2 ? > > >>>>> > > >>>>> Stan, > > >>>>> please take a look. > > >>>>> Since you wrote that test. > > >>>> > > >>>> Sure. Daniel wants to take a look on a separate thread, so we can sync > > >>>> up. Tentatively, seems like the failure is in the lwt path that does > > >>>> indeed drop the l2. > > >>> > > >>> If we'd change the test into the below, the tc and empty_skb tests pass. > > >>> run_lwt_bpf() calls into skb_do_redirect() which has L2 stripped, and thus > > >>> skb->len is 1 in this test. We do use skb_mac_header_len() also in other > > >>> tc BPF helpers, so perhaps s/skb->len/skb_mac_header_len(skb)/ is the best > > >>> way forward.. > > >>> > > >>> static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, > > >>> u32 flags) > > >>> { > > >>> /* Verify that a link layer header is carried */ > > >>> if (unlikely(skb->mac_header >= skb->network_header || skb->len == 0)) { > > >>> kfree_skb(skb); > > >>> return -ERANGE; > > >>> } > > >>> > > >>> if (unlikely(skb_mac_header_len(skb) < dev->min_header_len)) { > > >> > > >> Unfortunately this will not prevent frames with skb->len == 1 to reach > > >> an Ethernet driver ndo_start_xmit() > > >> > > >> At ndo_start_xmit(), we do not look where the MAC header supposedly > > >> starts in the skb, we only use skb->data > > >> > > >> I have a syzbot repro using team driver, so I added the following part in team : > > >> > > >> diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > > >> index 0a44bbdcfb7b9f30a0c27b700246501c5eba322f..75e5ef585a8f05b35cfddbae0bfc377864e6e38c > > >> 100644 > > >> --- a/drivers/net/team/team.c > > >> +++ b/drivers/net/team/team.c > > >> @@ -1714,6 +1714,11 @@ static netdev_tx_t team_xmit(struct sk_buff > > >> *skb, struct net_device *dev) > > >> bool tx_success; > > >> unsigned int len = skb->len; > > >> > > >> + if (len < 14) { > > >> + pr_err_once("team_xmit(len=%u)\n", len); > > >> + DO_ONCE_LITE(skb_dump, KERN_ERR, skb, false); > > >> + WARN_ON_ONCE(1); > > >> + } > > >> tx_success = team_queue_override_transmit(team, skb); > > >> if (!tx_success) > > >> tx_success = team->ops.transmit(team, skb); > > >> > > >> > > >> And I get (with your suggestion instead of skb->len) > > > > > > Missing part in my copy/paste : > > > > > > [ 41.123829] team_xmit(len=1) > > > [ 41.124335] skb len=1 headroom=78 headlen=1 tailroom=113 > > > > > >> mac=(78,0) net=(78,-1) trans=-1 > > > > Interesting. > > > > Could you also dump dev->type and/or dev->min_header_len? I suspect > > this case may not be ARPHRD_ETHER in team. > > > > Above says mac=(78,0), so mac len is 0 and the check against the > > dev->min_header_len should have dropped it if it went that branch. > > mac header is reset in __dev_queue_xmit() : > > skb_reset_mac_header(skb); > > So when the bpf code ran, skb_mac_header_len(skb) was 14, > but later the MAC header was set (to skb->data) > > > > > I wonder, is team driver missing sth like : > > > > diff --git a/drivers/net/team/team.c b/drivers/net/team/team.c > > index 0a44bbdcfb7b..6256f0d2f565 100644 > > --- a/drivers/net/team/team.c > > +++ b/drivers/net/team/team.c > > @@ -2124,6 +2124,7 @@ static void team_setup_by_port(struct net_device *dev, > > dev->type = port_dev->type; > > dev->hard_header_len = port_dev->hard_header_len; > > dev->needed_headroom = port_dev->needed_headroom; > > + dev->min_header_len = port_dev->min_header_len; > > dev->addr_len = port_dev->addr_len; > > dev->mtu = port_dev->mtu; > > memcpy(dev->broadcast, port_dev->broadcast, port_dev->addr_len); > > > > > I have confirmed that team min_header_len is 14, nothing seems to be > missing I think. > > team_xmit(dev team0, skb->len=1, dev->min_header_len=14) FYI, I am releasing today a syzbot report with the same problem.
diff --git a/net/core/filter.c b/net/core/filter.c index 8adf95765cdd967a15b2661dfb454db0ccf350b0..745697c08acb3a74721d26ee93389efa81e973a0 100644 --- a/net/core/filter.c +++ b/net/core/filter.c @@ -2175,6 +2175,11 @@ static int __bpf_redirect_common(struct sk_buff *skb, struct net_device *dev, return -ERANGE; } + if (unlikely(skb->len < dev->min_header_len)) { + kfree_skb(skb); + return -ERANGE; + } + bpf_push_mac_rcsum(skb); return flags & BPF_F_INGRESS ? __bpf_rx_skb(dev, skb) : __bpf_tx_skb(dev, skb);
Some drivers ndo_start_xmit() expect a minimal size, as shown by various syzbot reports [1]. Willem added in commit 217e6fa24ce2 ("net: introduce device min_header_len") the missing attribute that can be used by upper layers. We need to use it in __bpf_redirect_common(). [1] BUG: KMSAN: uninit-value in erspan_build_header+0x170/0x2f0 include/net/erspan.h:197 erspan_build_header+0x170/0x2f0 include/net/erspan.h:197 erspan_xmit+0x128a/0x1ec0 net/ipv4/ip_gre.c:706 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 sch_direct_xmit+0x3c5/0xd50 net/sched/sch_generic.c:343 __dev_xmit_skb net/core/dev.c:3760 [inline] __dev_queue_xmit+0x2e6a/0x52c0 net/core/dev.c:4301 dev_queue_xmit include/linux/netdevice.h:3091 [inline] __bpf_tx_skb net/core/filter.c:2136 [inline] __bpf_redirect_common net/core/filter.c:2180 [inline] __bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187 ____bpf_clone_redirect net/core/filter.c:2460 [inline] bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432 ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997 __bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238 bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline] __bpf_prog_run include/linux/filter.h:657 [inline] bpf_prog_run include/linux/filter.h:664 [inline] bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425 bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058 bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4240 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5649 __do_sys_bpf kernel/bpf/syscall.c:5738 [inline] __se_sys_bpf kernel/bpf/syscall.c:5736 [inline] __x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5736 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 Uninit was created at: slab_post_alloc_hook mm/slub.c:3804 [inline] slab_alloc_node mm/slub.c:3845 [inline] kmem_cache_alloc_node+0x613/0xc50 mm/slub.c:3888 kmalloc_reserve+0x13d/0x4a0 net/core/skbuff.c:577 pskb_expand_head+0x222/0x19d0 net/core/skbuff.c:2245 __skb_cow include/linux/skbuff.h:3671 [inline] skb_cow_head include/linux/skbuff.h:3705 [inline] erspan_xmit+0xb08/0x1ec0 net/ipv4/ip_gre.c:692 __netdev_start_xmit include/linux/netdevice.h:4903 [inline] netdev_start_xmit include/linux/netdevice.h:4917 [inline] xmit_one net/core/dev.c:3531 [inline] dev_hard_start_xmit+0x247/0xa20 net/core/dev.c:3547 sch_direct_xmit+0x3c5/0xd50 net/sched/sch_generic.c:343 __dev_xmit_skb net/core/dev.c:3760 [inline] __dev_queue_xmit+0x2e6a/0x52c0 net/core/dev.c:4301 dev_queue_xmit include/linux/netdevice.h:3091 [inline] __bpf_tx_skb net/core/filter.c:2136 [inline] __bpf_redirect_common net/core/filter.c:2180 [inline] __bpf_redirect+0x14a6/0x1620 net/core/filter.c:2187 ____bpf_clone_redirect net/core/filter.c:2460 [inline] bpf_clone_redirect+0x328/0x470 net/core/filter.c:2432 ___bpf_prog_run+0x13fe/0xe0f0 kernel/bpf/core.c:1997 __bpf_prog_run512+0xb5/0xe0 kernel/bpf/core.c:2238 bpf_dispatcher_nop_func include/linux/bpf.h:1234 [inline] __bpf_prog_run include/linux/filter.h:657 [inline] bpf_prog_run include/linux/filter.h:664 [inline] bpf_test_run+0x499/0xc30 net/bpf/test_run.c:425 bpf_prog_test_run_skb+0x14ea/0x1f20 net/bpf/test_run.c:1058 bpf_prog_test_run+0x6b7/0xad0 kernel/bpf/syscall.c:4240 __sys_bpf+0x6aa/0xd90 kernel/bpf/syscall.c:5649 __do_sys_bpf kernel/bpf/syscall.c:5738 [inline] __se_sys_bpf kernel/bpf/syscall.c:5736 [inline] __x64_sys_bpf+0xa0/0xe0 kernel/bpf/syscall.c:5736 do_syscall_64+0xd5/0x1f0 entry_SYSCALL_64_after_hwframe+0x6d/0x75 CPU: 0 PID: 5041 Comm: syz-executor167 Not tainted 6.8.0-syzkaller-11743-ga4145ce1e7bc #0 Hardware name: Google Google Compute Engine/Google Compute Engine, BIOS Google 02/29/2024 Fixes: fd1894224407 ("bpf: Don't redirect packets with invalid pkt_len") Reported-by: syzbot+9e27778c0edc62cb97d8@syzkaller.appspotmail.com Closes: https://lore.kernel.org/netdev/000000000000205af206143ece22@google.com/ Signed-off-by: Eric Dumazet <edumazet@google.com> Cc: Stanislav Fomichev <sdf@google.com> Cc: Willem de Bruijn <willemb@google.com> --- net/core/filter.c | 5 +++++ 1 file changed, 5 insertions(+)