diff mbox series

[net,v5,2/3] net: sched: add check tc_skip_classify in sch egress

Message ID 20211208145459.9590-3-xiangxia.m.yue@gmail.com (mailing list archive)
State Rejected
Delegated to: BPF
Headers show
Series fix bpf_redirect to ifb netdev | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7 this patch: 7
netdev/cc_maintainers warning 1 maintainers not CCed: bpf@vger.kernel.org
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 9 this patch: 9
netdev/checkpatch warning WARNING: Possible repeated word: 'in'
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-next success VM_Test
bpf/vmtest-bpf-next-PR success PR summary

Commit Message

Tonghao Zhang Dec. 8, 2021, 2:54 p.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

Try to resolve the issues as below:
* We look up and then check tc_skip_classify flag in net
  sched layer, even though skb don't want to be classified.
  That case may consume a lot of cpu cycles. This patch
  is useful when there are a lot of filters with different
  prio. There is ~5 prio in in production, ~1% improvement.

  Rules as below:
  $ for id in $(seq 1 5); do
  $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
  $ done

* bpf_redirect may be invoked in egress path. If we don't
  check the flags and then return immediately, the packets
  will loopback.

  $ tc filter add dev eth0 egress bpf direct-action obj ifb.o sec ifb

Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Alexei Starovoitov <ast@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Cc: Andrii Nakryiko <andrii@kernel.org>
Cc: Martin KaFai Lau <kafai@fb.com>
Cc: Song Liu <songliubraving@fb.com>
Cc: Yonghong Song <yhs@fb.com>
Cc: John Fastabend <john.fastabend@gmail.com>
Cc: KP Singh <kpsingh@kernel.org>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Alexander Lobakin <alexandr.lobakin@intel.com>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

John Fastabend Dec. 10, 2021, 4:43 p.m. UTC | #1
xiangxia.m.yue@ wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> Try to resolve the issues as below:
> * We look up and then check tc_skip_classify flag in net
>   sched layer, even though skb don't want to be classified.
>   That case may consume a lot of cpu cycles. This patch
>   is useful when there are a lot of filters with different
>   prio. There is ~5 prio in in production, ~1% improvement.
> 
>   Rules as below:
>   $ for id in $(seq 1 5); do
>   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>   $ done
> 
> * bpf_redirect may be invoked in egress path. If we don't
>   check the flags and then return immediately, the packets
>   will loopback.

This would be the naive case right? Meaning the BPF program is
doing a redirect without any logic or is buggy?

Can you map out how this happens for me, I'm not fully sure I
understand the exact concern. Is it possible for BPF programs
that used to see packets no longer see the packet as expected?

Is this the path you are talking about?

 rx ethx  ->
   execute BPF program on ethx with bpf_redirect(ifb0) ->
     __skb_dequeue @ifb tc_skip_classify = 1 ->
       dev_queue_xmit() -> 
          sch_handle_egress() ->
            execute BPF program again

I can't see why you want to skip that second tc BPF program,
or for that matter any tc filter there. In general how do you
know that is the correct/expected behavior? Before the above
change it would have been called, what if its doing useful
work.

Also its not clear how your ifb setup is built or used. That
might help understand your use case. I would just remove the
IFB altogether and the above discussion is mute.

Thanks,
John
John Fastabend Dec. 10, 2021, 4:52 p.m. UTC | #2
John Fastabend wrote:
> xiangxia.m.yue@ wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > 
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles. This patch
> >   is useful when there are a lot of filters with different
> >   prio. There is ~5 prio in in production, ~1% improvement.
> > 
> >   Rules as below:
> >   $ for id in $(seq 1 5); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> > 
> > * bpf_redirect may be invoked in egress path. If we don't
> >   check the flags and then return immediately, the packets
> >   will loopback.
> 
> This would be the naive case right? Meaning the BPF program is
> doing a redirect without any logic or is buggy?
> 
> Can you map out how this happens for me, I'm not fully sure I
> understand the exact concern. Is it possible for BPF programs
> that used to see packets no longer see the packet as expected?
> 
> Is this the path you are talking about?
> 
>  rx ethx  ->
>    execute BPF program on ethx with bpf_redirect(ifb0) ->
>      __skb_dequeue @ifb tc_skip_classify = 1 ->
>        dev_queue_xmit() -> 
>           sch_handle_egress() ->
>             execute BPF program again
> 
> I can't see why you want to skip that second tc BPF program,
> or for that matter any tc filter there. In general how do you
> know that is the correct/expected behavior? Before the above
> change it would have been called, what if its doing useful
> work.
> 
> Also its not clear how your ifb setup is built or used. That
> might help understand your use case. I would just remove the
> IFB altogether and the above discussion is mute.
> 
> Thanks,
> John

After a bit further thought (and coffee) I think this will
break some programs that exist today. Consider the case
where I pop a header off and resubmit to the same device
intentionally to reprocess the pkt without the header. I've
used this pattern in BPF a few times.
Tonghao Zhang Dec. 10, 2021, 5:37 p.m. UTC | #3
On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> xiangxia.m.yue@ wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > Try to resolve the issues as below:
> > * We look up and then check tc_skip_classify flag in net
> >   sched layer, even though skb don't want to be classified.
> >   That case may consume a lot of cpu cycles. This patch
> >   is useful when there are a lot of filters with different
> >   prio. There is ~5 prio in in production, ~1% improvement.
> >
> >   Rules as below:
> >   $ for id in $(seq 1 5); do
> >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >   $ done
> >
> > * bpf_redirect may be invoked in egress path. If we don't
> >   check the flags and then return immediately, the packets
> >   will loopback.
>
> This would be the naive case right? Meaning the BPF program is
> doing a redirect without any logic or is buggy?
>
> Can you map out how this happens for me, I'm not fully sure I
> understand the exact concern. Is it possible for BPF programs
> that used to see packets no longer see the packet as expected?
>
> Is this the path you are talking about?
Hi John
Tx ethx -> __dev_queue_xmit -> sch_handle_egress
->  execute BPF program on ethx with bpf_redirect(ifb0) ->
-> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
the packets loopbacks, that means bpf_redirect doesn't work with ifb
netdev, right ?
so in sch_handle_egress, I add the check skb_skip_tc_classify().

>  rx ethx  ->
>    execute BPF program on ethx with bpf_redirect(ifb0) ->
>      __skb_dequeue @ifb tc_skip_classify = 1 ->
>        dev_queue_xmit() ->
>           sch_handle_egress() ->
>             execute BPF program again
>
> I can't see why you want to skip that second tc BPF program,
> or for that matter any tc filter there. In general how do you
> know that is the correct/expected behavior? Before the above
> change it would have been called, what if its doing useful
> work.
bpf_redirect works fine on ingress with ifb
__netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
-> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
but
__netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
execute the BPF progam again)

> Also its not clear how your ifb setup is built or used. That
> might help understand your use case. I would just remove the
> IFB altogether and the above discussion is mute.
tc filter add dev veth1 egress bpf direct-action obj
test_bpf_redirect_ifb.o sec redirect_ifb

the test_bpf_redirect_ifb  bpf progam:
+// SPDX-License-Identifier: GPL-2.0
+/* Copyright (c) 2021 DiDi Global */
+
+#include <linux/bpf.h>
+#include <bpf/bpf_helpers.h>
+
+SEC("redirect_ifb")
+int redirect(struct __sk_buff *skb)
+{
+       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
+}
+
+char __license[] SEC("license") = "GPL";

The 3/3 is selftest:
https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/

> Thanks,
> John
Tonghao Zhang Dec. 10, 2021, 5:43 p.m. UTC | #4
On Sat, Dec 11, 2021 at 12:52 AM John Fastabend
<john.fastabend@gmail.com> wrote:
>
> John Fastabend wrote:
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> >
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> >
> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> >
> > Thanks,
> > John
>
> After a bit further thought (and coffee) I think this will
> break some programs that exist today. Consider the case
> where I pop a header off and resubmit to the same device
> intentionally to reprocess the pkt without the header. I've
> used this pattern in BPF a few times.
No, ifb netdev sets the skb->tc_skip_classify = 1, that means we
should not process the skb again, no matter on egress or ingress.
if the bpf programs don't set the skb->tc_skip_classify = 1, then we
can process them again.
Tonghao Zhang Dec. 10, 2021, 5:46 p.m. UTC | #5
On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> <john.fastabend@gmail.com> wrote:
> >
> > xiangxia.m.yue@ wrote:
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > Try to resolve the issues as below:
> > > * We look up and then check tc_skip_classify flag in net
> > >   sched layer, even though skb don't want to be classified.
> > >   That case may consume a lot of cpu cycles. This patch
> > >   is useful when there are a lot of filters with different
> > >   prio. There is ~5 prio in in production, ~1% improvement.
> > >
> > >   Rules as below:
> > >   $ for id in $(seq 1 5); do
> > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >   $ done
> > >
> > > * bpf_redirect may be invoked in egress path. If we don't
> > >   check the flags and then return immediately, the packets
> > >   will loopback.
> >
> > This would be the naive case right? Meaning the BPF program is
> > doing a redirect without any logic or is buggy?
> >
> > Can you map out how this happens for me, I'm not fully sure I
> > understand the exact concern. Is it possible for BPF programs
> > that used to see packets no longer see the packet as expected?
> >
> > Is this the path you are talking about?
> Hi John
> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> netdev, right ?
> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> >  rx ethx  ->
> >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> >        dev_queue_xmit() ->
> >           sch_handle_egress() ->
> >             execute BPF program again
> >
> > I can't see why you want to skip that second tc BPF program,
> > or for that matter any tc filter there. In general how do you
> > know that is the correct/expected behavior? Before the above
> > change it would have been called, what if its doing useful
> > work.
> bpf_redirect works fine on ingress with ifb
> __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> but
> __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> execute the BPF progam again)
so the packets will NOT execute the BPF progam again)

> > Also its not clear how your ifb setup is built or used. That
> > might help understand your use case. I would just remove the
> > IFB altogether and the above discussion is mute.
> tc filter add dev veth1 egress bpf direct-action obj
> test_bpf_redirect_ifb.o sec redirect_ifb
>
> the test_bpf_redirect_ifb  bpf progam:
> +// SPDX-License-Identifier: GPL-2.0
> +/* Copyright (c) 2021 DiDi Global */
> +
> +#include <linux/bpf.h>
> +#include <bpf/bpf_helpers.h>
> +
> +SEC("redirect_ifb")
> +int redirect(struct __sk_buff *skb)
> +{
> +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> +}
> +
> +char __license[] SEC("license") = "GPL";
>
> The 3/3 is selftest:
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>
> > Thanks,
> > John
>
>
>
> --
> Best regards, Tonghao
Tonghao Zhang Dec. 10, 2021, 7:54 p.m. UTC | #6
On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > <john.fastabend@gmail.com> wrote:
> > >
> > > xiangxia.m.yue@ wrote:
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > Try to resolve the issues as below:
> > > > * We look up and then check tc_skip_classify flag in net
> > > >   sched layer, even though skb don't want to be classified.
> > > >   That case may consume a lot of cpu cycles. This patch
> > > >   is useful when there are a lot of filters with different
> > > >   prio. There is ~5 prio in in production, ~1% improvement.
> > > >
> > > >   Rules as below:
> > > >   $ for id in $(seq 1 5); do
> > > >   $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > > >   $ done
> > > >
> > > > * bpf_redirect may be invoked in egress path. If we don't
> > > >   check the flags and then return immediately, the packets
> > > >   will loopback.
> > >
> > > This would be the naive case right? Meaning the BPF program is
> > > doing a redirect without any logic or is buggy?
> > >
> > > Can you map out how this happens for me, I'm not fully sure I
> > > understand the exact concern. Is it possible for BPF programs
> > > that used to see packets no longer see the packet as expected?
> > >
> > > Is this the path you are talking about?
> > Hi John
> > Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > netdev, right ?
> > so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >
> > >  rx ethx  ->
> > >    execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >      __skb_dequeue @ifb tc_skip_classify = 1 ->
> > >        dev_queue_xmit() ->
> > >           sch_handle_egress() ->
> > >             execute BPF program again
> > >
> > > I can't see why you want to skip that second tc BPF program,
> > > or for that matter any tc filter there. In general how do you
> > > know that is the correct/expected behavior? Before the above
> > > change it would have been called, what if its doing useful
> > > work.
> > bpf_redirect works fine on ingress with ifb
> > __netif_receive_skb_core -> sch_handle_ingress -> bpf_redirect (ifb0)
> > -> ifb_xmit -> netif_receive_skb -> __netif_receive_skb_core
> > but
> > __netif_receive_skb_core --> skb_skip_tc_classify(so the packets will
> > execute the BPF progam again)
> so the packets will NOT execute the BPF progam again)
>
> > > Also its not clear how your ifb setup is built or used. That
> > > might help understand your use case. I would just remove the
> > > IFB altogether and the above discussion is mute.
> > tc filter add dev veth1 egress bpf direct-action obj
> > test_bpf_redirect_ifb.o sec redirect_ifb
> >
> > the test_bpf_redirect_ifb  bpf progam:
> > +// SPDX-License-Identifier: GPL-2.0
> > +/* Copyright (c) 2021 DiDi Global */
> > +
> > +#include <linux/bpf.h>
> > +#include <bpf/bpf_helpers.h>
> > +
> > +SEC("redirect_ifb")
> > +int redirect(struct __sk_buff *skb)
> > +{
> > +       return bpf_redirect(skb->ifindex + 1 /* ifbX */, 0);
> > +}
> > +
> > +char __license[] SEC("license") = "GPL";
> >
> > The 3/3 is selftest:
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >
> > > Thanks,
> > > John
Hi John
Did I answer your question? I hope I explained things clearly.
> >
> >
> > --
> > Best regards, Tonghao
>
>
>
> --
> Best regards, Tonghao
Daniel Borkmann Dec. 10, 2021, 8:11 p.m. UTC | #7
On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
>>> <john.fastabend@gmail.com> wrote:
>>>> xiangxia.m.yue@ wrote:
>>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>>>>
>>>>> Try to resolve the issues as below:
>>>>> * We look up and then check tc_skip_classify flag in net
>>>>>    sched layer, even though skb don't want to be classified.
>>>>>    That case may consume a lot of cpu cycles. This patch
>>>>>    is useful when there are a lot of filters with different
>>>>>    prio. There is ~5 prio in in production, ~1% improvement.
>>>>>
>>>>>    Rules as below:
>>>>>    $ for id in $(seq 1 5); do
>>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>>>>>    $ done
>>>>>
>>>>> * bpf_redirect may be invoked in egress path. If we don't
>>>>>    check the flags and then return immediately, the packets
>>>>>    will loopback.
>>>>
>>>> This would be the naive case right? Meaning the BPF program is
>>>> doing a redirect without any logic or is buggy?
>>>>
>>>> Can you map out how this happens for me, I'm not fully sure I
>>>> understand the exact concern. Is it possible for BPF programs
>>>> that used to see packets no longer see the packet as expected?
>>>>
>>>> Is this the path you are talking about?
>>> Hi John
>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
>>> netdev, right ?
>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().

But why would you do that? Usage like this is just broken by design..
If you need to loop anything back to RX, just use bpf_redirect() with
BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
Tonghao Zhang Dec. 11, 2021, 12:37 a.m. UTC | #8
On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>> <john.fastabend@gmail.com> wrote:
> >>>> xiangxia.m.yue@ wrote:
> >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>
> >>>>> Try to resolve the issues as below:
> >>>>> * We look up and then check tc_skip_classify flag in net
> >>>>>    sched layer, even though skb don't want to be classified.
> >>>>>    That case may consume a lot of cpu cycles. This patch
> >>>>>    is useful when there are a lot of filters with different
> >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> >>>>>
> >>>>>    Rules as below:
> >>>>>    $ for id in $(seq 1 5); do
> >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >>>>>    $ done
> >>>>>
> >>>>> * bpf_redirect may be invoked in egress path. If we don't
> >>>>>    check the flags and then return immediately, the packets
> >>>>>    will loopback.
> >>>>
> >>>> This would be the naive case right? Meaning the BPF program is
> >>>> doing a redirect without any logic or is buggy?
> >>>>
> >>>> Can you map out how this happens for me, I'm not fully sure I
> >>>> understand the exact concern. Is it possible for BPF programs
> >>>> that used to see packets no longer see the packet as expected?
> >>>>
> >>>> Is this the path you are talking about?
> >>> Hi John
> >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>> netdev, right ?
> >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> But why would you do that? Usage like this is just broken by design..
As I understand, we can redirect packets to a target device either at
ingress or at *egress

The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
Allow eBPF programs attached to classifier/actions to call
bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
redirect the packet by dynamic ifindex selection from within the
program to a target device either at ingress or at egress. Can be used
for various scenarios, for example, to load balance skbs into veths,
split parts of the traffic to local taps, etc.

https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6

But at egress the bpf_redirect doesn't work with ifb.
> If you need to loop anything back to RX, just use bpf_redirect() with
Not use it to loop packets back. the flags of bpf_redirect is 0. for example:

tc filter add dev veth1 \
egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
We load balance the packets to different ifb netdevices at egress. On
ifb, we install filters, rate limit police,
Tonghao Zhang Dec. 12, 2021, 9:40 a.m. UTC | #9
On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>> <john.fastabend@gmail.com> wrote:
> >>>> xiangxia.m.yue@ wrote:
> >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>>>>
> >>>>> Try to resolve the issues as below:
> >>>>> * We look up and then check tc_skip_classify flag in net
> >>>>>    sched layer, even though skb don't want to be classified.
> >>>>>    That case may consume a lot of cpu cycles. This patch
> >>>>>    is useful when there are a lot of filters with different
> >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> >>>>>
> >>>>>    Rules as below:
> >>>>>    $ for id in $(seq 1 5); do
> >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >>>>>    $ done
> >>>>>
> >>>>> * bpf_redirect may be invoked in egress path. If we don't
> >>>>>    check the flags and then return immediately, the packets
> >>>>>    will loopback.
> >>>>
> >>>> This would be the naive case right? Meaning the BPF program is
> >>>> doing a redirect without any logic or is buggy?
> >>>>
> >>>> Can you map out how this happens for me, I'm not fully sure I
> >>>> understand the exact concern. Is it possible for BPF programs
> >>>> that used to see packets no longer see the packet as expected?
> >>>>
> >>>> Is this the path you are talking about?
> >>> Hi John
> >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>> netdev, right ?
> >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>
> But why would you do that? Usage like this is just broken by design..
> If you need to loop anything back to RX, just use bpf_redirect() with
> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
Hi
note that: ifb_ri_tasklet can send out the packets or receive skb
ifb_ri_tasklet
                if (!skb->from_ingress) {
                        dev_queue_xmit(skb); // bpf_redirect to ifb
and ifb invoked the dev_queue_xmit in our case.
                } else {
                        skb_pull_rcsum(skb, skb->mac_len);
                        netif_receive_skb(skb);
                }
Tonghao Zhang Dec. 14, 2021, 2:27 a.m. UTC | #10
On Sun, Dec 12, 2021 at 5:40 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > > On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > >>> <john.fastabend@gmail.com> wrote:
> > >>>> xiangxia.m.yue@ wrote:
> > >>>>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >>>>>
> > >>>>> Try to resolve the issues as below:
> > >>>>> * We look up and then check tc_skip_classify flag in net
> > >>>>>    sched layer, even though skb don't want to be classified.
> > >>>>>    That case may consume a lot of cpu cycles. This patch
> > >>>>>    is useful when there are a lot of filters with different
> > >>>>>    prio. There is ~5 prio in in production, ~1% improvement.
> > >>>>>
> > >>>>>    Rules as below:
> > >>>>>    $ for id in $(seq 1 5); do
> > >>>>>    $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> > >>>>>    $ done
> > >>>>>
> > >>>>> * bpf_redirect may be invoked in egress path. If we don't
> > >>>>>    check the flags and then return immediately, the packets
> > >>>>>    will loopback.
> > >>>>
> > >>>> This would be the naive case right? Meaning the BPF program is
> > >>>> doing a redirect without any logic or is buggy?
> > >>>>
> > >>>> Can you map out how this happens for me, I'm not fully sure I
> > >>>> understand the exact concern. Is it possible for BPF programs
> > >>>> that used to see packets no longer see the packet as expected?
> > >>>>
> > >>>> Is this the path you are talking about?
> > >>> Hi John
> > >>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > >>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > >>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > >>> netdev, right ?
> > >>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >
> > But why would you do that? Usage like this is just broken by design..
> > If you need to loop anything back to RX, just use bpf_redirect() with
> > BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> Hi
> note that: ifb_ri_tasklet can send out the packets or receive skb
> ifb_ri_tasklet
>                 if (!skb->from_ingress) {
>                         dev_queue_xmit(skb); // bpf_redirect to ifb
> and ifb invoked the dev_queue_xmit in our case.
>                 } else {
>                         skb_pull_rcsum(skb, skb->mac_len);
>                         netif_receive_skb(skb);
>                 }
Hi
In this thread, I try to explain this patch, and answer questions.
What should I do next? v1-v4 is "Changes Requested" in patchwork, but
v5 is "Rejected"
Should I add more commit message in this patch, and send v6 ?
1/3, 3/3 patch still need to be reviewed.

> --
> Best regards, Tonghao
Daniel Borkmann Dec. 16, 2021, 12:37 p.m. UTC | #11
On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
>> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
>>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
>>>>> <john.fastabend@gmail.com> wrote:
>>>>>> xiangxia.m.yue@ wrote:
[...]
>>>>> Hi John
>>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
>>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
>>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
>>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
>>>>> netdev, right ?
>>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
>>
>> But why would you do that? Usage like this is just broken by design..
> As I understand, we can redirect packets to a target device either at
> ingress or at *egress
> 
> The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> Allow eBPF programs attached to classifier/actions to call
> bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> redirect the packet by dynamic ifindex selection from within the
> program to a target device either at ingress or at egress. Can be used
> for various scenarios, for example, to load balance skbs into veths,
> split parts of the traffic to local taps, etc.
> 
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> 
> But at egress the bpf_redirect doesn't work with ifb.
>> If you need to loop anything back to RX, just use bpf_redirect() with
> Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> 
> tc filter add dev veth1 \
> egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
>> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> We load balance the packets to different ifb netdevices at egress. On
> ifb, we install filters, rate limit police,

I guess this part here is what I don't quite follow. Could you walk me through
the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
dev or different one I presume)? Why not doing the load-balancing, applying the
policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
there w/o the extra detour & overhead through ifb? The issue I see here is adding
extra overhead to support such a narrow case that nobody else is using and that
can be achieved already with existing infra as I understood it; the justification
right now to add the extra checks to the critical fast path is very thin..

Thanks,
Daniel
Tonghao Zhang Dec. 17, 2021, 3:21 a.m. UTC | #12
On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> >>>>> <john.fastabend@gmail.com> wrote:
> >>>>>> xiangxia.m.yue@ wrote:
> [...]
> >>>>> Hi John
> >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> >>>>> netdev, right ?
> >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> >>
> >> But why would you do that? Usage like this is just broken by design..
> > As I understand, we can redirect packets to a target device either at
> > ingress or at *egress
> >
> > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > Allow eBPF programs attached to classifier/actions to call
> > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > redirect the packet by dynamic ifindex selection from within the
> > program to a target device either at ingress or at egress. Can be used
> > for various scenarios, for example, to load balance skbs into veths,
> > split parts of the traffic to local taps, etc.
> >
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> >
> > But at egress the bpf_redirect doesn't work with ifb.
> >> If you need to loop anything back to RX, just use bpf_redirect() with
> > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> >
> > tc filter add dev veth1 \
> > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > We load balance the packets to different ifb netdevices at egress. On
> > ifb, we install filters, rate limit police,
Hi Daniel, I try to answer your question. thanks
> I guess this part here is what I don't quite follow. Could you walk me through
> the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
> redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
ifb doesn't install bpf prog, ifb will redirect packets to netdevice
which packets from.
ifb_ri_tasklet
1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
// skb_iif is phys-dev
2. dev_queue_xmit

> dev or different one I presume)? Why not doing the load-balancing, applying the
> policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
> there w/o the extra detour & overhead through ifb? The issue I see here is adding
This is not easy to explain. We have two solution for our production,
one is this, other one is my
other patchset:
https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*

Why I need those solutions, please see my patch 1/2 commit message.
https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/

The different of two solution is that this one use ifb to do the
rate-limiting/load-blance for applications which
key is high throughput. Then for  applications which key is  the low
latency of data access, can use the
mq or fifo qdisc in the phys-dev. This is very useful in k8s
environment. Anyway my other patchset is better
for this one sulotion. but this one solution is easy to used because
we can use the tc cmd(not bpf) to install filters.

This patch try to fix the bug in bpf.
> extra overhead to support such a narrow case that nobody else is using and that
I don't agree that, I think it's very useful in k8s. If the pods of
k8s are sharing the network resource, this is really a narrow case.
But part of pods want low latency, and other part of pods want high
throughput, this solution is selectable.
> can be achieved already with existing infra as I understood it; the justification
> right now to add the extra checks to the critical fast path is very thin..
note that:
For tc cmd, there is the check "skb_skip_tc_classify" in
tcf_action_exec, This patch only want to move check to
sch_handle_egress to avoid the
unessential tc filter lookup.
For bpf_redirect, we don't run tcf_action_exec, the packets loopback
if there is no check. This patch try to fix bug(bpf), and improve
performance(tc)

I use the netperf/iperf to measure performance, It's not easy to
figure out performance drop.
> Thanks,
> Daniel
Tonghao Zhang Jan. 10, 2022, 1:34 a.m. UTC | #13
t

On Fri, Dec 17, 2021 at 11:21 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Thu, Dec 16, 2021 at 8:37 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
> >
> > On 12/11/21 1:37 AM, Tonghao Zhang wrote:
> > > On Sat, Dec 11, 2021 at 4:11 AM Daniel Borkmann <daniel@iogearbox.net> wrote:
> > >> On 12/10/21 8:54 PM, Tonghao Zhang wrote:
> > >>> On Sat, Dec 11, 2021 at 1:46 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>>> On Sat, Dec 11, 2021 at 1:37 AM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> > >>>>> On Sat, Dec 11, 2021 at 12:43 AM John Fastabend
> > >>>>> <john.fastabend@gmail.com> wrote:
> > >>>>>> xiangxia.m.yue@ wrote:
> > [...]
> > >>>>> Hi John
> > >>>>> Tx ethx -> __dev_queue_xmit -> sch_handle_egress
> > >>>>> ->  execute BPF program on ethx with bpf_redirect(ifb0) ->
> > >>>>> -> ifb_xmit -> ifb_ri_tasklet -> dev_queue_xmit -> __dev_queue_xmit
> > >>>>> the packets loopbacks, that means bpf_redirect doesn't work with ifb
> > >>>>> netdev, right ?
> > >>>>> so in sch_handle_egress, I add the check skb_skip_tc_classify().
> > >>
> > >> But why would you do that? Usage like this is just broken by design..
> > > As I understand, we can redirect packets to a target device either at
> > > ingress or at *egress
> > >
> > > The commit ID: 3896d655f4d491c67d669a15f275a39f713410f8
> > > Allow eBPF programs attached to classifier/actions to call
> > > bpf_clone_redirect(skb, ifindex, flags) helper which will mirror or
> > > redirect the packet by dynamic ifindex selection from within the
> > > program to a target device either at ingress or at egress. Can be used
> > > for various scenarios, for example, to load balance skbs into veths,
> > > split parts of the traffic to local taps, etc.
> > >
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=3896d655f4d491c67d669a15f275a39f713410f8
> > > https://git.kernel.org/pub/scm/linux/kernel/git/netdev/net-next.git/commit/?id=27b29f63058d26c6c1742f1993338280d5a41dc6
> > >
> > > But at egress the bpf_redirect doesn't work with ifb.
> > >> If you need to loop anything back to RX, just use bpf_redirect() with
> > > Not use it to loop packets back. the flags of bpf_redirect is 0. for example:
> > >
> > > tc filter add dev veth1 \
> > > egress bpf direct-action obj test_bpf_redirect_ifb.o sec redirect_ifb
> > > https://patchwork.kernel.org/project/netdevbpf/patch/20211208145459.9590-4-xiangxia.m.yue@gmail.com/
> > >> BPF_F_INGRESS? What is the concrete/actual rationale for ifb here?
> > > We load balance the packets to different ifb netdevices at egress. On
> > > ifb, we install filters, rate limit police,
> Hi Daniel, I try to answer your question. thanks
> > I guess this part here is what I don't quite follow. Could you walk me through
> > the packet flow in this case? So you go from bpf@tc-egress@phys-dev to do the
> > redirect to bpf@tc-egress@ifb, and then again to bpf@tc-egress@phys-dev (same
> ifb doesn't install bpf prog, ifb will redirect packets to netdevice
> which packets from.
> ifb_ri_tasklet
> 1. skb->dev = dev_get_by_index_rcu(dev_net(txp->dev), skb->skb_iif);
> // skb_iif is phys-dev
> 2. dev_queue_xmit
>
> > dev or different one I presume)? Why not doing the load-balancing, applying the
> > policy, and doing the rate-limiting (e.g. EDT with sch_fq) directly at the initial
> > bpf@tc-egress@phys-dev location given bpf is perfectly capable to do all of it
> > there w/o the extra detour & overhead through ifb? The issue I see here is adding
> This is not easy to explain. We have two solution for our production,
> one is this, other one is my
> other patchset:
> https://patchwork.kernel.org/project/netdevbpf/list/?series=593409&archive=both&state=*
>
> Why I need those solutions, please see my patch 1/2 commit message.
> https://patchwork.kernel.org/project/netdevbpf/patch/20211210023626.20905-2-xiangxia.m.yue@gmail.com/
>
> The different of two solution is that this one use ifb to do the
> rate-limiting/load-blance for applications which
> key is high throughput. Then for  applications which key is  the low
> latency of data access, can use the
> mq or fifo qdisc in the phys-dev. This is very useful in k8s
> environment. Anyway my other patchset is better
> for this one sulotion. but this one solution is easy to used because
> we can use the tc cmd(not bpf) to install filters.
>
> This patch try to fix the bug in bpf.
> > extra overhead to support such a narrow case that nobody else is using and that
> I don't agree that, I think it's very useful in k8s. If the pods of
> k8s are sharing the network resource, this is really a narrow case.
> But part of pods want low latency, and other part of pods want high
> throughput, this solution is selectable.
> > can be achieved already with existing infra as I understood it; the justification
> > right now to add the extra checks to the critical fast path is very thin..
> note that:
> For tc cmd, there is the check "skb_skip_tc_classify" in
> tcf_action_exec, This patch only want to move check to
> sch_handle_egress to avoid the
> unessential tc filter lookup.
> For bpf_redirect, we don't run tcf_action_exec, the packets loopback
> if there is no check. This patch try to fix bug(bpf), and improve
> performance(tc)
>
> I use the netperf/iperf to measure performance, It's not easy to
> figure out performance drop.
Hi Daniel
any process ? Did I answer your questions?
 if you think we should not add(only move skb_skip_tc_classify from
tcf_action_exec to sch_handle_egress) this check in fastpath. I'am OK
to drop this patch.
but we can wait for other users to report this issue in the future. If
so, we can fix it with this patch. Can you help me to review other
patch 1/3 and 3/3.
> > Thanks,
> > Daniel
>
>
>
> --
> Best regards, Tonghao
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index a64297a4cc89..81ad415b78f9 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3823,6 +3823,9 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!miniq)
 		return skb;
 
+	if (skb_skip_tc_classify(skb))
+		return skb;
+
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_skb_cb(skb)->mru = 0;
 	qdisc_skb_cb(skb)->post_ct = false;