diff mbox series

[v2] net: sched: check tc_skip_classify as far as possible

Message ID 20211103143208.41282-1-xiangxia.m.yue@gmail.com (mailing list archive)
State Changes Requested
Delegated to: BPF
Headers show
Series [v2] net: sched: check tc_skip_classify as far as possible | expand

Checks

Context Check Description
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix warning Target tree name not specified in the subject
netdev/cover_letter success Single patches do not need cover letters
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 7 this patch: 7
netdev/cc_maintainers warning 8 maintainers not CCed: alobakin@pm.me edumazet@google.com arnd@arndb.de weiwan@google.com atenart@kernel.org bjorn@kernel.org davem@davemloft.net memxor@gmail.com
netdev/module_param success Was 0 now: 0
netdev/signed warning No signature found. Please sign patches: https://github.com/mricon/patatt
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
bpf/vmtest-bpf-PR pending PR summary
bpf/vmtest-bpf pending VM_Test
bpf/vmtest-bpf-next pending VM_Test
bpf/vmtest-bpf-next-PR pending PR summary
netdev/tree_selection success Guessing tree name failed - patch did not apply

Commit Message

Tonghao Zhang Nov. 3, 2021, 2:32 p.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

We look up and then check tc_skip_classify flag in net
sched layer, even though skb don't want to be classified.
That case may consume a lot of cpu cycles.

Install the rules as below:
$ for id in $(seq 1 100); do
$       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
$ done

netperf:
$ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
$ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32

Before: 10662.33 tps, 108.95 Mbit/s
After:  12434.48 tps, 145.89 Mbit/s

For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.

Cc: Willem de Bruijn <willemb@google.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Daniel Borkmann <daniel@iogearbox.net>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
v2: don't delete skb_skip_tc_classify in act_api 
---
 net/core/dev.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Willem de Bruijn Nov. 3, 2021, 2:47 p.m. UTC | #1
On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> We look up and then check tc_skip_classify flag in net
> sched layer, even though skb don't want to be classified.
> That case may consume a lot of cpu cycles.
>
> Install the rules as below:
> $ for id in $(seq 1 100); do
> $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> $ done
>
> netperf:
> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
>
> Before: 10662.33 tps, 108.95 Mbit/s
> After:  12434.48 tps, 145.89 Mbit/s
>
> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
>
> Cc: Willem de Bruijn <willemb@google.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Daniel Borkmann <daniel@iogearbox.net>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> ---
> v2: don't delete skb_skip_tc_classify in act_api
> ---
>  net/core/dev.c | 3 +++
>  1 file changed, 3 insertions(+)
>
> diff --git a/net/core/dev.c b/net/core/dev.c
> index edeb811c454e..fc29a429e9ad 100644
> --- a/net/core/dev.c
> +++ b/net/core/dev.c
> @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>         if (!miniq)
>                 return skb;
>
> +       if (skb_skip_tc_classify(skb))
> +               return skb;
> +

This bit and test exist to make sure that packets redirected through
the ifb device are redirected only once.

I was afraid that on second loop, this will also short-circuit other
unrelated tc classifiers and actions that should take place.

The name and the variable comment make clear that the intention is
indeed to bypass all classification.

However, the current implementation acts at tcf_action_exec. So it
does not stop processing by fused classifier-action objects, notably tc_bpf.

So I agree both that this seems to follow the original intent, but also
has the chance to break existing production configurations.


>         /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
>         qdisc_skb_cb(skb)->mru = 0;
>         qdisc_skb_cb(skb)->post_ct = false;
> --
> 2.27.0
>
Daniel Borkmann Nov. 3, 2021, 2:59 p.m. UTC | #2
On 11/3/21 3:47 PM, Willem de Bruijn wrote:
> On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote:
>>
>> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>>
>> We look up and then check tc_skip_classify flag in net
>> sched layer, even though skb don't want to be classified.
>> That case may consume a lot of cpu cycles.
>>
>> Install the rules as below:
>> $ for id in $(seq 1 100); do
>> $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
>> $ done
>>
>> netperf:
>> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
>> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
>>
>> Before: 10662.33 tps, 108.95 Mbit/s
>> After:  12434.48 tps, 145.89 Mbit/s
>>
>> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
>>
>> Cc: Willem de Bruijn <willemb@google.com>
>> Cc: Cong Wang <xiyou.wangcong@gmail.com>
>> Cc: Jakub Kicinski <kuba@kernel.org>
>> Cc: Daniel Borkmann <daniel@iogearbox.net>
>> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>> ---
>> v2: don't delete skb_skip_tc_classify in act_api
>> ---
>>   net/core/dev.c | 3 +++
>>   1 file changed, 3 insertions(+)
>>
>> diff --git a/net/core/dev.c b/net/core/dev.c
>> index edeb811c454e..fc29a429e9ad 100644
>> --- a/net/core/dev.c
>> +++ b/net/core/dev.c
>> @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
>>          if (!miniq)
>>                  return skb;
>>
>> +       if (skb_skip_tc_classify(skb))
>> +               return skb;
>> +
> 
> This bit and test exist to make sure that packets redirected through
> the ifb device are redirected only once.
> 
> I was afraid that on second loop, this will also short-circuit other
> unrelated tc classifiers and actions that should take place.
> 
> The name and the variable comment make clear that the intention is
> indeed to bypass all classification.
> 
> However, the current implementation acts at tcf_action_exec. So it
> does not stop processing by fused classifier-action objects, notably tc_bpf.
> 
> So I agree both that this seems to follow the original intent, but also
> has the chance to break existing production configurations.

Agree, I share your concern.
Tonghao Zhang Nov. 4, 2021, 1:23 a.m. UTC | #3
On Wed, Nov 3, 2021 at 10:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote:
>
> On 11/3/21 3:47 PM, Willem de Bruijn wrote:
> > On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote:
> >>
> >> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >>
> >> We look up and then check tc_skip_classify flag in net
> >> sched layer, even though skb don't want to be classified.
> >> That case may consume a lot of cpu cycles.
> >>
> >> Install the rules as below:
> >> $ for id in $(seq 1 100); do
> >> $       tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0
> >> $ done
> >>
> >> netperf:
> >> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32
> >> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32
> >>
> >> Before: 10662.33 tps, 108.95 Mbit/s
> >> After:  12434.48 tps, 145.89 Mbit/s
> >>
> >> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%.
> >>
> >> Cc: Willem de Bruijn <willemb@google.com>
> >> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> >> Cc: Jakub Kicinski <kuba@kernel.org>
> >> Cc: Daniel Borkmann <daniel@iogearbox.net>
> >> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >> ---
> >> v2: don't delete skb_skip_tc_classify in act_api
> >> ---
> >>   net/core/dev.c | 3 +++
> >>   1 file changed, 3 insertions(+)
> >>
> >> diff --git a/net/core/dev.c b/net/core/dev.c
> >> index edeb811c454e..fc29a429e9ad 100644
> >> --- a/net/core/dev.c
> >> +++ b/net/core/dev.c
> >> @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
> >>          if (!miniq)
> >>                  return skb;
> >>
> >> +       if (skb_skip_tc_classify(skb))
> >> +               return skb;
> >> +
> >
> > This bit and test exist to make sure that packets redirected through
> > the ifb device are redirected only once.
> >
> > I was afraid that on second loop, this will also short-circuit other
> > unrelated tc classifiers and actions that should take place.
> >
> > The name and the variable comment make clear that the intention is
> > indeed to bypass all classification.
> >
> > However, the current implementation acts at tcf_action_exec. So it
> > does not stop processing by fused classifier-action objects, notably tc_bpf.
> >
> > So I agree both that this seems to follow the original intent, but also
> > has the chance to break existing production configurations.
>
> Agree, I share your concern.
I got it, thanks Daniel, Willem.
One question, If users use the bpf_redirect, instead of tc-mirred, to
redirect the packets to ifb in egress.
If there is not a test with *skb_skip_tc_classify, the packets will
loop and be dropped, right ? but we hope the packets sent out the
linux box.
diff mbox series

Patch

diff --git a/net/core/dev.c b/net/core/dev.c
index edeb811c454e..fc29a429e9ad 100644
--- a/net/core/dev.c
+++ b/net/core/dev.c
@@ -3940,6 +3940,9 @@  sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev)
 	if (!miniq)
 		return skb;
 
+	if (skb_skip_tc_classify(skb))
+		return skb;
+
 	/* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */
 	qdisc_skb_cb(skb)->mru = 0;
 	qdisc_skb_cb(skb)->post_ct = false;