Message ID | 20211103143208.41282-1-xiangxia.m.yue@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | BPF |
Headers | show |
Series | [v2] net: sched: check tc_skip_classify as far as possible | expand |
On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote: > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > > We look up and then check tc_skip_classify flag in net > sched layer, even though skb don't want to be classified. > That case may consume a lot of cpu cycles. > > Install the rules as below: > $ for id in $(seq 1 100); do > $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > $ done > > netperf: > $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > > Before: 10662.33 tps, 108.95 Mbit/s > After: 12434.48 tps, 145.89 Mbit/s > > For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > > Cc: Willem de Bruijn <willemb@google.com> > Cc: Cong Wang <xiyou.wangcong@gmail.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Daniel Borkmann <daniel@iogearbox.net> > Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > --- > v2: don't delete skb_skip_tc_classify in act_api > --- > net/core/dev.c | 3 +++ > 1 file changed, 3 insertions(+) > > diff --git a/net/core/dev.c b/net/core/dev.c > index edeb811c454e..fc29a429e9ad 100644 > --- a/net/core/dev.c > +++ b/net/core/dev.c > @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > if (!miniq) > return skb; > > + if (skb_skip_tc_classify(skb)) > + return skb; > + This bit and test exist to make sure that packets redirected through the ifb device are redirected only once. I was afraid that on second loop, this will also short-circuit other unrelated tc classifiers and actions that should take place. The name and the variable comment make clear that the intention is indeed to bypass all classification. However, the current implementation acts at tcf_action_exec. So it does not stop processing by fused classifier-action objects, notably tc_bpf. So I agree both that this seems to follow the original intent, but also has the chance to break existing production configurations. > /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ > qdisc_skb_cb(skb)->mru = 0; > qdisc_skb_cb(skb)->post_ct = false; > -- > 2.27.0 >
On 11/3/21 3:47 PM, Willem de Bruijn wrote: > On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote: >> >> From: Tonghao Zhang <xiangxia.m.yue@gmail.com> >> >> We look up and then check tc_skip_classify flag in net >> sched layer, even though skb don't want to be classified. >> That case may consume a lot of cpu cycles. >> >> Install the rules as below: >> $ for id in $(seq 1 100); do >> $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 >> $ done >> >> netperf: >> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 >> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 >> >> Before: 10662.33 tps, 108.95 Mbit/s >> After: 12434.48 tps, 145.89 Mbit/s >> >> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. >> >> Cc: Willem de Bruijn <willemb@google.com> >> Cc: Cong Wang <xiyou.wangcong@gmail.com> >> Cc: Jakub Kicinski <kuba@kernel.org> >> Cc: Daniel Borkmann <daniel@iogearbox.net> >> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> >> --- >> v2: don't delete skb_skip_tc_classify in act_api >> --- >> net/core/dev.c | 3 +++ >> 1 file changed, 3 insertions(+) >> >> diff --git a/net/core/dev.c b/net/core/dev.c >> index edeb811c454e..fc29a429e9ad 100644 >> --- a/net/core/dev.c >> +++ b/net/core/dev.c >> @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) >> if (!miniq) >> return skb; >> >> + if (skb_skip_tc_classify(skb)) >> + return skb; >> + > > This bit and test exist to make sure that packets redirected through > the ifb device are redirected only once. > > I was afraid that on second loop, this will also short-circuit other > unrelated tc classifiers and actions that should take place. > > The name and the variable comment make clear that the intention is > indeed to bypass all classification. > > However, the current implementation acts at tcf_action_exec. So it > does not stop processing by fused classifier-action objects, notably tc_bpf. > > So I agree both that this seems to follow the original intent, but also > has the chance to break existing production configurations. Agree, I share your concern.
On Wed, Nov 3, 2021 at 10:59 PM Daniel Borkmann <daniel@iogearbox.net> wrote: > > On 11/3/21 3:47 PM, Willem de Bruijn wrote: > > On Wed, Nov 3, 2021 at 10:32 AM <xiangxia.m.yue@gmail.com> wrote: > >> > >> From: Tonghao Zhang <xiangxia.m.yue@gmail.com> > >> > >> We look up and then check tc_skip_classify flag in net > >> sched layer, even though skb don't want to be classified. > >> That case may consume a lot of cpu cycles. > >> > >> Install the rules as below: > >> $ for id in $(seq 1 100); do > >> $ tc filter add ... egress prio $id ... action mirred egress redirect dev ifb0 > >> $ done > >> > >> netperf: > >> $ taskset -c 1 netperf -t TCP_RR -H ip -- -r 32,32 > >> $ taskset -c 1 netperf -t TCP_STREAM -H ip -- -m 32 > >> > >> Before: 10662.33 tps, 108.95 Mbit/s > >> After: 12434.48 tps, 145.89 Mbit/s > >> > >> For TCP_RR, there are 16.6% improvement, TCP_STREAM 33.9%. > >> > >> Cc: Willem de Bruijn <willemb@google.com> > >> Cc: Cong Wang <xiyou.wangcong@gmail.com> > >> Cc: Jakub Kicinski <kuba@kernel.org> > >> Cc: Daniel Borkmann <daniel@iogearbox.net> > >> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com> > >> --- > >> v2: don't delete skb_skip_tc_classify in act_api > >> --- > >> net/core/dev.c | 3 +++ > >> 1 file changed, 3 insertions(+) > >> > >> diff --git a/net/core/dev.c b/net/core/dev.c > >> index edeb811c454e..fc29a429e9ad 100644 > >> --- a/net/core/dev.c > >> +++ b/net/core/dev.c > >> @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) > >> if (!miniq) > >> return skb; > >> > >> + if (skb_skip_tc_classify(skb)) > >> + return skb; > >> + > > > > This bit and test exist to make sure that packets redirected through > > the ifb device are redirected only once. > > > > I was afraid that on second loop, this will also short-circuit other > > unrelated tc classifiers and actions that should take place. > > > > The name and the variable comment make clear that the intention is > > indeed to bypass all classification. > > > > However, the current implementation acts at tcf_action_exec. So it > > does not stop processing by fused classifier-action objects, notably tc_bpf. > > > > So I agree both that this seems to follow the original intent, but also > > has the chance to break existing production configurations. > > Agree, I share your concern. I got it, thanks Daniel, Willem. One question, If users use the bpf_redirect, instead of tc-mirred, to redirect the packets to ifb in egress. If there is not a test with *skb_skip_tc_classify, the packets will loop and be dropped, right ? but we hope the packets sent out the linux box.
diff --git a/net/core/dev.c b/net/core/dev.c index edeb811c454e..fc29a429e9ad 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -3940,6 +3940,9 @@ sch_handle_egress(struct sk_buff *skb, int *ret, struct net_device *dev) if (!miniq) return skb; + if (skb_skip_tc_classify(skb)) + return skb; + /* qdisc_skb_cb(skb)->pkt_len was already set by the caller. */ qdisc_skb_cb(skb)->mru = 0; qdisc_skb_cb(skb)->post_ct = false;