diff mbox series

[net-next,v3,2/2] net: sched: support hash/classid/cpuid selecting tx queue

Message ID 20211210023626.20905-3-xiangxia.m.yue@gmail.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series net: sched: allow user to select txqueue | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 9 this patch: 9
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 20 this patch: 20
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 11 this patch: 11
netdev/checkpatch warning WARNING: line length of 81 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Tonghao Zhang Dec. 10, 2021, 2:36 a.m. UTC
From: Tonghao Zhang <xiangxia.m.yue@gmail.com>

This patch allows users to select queue_mapping, range
from A to B. And users can use skb-hash, cgroup classid
and cpuid to select Tx queues. Then we can load balance
packets from A to B queue. The range is an unsigned 16bit
value in decimal format.

$ tc filter ... action skbedit queue_mapping hash-type normal A B

"skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
enhanced with flags:
* SKBEDIT_F_QUEUE_MAPPING_HASH
* SKBEDIT_F_QUEUE_MAPPING_CLASSID
* SKBEDIT_F_QUEUE_MAPPING_CPUID

Use skb->hash, cgroup classid, or cpuid to distribute packets.
Then same range of tx queues can be shared for different flows,
cgroups, or CPUs in a variety of scenarios.

For example, flows F1 may share range R1 with flows F2. The best
way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
C1 uses R1, while Cn can use Rn.

Cc: Jamal Hadi Salim <jhs@mojatatu.com>
Cc: Cong Wang <xiyou.wangcong@gmail.com>
Cc: Jiri Pirko <jiri@resnulli.us>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Alexander Lobakin <alobakin@pm.me>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Talal Ahmad <talalahmad@google.com>
Cc: Kevin Hao <haokexin@gmail.com>
Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
Cc: Kees Cook <keescook@chromium.org>
Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
Cc: Antoine Tenart <atenart@kernel.org>
Cc: Wei Wang <weiwan@google.com>
Cc: Arnd Bergmann <arnd@arndb.de>
Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
---
 include/net/tc_act/tc_skbedit.h        |  1 +
 include/uapi/linux/tc_act/tc_skbedit.h |  8 +++
 net/sched/act_skbedit.c                | 74 ++++++++++++++++++++++++--
 3 files changed, 79 insertions(+), 4 deletions(-)

Comments

Cong Wang Dec. 12, 2021, 2:19 a.m. UTC | #1
On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
>
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
>
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID

With act_bpf you can do all of them... So why do you have to do it
in skbedit?
Tonghao Zhang Dec. 12, 2021, 2:34 a.m. UTC | #2
On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> >
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch allows users to select queue_mapping, range
> > from A to B. And users can use skb-hash, cgroup classid
> > and cpuid to select Tx queues. Then we can load balance
> > packets from A to B queue. The range is an unsigned 16bit
> > value in decimal format.
> >
> > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> >
> > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > enhanced with flags:
> > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > * SKBEDIT_F_QUEUE_MAPPING_CPUID
>
> With act_bpf you can do all of them... So why do you have to do it
> in skbedit?
Hi Cong
This idea is inspired by skbedit queue_mapping, and skbedit is
enhanced by this patch.
We support this in skbedit firstly in production. act_bpf can do more
things than this. Anyway we
can support this in both act_skbedit/acc_bpf. 1/2 is changed from
skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
patch which can change the
per-cpu var in bpf. I will post this patch later.
Cong Wang Dec. 13, 2021, 10:53 p.m. UTC | #3
On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> >
> > On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> > >
> > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > >
> > > This patch allows users to select queue_mapping, range
> > > from A to B. And users can use skb-hash, cgroup classid
> > > and cpuid to select Tx queues. Then we can load balance
> > > packets from A to B queue. The range is an unsigned 16bit
> > > value in decimal format.
> > >
> > > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> > >
> > > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > > enhanced with flags:
> > > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> >
> > With act_bpf you can do all of them... So why do you have to do it
> > in skbedit?
> Hi Cong
> This idea is inspired by skbedit queue_mapping, and skbedit is
> enhanced by this patch.

This is exactly my question. ;)

> We support this in skbedit firstly in production. act_bpf can do more
> things than this. Anyway we
> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> patch which can change the
> per-cpu var in bpf. I will post this patch later.

The point is if act_bpf can do it, you don't need to bother skbedit at
all. More importantly, you are enforcing policies in kernel, which is
not encouraged. So unless you provide more details, this patch is not
needed at all.

Thanks.
Tonghao Zhang Dec. 14, 2021, 2:19 a.m. UTC | #4
On Tue, Dec 14, 2021 at 6:53 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
>
> On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
> >
> > On Sun, Dec 12, 2021 at 10:19 AM Cong Wang <xiyou.wangcong@gmail.com> wrote:
> > >
> > > On Thu, Dec 9, 2021 at 6:36 PM <xiangxia.m.yue@gmail.com> wrote:
> > > >
> > > > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> > > >
> > > > This patch allows users to select queue_mapping, range
> > > > from A to B. And users can use skb-hash, cgroup classid
> > > > and cpuid to select Tx queues. Then we can load balance
> > > > packets from A to B queue. The range is an unsigned 16bit
> > > > value in decimal format.
> > > >
> > > > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> > > >
> > > > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > > > enhanced with flags:
> > > > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > > > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > > > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> > >
> > > With act_bpf you can do all of them... So why do you have to do it
> > > in skbedit?
> > Hi Cong
> > This idea is inspired by skbedit queue_mapping, and skbedit is
> > enhanced by this patch.
>
> This is exactly my question. ;)
>
> > We support this in skbedit firstly in production. act_bpf can do more
> > things than this. Anyway we
> > can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> > skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> > patch which can change the
> > per-cpu var in bpf. I will post this patch later.
>
> The point is if act_bpf can do it, you don't need to bother skbedit at
> all. More importantly, you are enforcing policies in kernel, which is
> not encouraged. So unless you provide more details, this patch is not
> needed at all.
Hi Cong,
1. As I understand it, act_bpf can work with 1/2 patch(but we should
another path to change the skip_txqueue in bpf).
It is easy for kernel developer. But for another user, it is not easy.
tc command is a choice, and easy to use.
2. BTW, act_bpf can't try to instead act_xxx action in future even
though it can do more thing. right ?
3. This patch is more important to work with 1/2, and only enhance the
skbedit queue_mapping function. not a big change.
4. "policies" ? if selecting tx from a range(this patch) is what you
mean "policies", so change the skb mark, priority, tc-peidt
and other tc-action are "policies" too. we still need the different tc actions.
>
> Thanks.
Jamal Hadi Salim Dec. 14, 2021, 12:13 p.m. UTC | #5
On 2021-12-13 17:53, Cong Wang wrote:
> On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:

>> We support this in skbedit firstly in production. act_bpf can do more
>> things than this. Anyway we
>> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
>> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
>> patch which can change the
>> per-cpu var in bpf. I will post this patch later.
> 
> The point is if act_bpf can do it, you don't need to bother skbedit at
> all. 

Just a comment on this general statement:
I know of at least one large data centre that wont allow anything
"compiled" to be installed unless it goes through a very long vetting
process(we are talking months).
"Compiled" includes bpf. This is common practise in a few other places
some extreme more than others - the reasons are typically driven by
either some overzelous program manager or security people. Regardless
of the reasoning, this is a real issue.
Note: None of these data centres have a long process if what is
installed is a bash script expressing policy.
In such a case an upstream of a feature such as this is more useful.


cheers,
jamal
Tonghao Zhang Dec. 15, 2021, 1:34 a.m. UTC | #6
On Fri, Dec 10, 2021 at 10:36 AM <xiangxia.m.yue@gmail.com> wrote:
>
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
>
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
>
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
>
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID
>
> Use skb->hash, cgroup classid, or cpuid to distribute packets.
> Then same range of tx queues can be shared for different flows,
> cgroups, or CPUs in a variety of scenarios.
>
> For example, flows F1 may share range R1 with flows F2. The best
> way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> C1 uses R1, while Cn can use Rn.
>
> Cc: Jamal Hadi Salim <jhs@mojatatu.com>
> Cc: Cong Wang <xiyou.wangcong@gmail.com>
> Cc: Jiri Pirko <jiri@resnulli.us>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Jonathan Lemon <jonathan.lemon@gmail.com>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Alexander Lobakin <alobakin@pm.me>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Talal Ahmad <talalahmad@google.com>
> Cc: Kevin Hao <haokexin@gmail.com>
> Cc: Ilias Apalodimas <ilias.apalodimas@linaro.org>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Kumar Kartikeya Dwivedi <memxor@gmail.com>
> Cc: Antoine Tenart <atenart@kernel.org>
> Cc: Wei Wang <weiwan@google.com>
> Cc: Arnd Bergmann <arnd@arndb.de>
> Signed-off-by: Tonghao Zhang <xiangxia.m.yue@gmail.com>
Hi Jakub, Eric
Do you have any comment?We talk about this patchset in this thread.
Should I resend this patchset with more commit message ?
> ---
>  include/net/tc_act/tc_skbedit.h        |  1 +
>  include/uapi/linux/tc_act/tc_skbedit.h |  8 +++
>  net/sched/act_skbedit.c                | 74 ++++++++++++++++++++++++--
>  3 files changed, 79 insertions(+), 4 deletions(-)
>
> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 00bfee70609e..ee96e0fa6566 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
>         u32 mark;
>         u32 mask;
>         u16 queue_mapping;
> +       u16 mapping_mod;
>         u16 ptype;
>         struct rcu_head rcu;
>  };
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 800e93377218..5642b095d206 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -29,6 +29,13 @@
>  #define SKBEDIT_F_PTYPE                        0x8
>  #define SKBEDIT_F_MASK                 0x10
>  #define SKBEDIT_F_INHERITDSFIELD       0x20
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH   0x40
> +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID        0x80
> +#define SKBEDIT_F_QUEUE_MAPPING_CPUID  0x100
> +
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> +                                          SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> +                                          SKBEDIT_F_QUEUE_MAPPING_CPUID)
>
>  struct tc_skbedit {
>         tc_gen;
> @@ -45,6 +52,7 @@ enum {
>         TCA_SKBEDIT_PTYPE,
>         TCA_SKBEDIT_MASK,
>         TCA_SKBEDIT_FLAGS,
> +       TCA_SKBEDIT_QUEUE_MAPPING_MAX,
>         __TCA_SKBEDIT_MAX
>  };
>  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 498feedad70a..0b0d65d7112e 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
> +#include <net/cls_cgroup.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
>  #include <net/ip.h>
> @@ -23,6 +24,37 @@
>  static unsigned int skbedit_net_id;
>  static struct tc_action_ops act_skbedit_ops;
>
> +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> +                           struct sk_buff *skb)
> +{
> +       u16 queue_mapping = params->queue_mapping;
> +       u16 mapping_mod = params->mapping_mod;
> +       u32 mapping_hash_type = params->flags &
> +                               SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +       u32 hash = 0;
> +
> +       if (!mapping_hash_type)
> +               return netdev_cap_txqueue(skb->dev, queue_mapping);
> +
> +       switch (mapping_hash_type) {
> +       case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> +               hash = jhash_1word(task_get_classid(skb), 0);
> +               break;
> +       case SKBEDIT_F_QUEUE_MAPPING_HASH:
> +               hash = skb_get_hash(skb);
> +               break;
> +       case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> +               hash = raw_smp_processor_id();
> +               break;
> +       default:
> +               net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> +                                    mapping_hash_type);
> +       }
> +
> +       queue_mapping = queue_mapping + hash % mapping_mod;
> +       return netdev_cap_txqueue(skb->dev, queue_mapping);
> +}
> +
>  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>                            struct tcf_result *res)
>  {
> @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>                         break;
>                 }
>         }
> -       if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> -           skb->dev->real_num_tx_queues > params->queue_mapping) {
> +       if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
>                 netdev_xmit_skip_txqueue();
> -               skb_set_queue_mapping(skb, params->queue_mapping);
> +               skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
>         }
>         if (params->flags & SKBEDIT_F_MARK) {
>                 skb->mark &= ~params->mask;
> @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>         [TCA_SKBEDIT_PTYPE]             = { .len = sizeof(u16) },
>         [TCA_SKBEDIT_MASK]              = { .len = sizeof(u32) },
>         [TCA_SKBEDIT_FLAGS]             = { .len = sizeof(u64) },
> +       [TCA_SKBEDIT_QUEUE_MAPPING_MAX] = { .len = sizeof(u16) },
>  };
>
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>         struct tcf_skbedit *d;
>         u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>         u16 *queue_mapping = NULL, *ptype = NULL;
> +       u16 mapping_mod = 0;
>         bool exists = false;
>         int ret = 0, err;
>         u32 index;
> @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>
>         if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
>                 u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> +               u64 mapping_hash_type = *pure_flags &
> +                                       SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +               if (mapping_hash_type) {
> +                       u16 *queue_mapping_max;
> +
> +                       /* Hash types are mutually exclusive. */
> +                       if (mapping_hash_type & (mapping_hash_type - 1))
> +                               return -EINVAL;
> +
> +                       if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> +                               return -EINVAL;
>
> +                       if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> +                               return -EINVAL;
> +
> +                       queue_mapping_max =
> +                               nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
> +
> +                       if (*queue_mapping_max < *queue_mapping)
> +                               return -EINVAL;
> +
> +                       mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> +                       flags |= mapping_hash_type;
> +               }
>                 if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
>                         flags |= SKBEDIT_F_INHERITDSFIELD;
>         }
> @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>         params_new->flags = flags;
>         if (flags & SKBEDIT_F_PRIORITY)
>                 params_new->priority = *priority;
> -       if (flags & SKBEDIT_F_QUEUE_MAPPING)
> +       if (flags & SKBEDIT_F_QUEUE_MAPPING) {
>                 params_new->queue_mapping = *queue_mapping;
> +               params_new->mapping_mod = mapping_mod;
> +       }
>         if (flags & SKBEDIT_F_MARK)
>                 params_new->mark = *mark;
>         if (flags & SKBEDIT_F_PTYPE)
> @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>                 goto nla_put_failure;
>         if (params->flags & SKBEDIT_F_INHERITDSFIELD)
>                 pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> +       if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> +               if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> +                               params->queue_mapping + params->mapping_mod - 1))
> +                       goto nla_put_failure;
> +
> +               pure_flags |= params->flags &
> +                             SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
> +       }
>         if (pure_flags != 0 &&
>             nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
>                 goto nla_put_failure;
> --
> 2.27.0
>
Tonghao Zhang Dec. 15, 2021, 1:41 a.m. UTC | #7
On Tue, Dec 14, 2021 at 8:13 PM Jamal Hadi Salim <jhs@mojatatu.com> wrote:
>
> On 2021-12-13 17:53, Cong Wang wrote:
> > On Sat, Dec 11, 2021 at 6:34 PM Tonghao Zhang <xiangxia.m.yue@gmail.com> wrote:
>
> >> We support this in skbedit firstly in production. act_bpf can do more
> >> things than this. Anyway we
> >> can support this in both act_skbedit/acc_bpf. 1/2 is changed from
> >> skip_tx_queue in skb to per-cpu var suggested-by Eric. We need another
> >> patch which can change the
> >> per-cpu var in bpf. I will post this patch later.
> >
> > The point is if act_bpf can do it, you don't need to bother skbedit at
> > all.
>
> Just a comment on this general statement:
> I know of at least one large data centre that wont allow anything
> "compiled" to be installed unless it goes through a very long vetting
> process(we are talking months).
> "Compiled" includes bpf. This is common practise in a few other places
> some extreme more than others - the reasons are typically driven by
> either some overzelous program manager or security people. Regardless
> of the reasoning, this is a real issue.
Yes, I agree with you that
> Note: None of these data centres have a long process if what is
> installed is a bash script expressing policy.
> In such a case an upstream of a feature such as this is more useful.
>
>
> cheers,
> jamal
>
Jakub Kicinski Dec. 15, 2021, 4:08 p.m. UTC | #8
On Fri, 10 Dec 2021 10:36:26 +0800 xiangxia.m.yue@gmail.com wrote:
> From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> 
> This patch allows users to select queue_mapping, range
> from A to B. And users can use skb-hash, cgroup classid
> and cpuid to select Tx queues. Then we can load balance
> packets from A to B queue. The range is an unsigned 16bit
> value in decimal format.
> 
> $ tc filter ... action skbedit queue_mapping hash-type normal A B
> 
> "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> enhanced with flags:
> * SKBEDIT_F_QUEUE_MAPPING_HASH
> * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> * SKBEDIT_F_QUEUE_MAPPING_CPUID
> 
> Use skb->hash, cgroup classid, or cpuid to distribute packets.
> Then same range of tx queues can be shared for different flows,
> cgroups, or CPUs in a variety of scenarios.
> 
> For example, flows F1 may share range R1 with flows F2. The best
> way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> C1 uses R1, while Cn can use Rn.

> diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> index 00bfee70609e..ee96e0fa6566 100644
> --- a/include/net/tc_act/tc_skbedit.h
> +++ b/include/net/tc_act/tc_skbedit.h
> @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
>  	u32 mark;
>  	u32 mask;
>  	u16 queue_mapping;
> +	u16 mapping_mod;
>  	u16 ptype;
>  	struct rcu_head rcu;
>  };
> diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> index 800e93377218..5642b095d206 100644
> --- a/include/uapi/linux/tc_act/tc_skbedit.h
> +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> @@ -29,6 +29,13 @@
>  #define SKBEDIT_F_PTYPE			0x8
>  #define SKBEDIT_F_MASK			0x10
>  #define SKBEDIT_F_INHERITDSFIELD	0x20
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH	0x40
> +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID	0x80
> +#define SKBEDIT_F_QUEUE_MAPPING_CPUID	0x100
> +
> +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> +					   SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> +					   SKBEDIT_F_QUEUE_MAPPING_CPUID)

Any way we can make these defines shorter?

s/QUEUE_MAPPING_/TXQ_/ ?

>  struct tc_skbedit {
>  	tc_gen;
> @@ -45,6 +52,7 @@ enum {
>  	TCA_SKBEDIT_PTYPE,
>  	TCA_SKBEDIT_MASK,
>  	TCA_SKBEDIT_FLAGS,
> +	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
>  	__TCA_SKBEDIT_MAX
>  };
>  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> index 498feedad70a..0b0d65d7112e 100644
> --- a/net/sched/act_skbedit.c
> +++ b/net/sched/act_skbedit.c
> @@ -10,6 +10,7 @@
>  #include <linux/kernel.h>
>  #include <linux/skbuff.h>
>  #include <linux/rtnetlink.h>
> +#include <net/cls_cgroup.h>
>  #include <net/netlink.h>
>  #include <net/pkt_sched.h>
>  #include <net/ip.h>
> @@ -23,6 +24,37 @@
>  static unsigned int skbedit_net_id;
>  static struct tc_action_ops act_skbedit_ops;
>  
> +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> +			    struct sk_buff *skb)
> +{
> +	u16 queue_mapping = params->queue_mapping;
> +	u16 mapping_mod = params->mapping_mod;
> +	u32 mapping_hash_type = params->flags &
> +				SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Don't do inline init if the init doesn't fit on a line.

> +	u32 hash = 0;
> +
> +	if (!mapping_hash_type)
> +		return netdev_cap_txqueue(skb->dev, queue_mapping);

Make it "case 0:" below?

> +	switch (mapping_hash_type) {
> +	case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> +		hash = jhash_1word(task_get_classid(skb), 0);
> +		break;
> +	case SKBEDIT_F_QUEUE_MAPPING_HASH:
> +		hash = skb_get_hash(skb);
> +		break;
> +	case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> +		hash = raw_smp_processor_id();
> +		break;
> +	default:
> +		net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> +				     mapping_hash_type);
> +	}
> +
> +	queue_mapping = queue_mapping + hash % mapping_mod;
> +	return netdev_cap_txqueue(skb->dev, queue_mapping);
> +}
> +
>  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>  			   struct tcf_result *res)
>  {
> @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
>  			break;
>  		}
>  	}
> -	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> -	    skb->dev->real_num_tx_queues > params->queue_mapping) {
> +	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
>  		netdev_xmit_skip_txqueue();
> -		skb_set_queue_mapping(skb, params->queue_mapping);
> +		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
>  	}
>  	if (params->flags & SKBEDIT_F_MARK) {
>  		skb->mark &= ~params->mask;
> @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
>  	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
>  	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
>  	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
> +	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },

.type = NLA_U16 ? I think it's okay to use the modern stuff here.

>  };
>  
>  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  	struct tcf_skbedit *d;
>  	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
>  	u16 *queue_mapping = NULL, *ptype = NULL;
> +	u16 mapping_mod = 0;
>  	bool exists = false;
>  	int ret = 0, err;
>  	u32 index;
> @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  
>  	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
>  		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> +		u64 mapping_hash_type = *pure_flags &
> +					SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Again, doing init inline and new line between variables and code

> +		if (mapping_hash_type) {
> +			u16 *queue_mapping_max;
> +
> +			/* Hash types are mutually exclusive. */
> +			if (mapping_hash_type & (mapping_hash_type - 1))
> +				return -EINVAL;
> +
> +			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> +				return -EINVAL;
>  
> +			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> +				return -EINVAL;

Can be merged with the condition above, unless you add extack, which
perhaps you should..

> +			queue_mapping_max =
> +				nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);

nla_get_u16()

> +			if (*queue_mapping_max < *queue_mapping)
> +				return -EINVAL;
> +
> +			mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> +			flags |= mapping_hash_type;
> +		}
>  		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
>  			flags |= SKBEDIT_F_INHERITDSFIELD;
>  	}
> @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
>  	params_new->flags = flags;
>  	if (flags & SKBEDIT_F_PRIORITY)
>  		params_new->priority = *priority;
> -	if (flags & SKBEDIT_F_QUEUE_MAPPING)
> +	if (flags & SKBEDIT_F_QUEUE_MAPPING) {
>  		params_new->queue_mapping = *queue_mapping;
> +		params_new->mapping_mod = mapping_mod;
> +	}
>  	if (flags & SKBEDIT_F_MARK)
>  		params_new->mark = *mark;
>  	if (flags & SKBEDIT_F_PTYPE)
> @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
>  		goto nla_put_failure;
>  	if (params->flags & SKBEDIT_F_INHERITDSFIELD)
>  		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> +	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> +		if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> +				params->queue_mapping + params->mapping_mod - 1))
> +			goto nla_put_failure;
> +
> +		pure_flags |= params->flags &
> +			      SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;

Why "params->flags &" ? Given the if above the flag must be set.

> +	}
>  	if (pure_flags != 0 &&
>  	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
>  		goto nla_put_failure;
Jakub Kicinski Dec. 15, 2021, 4:12 p.m. UTC | #9
On Wed, 15 Dec 2021 09:34:15 +0800 Tonghao Zhang wrote:
> Do you have any comment?We talk about this patchset in this thread.
> Should I resend this patchset with more commit message ?

Patch 1 LGTM, minor nits on patch 2, I'd side with Jamal 
in the discussion.
Tonghao Zhang Dec. 16, 2021, 11:29 a.m. UTC | #10
On Thu, Dec 16, 2021 at 12:09 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 10 Dec 2021 10:36:26 +0800 xiangxia.m.yue@gmail.com wrote:
> > From: Tonghao Zhang <xiangxia.m.yue@gmail.com>
> >
> > This patch allows users to select queue_mapping, range
> > from A to B. And users can use skb-hash, cgroup classid
> > and cpuid to select Tx queues. Then we can load balance
> > packets from A to B queue. The range is an unsigned 16bit
> > value in decimal format.
> >
> > $ tc filter ... action skbedit queue_mapping hash-type normal A B
> >
> > "skbedit queue_mapping QUEUE_MAPPING" (from "man 8 tc-skbedit") is
> > enhanced with flags:
> > * SKBEDIT_F_QUEUE_MAPPING_HASH
> > * SKBEDIT_F_QUEUE_MAPPING_CLASSID
> > * SKBEDIT_F_QUEUE_MAPPING_CPUID
> >
> > Use skb->hash, cgroup classid, or cpuid to distribute packets.
> > Then same range of tx queues can be shared for different flows,
> > cgroups, or CPUs in a variety of scenarios.
> >
> > For example, flows F1 may share range R1 with flows F2. The best
> > way to do that is to set flag to SKBEDIT_F_QUEUE_MAPPING_HASH.
> > If cgroup C1 share the R1 with cgroup C2 .. Cn, use the
> > SKBEDIT_F_QUEUE_MAPPING_CLASSID. Of course, in some other scenario,
> > C1 uses R1, while Cn can use Rn.
>
> > diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
> > index 00bfee70609e..ee96e0fa6566 100644
> > --- a/include/net/tc_act/tc_skbedit.h
> > +++ b/include/net/tc_act/tc_skbedit.h
> > @@ -17,6 +17,7 @@ struct tcf_skbedit_params {
> >       u32 mark;
> >       u32 mask;
> >       u16 queue_mapping;
> > +     u16 mapping_mod;
> >       u16 ptype;
> >       struct rcu_head rcu;
> >  };
> > diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
> > index 800e93377218..5642b095d206 100644
> > --- a/include/uapi/linux/tc_act/tc_skbedit.h
> > +++ b/include/uapi/linux/tc_act/tc_skbedit.h
> > @@ -29,6 +29,13 @@
> >  #define SKBEDIT_F_PTYPE                      0x8
> >  #define SKBEDIT_F_MASK                       0x10
> >  #define SKBEDIT_F_INHERITDSFIELD     0x20
> > +#define SKBEDIT_F_QUEUE_MAPPING_HASH 0x40
> > +#define SKBEDIT_F_QUEUE_MAPPING_CLASSID      0x80
> > +#define SKBEDIT_F_QUEUE_MAPPING_CPUID        0x100
> > +
> > +#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
> > +                                        SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
> > +                                        SKBEDIT_F_QUEUE_MAPPING_CPUID)
>
> Any way we can make these defines shorter?
>
> s/QUEUE_MAPPING_/TXQ_/ ?
Yes
>
> >  struct tc_skbedit {
> >       tc_gen;
> > @@ -45,6 +52,7 @@ enum {
> >       TCA_SKBEDIT_PTYPE,
> >       TCA_SKBEDIT_MASK,
> >       TCA_SKBEDIT_FLAGS,
> > +     TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> >       __TCA_SKBEDIT_MAX
> >  };
> >  #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
> > diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
> > index 498feedad70a..0b0d65d7112e 100644
> > --- a/net/sched/act_skbedit.c
> > +++ b/net/sched/act_skbedit.c
> > @@ -10,6 +10,7 @@
> >  #include <linux/kernel.h>
> >  #include <linux/skbuff.h>
> >  #include <linux/rtnetlink.h>
> > +#include <net/cls_cgroup.h>
> >  #include <net/netlink.h>
> >  #include <net/pkt_sched.h>
> >  #include <net/ip.h>
> > @@ -23,6 +24,37 @@
> >  static unsigned int skbedit_net_id;
> >  static struct tc_action_ops act_skbedit_ops;
> >
> > +static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
> > +                         struct sk_buff *skb)
> > +{
> > +     u16 queue_mapping = params->queue_mapping;
> > +     u16 mapping_mod = params->mapping_mod;
> > +     u32 mapping_hash_type = params->flags &
> > +                             SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Don't do inline init if the init doesn't fit on a line.
OK
> > +     u32 hash = 0;
> > +
> > +     if (!mapping_hash_type)
> > +             return netdev_cap_txqueue(skb->dev, queue_mapping);
>
> Make it "case 0:" below?
Yes
>
> > +     switch (mapping_hash_type) {
> > +     case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
> > +             hash = jhash_1word(task_get_classid(skb), 0);
> > +             break;
> > +     case SKBEDIT_F_QUEUE_MAPPING_HASH:
> > +             hash = skb_get_hash(skb);
> > +             break;
> > +     case SKBEDIT_F_QUEUE_MAPPING_CPUID:
> > +             hash = raw_smp_processor_id();
> > +             break;
> > +     default:
> > +             net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
> > +                                  mapping_hash_type);
> > +     }
> > +
> > +     queue_mapping = queue_mapping + hash % mapping_mod;
> > +     return netdev_cap_txqueue(skb->dev, queue_mapping);
> > +}
> > +
> >  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
> >                          struct tcf_result *res)
> >  {
> > @@ -57,10 +89,9 @@ static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
> >                       break;
> >               }
> >       }
> > -     if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
> > -         skb->dev->real_num_tx_queues > params->queue_mapping) {
> > +     if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
> >               netdev_xmit_skip_txqueue();
> > -             skb_set_queue_mapping(skb, params->queue_mapping);
> > +             skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
> >       }
> >       if (params->flags & SKBEDIT_F_MARK) {
> >               skb->mark &= ~params->mask;
> > @@ -94,6 +125,7 @@ static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
> >       [TCA_SKBEDIT_PTYPE]             = { .len = sizeof(u16) },
> >       [TCA_SKBEDIT_MASK]              = { .len = sizeof(u32) },
> >       [TCA_SKBEDIT_FLAGS]             = { .len = sizeof(u64) },
> > +     [TCA_SKBEDIT_QUEUE_MAPPING_MAX] = { .len = sizeof(u16) },
>
> .type = NLA_U16 ? I think it's okay to use the modern stuff here.
TCA_SKBEDIT_QUEUE_MAPPING_MAX and TCA_SKBEDIT_QUEUE_MAPPING are the
same type of value,
so I use the len of u16 as TCA_SKBEDIT_QUEUE_MAPPING. I think it is better.
> >  };
> >
> >  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> > @@ -110,6 +142,7 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >       struct tcf_skbedit *d;
> >       u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
> >       u16 *queue_mapping = NULL, *ptype = NULL;
> > +     u16 mapping_mod = 0;
> >       bool exists = false;
> >       int ret = 0, err;
> >       u32 index;
> > @@ -154,7 +187,30 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >
> >       if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
> >               u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
> > +             u64 mapping_hash_type = *pure_flags &
> > +                                     SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Again, doing init inline and new line between variables and code
Ok
> > +             if (mapping_hash_type) {
> > +                     u16 *queue_mapping_max;
> > +
> > +                     /* Hash types are mutually exclusive. */
> > +                     if (mapping_hash_type & (mapping_hash_type - 1))
> > +                             return -EINVAL;
> > +
> > +                     if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
> > +                             return -EINVAL;
> >
> > +                     if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
> > +                             return -EINVAL;
>
> Can be merged with the condition above, unless you add extack, which
> perhaps you should..
Yes, I will merged them and add exack.
> > +                     queue_mapping_max =
> > +                             nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
>
> nla_get_u16()
>
> > +                     if (*queue_mapping_max < *queue_mapping)
> > +                             return -EINVAL;
> > +
> > +                     mapping_mod = *queue_mapping_max - *queue_mapping + 1;
> > +                     flags |= mapping_hash_type;
> > +             }
> >               if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
> >                       flags |= SKBEDIT_F_INHERITDSFIELD;
> >       }
> > @@ -206,8 +262,10 @@ static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
> >       params_new->flags = flags;
> >       if (flags & SKBEDIT_F_PRIORITY)
> >               params_new->priority = *priority;
> > -     if (flags & SKBEDIT_F_QUEUE_MAPPING)
> > +     if (flags & SKBEDIT_F_QUEUE_MAPPING) {
> >               params_new->queue_mapping = *queue_mapping;
> > +             params_new->mapping_mod = mapping_mod;
> > +     }
> >       if (flags & SKBEDIT_F_MARK)
> >               params_new->mark = *mark;
> >       if (flags & SKBEDIT_F_PTYPE)
> > @@ -274,6 +332,14 @@ static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
> >               goto nla_put_failure;
> >       if (params->flags & SKBEDIT_F_INHERITDSFIELD)
> >               pure_flags |= SKBEDIT_F_INHERITDSFIELD;
> > +     if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
> > +             if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
> > +                             params->queue_mapping + params->mapping_mod - 1))
> > +                     goto nla_put_failure;
> > +
> > +             pure_flags |= params->flags &
> > +                           SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
>
> Why "params->flags &" ? Given the if above the flag must be set.
We have checked "params->flags", we can use it in this block. Thanks!
> > +     }
> >       if (pure_flags != 0 &&
> >           nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
> >               goto nla_put_failure;
>
diff mbox series

Patch

diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index 00bfee70609e..ee96e0fa6566 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -17,6 +17,7 @@  struct tcf_skbedit_params {
 	u32 mark;
 	u32 mask;
 	u16 queue_mapping;
+	u16 mapping_mod;
 	u16 ptype;
 	struct rcu_head rcu;
 };
diff --git a/include/uapi/linux/tc_act/tc_skbedit.h b/include/uapi/linux/tc_act/tc_skbedit.h
index 800e93377218..5642b095d206 100644
--- a/include/uapi/linux/tc_act/tc_skbedit.h
+++ b/include/uapi/linux/tc_act/tc_skbedit.h
@@ -29,6 +29,13 @@ 
 #define SKBEDIT_F_PTYPE			0x8
 #define SKBEDIT_F_MASK			0x10
 #define SKBEDIT_F_INHERITDSFIELD	0x20
+#define SKBEDIT_F_QUEUE_MAPPING_HASH	0x40
+#define SKBEDIT_F_QUEUE_MAPPING_CLASSID	0x80
+#define SKBEDIT_F_QUEUE_MAPPING_CPUID	0x100
+
+#define SKBEDIT_F_QUEUE_MAPPING_HASH_MASK (SKBEDIT_F_QUEUE_MAPPING_HASH | \
+					   SKBEDIT_F_QUEUE_MAPPING_CLASSID | \
+					   SKBEDIT_F_QUEUE_MAPPING_CPUID)
 
 struct tc_skbedit {
 	tc_gen;
@@ -45,6 +52,7 @@  enum {
 	TCA_SKBEDIT_PTYPE,
 	TCA_SKBEDIT_MASK,
 	TCA_SKBEDIT_FLAGS,
+	TCA_SKBEDIT_QUEUE_MAPPING_MAX,
 	__TCA_SKBEDIT_MAX
 };
 #define TCA_SKBEDIT_MAX (__TCA_SKBEDIT_MAX - 1)
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 498feedad70a..0b0d65d7112e 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -10,6 +10,7 @@ 
 #include <linux/kernel.h>
 #include <linux/skbuff.h>
 #include <linux/rtnetlink.h>
+#include <net/cls_cgroup.h>
 #include <net/netlink.h>
 #include <net/pkt_sched.h>
 #include <net/ip.h>
@@ -23,6 +24,37 @@ 
 static unsigned int skbedit_net_id;
 static struct tc_action_ops act_skbedit_ops;
 
+static u16 tcf_skbedit_hash(struct tcf_skbedit_params *params,
+			    struct sk_buff *skb)
+{
+	u16 queue_mapping = params->queue_mapping;
+	u16 mapping_mod = params->mapping_mod;
+	u32 mapping_hash_type = params->flags &
+				SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+	u32 hash = 0;
+
+	if (!mapping_hash_type)
+		return netdev_cap_txqueue(skb->dev, queue_mapping);
+
+	switch (mapping_hash_type) {
+	case SKBEDIT_F_QUEUE_MAPPING_CLASSID:
+		hash = jhash_1word(task_get_classid(skb), 0);
+		break;
+	case SKBEDIT_F_QUEUE_MAPPING_HASH:
+		hash = skb_get_hash(skb);
+		break;
+	case SKBEDIT_F_QUEUE_MAPPING_CPUID:
+		hash = raw_smp_processor_id();
+		break;
+	default:
+		net_warn_ratelimited("The type of queue_mapping hash is not supported. 0x%x\n",
+				     mapping_hash_type);
+	}
+
+	queue_mapping = queue_mapping + hash % mapping_mod;
+	return netdev_cap_txqueue(skb->dev, queue_mapping);
+}
+
 static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			   struct tcf_result *res)
 {
@@ -57,10 +89,9 @@  static int tcf_skbedit_act(struct sk_buff *skb, const struct tc_action *a,
 			break;
 		}
 	}
-	if (params->flags & SKBEDIT_F_QUEUE_MAPPING &&
-	    skb->dev->real_num_tx_queues > params->queue_mapping) {
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING) {
 		netdev_xmit_skip_txqueue();
-		skb_set_queue_mapping(skb, params->queue_mapping);
+		skb_set_queue_mapping(skb, tcf_skbedit_hash(params, skb));
 	}
 	if (params->flags & SKBEDIT_F_MARK) {
 		skb->mark &= ~params->mask;
@@ -94,6 +125,7 @@  static const struct nla_policy skbedit_policy[TCA_SKBEDIT_MAX + 1] = {
 	[TCA_SKBEDIT_PTYPE]		= { .len = sizeof(u16) },
 	[TCA_SKBEDIT_MASK]		= { .len = sizeof(u32) },
 	[TCA_SKBEDIT_FLAGS]		= { .len = sizeof(u64) },
+	[TCA_SKBEDIT_QUEUE_MAPPING_MAX]	= { .len = sizeof(u16) },
 };
 
 static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
@@ -110,6 +142,7 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	struct tcf_skbedit *d;
 	u32 flags = 0, *priority = NULL, *mark = NULL, *mask = NULL;
 	u16 *queue_mapping = NULL, *ptype = NULL;
+	u16 mapping_mod = 0;
 	bool exists = false;
 	int ret = 0, err;
 	u32 index;
@@ -154,7 +187,30 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 
 	if (tb[TCA_SKBEDIT_FLAGS] != NULL) {
 		u64 *pure_flags = nla_data(tb[TCA_SKBEDIT_FLAGS]);
+		u64 mapping_hash_type = *pure_flags &
+					SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+		if (mapping_hash_type) {
+			u16 *queue_mapping_max;
+
+			/* Hash types are mutually exclusive. */
+			if (mapping_hash_type & (mapping_hash_type - 1))
+				return -EINVAL;
+
+			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX])
+				return -EINVAL;
 
+			if (!tb[TCA_SKBEDIT_QUEUE_MAPPING])
+				return -EINVAL;
+
+			queue_mapping_max =
+				nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING_MAX]);
+
+			if (*queue_mapping_max < *queue_mapping)
+				return -EINVAL;
+
+			mapping_mod = *queue_mapping_max - *queue_mapping + 1;
+			flags |= mapping_hash_type;
+		}
 		if (*pure_flags & SKBEDIT_F_INHERITDSFIELD)
 			flags |= SKBEDIT_F_INHERITDSFIELD;
 	}
@@ -206,8 +262,10 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	params_new->flags = flags;
 	if (flags & SKBEDIT_F_PRIORITY)
 		params_new->priority = *priority;
-	if (flags & SKBEDIT_F_QUEUE_MAPPING)
+	if (flags & SKBEDIT_F_QUEUE_MAPPING) {
 		params_new->queue_mapping = *queue_mapping;
+		params_new->mapping_mod = mapping_mod;
+	}
 	if (flags & SKBEDIT_F_MARK)
 		params_new->mark = *mark;
 	if (flags & SKBEDIT_F_PTYPE)
@@ -274,6 +332,14 @@  static int tcf_skbedit_dump(struct sk_buff *skb, struct tc_action *a,
 		goto nla_put_failure;
 	if (params->flags & SKBEDIT_F_INHERITDSFIELD)
 		pure_flags |= SKBEDIT_F_INHERITDSFIELD;
+	if (params->flags & SKBEDIT_F_QUEUE_MAPPING_HASH_MASK) {
+		if (nla_put_u16(skb, TCA_SKBEDIT_QUEUE_MAPPING_MAX,
+				params->queue_mapping + params->mapping_mod - 1))
+			goto nla_put_failure;
+
+		pure_flags |= params->flags &
+			      SKBEDIT_F_QUEUE_MAPPING_HASH_MASK;
+	}
 	if (pure_flags != 0 &&
 	    nla_put(skb, TCA_SKBEDIT_FLAGS, sizeof(pure_flags), &pure_flags))
 		goto nla_put_failure;