diff mbox series

[net-next,v3,1/3] act_skbedit: skbedit queue mapping for receive queue

Message ID 166633911976.52141.3907831602027668289.stgit@anambiarhost.jf.intel.com (mailing list archive)
State Accepted
Commit 4a6a676f8c16ec17d2f8d69ce3b5d680277ed0d2
Delegated to: Netdev Maintainers
Headers show
Series Extend action skbedit to RX queue mapping | expand

Checks

Context Check Description
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/subject_prefix success Link
netdev/cover_letter success Series has a cover letter
netdev/patch_count success Link
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1770 this patch: 1770
netdev/cc_maintainers warning 3 maintainers not CCed: pabeni@redhat.com edumazet@google.com davem@davemloft.net
netdev/build_clang success Errors and warnings before: 168 this patch: 168
netdev/module_param success Was 0 now: 0
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1917 this patch: 1917
netdev/checkpatch warning WARNING: line length of 125 exceeds 80 columns WARNING: line length of 131 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Nambiar, Amritha Oct. 21, 2022, 7:58 a.m. UTC
Add support for skbedit queue mapping action on receive
side. This is supported only in hardware, so the skip_sw
flag is enforced. This enables offloading filters for
receive queue selection in the hardware using the
skbedit action. Traffic arrives on the Rx queue requested
in the skbedit action parameter. A new tc action flag
TCA_ACT_FLAGS_AT_INGRESS is introduced to identify the
traffic direction the action queue_mapping is requested
on during filter addition. This is used to disallow
offloading the skbedit queue mapping action on transmit
side.

Example:
$tc filter add dev $IFACE ingress protocol ip flower dst_ip $DST_IP\
 action skbedit queue_mapping $rxq_id skip_sw

Reviewed-by: Sridhar Samudrala <sridhar.samudrala@intel.com>
Signed-off-by: Amritha Nambiar <amritha.nambiar@intel.com>
---
 include/net/act_api.h           |    1 +
 include/net/flow_offload.h      |    2 ++
 include/net/tc_act/tc_skbedit.h |   29 +++++++++++++++++++++++++++++
 net/sched/act_skbedit.c         |   14 ++++++++++++--
 net/sched/cls_api.c             |    7 +++++++
 5 files changed, 51 insertions(+), 2 deletions(-)

Comments

Roi Dayan Oct. 26, 2022, 11:40 a.m. UTC | #1
>  
>  /* Update lastuse only if needed, to avoid dirtying a cache line.
>   * We use a temp variable to avoid fetching jiffies twice.
> diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
> index e343f9f8363e..7a60bc6d72c9 100644
> --- a/include/net/flow_offload.h
> +++ b/include/net/flow_offload.h
> @@ -155,6 +155,7 @@ enum flow_action_id {
>  	FLOW_ACTION_MARK,
>  	FLOW_ACTION_PTYPE,
>  	FLOW_ACTION_PRIORITY,
> +	FLOW_ACTION_RX_QUEUE_MAPPING,
>  	FLOW_ACTION_WAKE,
>  	FLOW_ACTION_QUEUE,
>  	FLOW_ACTION_SAMPLE,
> @@ -247,6 +248,7 @@ struct flow_action_entry {
>  		u32			csum_flags;	/* FLOW_ACTION_CSUM */
>  		u32			mark;		/* FLOW_ACTION_MARK */
>  		u16                     ptype;          /* FLOW_ACTION_PTYPE */
> +		u16			rx_queue;	/* FLOW_ACTION_RX_QUEUE_MAPPING */
>  		u32			priority;	/* FLOW_ACTION_PRIORITY */
>  		struct {				/* FLOW_ACTION_QUEUE */
>  			u32		ctx;


Hi,

This patch broke mlx5_core TC offloads.
We have a generic code part going over the enum values and have a list
of action pointers to handle parsing each action without knowing the action.
The list of actions depends on being aligned with the values order of
the enum which I think usually new values should go to the end of the list.
I'm not sure if other code parts are broken from this change but at
least one part is.
New values were always inserted at the end.

Can you make a fixup patch to move FLOW_ACTION_RX_QUEUE_MAPPING to
the end of the enum list?
i.e. right before NUM_FLOW_ACTIONS.

Thanks,
Roi
Jakub Kicinski Oct. 26, 2022, 4:17 p.m. UTC | #2
On Wed, 26 Oct 2022 14:40:39 +0300 Roi Dayan wrote:
> This patch broke mlx5_core TC offloads.
> We have a generic code part going over the enum values and have a list
> of action pointers to handle parsing each action without knowing the action.
> The list of actions depends on being aligned with the values order of
> the enum which I think usually new values should go to the end of the list.
> I'm not sure if other code parts are broken from this change but at
> least one part is.
> New values were always inserted at the end.
> 
> Can you make a fixup patch to move FLOW_ACTION_RX_QUEUE_MAPPING to
> the end of the enum list?
> i.e. right before NUM_FLOW_ACTIONS.

Odd, can you point us to the exact code that got broken?
There are no guarantees on ordering of kernel-internal enum
and I think it's a bad idea to make such precedent.
Roi Dayan Oct. 27, 2022, 7:12 a.m. UTC | #3
On 26/10/2022 19:17, Jakub Kicinski wrote:
> On Wed, 26 Oct 2022 14:40:39 +0300 Roi Dayan wrote:
>> This patch broke mlx5_core TC offloads.
>> We have a generic code part going over the enum values and have a list
>> of action pointers to handle parsing each action without knowing the action.
>> The list of actions depends on being aligned with the values order of
>> the enum which I think usually new values should go to the end of the list.
>> I'm not sure if other code parts are broken from this change but at
>> least one part is.
>> New values were always inserted at the end.
>>
>> Can you make a fixup patch to move FLOW_ACTION_RX_QUEUE_MAPPING to
>> the end of the enum list?
>> i.e. right before NUM_FLOW_ACTIONS.
> 
> Odd, can you point us to the exact code that got broken?
> There are no guarantees on ordering of kernel-internal enum
> and I think it's a bad idea to make such precedent.


ok. I were in the thought order is kept.

You can see our usage in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
in function parse_tc_actions().
we loop over the actions and get a struct with function pointers
that represent the flow action and we use those function pointers
to parse whats needed without parse_tc_actions() knowing the action.

the function pointers are in drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.c
see static struct mlx5e_tc_act *tc_acts_fdb[NUM_FLOW_ACTIONS].
each function handling code is in a different file under that sub folder.

if order is not important i guess i'll do a function to return the ops i need
per enum value.
please let me know if to continue this road.
thanks
Roi Dayan Oct. 27, 2022, 8:10 a.m. UTC | #4
On 27/10/2022 10:12, Roi Dayan wrote:
> 
> 
> On 26/10/2022 19:17, Jakub Kicinski wrote:
>> On Wed, 26 Oct 2022 14:40:39 +0300 Roi Dayan wrote:
>>> This patch broke mlx5_core TC offloads.
>>> We have a generic code part going over the enum values and have a list
>>> of action pointers to handle parsing each action without knowing the action.
>>> The list of actions depends on being aligned with the values order of
>>> the enum which I think usually new values should go to the end of the list.
>>> I'm not sure if other code parts are broken from this change but at
>>> least one part is.
>>> New values were always inserted at the end.
>>>
>>> Can you make a fixup patch to move FLOW_ACTION_RX_QUEUE_MAPPING to
>>> the end of the enum list?
>>> i.e. right before NUM_FLOW_ACTIONS.
>>
>> Odd, can you point us to the exact code that got broken?
>> There are no guarantees on ordering of kernel-internal enum
>> and I think it's a bad idea to make such precedent.
> 
> 
> ok. I were in the thought order is kept.
> 
> You can see our usage in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
> in function parse_tc_actions().
> we loop over the actions and get a struct with function pointers
> that represent the flow action and we use those function pointers
> to parse whats needed without parse_tc_actions() knowing the action.
> 
> the function pointers are in drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.c
> see static struct mlx5e_tc_act *tc_acts_fdb[NUM_FLOW_ACTIONS].
> each function handling code is in a different file under that sub folder.
> 
> if order is not important i guess i'll do a function to return the ops i need
> per enum value.
> please let me know if to continue this road.
> thanks

Hi,

going to do this change, which I didn't remember to do from the start.

static struct mlx5e_tc_act *tc_acts_fdb[NUM_FLOW_ACTIONS] = {                                       
        [FLOW_ACTION_ACCEPT] = &mlx5e_tc_act_accept,                                                
        [FLOW_ACTION_DROP] = &mlx5e_tc_act_drop,                                                    
        [FLOW_ACTION_TRAP] = &mlx5e_tc_act_trap,                                                    
        [FLOW_ACTION_GOTO] = &mlx5e_tc_act_goto,                 
.
.
.


then the ordering is not important.

Thanks,
Roi
Saeed Mahameed Oct. 27, 2022, 8:10 a.m. UTC | #5
On 27 Oct 10:12, Roi Dayan wrote:
>
>
>On 26/10/2022 19:17, Jakub Kicinski wrote:
>> On Wed, 26 Oct 2022 14:40:39 +0300 Roi Dayan wrote:
>>> This patch broke mlx5_core TC offloads.
>>> We have a generic code part going over the enum values and have a list
>>> of action pointers to handle parsing each action without knowing the action.
>>> The list of actions depends on being aligned with the values order of
>>> the enum which I think usually new values should go to the end of the list.
>>> I'm not sure if other code parts are broken from this change but at
>>> least one part is.
>>> New values were always inserted at the end.
>>>
>>> Can you make a fixup patch to move FLOW_ACTION_RX_QUEUE_MAPPING to
>>> the end of the enum list?
>>> i.e. right before NUM_FLOW_ACTIONS.
>>
>> Odd, can you point us to the exact code that got broken?
>> There are no guarantees on ordering of kernel-internal enum
>> and I think it's a bad idea to make such precedent.
>
>
>ok. I were in the thought order is kept.
>
>You can see our usage in drivers/net/ethernet/mellanox/mlx5/core/en_tc.c
>in function parse_tc_actions().
>we loop over the actions and get a struct with function pointers
>that represent the flow action and we use those function pointers
>to parse whats needed without parse_tc_actions() knowing the action.
>
>the function pointers are in drivers/net/ethernet/mellanox/mlx5/core/en/tc/act/act.c
>see static struct mlx5e_tc_act *tc_acts_fdb[NUM_FLOW_ACTIONS].
>each function handling code is in a different file under that sub folder.
>
>if order is not important i guess i'll do a function to return the ops i need
>per enum value.
>please let me know if to continue this road.
>thanks

Order is not guaranteed, let's have a robust solution.
You can define an explicit static mapping in the driver, not in a for loop.
Jakub Kicinski Oct. 28, 2022, 2:48 a.m. UTC | #6
On Thu, 27 Oct 2022 11:10:21 +0300 Roi Dayan wrote:
> going to do this change, which I didn't remember to do from the start.
> 
> static struct mlx5e_tc_act *tc_acts_fdb[NUM_FLOW_ACTIONS] = {
>         [FLOW_ACTION_ACCEPT] = &mlx5e_tc_act_accept,
>         [FLOW_ACTION_DROP] = &mlx5e_tc_act_drop,
>         [FLOW_ACTION_TRAP] = &mlx5e_tc_act_trap,
>         [FLOW_ACTION_GOTO] = &mlx5e_tc_act_goto, 


diff mbox series

Patch

diff --git a/include/net/act_api.h b/include/net/act_api.h
index 61f2ceb3939e..c94ea1a306e0 100644
--- a/include/net/act_api.h
+++ b/include/net/act_api.h
@@ -67,6 +67,7 @@  struct tc_action {
 #define TCA_ACT_FLAGS_BIND	(1U << (TCA_ACT_FLAGS_USER_BITS + 1))
 #define TCA_ACT_FLAGS_REPLACE	(1U << (TCA_ACT_FLAGS_USER_BITS + 2))
 #define TCA_ACT_FLAGS_NO_RTNL	(1U << (TCA_ACT_FLAGS_USER_BITS + 3))
+#define TCA_ACT_FLAGS_AT_INGRESS	(1U << (TCA_ACT_FLAGS_USER_BITS + 4))
 
 /* Update lastuse only if needed, to avoid dirtying a cache line.
  * We use a temp variable to avoid fetching jiffies twice.
diff --git a/include/net/flow_offload.h b/include/net/flow_offload.h
index e343f9f8363e..7a60bc6d72c9 100644
--- a/include/net/flow_offload.h
+++ b/include/net/flow_offload.h
@@ -155,6 +155,7 @@  enum flow_action_id {
 	FLOW_ACTION_MARK,
 	FLOW_ACTION_PTYPE,
 	FLOW_ACTION_PRIORITY,
+	FLOW_ACTION_RX_QUEUE_MAPPING,
 	FLOW_ACTION_WAKE,
 	FLOW_ACTION_QUEUE,
 	FLOW_ACTION_SAMPLE,
@@ -247,6 +248,7 @@  struct flow_action_entry {
 		u32			csum_flags;	/* FLOW_ACTION_CSUM */
 		u32			mark;		/* FLOW_ACTION_MARK */
 		u16                     ptype;          /* FLOW_ACTION_PTYPE */
+		u16			rx_queue;	/* FLOW_ACTION_RX_QUEUE_MAPPING */
 		u32			priority;	/* FLOW_ACTION_PRIORITY */
 		struct {				/* FLOW_ACTION_QUEUE */
 			u32		ctx;
diff --git a/include/net/tc_act/tc_skbedit.h b/include/net/tc_act/tc_skbedit.h
index dc1079f28e13..9649600fb3dc 100644
--- a/include/net/tc_act/tc_skbedit.h
+++ b/include/net/tc_act/tc_skbedit.h
@@ -95,12 +95,41 @@  static inline u32 tcf_skbedit_priority(const struct tc_action *a)
 	return priority;
 }
 
+static inline u16 tcf_skbedit_rx_queue_mapping(const struct tc_action *a)
+{
+	u16 rx_queue;
+
+	rcu_read_lock();
+	rx_queue = rcu_dereference(to_skbedit(a)->params)->queue_mapping;
+	rcu_read_unlock();
+
+	return rx_queue;
+}
+
 /* Return true iff action is queue_mapping */
 static inline bool is_tcf_skbedit_queue_mapping(const struct tc_action *a)
 {
 	return is_tcf_skbedit_with_flag(a, SKBEDIT_F_QUEUE_MAPPING);
 }
 
+/* Return true if action is on ingress traffic */
+static inline bool is_tcf_skbedit_ingress(u32 flags)
+{
+	return flags & TCA_ACT_FLAGS_AT_INGRESS;
+}
+
+static inline bool is_tcf_skbedit_tx_queue_mapping(const struct tc_action *a)
+{
+	return is_tcf_skbedit_queue_mapping(a) &&
+	       !is_tcf_skbedit_ingress(a->tcfa_flags);
+}
+
+static inline bool is_tcf_skbedit_rx_queue_mapping(const struct tc_action *a)
+{
+	return is_tcf_skbedit_queue_mapping(a) &&
+	       is_tcf_skbedit_ingress(a->tcfa_flags);
+}
+
 /* Return true iff action is inheritdsfield */
 static inline bool is_tcf_skbedit_inheritdsfield(const struct tc_action *a)
 {
diff --git a/net/sched/act_skbedit.c b/net/sched/act_skbedit.c
index 7f598784fd30..1710780c908a 100644
--- a/net/sched/act_skbedit.c
+++ b/net/sched/act_skbedit.c
@@ -148,6 +148,11 @@  static int tcf_skbedit_init(struct net *net, struct nlattr *nla,
 	}
 
 	if (tb[TCA_SKBEDIT_QUEUE_MAPPING] != NULL) {
+		if (is_tcf_skbedit_ingress(act_flags) &&
+		    !(act_flags & TCA_ACT_FLAGS_SKIP_SW)) {
+			NL_SET_ERR_MSG_MOD(extack, "\"queue_mapping\" option on receive side is hardware only, use skip_sw");
+			return -EOPNOTSUPP;
+		}
 		flags |= SKBEDIT_F_QUEUE_MAPPING;
 		queue_mapping = nla_data(tb[TCA_SKBEDIT_QUEUE_MAPPING]);
 	}
@@ -374,9 +379,12 @@  static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 		} else if (is_tcf_skbedit_priority(act)) {
 			entry->id = FLOW_ACTION_PRIORITY;
 			entry->priority = tcf_skbedit_priority(act);
-		} else if (is_tcf_skbedit_queue_mapping(act)) {
-			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"queue_mapping\" option is used");
+		} else if (is_tcf_skbedit_tx_queue_mapping(act)) {
+			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"queue_mapping\" option is used on transmit side");
 			return -EOPNOTSUPP;
+		} else if (is_tcf_skbedit_rx_queue_mapping(act)) {
+			entry->id = FLOW_ACTION_RX_QUEUE_MAPPING;
+			entry->rx_queue = tcf_skbedit_rx_queue_mapping(act);
 		} else if (is_tcf_skbedit_inheritdsfield(act)) {
 			NL_SET_ERR_MSG_MOD(extack, "Offload not supported when \"inheritdsfield\" option is used");
 			return -EOPNOTSUPP;
@@ -394,6 +402,8 @@  static int tcf_skbedit_offload_act_setup(struct tc_action *act, void *entry_data
 			fl_action->id = FLOW_ACTION_PTYPE;
 		else if (is_tcf_skbedit_priority(act))
 			fl_action->id = FLOW_ACTION_PRIORITY;
+		else if (is_tcf_skbedit_rx_queue_mapping(act))
+			fl_action->id = FLOW_ACTION_RX_QUEUE_MAPPING;
 		else
 			return -EOPNOTSUPP;
 	}
diff --git a/net/sched/cls_api.c b/net/sched/cls_api.c
index 50566db45949..23d1cfa4f58c 100644
--- a/net/sched/cls_api.c
+++ b/net/sched/cls_api.c
@@ -1953,6 +1953,11 @@  static void tfilter_put(struct tcf_proto *tp, void *fh)
 		tp->ops->put(tp, fh);
 }
 
+static bool is_qdisc_ingress(__u32 classid)
+{
+	return (TC_H_MIN(classid) == TC_H_MIN(TC_H_MIN_INGRESS));
+}
+
 static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 			  struct netlink_ext_ack *extack)
 {
@@ -2144,6 +2149,8 @@  static int tc_new_tfilter(struct sk_buff *skb, struct nlmsghdr *n,
 		flags |= TCA_ACT_FLAGS_REPLACE;
 	if (!rtnl_held)
 		flags |= TCA_ACT_FLAGS_NO_RTNL;
+	if (is_qdisc_ingress(parent))
+		flags |= TCA_ACT_FLAGS_AT_INGRESS;
 	err = tp->ops->change(net, skb, tp, cl, t->tcm_handle, tca, &fh,
 			      flags, extack);
 	if (err == 0) {