diff mbox series

[net-next,1/5] netlink: spec: add shaper YAML spec

Message ID 75cb77aa91040829e55c5cae73e79349f3988e06.1719518113.git.pabeni@redhat.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series net: introduce TX shaping H/W offload API | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl fail Generated files up to date; build failed; build has 3 warnings/errors; GEN HAS DIFF 2 files changed, 1025 insertions(+);
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 839 this patch: 839
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 2 maintainers not CCed: edumazet@google.com donald.hunter@gmail.com
netdev/build_clang success Errors and warnings before: 846 this patch: 846
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 851 this patch: 851
netdev/checkpatch warning WARNING: added, moved or deleted file(s), does MAINTAINERS need updating? WARNING: line length of 81 exceeds 80 columns WARNING: line length of 82 exceeds 80 columns WARNING: line length of 85 exceeds 80 columns WARNING: line length of 87 exceeds 80 columns WARNING: line length of 88 exceeds 80 columns WARNING: line length of 90 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Paolo Abeni June 27, 2024, 8:17 p.m. UTC
Define the user-space visible interface to query, configure and delete
network shapers via yaml definition.

Add dummy implementations for the relevant NL callbacks.

set() and delete() operations allows touching multiple shapers with a
single operation, atomically.

Signed-off-by: Paolo Abeni <pabeni@redhat.com>
---
v4 -> v5:
 - fixed a few typos
 - set() and get() ops reply with the number of affected handles
 - re-ordered bps and pps
 - added 'unspec' scope
v3 -> v4:
 - dropped 'major'
 - renamed 'minor' as 'id'
 - rename 'bw_{max,min} as 'bw-{max,min}'
v2 -> v3:
 - dropped 'move' op, use parent in 'set' instead
 - expand 'handle' in 'scope', 'major', 'minor'
 - rename 'queue_group' scope to 'detached'
 - rename 'info' attr to 'shapers'
 - added pad attribute (for 64 bits' sake)
v1 -> v2:
 - reset -> delete
 - added 'parent' and 'burst'
---
 Documentation/netlink/specs/shaper.yaml | 202 ++++++++++++++++++++++++
 include/uapi/linux/net_shaper.h         |  73 +++++++++
 net/Kconfig                             |   3 +
 net/Makefile                            |   1 +
 net/shaper/Makefile                     |   9 ++
 net/shaper/shaper.c                     |  34 ++++
 net/shaper/shaper_nl_gen.c              |  93 +++++++++++
 net/shaper/shaper_nl_gen.h              |  25 +++
 8 files changed, 440 insertions(+)
 create mode 100644 Documentation/netlink/specs/shaper.yaml
 create mode 100644 include/uapi/linux/net_shaper.h
 create mode 100644 net/shaper/Makefile
 create mode 100644 net/shaper/shaper.c
 create mode 100644 net/shaper/shaper_nl_gen.c
 create mode 100644 net/shaper/shaper_nl_gen.h

Comments

Jakub Kicinski June 29, 2024, 2:12 a.m. UTC | #1
On Thu, 27 Jun 2024 22:17:18 +0200 Paolo Abeni wrote:
> +      -
> +        name: port
> +        doc: The root shaper for the whole H/W.
> +      -
> +        name: netdev
> +        doc: The main shaper for the given network device.
> +      -
> +        name: queue
> +        doc: The shaper is attached to the given device queue.
> +      -
> +        name: detached
> +        doc: |
> +             The shaper can be attached to port, netdev or other
> +             detached shapers, allowing nesting and grouping of
> +             netdev or queues.
> +    render-max: true

nit: move other properties before the list

> +attribute-sets:
> +  -
> +    name: net_shaper

s/_/-/ on all names

> +    attributes:
> +      -
> +        name: ifindex
> +        type: u32
> +      -
> +        name: parent
> +        type: nest
> +        nested-attributes: handle
> +      -
> +        name: handle
> +        type: nest
> +        nested-attributes: handle
> +      -
> +        name: metric
> +        type: u32
> +        enum: metric
> +      -
> +        name: bw-min
> +        type: u64

s/u64/uint/

> +      -
> +         name: handles
> +         type: nest
> +         multi-attr: true
> +         nested-attributes: handle

oh both handle and handles...
it'd be good to have more doc: for the attributes, all these things get
auto-rendered on docs.kernel.org

> +      -
> +        name: shapers
> +        type: nest
> +        multi-attr: true
> +        nested-attributes: ns-info

How do shapers differ from shaping attrs in this scope? :S

> +      -
> +        name: modified
> +        type: u32
> +      -
> +        name: pad
> +        type: pad

after using uint this can go

> +operations:
> +  list:
> +    -
> +      name: get
> +      doc: |
> +        Get / Dump information about a/all the shaper for a given device
> +      attribute-set: net_shaper
> +      flags: [ admin-perm ]

Any reason why get is admin-perm ?
Paolo Abeni July 1, 2024, 10:14 a.m. UTC | #2
On Fri, 2024-06-28 at 19:12 -0700, Jakub Kicinski wrote:
> On Thu, 27 Jun 2024 22:17:18 +0200 Paolo Abeni wrote:
> 
> > +      -
> > +        name: shapers
> > +        type: nest
> > +        multi-attr: true
> > +        nested-attributes: ns-info
> 
> How do shapers differ from shaping attrs in this scope? :S

the set() operation must configure multiple shapers with a single
command - to allow the 'atomic configuration changes' need for Andrew's
use-case.

Out-of-sheer ignorance on my side, the above was the most straight-
forward way to provide set() with an array of shapers.

Do you mean there are better way to achieve the goal, or "just" that
the documentation here is missing and _necessary_?

> > +operations:
> > +  list:
> > +    -
> > +      name: get
> > +      doc: |
> > +        Get / Dump information about a/all the shaper for a given device
> > +      attribute-set: net_shaper
> > +      flags: [ admin-perm ]
> 
> Any reason why get is admin-perm ?

Mostly a "better safe then sorry" approach and cargo-cult form other
recent yaml changes the hard reasons. Fine to drop it, if there is
agreement.

Side note: ack to everything else noted in your reply.

Thanks,

Paolo
Paolo Abeni July 1, 2024, 2:50 p.m. UTC | #3
On Fri, 2024-06-28 at 19:12 -0700, Jakub Kicinski wrote:
> On Thu, 27 Jun 2024 22:17:18 +0200 Paolo Abeni wrote:
> 
> > +attribute-sets:
> > +  -
> > +    name: net_shaper
> 
> s/_/-/ on all names

It looks like the initial

name: net_shaper

would require some patching to ynl-gen-c.py to handle the s/_/-/
conversion in the generated file names, too.

Cheers,

Paolo
Paolo Abeni July 1, 2024, 3:50 p.m. UTC | #4
On Mon, 2024-07-01 at 16:50 +0200, Paolo Abeni wrote:
> On Fri, 2024-06-28 at 19:12 -0700, Jakub Kicinski wrote:
> > On Thu, 27 Jun 2024 22:17:18 +0200 Paolo Abeni wrote:
> > 
> > > +attribute-sets:
> > > +  -
> > > +    name: net_shaper
> > 
> > s/_/-/ on all names
> 
> It looks like the initial
> 
> name: net_shaper
> 
> would require some patching to ynl-gen-c.py to handle the s/_/-/
> conversion in the generated file names, too.

I mean something alike the following, to be explicit:
---
diff --git a/tools/net/ynl/ynl-gen-c.py b/tools/net/ynl/ynl-gen-c.py
index 374ca5e86e24..51529fabd517 100755
--- a/tools/net/ynl/ynl-gen-c.py
+++ b/tools/net/ynl/ynl-gen-c.py
@@ -59,9 +59,9 @@ class Type(SpecAttr):
         if 'nested-attributes' in attr:
             self.nested_attrs = attr['nested-attributes']
             if self.nested_attrs == family.name:
-                self.nested_render_name = c_lower(f"{family.name}")
+                self.nested_render_name = c_lower(f"{family.ident_name}")
             else:
-                self.nested_render_name = c_lower(f"{family.name}_{self.nested_attrs}")
+                self.nested_render_name = c_lower(f"{family.ident_name}_{self.nested_attrs}")
 
             if self.nested_attrs in self.family.consts:
                 self.nested_struct_type = 'struct ' + self.nested_render_name + '_'
@@ -693,9 +693,9 @@ class Struct:
 
         self.nested = type_list is None
         if family.name == c_lower(space_name):
-            self.render_name = c_lower(family.name)
+            self.render_name = c_lower(family.ident_name)
         else:
-            self.render_name = c_lower(family.name + '-' + space_name)
+            self.render_name = c_lower(family.ident_name + '-' + space_name)
         self.struct_name = 'struct ' + self.render_name
         if self.nested and space_name in family.consts:
             self.struct_name += '_'
@@ -761,7 +761,7 @@ class EnumEntry(SpecEnumEntry):
 
 class EnumSet(SpecEnumSet):
     def __init__(self, family, yaml):
-        self.render_name = c_lower(family.name + '-' + yaml['name'])
+        self.render_name = c_lower(family.ident_name + '-' + yaml['name'])
 
         if 'enum-name' in yaml:
             if yaml['enum-name']:
@@ -777,7 +777,7 @@ class EnumSet(SpecEnumSet):
         else:
             self.user_type = 'int'
 
-        self.value_pfx = yaml.get('name-prefix', f"{family.name}-{yaml['name']}-")
+        self.value_pfx = yaml.get('name-prefix', f"{family.ident_name}-{yaml['name']}-")
 
         super().__init__(family, yaml)
 
@@ -802,9 +802,9 @@ class AttrSet(SpecAttrSet):
             if 'name-prefix' in yaml:
                 pfx = yaml['name-prefix']
             elif self.name == family.name:
-                pfx = family.name + '-a-'
+                pfx = family.ident_name + '-a-'
             else:
-                pfx = f"{family.name}-a-{self.name}-"
+                pfx = f"{family.ident_name}-a-{self.name}-"
             self.name_prefix = c_upper(pfx)
             self.max_name = c_upper(self.yaml.get('attr-max-name', f"{self.name_prefix}max"))
             self.cnt_name = c_upper(self.yaml.get('attr-cnt-name', f"__{self.name_prefix}max"))
@@ -861,7 +861,7 @@ class Operation(SpecOperation):
     def __init__(self, family, yaml, req_value, rsp_value):
         super().__init__(family, yaml, req_value, rsp_value)
 
-        self.render_name = c_lower(family.name + '_' + self.name)
+        self.render_name = c_lower(family.ident_name + '_' + self.name)
 
         self.dual_policy = ('do' in yaml and 'request' in yaml['do']) and \
                          ('dump' in yaml and 'request' in yaml['dump'])
@@ -911,11 +911,11 @@ class Family(SpecFamily):
         if 'uapi-header' in self.yaml:
             self.uapi_header = self.yaml['uapi-header']
         else:
-            self.uapi_header = f"linux/{self.name}.h"
+            self.uapi_header = f"linux/{self.ident_name}.h"
         if self.uapi_header.startswith("linux/") and self.uapi_header.endswith('.h'):
             self.uapi_header_name = self.uapi_header[6:-2]
         else:
-            self.uapi_header_name = self.name
+            self.uapi_header_name = self.ident_name
 
     def resolve(self):
         self.resolve_up(super())
@@ -923,7 +923,7 @@ class Family(SpecFamily):
         if self.yaml.get('protocol', 'genetlink') not in {'genetlink', 'genetlink-c', 'genetlink-legacy'}:
             raise Exception("Codegen only supported for genetlink")
 
-        self.c_name = c_lower(self.name)
+        self.c_name = c_lower(self.ident_name)
         if 'name-prefix' in self.yaml['operations']:
             self.op_prefix = c_upper(self.yaml['operations']['name-prefix'])
         else:
@@ -2173,7 +2173,7 @@ def print_kernel_op_table_fwd(family, cw, terminate):
     exported = not kernel_can_gen_family_struct(family)
 
     if not terminate or exported:
-        cw.p(f"/* Ops table for {family.name} */")
+        cw.p(f"/* Ops table for {family.ident_name} */")
 
         pol_to_struct = {'global': 'genl_small_ops',
                          'per-op': 'genl_ops',
@@ -2225,12 +2225,12 @@ def print_kernel_op_table_fwd(family, cw, terminate):
             continue
 
         if 'do' in op:
-            name = c_lower(f"{family.name}-nl-{op_name}-doit")
+            name = c_lower(f"{family.ident_name}-nl-{op_name}-doit")
             cw.write_func_prot('int', name,
                                ['struct sk_buff *skb', 'struct genl_info *info'], suffix=';')
 
         if 'dump' in op:
-            name = c_lower(f"{family.name}-nl-{op_name}-dumpit")
+            name = c_lower(f"{family.ident_name}-nl-{op_name}-dumpit")
             cw.write_func_prot('int', name,
                                ['struct sk_buff *skb', 'struct netlink_callback *cb'], suffix=';')
     cw.nl()
@@ -2256,13 +2256,13 @@ def print_kernel_op_table(family, cw):
                                             for x in op['dont-validate']])), )
             for op_mode in ['do', 'dump']:
                 if op_mode in op:
-                    name = c_lower(f"{family.name}-nl-{op_name}-{op_mode}it")
+                    name = c_lower(f"{family.ident_name}-nl-{op_name}-{op_mode}it")
                     members.append((op_mode + 'it', name))
             if family.kernel_policy == 'per-op':
                 struct = Struct(family, op['attribute-set'],
                                 type_list=op['do']['request']['attributes'])
 
-                name = c_lower(f"{family.name}-{op_name}-nl-policy")
+                name = c_lower(f"{family.ident_name}-{op_name}-nl-policy")
                 members.append(('policy', name))
                 members.append(('maxattr', struct.attr_max_val.enum_name))
             if 'flags' in op:
@@ -2294,7 +2294,7 @@ def print_kernel_op_table(family, cw):
                         members.append(('validate',
                                         ' | '.join([c_upper('genl-dont-validate-' + x)
                                                     for x in dont_validate])), )
-                name = c_lower(f"{family.name}-nl-{op_name}-{op_mode}it")
+                name = c_lower(f"{family.ident_name}-nl-{op_name}-{op_mode}it")
                 if 'pre' in op[op_mode]:
                     members.append((cb_names[op_mode]['pre'], c_lower(op[op_mode]['pre'])))
                 members.append((op_mode + 'it', name))
@@ -2305,9 +2305,9 @@ def print_kernel_op_table(family, cw):
                                     type_list=op[op_mode]['request']['attributes'])
 
                     if op.dual_policy:
-                        name = c_lower(f"{family.name}-{op_name}-{op_mode}-nl-policy")
+                        name = c_lower(f"{family.ident_name}-{op_name}-{op_mode}-nl-policy")
                     else:
-                        name = c_lower(f"{family.name}-{op_name}-nl-policy")
+                        name = c_lower(f"{family.ident_name}-{op_name}-nl-policy")
                     members.append(('policy', name))
                     members.append(('maxattr', struct.attr_max_val.enum_name))
                 flags = (op['flags'] if 'flags' in op else []) + ['cmd-cap-' + op_mode]
@@ -2326,7 +2326,7 @@ def print_kernel_mcgrp_hdr(family, cw):
 
     cw.block_start('enum')
     for grp in family.mcgrps['list']:
-        grp_id = c_upper(f"{family.name}-nlgrp-{grp['name']},")
+        grp_id = c_upper(f"{family.ident_name}-nlgrp-{grp['name']},")
         cw.p(grp_id)
     cw.block_end(';')
     cw.nl()
@@ -2339,7 +2339,7 @@ def print_kernel_mcgrp_src(family, cw):
     cw.block_start('static const struct genl_multicast_group ' + family.c_name + '_nl_mcgrps[] =')
     for grp in family.mcgrps['list']:
         name = grp['name']
-        grp_id = c_upper(f"{family.name}-nlgrp-{name}")
+        grp_id = c_upper(f"{family.ident_name}-nlgrp-{name}")
         cw.p('[' + grp_id + '] = { "' + name + '", },')
     cw.block_end(';')
     cw.nl()
@@ -2361,7 +2361,7 @@ def print_kernel_family_struct_src(family, cw):
     if not kernel_can_gen_family_struct(family):
         return
 
-    cw.block_start(f"struct genl_family {family.name}_nl_family __ro_after_init =")
+    cw.block_start(f"struct genl_family {family.ident_name}_nl_family __ro_after_init =")
     cw.p('.name\t\t= ' + family.fam_key + ',')
     cw.p('.version\t= ' + family.ver_key + ',')
     cw.p('.netnsok\t= true,')
@@ -2429,7 +2429,7 @@ def render_uapi(family, cw):
                 cw.p(' */')
 
             uapi_enum_start(family, cw, const, 'name')
-            name_pfx = const.get('name-prefix', f"{family.name}-{const['name']}-")
+            name_pfx = const.get('name-prefix', f"{family.ident_name}-{const['name']}-")
             for entry in enum.entries.values():
                 suffix = ','
                 if entry.value_change:
@@ -2451,7 +2451,7 @@ def render_uapi(family, cw):
             cw.nl()
         elif const['type'] == 'const':
             defines.append([c_upper(family.get('c-define-name',
-                                               f"{family.name}-{const['name']}")),
+                                               f"{family.ident_name}-{const['name']}")),
                             const['value']])
 
     if defines:
@@ -2529,7 +2529,7 @@ def render_uapi(family, cw):
     defines = []
     for grp in family.mcgrps['list']:
         name = grp['name']
-        defines.append([c_upper(grp.get('c-define-name', f"{family.name}-mcgrp-{name}")),
+        defines.append([c_upper(grp.get('c-define-name', f"{family.ident_name}-mcgrp-{name}")),
                         f'{name}'])
     cw.nl()
     if defines:
Jakub Kicinski July 2, 2024, 12:37 a.m. UTC | #5
On Mon, 01 Jul 2024 17:50:17 +0200 Paolo Abeni wrote:
> > would require some patching to ynl-gen-c.py to handle the s/_/-/
> > conversion in the generated file names, too.  
> 
> I mean something alike the following, to be explicit:

Great, thanks! I wasn't sure we're actually ready to make the family
name use dashes, since we don't have any precedent. But since you have
the patch - please submit!
Jakub Kicinski July 2, 2024, 2:54 a.m. UTC | #6
On Mon, 01 Jul 2024 12:14:32 +0200 Paolo Abeni wrote:
> > > +      -
> > > +        name: shapers
> > > +        type: nest
> > > +        multi-attr: true
> > > +        nested-attributes: ns-info  
> > 
> > How do shapers differ from shaping attrs in this scope? :S  
> 
> the set() operation must configure multiple shapers with a single
> command - to allow the 'atomic configuration changes' need for Andrew's
> use-case.
> 
> Out-of-sheer ignorance on my side, the above was the most straight-
> forward way to provide set() with an array of shapers.
> 
> Do you mean there are better way to achieve the goal, or "just" that
> the documentation here is missing and _necessary_?

I see, I had a look at patch 2 now.
But that's really "Andrew's use-case" it doesn't cover deletion, right?
Sorry that I don't have a perfect suggestion either but it seems like
a half-measure. It's a partial support for transactions. If we want
transactions we should group ops like nftables. Have normal ops (add,
delete, modify) and control ops (start, commit) which clone the entire
tree, then ops change it, and commit presents new tree to the device.

Alternative would be to, instead of supporting transactions have some
form of a "complex instruction set". Most transformations will take a
set of inputs (+weights / prios), shaping params, and where to attach.

> > > +operations:
> > > +  list:
> > > +    -
> > > +      name: get
> > > +      doc: |
> > > +        Get / Dump information about a/all the shaper for a given device
> > > +      attribute-set: net_shaper
> > > +      flags: [ admin-perm ]  
> > 
> > Any reason why get is admin-perm ?  
> 
> Mostly a "better safe then sorry" approach and cargo-cult form other
> recent yaml changes the hard reasons. Fine to drop it, if there is
> agreement.

I thought we default to GET being non-privileged.
I think that's better, monitoring shouldn't require admin perm
and presumably those shapers may grow stats at some stage.
But no strong feelings.
Paolo Abeni July 2, 2024, 2:21 p.m. UTC | #7
On Mon, 2024-07-01 at 19:54 -0700, Jakub Kicinski wrote:
> On Mon, 01 Jul 2024 12:14:32 +0200 Paolo Abeni wrote:
> > > > +      -
> > > > +        name: shapers
> > > > +        type: nest
> > > > +        multi-attr: true
> > > > +        nested-attributes: ns-info  
> > > 
> > > How do shapers differ from shaping attrs in this scope? :S  
> > 
> > the set() operation must configure multiple shapers with a single
> > command - to allow the 'atomic configuration changes' need for Andrew's
> > use-case.
> > 
> > Out-of-sheer ignorance on my side, the above was the most straight-
> > forward way to provide set() with an array of shapers.
> > 
> > Do you mean there are better way to achieve the goal, or "just" that
> > the documentation here is missing and _necessary_?
> 
> I see, I had a look at patch 2 now.
> But that's really "Andrew's use-case" it doesn't cover deletion, right?
> Sorry that I don't have a perfect suggestion either but it seems like
> a half-measure. It's a partial support for transactions. If we want
> transactions we should group ops like nftables. Have normal ops (add,
> delete, modify) and control ops (start, commit) which clone the entire
> tree, then ops change it, and commit presents new tree to the device.

Yes, it does not cover deletion _and_ update/add/move within the same
atomic operation.

Still any configuration could be reached from default/initial state
with set(<possibly many shapers>). Additionally, given any arbitrary
configuration, the default/initial state could be restored with a
single delete(<possibly many handlers>).

The above covers any possible limitation enforced by the H/W, not just
the DSA use-case.

Do you have a strong feeling for atomic transactions from any arbitrary
state towards any other? If so, I’d like to understand why?

Dealing with transactions allowing arbitrary any state <> any state
atomic changes will involve some complex logic that seems better
assigned to user-space.

Thanks,

Paolo
Jakub Kicinski July 2, 2024, 3:04 p.m. UTC | #8
On Tue, 02 Jul 2024 16:21:38 +0200 Paolo Abeni wrote:
> > I see, I had a look at patch 2 now.
> > But that's really "Andrew's use-case" it doesn't cover deletion, right?
> > Sorry that I don't have a perfect suggestion either but it seems like
> > a half-measure. It's a partial support for transactions. If we want
> > transactions we should group ops like nftables. Have normal ops (add,
> > delete, modify) and control ops (start, commit) which clone the entire
> > tree, then ops change it, and commit presents new tree to the device.  
> 
> Yes, it does not cover deletion _and_ update/add/move within the same
> atomic operation.
> 
> Still any configuration could be reached from default/initial state
> with set(<possibly many shapers>). Additionally, given any arbitrary
> configuration, the default/initial state could be restored with a
> single delete(<possibly many handlers>).

From:

q0 -. RR \
q1 /      > SP
q2 -. RR /
q3 /

To:

q0 ------\
q1 -------> SP
q2 -. RR /
q3 /

You have to both delete an RR node, and set SP params on Q0 and Q1.

> The above covers any possible limitation enforced by the H/W, not just
> the DSA use-case.
> 
> Do you have a strong feeling for atomic transactions from any arbitrary
> state towards any other? If so, I’d like to understand why?

I don't believe this is covers all cases.

> Dealing with transactions allowing arbitrary any state <> any state
> atomic changes will involve some complex logic that seems better
> assigned to user-space.

Complex logic in which part of the code?
It's just a full clone of the xarray, then do whatever ops user is
asking to do, then tree walk to render diff as a set of ops.
If you mean the tree walk to convert tree diff into ops, I think we
need that anyway, otherwise we may get into a situation where there's
a dependency between the user space implementation and driver
expectations.
Paolo Abeni July 2, 2024, 7:51 p.m. UTC | #9
Oops, I unintentionally dropped most recipients, re-adding them, sorry
for the duplicate.

On Tue, Jul 2, 2024 at 5:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
> On Tue, 02 Jul 2024 16:21:38 +0200 Paolo Abeni wrote:
> > > I see, I had a look at patch 2 now.
> > > But that's really "Andrew's use-case" it doesn't cover deletion, right?
> > > Sorry that I don't have a perfect suggestion either but it seems like
> > > a half-measure. It's a partial support for transactions. If we want
> > > transactions we should group ops like nftables. Have normal ops (add,
> > > delete, modify) and control ops (start, commit) which clone the entire
> > > tree, then ops change it, and commit presents new tree to the device.
> >
> > Yes, it does not cover deletion _and_ update/add/move within the same
> > atomic operation.
> >
> > Still any configuration could be reached from default/initial state
> > with set(<possibly many shapers>). Additionally, given any arbitrary
> > configuration, the default/initial state could be restored with a
> > single delete(<possibly many handlers>).
>
> From:
>
> q0 -. RR \
> q1 /      > SP
> q2 -. RR /
> q3 /

Call this C1

> To:
>
> q0 ------\
> q1 -------> SP
> q2 -. RR /
> q3 /

Call this C2

> You have to both delete an RR node, and set SP params on Q0 and Q1.

default -> C1:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
   --do set --json '{ "ifindex":2, "shapers": [ \
                         { "parent": { "scope": "netdev"}, "handle": {
"scope": "detached", "id": 1 }, "priority": 1 },
                         { "parent": { "scope": "netdev"}, "handle": {
"scope": "detached", "id": 2 }, "priority": 2 },
                         { "parent": { "scope": "detached", "id":1},
"handle": { "scope": "queue", "id": 1 }, "weight": 1 },
                         { "parent": { "scope": "detached", "id":1},
"handle": { "scope": "queue", "id": 2 }, "weight": 2 },
                         { "parent": { "scope": "detached" "id":2},
"handle": { "scope": "queue", "id": 3 }, "weight": 1 },
                         { "parent": { "scope": "detached" "id":2},
"handle": { "scope": "queue", "id": 4 }, "weight": 2 }]}
C1 -> C2:

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
   --do delete --json '{ "ifindex":2, "handles": [ \
                         { "scope": "queue", "id": 1 },
                         { "scope": "queue", "id": 2 },
                         { "scope": "queue", "id": 3 },
                         { "scope": "queue", "id": 4 },
                         {  "scope": "detached", "id": 1 },
                         {  "scope": "detached", "id": 2 }]}

./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
   --do set --json '{ "ifindex":2, "shapers": [ \
                         { "parent": { "scope": "netdev"}, "handle": {
"scope": "detached", "id": 1 }, "priority": 1 },
                         { "parent": { "scope": "netdev"}, "handle": {
"scope": "queue", "id": 1 }, "priority": 2 },
                         { "parent": { "scope": "netdev"}, "handle": {
"scope": "queue", "id": 2 }, "priorirty": 3 },
                         { "parent": { "scope": "detached" "id":1},
"handle": { "scope": "queue", "id": 3 }, "weight": 1 },
                         { "parent": { "scope": "detached" "id":1},
"handle": { "scope": "queue", "id": 4 }, "weight": 2 },

The goal is to allow the system to reach the final configuration, even
with the assumption the H/W does not support any configuration except
the starting one and the final one.
But the infra requires that the system _must_ support at least a 3rd
configuration, the default one.

> > The above covers any possible limitation enforced by the H/W, not just
> > the DSA use-case.
> >
> > Do you have a strong feeling for atomic transactions from any arbitrary
> > state towards any other? If so, I’d like to understand why?
>
> I don't believe this is covers all cases.

Given any pair of configurations C1, C2 we can reach C2 via C1 ->
default, default -> C2. respecting any H/W constraint.

> > Dealing with transactions allowing arbitrary any state <> any state
> > atomic changes will involve some complex logic that seems better
> > assigned to user-space.
>
> Complex logic in which part of the code?

IIRC in a past iteration you pointed out that the complexity of
computing the delta between 2 arbitrary configurations is
significantly higher than going through the empty/default
configuration.

In any case I think that the larger complexity to implement a full
transactional model. nft had proven that to be very hard and bug
prone. I really would avoid that option, if possible.

Thanks,

Paolo
Paolo Abeni July 3, 2024, 2:53 p.m. UTC | #10
On Tue, 2024-07-02 at 14:08 -0700, Jakub Kicinski wrote:
> Not sure if dropping CCs was on purpose ?

Nope, just PEBKAC here. Re-adding them and include your reply verbatim,
to avoid losing track on the ML

> On Tue, 2 Jul 2024 21:49:09 +0200 Paolo Abeni wrote:
> > On Tue, Jul 2, 2024 at 5:05 PM Jakub Kicinski <kuba@kernel.org> wrote:
> > > On Tue, 02 Jul 2024 16:21:38 +0200 Paolo Abeni wrote:  
> > > > Yes, it does not cover deletion _and_ update/add/move within the same
> > > > atomic operation.
> > > > 
> > > > Still any configuration could be reached from default/initial state
> > > > with set(<possibly many shapers>). Additionally, given any arbitrary
> > > > configuration, the default/initial state could be restored with a
> > > > single delete(<possibly many handlers>).  
> > > 
> > > From:
> > > 
> > > q0 -. RR \
> > > q1 /      > SP
> > > q2 -. RR /
> > > q3 /  
> > 
> > Call this C1
> > 
> > > To:
> > > 
> > > q0 ------\
> > > q1 -------> SP
> > > q2 -. RR /
> > > q3 /  
> > 
> > Call this C2
> > 
> > > You have to both delete an RR node, and set SP params on Q0 and Q1.  
> > 
> > default -> C1:
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do set --json '{ "ifindex":2, "shapers": [ \
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 1 }, "priority": 1 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 2 }, "priority": 2 },
> >                          { "parent": { "scope": "detached", "id":1},
> > "handle": { "scope": "queue", "id": 1 }, "weight": 1 },
> >                          { "parent": { "scope": "detached", "id":1},
> > "handle": { "scope": "queue", "id": 2 }, "weight": 2 },
> >                          { "parent": { "scope": "detached" "id":2},
> > "handle": { "scope": "queue", "id": 3 }, "weight": 1 },
> >                          { "parent": { "scope": "detached" "id":2},
> > "handle": { "scope": "queue", "id": 4 }, "weight": 2 }]}
> > C1 -> C2:
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do delete --json '{ "ifindex":2, "handles": [ \
> >                          { "scope": "queue", "id": 1 },
> >                          { "scope": "queue", "id": 2 },
> >                          { "scope": "queue", "id": 3 },
> >                          { "scope": "queue", "id": 4 },
> >                          {  "scope": "detached", "id": 1 },
> >                          {  "scope": "detached", "id": 2 }]}
> > 
> > ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/shaper.yaml \
> >    --do set --json '{ "ifindex":2, "shapers": [ \
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "detached", "id": 1 }, "priority": 1 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "queue", "id": 1 }, "priority": 2 },
> >                          { "parent": { "scope": "netdev"}, "handle": {
> > "scope": "queue", "id": 2 }, "priorirty": 3 },
> >                          { "parent": { "scope": "detached" "id":1},
> > "handle": { "scope": "queue", "id": 3 }, "weight": 1 },
> >                          { "parent": { "scope": "detached" "id":1},
> > "handle": { "scope": "queue", "id": 4 }, "weight": 2 },
> > 
> > The goal is to allow the system to reach the final configuration, even
> > with the assumption the H/W does not support any configuration except
> > the starting one and the final one.
> > But the infra requires that the system _must_ support at least a 3rd
> > configuration, the default one.
> 
> This assumes there is a single daemon responsible for control of all of
> the shaping, which I think the endless BPF multi-program back and forth
> proves to be incorrect.
> 
> We'd also lose stats each time, and make reconfiguration disruptive to
> running workloads.

Note there is no stats support for the shapers, nor is planned, nor the
H/W I know of have any support for it.

The destructive operations will be needed only when the configuration
change is inherently destructive. Nothing prevents the user-space to
push a direct configuration change when possible - which should be the
most frequent case, in practice.

Regarding the entity responsible for control, I had in mind a single
one, yes. I read the above as you are looking forward to e.g. different
applications configuring their own shaping accessing directly the NL
interface, am I correct? Why can’t such applications talk to that
daemon, instead? 

Anyway different applications must touch disjoint resources (e.g.
disjoint queues sets) right? In such a case even multiple destructive
configuration changes (on disjoint resources set) will not be
problematic.

Still if we want to allow somewhat consistent, concurrent, destructive
configuration changes on shared resource (why? It sounds a bit of
overdesign at this point), we could extend the set() operation to
additional support shapers deletion, e.g. adding an additional ‘delete’
flag attribute to the ‘handle’ sub-set.

> > > > The above covers any possible limitation enforced by the H/W, not just
> > > > the DSA use-case.
> > > > 
> > > > Do you have a strong feeling for atomic transactions from any arbitrary
> > > > state towards any other? If so, I’d like to understand why?  
> > > 
> > > I don't believe this is covers all cases.  
> > 
> > Given any pair of configurations C1, C2 we can reach C2 via C1 ->
> > default, default -> C2. respecting any H/W constraint.
> > 
> > > > Dealing with transactions allowing arbitrary any state <> any state
> > > > atomic changes will involve some complex logic that seems better
> > > > assigned to user-space.  
> > > 
> > > Complex logic in which part of the code?  
> > 
> > IIRC in a past iteration you pointed out that the complexity of
> > computing the delta between 2 arbitrary configurations is
> > significantly higher than going through the empty/default
> > configuration.
> 
> For a fixed-layout scheduler where HW blocks have a fixed mapping 
> with user's hierarchy - it's easier to program the shapers from 
> the hierarchy directly. Since node maps to the same set of registers
> reprogramming will be writing to a register a value it already has
> - a noop. That's different than doing a full reset and reprogram 
> each time, as two separate calls from user space.
> 
> For Intel's cases OTOH, when each command is a separate FW call
> we can't actually enforce the atomic transitions, right?
> Your code seems to handle returns smaller the number of commands,
> from which I infer that we may execute half of the modification?

Yes, the code and the NL API allows the NIC to do the update
incrementally, and AFAICS Intel ICE has no support for complex
transactions.
Somewhat enforcing atomic transitions will be quite complex at best and
is not need to accomplish the stated goal - allow reconfiguration even
when the H/W does not support intermediate states.

Do we need to enforce atomicity? Why? NFT has proven that a
transational model implementation is hard, and should be avoided if
possible. 

> IOW for Andrew's HW - he'd probably prefer to look at the resulting
> tree, no matter what previous state we were in. For Intel we _can't_
> support atomic commands, if they span multiple cycles of FW exchanges?

My understanding is that we can’t have atomic updates on Intel, from
firmare perspective. As said, I don’t think it’s necessary to support
them.

WRT the DSA H/W, do you mean the core should always set() the whole
known tree to the driver, regardless of the specific changes asked by
the user-space? If so, what about letting the driver expose some
capability (or private flag) asking the core for such behavior? So that
the driver will do the largish set() only with the H/W requiring that.

Anyway I'm not sure the mentioned DSA H/W would benefit from always
receiving the whole configuration. e.g. changing the weight for a
single queue shaper would not need the whole data-set.

> > In any case I think that the larger complexity to implement a full
> > transactional model. nft had proven that to be very hard and bug
> > prone. I really would avoid that option, if possible.
> 
> Maybe instead of discussing the user space API it'd be more beneficial
> to figure out a solid way of translating the existing APIs into the new
> model?

Could you please rephrase? I think all the arguments discussed here are
related to the model - at some point that impact the user space API,
too.

Thanks,

Paolo
Jakub Kicinski July 3, 2024, 9:20 p.m. UTC | #11
On Wed, 03 Jul 2024 16:53:38 +0200 Paolo Abeni wrote:
> Note there is no stats support for the shapers, nor is planned, nor the
> H/W I know of have any support for it.
> 
> The destructive operations will be needed only when the configuration
> change is inherently destructive. Nothing prevents the user-space to
> push a direct configuration change when possible - which should be the
> most frequent case, in practice.
> 
> Regarding the entity responsible for control, I had in mind a single
> one, yes. I read the above as you are looking forward to e.g. different
> applications configuring their own shaping accessing directly the NL
> interface, am I correct? Why can’t such applications talk to that
> daemon, instead? 

We know such daemon did not materialize in other cases.
We can assume there is a daemon if we think that's a good design.
We shouldn't use a mirage of a third-party daemon to wave away
problems we didn't manage to solve.

> Anyway different applications must touch disjoint resources (e.g.
> disjoint queues sets) right? In such a case even multiple destructive
> configuration changes (on disjoint resources set) will not be
> problematic.
> 
> Still if we want to allow somewhat consistent, concurrent, destructive
> configuration changes on shared resource (why? It sounds a bit of
> overdesign at this point), we could extend the set() operation to
> additional support shapers deletion, e.g. adding an additional ‘delete’
> flag attribute to the ‘handle’ sub-set.

To judge whether it's an over-design we'd need to know what the user
scenarios are, those are not included in the series AFAICS.

> > For a fixed-layout scheduler where HW blocks have a fixed mapping 
> > with user's hierarchy - it's easier to program the shapers from 
> > the hierarchy directly. Since node maps to the same set of registers
> > reprogramming will be writing to a register a value it already has
> > - a noop. That's different than doing a full reset and reprogram 
> > each time, as two separate calls from user space.
> > 
> > For Intel's cases OTOH, when each command is a separate FW call
> > we can't actually enforce the atomic transitions, right?
> > Your code seems to handle returns smaller the number of commands,
> > from which I infer that we may execute half of the modification?  
> 
> Yes, the code and the NL API allows the NIC to do the update
> incrementally, and AFAICS Intel ICE has no support for complex
> transactions.
> Somewhat enforcing atomic transitions will be quite complex at best and
> is not need to accomplish the stated goal - allow reconfiguration even
> when the H/W does not support intermediate states.
> 
> Do we need to enforce atomicity? Why? NFT has proven that a
> transational model implementation is hard, and should be avoided if
> possible. 
> 
> > IOW for Andrew's HW - he'd probably prefer to look at the resulting
> > tree, no matter what previous state we were in. For Intel we _can't_
> > support atomic commands, if they span multiple cycles of FW exchanges?  
> 
> My understanding is that we can’t have atomic updates on Intel, from
> firmare perspective. As said, I don’t think it’s necessary to support
> them.
> 
> WRT the DSA H/W, do you mean the core should always set() the whole
> known tree to the driver, regardless of the specific changes asked by
> the user-space? If so, what about letting the driver expose some
> capability (or private flag) asking the core for such behavior? So that
> the driver will do the largish set() only with the H/W requiring that.
> 
> Anyway I'm not sure the mentioned DSA H/W would benefit from always
> receiving the whole configuration. e.g. changing the weight for a
> single queue shaper would not need the whole data-set.

To be blunt - what I'm getting at is that the API mirrors Intel's FW
API with an extra kludge to support the DSA H/W - in the end matching
neither what the DSA wants nor what Intel can do.

> > > In any case I think that the larger complexity to implement a full
> > > transactional model. nft had proven that to be very hard and bug
> > > prone. I really would avoid that option, if possible.  
> > 
> > Maybe instead of discussing the user space API it'd be more beneficial
> > to figure out a solid way of translating the existing APIs into the new
> > model?  
> 
> Could you please rephrase? I think all the arguments discussed here are
> related to the model - at some point that impact the user space API,
> too.

I was hoping that implementing the code that mirrors the existing APIs
into tree operations would teach us something about the primitives 
that operate on the tree. The proposed primitives are really low level,
which is why we need to "fuse" them into multi-change operations.

More specifically what I described a few emails up was a
group+schedule+limit paradigm. Instead of describing single moves 
and parameter setting - transformation would describe inputs,
scheduling across them, and rate limit to apply. But one transformation
would always operate on at most one MUX node.

IOW I'm trying to explore whether we can find a language of
transformations which will be more complex than single micro-operations
on the tree, but sufficiently expressive to provide atomic
transformations without transactions of micro-ops.
Paolo Abeni July 8, 2024, 7:42 p.m. UTC | #12
On Wed, 2024-07-03 at 14:20 -0700, Jakub Kicinski wrote:
> On Wed, 03 Jul 2024 16:53:38 +0200 Paolo Abeni wrote:
> 
> > Anyway different applications must touch disjoint resources (e.g.
> > disjoint queues sets) right? In such a case even multiple destructive
> > configuration changes (on disjoint resources set) will not be
> > problematic.
> > 
> > Still if we want to allow somewhat consistent, concurrent, destructive
> > configuration changes on shared resource (why? It sounds a bit of
> > overdesign at this point), we could extend the set() operation to
> > additional support shapers deletion, e.g. adding an additional ‘delete’
> > flag attribute to the ‘handle’ sub-set.
> 
> To judge whether it's an over-design we'd need to know what the user
> scenarios are, those are not included in the series AFAICS.

My bad, in the cover I referred to the prior discussions without
explicitly quoting the contents.

The initial goal here was to allow the user-space to configure per-
queue, H/W offloaded, TX shaping.

That later evolved in introducing an in-kernel H/W offload TX shaping
API capable of replacing the many existing similar in-kernel APIs and
supporting the foreseeable H/W capabilities.

> To be blunt - what I'm getting at is that the API mirrors Intel's FW
> API with an extra kludge to support the DSA H/W - in the end matching
> neither what the DSA wants nor what Intel can do.

The API is similar to Intel’s FW API because to my understanding the
underlying design - an arbitrary tree - is the most complete
representation possible for shaping H/W. AFAICT is also similar to what
other NIC vendors’ offer.

I don’t see why the APIs don’t match what Intel can do, could you
please elaborate on that?

> IOW I'm trying to explore whether we can find a language of
> transformations which will be more complex than single micro-operations
> on the tree, but sufficiently expressive to provide atomic
> transformations without transactions of micro-ops.

I personally find it straight-forward describing the scenario you
proposed in terms of the simple operations as allowed by this series. I
also think it’s easier to build arbitrarily complex scenarios in terms
of simple operations instead of trying to put enough complexity in the
language to describe everything. It will easily lose flexibility or
increase complexity for unclear gain. Why do you think that would be a
better approach?

Also the simple building blocks approach is IMHO closer to the original
use-case.

Are there any other reasons for atomic operations, beyond addressing
low-end H/W? Note that WRT the specific limitations we are aware of,
the APIs allows atomic conf changes without resorting to transition
into the default configuration.

My biggest concern is that from my perspective we are incrementally
adding requisites, with increasing complexity, further and further away
from the initial use-case. 

Thanks,

Paolo
Jakub Kicinski July 9, 2024, 2:54 a.m. UTC | #13
On Mon, 08 Jul 2024 21:42:00 +0200 Paolo Abeni wrote:
> > To judge whether it's an over-design we'd need to know what the user
> > scenarios are, those are not included in the series AFAICS.  
> 
> My bad, in the cover I referred to the prior discussions without
> explicitly quoting the contents.
> 
> The initial goal here was to allow the user-space to configure per-
> queue, H/W offloaded, TX shaping.

Per-queue Tx shaping is already supported.

I mean "user scenarios" in the agile programming sense as in something
that gets closer to production use cases. "Set rate limit on a queue"
is a suggested solution not a statement of a problem.

> That later evolved in introducing an in-kernel H/W offload TX shaping
> API capable of replacing the many existing similar in-kernel APIs and
> supporting the foreseeable H/W capabilities.
> 
> > To be blunt - what I'm getting at is that the API mirrors Intel's FW
> > API with an extra kludge to support the DSA H/W - in the end matching
> > neither what the DSA wants nor what Intel can do.  
> 
> The API is similar to Intel’s FW API because to my understanding the
> underlying design - an arbitrary tree - is the most complete
> representation possible for shaping H/W. AFAICT is also similar to what
> other NIC vendors’ offer.
> 
> I don’t see why the APIs don’t match what Intel can do, could you
> please elaborate on that?

That's not the main point I was making, I was complaining about how 
the "extension" to support DSA HW was bolted onto this API.

Undeniably the implementation must be stored as a tree (with some
max depth). That doesn't imply that, for example, arbitrary re-parenting
of non-leaf nodes is an operation that makes sense for all devices.
From memory devlink rate doesn't allow mixing some node types after one
parent, too.

> > IOW I'm trying to explore whether we can find a language of
> > transformations which will be more complex than single micro-operations
> > on the tree, but sufficiently expressive to provide atomic
> > transformations without transactions of micro-ops.  
> 
> I personally find it straight-forward describing the scenario you
> proposed in terms of the simple operations as allowed by this series. I
> also think it’s easier to build arbitrarily complex scenarios in terms
> of simple operations instead of trying to put enough complexity in the
> language to describe everything. It will easily lose flexibility or
> increase complexity for unclear gain. Why do you think that would be a
> better approach?

I strongly dislike the operation grouping/batching as it stands in v1.
Do you think that part of the design is clean?

I also disagree with the assertion that having a language of
transformations more advanced that "add/move/delete" increases
complexity. Language gives you properties you can reason about.
That's why rbtree, B-tree and other algos define a language of
transformations. Naive ops make arbitrary trees easy but I put
it to you that allowing arbitrary trees without any enforced 
invariants will breed far more complexity and bugs than a properly
designed language :)


The primary operation of interest is, in fact: given a set of resources
(queues or netdevs) which currently feed one mux node - build 
a sub-hierarchy.

Use case 1 - container b/w sharing. Container manager will want to
group a set of queues, and feed a higher layer RR node (so that number
of queues doesn't impact load sharing).

Say we have two containers (c1, c2 represent queues assigned to them);
before container 3 starts:

c1 - \
c1 -  >RR
c1 - /    > RR(root)
c2 - \ RR
c2 - /

allocate 2 queues:

c1 - \
c1 - ->RR
c1 - /    > RR(root)
c2 - \ RR
c2 - /     '
       qX /
       qY /

hierarchize ("group([qX, qY], type="rr")):

c1 - \
c1 -  >RR
c1 - /    \ 
c2 - \ RR -> RR(root)
c2 - /    /
c3 - \ RR 
c3 - /

The container manager just wants so say "take the new queues (X, Y),
put them under an RR node". If the language is build around creating
mux nodes - that's a single call.

Note that the RR(root) node is implicit (in your API it's not visible
but it is implied).

Users case 2 - delegation - the neat thing about using such construct is
that as you can see we never referred to output, i.e. RR(root).
The output can be implied by whatever node the queues already output to.
So say container 1 (c1) wants to set a b/w limit on two of it's queues
(let's call its queues A B C):

 rr_id = group([B C], type="rr")
 rate_limit(rr_id, 1Gbps)

and end up with:

cA ----- \
cB - \RR*/ RR
cC - /        \ 
    c2 - \ RR -> RR(root)
    c2 - /    /
    c3 - \ RR 
    c3 - /

* new node, also has rate limit set

You don't have to worry about parentage permissions. Container can only
add nodes (which is always safe) or delete nodes (and we can trivially
enforce it only deletes node it has created itself).

> Also the simple building blocks approach is IMHO closer to the original
> use-case.
> 
> Are there any other reasons for atomic operations, beyond addressing
> low-end H/W?

I think atomic changes are convenient to match what the user wants to
do. And the second use case is that I do believe there's a real need
to allow uncoordinated agents to modify sections on the hierarchy.

Neither of those are hard requirements, but I think any application
driven requirement should come before "that's the FW API for vendor X"
I hope this we can agree on.


Thinking about it (for longer than I care to admit), one concern I have
about the "mux creation" API I described above is that it forces
existence of leaf and non-leaf nodes at the same parent, at least
transiently.

Can we go back to an API with an explicit create/modify/delete?
All we need for Andrew's use case, I believe, is to be able to
"somewhat atomically" move leaf nodes.
diff mbox series

Patch

diff --git a/Documentation/netlink/specs/shaper.yaml b/Documentation/netlink/specs/shaper.yaml
new file mode 100644
index 000000000000..8563c85de68d
--- /dev/null
+++ b/Documentation/netlink/specs/shaper.yaml
@@ -0,0 +1,202 @@ 
+# SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+
+name: net_shaper
+
+doc: Network HW Rate Limiting offload
+
+definitions:
+  -
+    type: enum
+    name: scope
+    doc: the different scopes where a shaper can be attached
+    entries:
+      - name: unspec
+        doc: The scope is not specified
+      -
+        name: port
+        doc: The root shaper for the whole H/W.
+      -
+        name: netdev
+        doc: The main shaper for the given network device.
+      -
+        name: queue
+        doc: The shaper is attached to the given device queue.
+      -
+        name: detached
+        doc: |
+             The shaper can be attached to port, netdev or other
+             detached shapers, allowing nesting and grouping of
+             netdev or queues.
+    render-max: true
+  -
+    type: enum
+    name: metric
+    doc: different metric each shaper can support
+    entries:
+      -
+        name: bps
+        doc: Shaper operates on a bits per second basis
+      -
+        name: pps
+        doc: Shaper operates on a packets per second basis
+
+attribute-sets:
+  -
+    name: net_shaper
+    attributes:
+      -
+        name: ifindex
+        type: u32
+      -
+        name: parent
+        type: nest
+        nested-attributes: handle
+      -
+        name: handle
+        type: nest
+        nested-attributes: handle
+      -
+        name: metric
+        type: u32
+        enum: metric
+      -
+        name: bw-min
+        type: u64
+      -
+        name: bw-max
+        type: u64
+      -
+        name: burst
+        type: u64
+      -
+        name: priority
+        type: u32
+      -
+        name: weight
+        type: u32
+      -
+        name: scope
+        type: u32
+        enum: scope
+      -
+        name: id
+        type: u32
+      -
+         name: handles
+         type: nest
+         multi-attr: true
+         nested-attributes: handle
+      -
+        name: shapers
+        type: nest
+        multi-attr: true
+        nested-attributes: ns-info
+      -
+        name: modified
+        type: u32
+      -
+        name: pad
+        type: pad
+  -
+    name: handle
+    subset-of: net_shaper
+    attributes:
+      -
+        name: scope
+      -
+        name: id
+  -
+    name: ns-info
+    subset-of: net_shaper
+    attributes:
+      -
+        name: parent
+      -
+        name: handle
+      -
+        name: metric
+      -
+        name: bw-min
+      -
+        name: bw-max
+      -
+        name: burst
+      -
+        name: priority
+      -
+        name: weight
+
+operations:
+  list:
+    -
+      name: get
+      doc: |
+        Get / Dump information about a/all the shaper for a given device
+      attribute-set: net_shaper
+      flags: [ admin-perm ]
+
+      do:
+        request:
+          attributes:
+            - ifindex
+            - handle
+        reply:
+          attributes: &ns-attrs
+            - parent
+            - handle
+            - metric
+            - bw-min
+            - bw-max
+            - burst
+            - priority
+            - weight
+
+      dump:
+        request:
+          attributes:
+            - ifindex
+        reply:
+          attributes: *ns-attrs
+    -
+      name: set
+      doc: |
+        Create or configures the specified shapers.
+        The update is atomic with respect to all shaper
+        affected by a single command, and is allowed to
+        affect a subset of the specified shapers, e.g.
+        due to H/W resources exhaustion. In such case
+        the update stops at the first failure, the extack
+        is set accordingly.
+      attribute-set: net_shaper
+      flags: [ admin-perm ]
+
+      do:
+        request:
+          attributes:
+            - ifindex
+            - shapers
+        reply:
+          attributes:
+            - modified
+
+    -
+      name: delete
+      doc: |
+        Clear (remove) the specified shaper.
+        The update is atomic with respect to all shaper
+        affected by a single command, and is allowed to
+        affect a subset of the specified shapers, e.g.
+        due to H/W resources exhaustion. In such case
+        the update stops at the first failure, the extack
+        is set accordingly.
+      attribute-set: net_shaper
+      flags: [ admin-perm ]
+
+      do:
+        request:
+          attributes:
+            - ifindex
+            - handles
+        reply:
+          attributes:
+            - modified
diff --git a/include/uapi/linux/net_shaper.h b/include/uapi/linux/net_shaper.h
new file mode 100644
index 000000000000..7e6b655e6c6d
--- /dev/null
+++ b/include/uapi/linux/net_shaper.h
@@ -0,0 +1,73 @@ 
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/shaper.yaml */
+/* YNL-GEN uapi header */
+
+#ifndef _UAPI_LINUX_NET_SHAPER_H
+#define _UAPI_LINUX_NET_SHAPER_H
+
+#define NET_SHAPER_FAMILY_NAME		"net_shaper"
+#define NET_SHAPER_FAMILY_VERSION	1
+
+/**
+ * enum net_shaper_scope - the different scopes where a shaper can be attached
+ * @NET_SHAPER_SCOPE_UNSPEC: The scope is not specified
+ * @NET_SHAPER_SCOPE_PORT: The root shaper for the whole H/W.
+ * @NET_SHAPER_SCOPE_NETDEV: The main shaper for the given network device.
+ * @NET_SHAPER_SCOPE_QUEUE: The shaper is attached to the given device queue.
+ * @NET_SHAPER_SCOPE_DETACHED: The shaper can be attached to port, netdev or
+ *   other detached shapers, allowing nesting and grouping of netdev or queues.
+ */
+enum net_shaper_scope {
+	NET_SHAPER_SCOPE_UNSPEC,
+	NET_SHAPER_SCOPE_PORT,
+	NET_SHAPER_SCOPE_NETDEV,
+	NET_SHAPER_SCOPE_QUEUE,
+	NET_SHAPER_SCOPE_DETACHED,
+
+	/* private: */
+	__NET_SHAPER_SCOPE_MAX,
+	NET_SHAPER_SCOPE_MAX = (__NET_SHAPER_SCOPE_MAX - 1)
+};
+
+/**
+ * enum net_shaper_metric - different metric each shaper can support
+ * @NET_SHAPER_METRIC_BPS: Shaper operates on a bits per second basis
+ * @NET_SHAPER_METRIC_PPS: Shaper operates on a packets per second basis
+ */
+enum net_shaper_metric {
+	NET_SHAPER_METRIC_BPS,
+	NET_SHAPER_METRIC_PPS,
+};
+
+enum {
+	NET_SHAPER_A_IFINDEX = 1,
+	NET_SHAPER_A_PARENT,
+	NET_SHAPER_A_HANDLE,
+	NET_SHAPER_A_METRIC,
+	NET_SHAPER_A_BW_MIN,
+	NET_SHAPER_A_BW_MAX,
+	NET_SHAPER_A_BURST,
+	NET_SHAPER_A_PRIORITY,
+	NET_SHAPER_A_WEIGHT,
+	NET_SHAPER_A_SCOPE,
+	NET_SHAPER_A_ID,
+	NET_SHAPER_A_HANDLES,
+	NET_SHAPER_A_SHAPERS,
+	NET_SHAPER_A_MODIFIED,
+	NET_SHAPER_A_PAD,
+
+	__NET_SHAPER_A_MAX,
+	NET_SHAPER_A_MAX = (__NET_SHAPER_A_MAX - 1)
+};
+
+enum {
+	NET_SHAPER_CMD_GET = 1,
+	NET_SHAPER_CMD_SET,
+	NET_SHAPER_CMD_DELETE,
+
+	__NET_SHAPER_CMD_MAX,
+	NET_SHAPER_CMD_MAX = (__NET_SHAPER_CMD_MAX - 1)
+};
+
+#endif /* _UAPI_LINUX_NET_SHAPER_H */
diff --git a/net/Kconfig b/net/Kconfig
index d27d0deac0bf..31fccfed04f7 100644
--- a/net/Kconfig
+++ b/net/Kconfig
@@ -66,6 +66,9 @@  config SKB_DECRYPTED
 config SKB_EXTENSIONS
 	bool
 
+config NET_SHAPER
+	bool
+
 menu "Networking options"
 
 source "net/packet/Kconfig"
diff --git a/net/Makefile b/net/Makefile
index 65bb8c72a35e..60ed5190eda8 100644
--- a/net/Makefile
+++ b/net/Makefile
@@ -79,3 +79,4 @@  obj-$(CONFIG_XDP_SOCKETS)	+= xdp/
 obj-$(CONFIG_MPTCP)		+= mptcp/
 obj-$(CONFIG_MCTP)		+= mctp/
 obj-$(CONFIG_NET_HANDSHAKE)	+= handshake/
+obj-$(CONFIG_NET_SHAPER)	+= shaper/
diff --git a/net/shaper/Makefile b/net/shaper/Makefile
new file mode 100644
index 000000000000..13375884d60e
--- /dev/null
+++ b/net/shaper/Makefile
@@ -0,0 +1,9 @@ 
+# SPDX-License-Identifier: GPL-2.0-only
+#
+# Makefile for the Generic HANDSHAKE service
+#
+# Copyright (c) 2024, Red Hat, Inc.
+#
+
+obj-y += shaper.o shaper_nl_gen.o
+
diff --git a/net/shaper/shaper.c b/net/shaper/shaper.c
new file mode 100644
index 000000000000..49de88c68e2f
--- /dev/null
+++ b/net/shaper/shaper.c
@@ -0,0 +1,34 @@ 
+// SPDX-License-Identifier: GPL-2.0-or-later
+
+#include <linux/kernel.h>
+#include <linux/skbuff.h>
+
+#include "shaper_nl_gen.h"
+
+int net_shaper_nl_get_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+int net_shaper_nl_get_dumpit(struct sk_buff *skb,
+			     struct netlink_callback *cb)
+{
+	return -EOPNOTSUPP;
+}
+
+int net_shaper_nl_set_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+int net_shaper_nl_delete_doit(struct sk_buff *skb, struct genl_info *info)
+{
+	return -EOPNOTSUPP;
+}
+
+static int __init shaper_init(void)
+{
+	return genl_register_family(&net_shaper_nl_family);
+}
+
+subsys_initcall(shaper_init);
diff --git a/net/shaper/shaper_nl_gen.c b/net/shaper/shaper_nl_gen.c
new file mode 100644
index 000000000000..159b4cb6d2b8
--- /dev/null
+++ b/net/shaper/shaper_nl_gen.c
@@ -0,0 +1,93 @@ 
+// SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause)
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/shaper.yaml */
+/* YNL-GEN kernel source */
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include "shaper_nl_gen.h"
+
+#include <uapi/linux/net_shaper.h>
+
+/* Common nested types */
+const struct nla_policy net_shaper_handle_nl_policy[NET_SHAPER_A_ID + 1] = {
+	[NET_SHAPER_A_SCOPE] = NLA_POLICY_MAX(NLA_U32, 4),
+	[NET_SHAPER_A_ID] = { .type = NLA_U32, },
+};
+
+const struct nla_policy net_shaper_ns_info_nl_policy[NET_SHAPER_A_WEIGHT + 1] = {
+	[NET_SHAPER_A_PARENT] = NLA_POLICY_NESTED(net_shaper_handle_nl_policy),
+	[NET_SHAPER_A_HANDLE] = NLA_POLICY_NESTED(net_shaper_handle_nl_policy),
+	[NET_SHAPER_A_METRIC] = NLA_POLICY_MAX(NLA_U32, 1),
+	[NET_SHAPER_A_BW_MIN] = { .type = NLA_U64, },
+	[NET_SHAPER_A_BW_MAX] = { .type = NLA_U64, },
+	[NET_SHAPER_A_BURST] = { .type = NLA_U64, },
+	[NET_SHAPER_A_PRIORITY] = { .type = NLA_U32, },
+	[NET_SHAPER_A_WEIGHT] = { .type = NLA_U32, },
+};
+
+/* NET_SHAPER_CMD_GET - do */
+static const struct nla_policy net_shaper_get_do_nl_policy[NET_SHAPER_A_HANDLE + 1] = {
+	[NET_SHAPER_A_IFINDEX] = { .type = NLA_U32, },
+	[NET_SHAPER_A_HANDLE] = NLA_POLICY_NESTED(net_shaper_handle_nl_policy),
+};
+
+/* NET_SHAPER_CMD_GET - dump */
+static const struct nla_policy net_shaper_get_dump_nl_policy[NET_SHAPER_A_IFINDEX + 1] = {
+	[NET_SHAPER_A_IFINDEX] = { .type = NLA_U32, },
+};
+
+/* NET_SHAPER_CMD_SET - do */
+static const struct nla_policy net_shaper_set_nl_policy[NET_SHAPER_A_SHAPERS + 1] = {
+	[NET_SHAPER_A_IFINDEX] = { .type = NLA_U32, },
+	[NET_SHAPER_A_SHAPERS] = NLA_POLICY_NESTED(net_shaper_ns_info_nl_policy),
+};
+
+/* NET_SHAPER_CMD_DELETE - do */
+static const struct nla_policy net_shaper_delete_nl_policy[NET_SHAPER_A_HANDLES + 1] = {
+	[NET_SHAPER_A_IFINDEX] = { .type = NLA_U32, },
+	[NET_SHAPER_A_HANDLES] = NLA_POLICY_NESTED(net_shaper_handle_nl_policy),
+};
+
+/* Ops table for net_shaper */
+static const struct genl_split_ops net_shaper_nl_ops[] = {
+	{
+		.cmd		= NET_SHAPER_CMD_GET,
+		.doit		= net_shaper_nl_get_doit,
+		.policy		= net_shaper_get_do_nl_policy,
+		.maxattr	= NET_SHAPER_A_HANDLE,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= NET_SHAPER_CMD_GET,
+		.dumpit		= net_shaper_nl_get_dumpit,
+		.policy		= net_shaper_get_dump_nl_policy,
+		.maxattr	= NET_SHAPER_A_IFINDEX,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DUMP,
+	},
+	{
+		.cmd		= NET_SHAPER_CMD_SET,
+		.doit		= net_shaper_nl_set_doit,
+		.policy		= net_shaper_set_nl_policy,
+		.maxattr	= NET_SHAPER_A_SHAPERS,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+	{
+		.cmd		= NET_SHAPER_CMD_DELETE,
+		.doit		= net_shaper_nl_delete_doit,
+		.policy		= net_shaper_delete_nl_policy,
+		.maxattr	= NET_SHAPER_A_HANDLES,
+		.flags		= GENL_ADMIN_PERM | GENL_CMD_CAP_DO,
+	},
+};
+
+struct genl_family net_shaper_nl_family __ro_after_init = {
+	.name		= NET_SHAPER_FAMILY_NAME,
+	.version	= NET_SHAPER_FAMILY_VERSION,
+	.netnsok	= true,
+	.parallel_ops	= true,
+	.module		= THIS_MODULE,
+	.split_ops	= net_shaper_nl_ops,
+	.n_split_ops	= ARRAY_SIZE(net_shaper_nl_ops),
+};
diff --git a/net/shaper/shaper_nl_gen.h b/net/shaper/shaper_nl_gen.h
new file mode 100644
index 000000000000..663406224d62
--- /dev/null
+++ b/net/shaper/shaper_nl_gen.h
@@ -0,0 +1,25 @@ 
+/* SPDX-License-Identifier: ((GPL-2.0 WITH Linux-syscall-note) OR BSD-3-Clause) */
+/* Do not edit directly, auto-generated from: */
+/*	Documentation/netlink/specs/shaper.yaml */
+/* YNL-GEN kernel header */
+
+#ifndef _LINUX_NET_SHAPER_GEN_H
+#define _LINUX_NET_SHAPER_GEN_H
+
+#include <net/netlink.h>
+#include <net/genetlink.h>
+
+#include <uapi/linux/net_shaper.h>
+
+/* Common nested types */
+extern const struct nla_policy net_shaper_handle_nl_policy[NET_SHAPER_A_ID + 1];
+extern const struct nla_policy net_shaper_ns_info_nl_policy[NET_SHAPER_A_WEIGHT + 1];
+
+int net_shaper_nl_get_doit(struct sk_buff *skb, struct genl_info *info);
+int net_shaper_nl_get_dumpit(struct sk_buff *skb, struct netlink_callback *cb);
+int net_shaper_nl_set_doit(struct sk_buff *skb, struct genl_info *info);
+int net_shaper_nl_delete_doit(struct sk_buff *skb, struct genl_info *info);
+
+extern struct genl_family net_shaper_nl_family;
+
+#endif /* _LINUX_NET_SHAPER_GEN_H */