diff mbox series

[RFC,net-next,v2,1/2] devlink: add whole device devlink instance

Message ID 20250219164410.35665-2-przemyslaw.kitszel@intel.com (mailing list archive)
State RFC
Delegated to: Netdev Maintainers
Headers show
Series devlink: whole-device, resource .occ_set() | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit fail Errors and warnings before: 0 this patch: 1
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers warning 1 maintainers not CCed: horms@kernel.org
netdev/build_clang fail Errors and warnings before: 0 this patch: 2
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn fail Errors and warnings before: 0 this patch: 2
netdev/checkpatch warning WARNING: 'Intented' may be misspelled - perhaps 'Intended'? WARNING: line length of 88 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc fail Errors and warnings before: 6 this patch: 7
netdev/source_inline success Was 0 now: 0

Commit Message

Przemek Kitszel Feb. 19, 2025, 4:32 p.m. UTC
Add a support for whole device devlink instance. Intented as a entity
over all PF devices on given physical device.

In case of ice driver we have multiple PF devices (with their devlink
dev representation), that have separate drivers loaded. However those
still do share lots of resources due to being the on same HW. Examples
include PTP clock and RSS LUT. Historically such stuff was assigned to
PF0, but that was both not clear and not working well. Now such stuff
is moved to be covered into struct ice_adapter, there is just one instance
of such per HW.

This patch adds a devlink instance that corresponds to that ice_adapter,
to allow arbitrage over resources (as RSS LUT) via it (further in the
series (RFC NOTE: stripped out so far)).

Thanks to Wojciech Drewek for very nice naming of the devlink instance:
PF0:		pci/0000:00:18.0
whole-dev:	pci/0000:00:18
But I made this a param for now (driver is free to pass just "whole-dev").

$ devlink dev # (Interesting part of output only)
pci/0000:af:00:
  nested_devlink:
    pci/0000:af:00.0
    pci/0000:af:00.1
    pci/0000:af:00.2
    pci/0000:af:00.3
    pci/0000:af:00.4
    pci/0000:af:00.5
    pci/0000:af:00.6
    pci/0000:af:00.7

Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com>
---
 net/devlink/devl_internal.h | 14 +++++----
 net/devlink/core.c          | 58 +++++++++++++++++++++++++++++--------
 net/devlink/netlink.c       |  4 +--
 net/devlink/port.c          |  4 +--
 4 files changed, 58 insertions(+), 22 deletions(-)

Comments

Jacob Keller Feb. 19, 2025, 10:11 p.m. UTC | #1
On 2/19/2025 8:32 AM, Przemek Kitszel wrote:
> Add a support for whole device devlink instance. Intented as a entity
> over all PF devices on given physical device.
> 
> In case of ice driver we have multiple PF devices (with their devlink
> dev representation), that have separate drivers loaded. However those
> still do share lots of resources due to being the on same HW. Examples
> include PTP clock and RSS LUT. Historically such stuff was assigned to
> PF0, but that was both not clear and not working well. Now such stuff
> is moved to be covered into struct ice_adapter, there is just one instance
> of such per HW.
> 
> This patch adds a devlink instance that corresponds to that ice_adapter,
> to allow arbitrage over resources (as RSS LUT) via it (further in the
> series (RFC NOTE: stripped out so far)).
> 
> Thanks to Wojciech Drewek for very nice naming of the devlink instance:
> PF0:		pci/0000:00:18.0
> whole-dev:	pci/0000:00:18
> But I made this a param for now (driver is free to pass just "whole-dev").
> 
> $ devlink dev # (Interesting part of output only)
> pci/0000:af:00:
>   nested_devlink:
>     pci/0000:af:00.0
>     pci/0000:af:00.1
>     pci/0000:af:00.2
>     pci/0000:af:00.3
>     pci/0000:af:00.4
>     pci/0000:af:00.5
>     pci/0000:af:00.6
>     pci/0000:af:00.7
> 

This adds an additional devlink interface instead of replacing the
existing scheme? Seems reasonable to avoid compatibility issues with
older driver versions. I had wanted to use a single instance pretty much
from my initial attempts at flash update, but ended up giving up at the
time.

I do like that we can see the nesting so its clear which ones are connected.

One downside to this approach is in dealing with something like direct
assignment with virtualization. In practice, I think that already exists
because of HW limitations, and I would expect most such setups want to
assign the entire device rather than just one of its functions.
Jakub Kicinski Feb. 21, 2025, 1:45 a.m. UTC | #2
On Wed, 19 Feb 2025 17:32:54 +0100 Przemek Kitszel wrote:
> Add a support for whole device devlink instance. Intented as a entity
> over all PF devices on given physical device.
> 
> In case of ice driver we have multiple PF devices (with their devlink
> dev representation), that have separate drivers loaded. However those
> still do share lots of resources due to being the on same HW. Examples
> include PTP clock and RSS LUT. Historically such stuff was assigned to
> PF0, but that was both not clear and not working well. Now such stuff
> is moved to be covered into struct ice_adapter, there is just one instance
> of such per HW.
> 
> This patch adds a devlink instance that corresponds to that ice_adapter,
> to allow arbitrage over resources (as RSS LUT) via it (further in the
> series (RFC NOTE: stripped out so far)).
> 
> Thanks to Wojciech Drewek for very nice naming of the devlink instance:
> PF0:		pci/0000:00:18.0
> whole-dev:	pci/0000:00:18
> But I made this a param for now (driver is free to pass just "whole-dev").

Which only works nicely if you're talking about functions not full
separate links :) When I was thinking about it a while back my
intuition was that we should have a single instance, just accessible
under multiple names. But I'm not married to that direction if there
are problems with it.

> $ devlink dev # (Interesting part of output only)
> pci/0000:af:00:
>   nested_devlink:
>     pci/0000:af:00.0
>     pci/0000:af:00.1
>     pci/0000:af:00.2
>     pci/0000:af:00.3
>     pci/0000:af:00.4
>     pci/0000:af:00.5
>     pci/0000:af:00.6
>     pci/0000:af:00.7

Could you go into more details on what stays on the "nested" instances
and what moves to the "whole-dev"? Jiri recently pointed out to y'all
cases where stuff that should be a port attribute was an instance
attribute.
diff mbox series

Patch

diff --git a/net/devlink/devl_internal.h b/net/devlink/devl_internal.h
index 14eaad9cfe35..073afe02ce2f 100644
--- a/net/devlink/devl_internal.h
+++ b/net/devlink/devl_internal.h
@@ -49,6 +49,8 @@  struct devlink {
 	struct xarray snapshot_ids;
 	struct devlink_dev_stats stats;
 	struct device *dev;
+	const char *dev_name;
+	const char *bus_name;
 	possible_net_t _net;
 	/* Serializes access to devlink instance specific objects such as
 	 * port, sb, dpipe, resource, params, region, traps and more.
@@ -104,15 +106,15 @@  static inline bool devl_is_registered(struct devlink *devlink)
 
 static inline void devl_dev_lock(struct devlink *devlink, bool dev_lock)
 {
-	if (dev_lock)
+	if (dev_lock && devlink->dev)
 		device_lock(devlink->dev);
 	devl_lock(devlink);
 }
 
 static inline void devl_dev_unlock(struct devlink *devlink, bool dev_lock)
 {
 	devl_unlock(devlink);
-	if (dev_lock)
+	if (dev_lock && devlink->dev)
 		device_unlock(devlink->dev);
 }
 
@@ -174,9 +176,9 @@  devlink_dump_state(struct netlink_callback *cb)
 static inline int
 devlink_nl_put_handle(struct sk_buff *msg, struct devlink *devlink)
 {
-	if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->dev->bus->name))
+	if (nla_put_string(msg, DEVLINK_ATTR_BUS_NAME, devlink->bus_name))
 		return -EMSGSIZE;
-	if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, dev_name(devlink->dev)))
+	if (nla_put_string(msg, DEVLINK_ATTR_DEV_NAME, devlink->dev_name))
 		return -EMSGSIZE;
 	return 0;
 }
@@ -209,8 +211,8 @@  static inline void devlink_nl_obj_desc_init(struct devlink_obj_desc *desc,
 					    struct devlink *devlink)
 {
 	memset(desc, 0, sizeof(*desc));
-	desc->bus_name = devlink->dev->bus->name;
-	desc->dev_name = dev_name(devlink->dev);
+	desc->bus_name = devlink->bus_name;
+	desc->dev_name = devlink->dev_name;
 }
 
 static inline void devlink_nl_obj_desc_port_set(struct devlink_obj_desc *desc,
diff --git a/net/devlink/core.c b/net/devlink/core.c
index f49cd83f1955..f4960074b845 100644
--- a/net/devlink/core.c
+++ b/net/devlink/core.c
@@ -397,26 +397,25 @@  void devlink_unregister(struct devlink *devlink)
 EXPORT_SYMBOL_GPL(devlink_unregister);
 
 /**
- *	devlink_alloc_ns - Allocate new devlink instance resources
- *	in specific namespace
+ *	devlink_alloc_wrapper - Allocate a new devlink instance resources
+ *	for a SW wrapper over multiple HW devlink instances
  *
  *	@ops: ops
  *	@priv_size: size of user private data
- *	@net: net namespace
- *	@dev: parent device
+ *	@bus_name: user visible bus name
+ *	@dev_name: user visible device name
  *
- *	Allocate new devlink instance resources, including devlink index
- *	and name.
+ *	Allocate new devlink instance resources, including devlink index.
  */
-struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
-				 size_t priv_size, struct net *net,
-				 struct device *dev)
+struct devlink *devlink_alloc_wrapper(const struct devlink_ops *ops,
+				      size_t priv_size, const char *bus_name,
+				      const char *dev_name)
 {
 	struct devlink *devlink;
 	static u32 last_id;
 	int ret;
 
-	WARN_ON(!ops || !dev);
+	WARN_ON(!ops);
 	if (!devlink_reload_actions_valid(ops))
 		return NULL;
 
@@ -429,13 +428,14 @@  struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	if (ret < 0)
 		goto err_xa_alloc;
 
-	devlink->dev = get_device(dev);
 	devlink->ops = ops;
+	devlink->bus_name = bus_name;
+	devlink->dev_name = dev_name;
 	xa_init_flags(&devlink->ports, XA_FLAGS_ALLOC);
 	xa_init_flags(&devlink->params, XA_FLAGS_ALLOC);
 	xa_init_flags(&devlink->snapshot_ids, XA_FLAGS_ALLOC);
 	xa_init_flags(&devlink->nested_rels, XA_FLAGS_ALLOC);
-	write_pnet(&devlink->_net, net);
+	write_pnet(&devlink->_net, &init_net);
 	INIT_LIST_HEAD(&devlink->rate_list);
 	INIT_LIST_HEAD(&devlink->linecard_list);
 	INIT_LIST_HEAD(&devlink->sb_list);
@@ -458,6 +458,40 @@  struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
 	kvfree(devlink);
 	return NULL;
 }
+EXPORT_SYMBOL_GPL(devlink_alloc_wrapper);
+
+/**
+ *	devlink_alloc_ns - Allocate new devlink instance resources
+ *	in specific namespace
+ *
+ *	@ops: ops
+ *	@priv_size: size of user private data
+ *	@net: net namespace
+ *	@dev: parent device
+ *
+ *	Allocate new devlink instance resources, including devlink index
+ *	and name.
+ */
+struct devlink *devlink_alloc_ns(const struct devlink_ops *ops,
+				 size_t priv_size, struct net *net,
+				 struct device *dev)
+{
+	struct devlink *devlink;
+
+	if (WARN_ON(!dev))
+		return NULL;
+
+	dev = get_device(dev);
+	devlink = devlink_alloc_wrapper(ops, priv_size, dev->bus->name,
+					dev_name(dev));
+	if (!devlink) {
+		put_device(dev);
+		return NULL;
+	}
+	devlink->dev = dev;
+	write_pnet(&devlink->_net, net);
+	return devlink;
+}
 EXPORT_SYMBOL_GPL(devlink_alloc_ns);
 
 /**
diff --git a/net/devlink/netlink.c b/net/devlink/netlink.c
index 593605c1b1ef..3f73ced2d879 100644
--- a/net/devlink/netlink.c
+++ b/net/devlink/netlink.c
@@ -193,8 +193,8 @@  devlink_get_from_attrs_lock(struct net *net, struct nlattr **attrs,
 	devname = nla_data(attrs[DEVLINK_ATTR_DEV_NAME]);
 
 	devlinks_xa_for_each_registered_get(net, index, devlink) {
-		if (strcmp(devlink->dev->bus->name, busname) == 0 &&
-		    strcmp(dev_name(devlink->dev), devname) == 0) {
+		if (strcmp(devlink->bus_name, busname) == 0 &&
+		    strcmp(devlink->dev_name, devname) == 0) {
 			devl_dev_lock(devlink, dev_lock);
 			if (devl_is_registered(devlink))
 				return devlink;
diff --git a/net/devlink/port.c b/net/devlink/port.c
index 939081a0e615..508ecf34d41a 100644
--- a/net/devlink/port.c
+++ b/net/devlink/port.c
@@ -220,8 +220,8 @@  size_t devlink_nl_port_handle_size(struct devlink_port *devlink_port)
 {
 	struct devlink *devlink = devlink_port->devlink;
 
-	return nla_total_size(strlen(devlink->dev->bus->name) + 1) /* DEVLINK_ATTR_BUS_NAME */
-	     + nla_total_size(strlen(dev_name(devlink->dev)) + 1) /* DEVLINK_ATTR_DEV_NAME */
+	return nla_total_size(strlen(devlink->bus_name) + 1) /* DEVLINK_ATTR_BUS_NAME */
+	     + nla_total_size(strlen(devlink->dev_name) + 1) /* DEVLINK_ATTR_DEV_NAME */
 	     + nla_total_size(4); /* DEVLINK_ATTR_PORT_INDEX */
 }