Message ID | 20221123203834.738606-7-jacob.e.keller@intel.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | support direct read from region | expand |
Wed, Nov 23, 2022 at 09:38:31PM CET, jacob.e.keller@intel.com wrote: >To read from a region, user space must currently request a new snapshot of >the region and then read from that snapshot. This can sometimes be overkill >if user space only reads a tiny portion. They first create the snapshot, >then request a read, then destroy the snapshot. > >For regions which have a single underlying "contents", it makes sense to >allow supporting direct reading of the region data. > >Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if >supported. Instead of reporting a missing snapshot id as invalid, check if >the region supports direct read and if so switch to the direct access >method for reading the region data. > >Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> >--- >Changes since v1: >* Update documentation to explain that direct reads are not atomic > > .../networking/devlink/devlink-region.rst | 11 ++++ > include/net/devlink.h | 16 +++++ > net/core/devlink.c | 66 ++++++++++++++----- > 3 files changed, 76 insertions(+), 17 deletions(-) > >diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst >index f06dca9a1eb6..6525a9120a4e 100644 >--- a/Documentation/networking/devlink/devlink-region.rst >+++ b/Documentation/networking/devlink/devlink-region.rst >@@ -31,6 +31,13 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in > the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send > the snapshot information to user space. > >+Regions may optionally allow directly reading from their contents without a >+snapshot. Direct read requests are not atomic. In particular a read request >+of size 256 bytes or larger will be split into multiple chunks. If atomic >+access is required, use a snapshot. A driver wishing to enable this for a >+region should implement the ``.read`` callback in the ``devlink_region_ops`` >+structure. >+ > example usage > ------------- > >@@ -65,6 +72,10 @@ example usage > $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 > 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 > >+ # Read from the region without a snapshot >+ $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16 >+ 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 >+ > As regions are likely very device or driver specific, no generic regions are > defined. See the driver-specific documentation files for information on the > specific regions a driver supports. >diff --git a/include/net/devlink.h b/include/net/devlink.h >index 074a79b8933f..02528f736f65 100644 >--- a/include/net/devlink.h >+++ b/include/net/devlink.h >@@ -650,6 +650,10 @@ struct devlink_info_req; > * the data variable must be updated to point to the snapshot data. > * The function will be called while the devlink instance lock is > * held. >+ * @read: callback to directly read a portion of the region. On success, >+ * the data pointer will be updated with the contents of the >+ * requested portion of the region. The function will be called >+ * while the devlink instance lock is held. > * @priv: Pointer to driver private data for the region operation > */ > struct devlink_region_ops { >@@ -659,6 +663,10 @@ struct devlink_region_ops { > const struct devlink_region_ops *ops, > struct netlink_ext_ack *extack, > u8 **data); >+ int (*read)(struct devlink *devlink, >+ const struct devlink_region_ops *ops, >+ struct netlink_ext_ack *extack, >+ u64 offset, u32 size, u8 *data); > void *priv; > }; > >@@ -670,6 +678,10 @@ struct devlink_region_ops { > * the data variable must be updated to point to the snapshot data. > * The function will be called while the devlink instance lock is > * held. >+ * @read: callback to directly read a portion of the region. On success, >+ * the data pointer will be updated with the contents of the >+ * requested portion of the region. The function will be called >+ * while the devlink instance lock is held. > * @priv: Pointer to driver private data for the region operation > */ > struct devlink_port_region_ops { >@@ -679,6 +691,10 @@ struct devlink_port_region_ops { > const struct devlink_port_region_ops *ops, > struct netlink_ext_ack *extack, > u8 **data); >+ int (*read)(struct devlink_port *port, >+ const struct devlink_port_region_ops *ops, >+ struct netlink_ext_ack *extack, >+ u64 offset, u32 size, u8 *data); > void *priv; > }; > >diff --git a/net/core/devlink.c b/net/core/devlink.c >index 729e2162a4db..f249b5b57677 100644 >--- a/net/core/devlink.c >+++ b/net/core/devlink.c >@@ -6515,6 +6515,26 @@ devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size, > return 0; > } > >+static int >+devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, >+ u64 curr_offset, struct netlink_ext_ack *extack) >+{ >+ struct devlink_region *region = cb_priv; >+ >+ return region->port_ops->read(region->port, region->port_ops, extack, >+ curr_offset, chunk_size, chunk); >+} >+ >+static int >+devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, >+ u64 curr_offset, struct netlink_ext_ack *extack) >+{ >+ struct devlink_region *region = cb_priv; >+ >+ return region->ops->read(region->devlink, region->ops, extack, >+ curr_offset, chunk_size, chunk); >+} >+ > static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, > struct netlink_callback *cb) > { >@@ -6523,12 +6543,12 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, > u64 ret_offset, start_offset, end_offset = U64_MAX; > struct nlattr **attrs = info->attrs; > struct devlink_port *port = NULL; >- struct devlink_snapshot *snapshot; >+ devlink_chunk_fill_t *region_cb; > struct devlink_region *region; > const char *region_name; > struct devlink *devlink; > unsigned int index; >- u32 snapshot_id; >+ void *region_cb_priv; > void *hdr; > int err; > >@@ -6546,12 +6566,6 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, > goto out_unlock; > } > >- if (!attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) { >- NL_SET_ERR_MSG(cb->extack, "No snapshot id provided"); >- err = -EINVAL; >- goto out_unlock; >- } >- > if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { > index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); > >@@ -6577,12 +6591,30 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, > } > > snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; >- snapshot_id = nla_get_u32(snapshot_attr); >- snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); >- if (!snapshot) { >- NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "requested snapshot does not exist"); >- err = -EINVAL; >- goto out_unlock; >+ if (!snapshot_attr) { Hmm. Being pedantic here, it changes userspace expectations: current - message without snapshotid is errored out new - message without snapshotid is happily processed I can imagine some obscure userspace app depending on this behaviour, in theory. Safe would be to add new NLA_FLAG attr DEVLINK_ATTR_REGION_DIRECT or something that would indicate userspace is interested in direct read. Also, the userspace would know right away it this new functionality is supported or not by the kernel. >+ if (!region->ops->read) { >+ NL_SET_ERR_MSG(cb->extack, "requested region does not support direct read"); >+ err = -EOPNOTSUPP; >+ goto out_unlock; >+ } >+ if (port) >+ region_cb = &devlink_region_port_direct_fill; >+ else >+ region_cb = &devlink_region_direct_fill; >+ region_cb_priv = region; >+ } else { >+ struct devlink_snapshot *snapshot; >+ u32 snapshot_id; >+ >+ snapshot_id = nla_get_u32(snapshot_attr); >+ snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); >+ if (!snapshot) { >+ NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "requested snapshot does not exist"); >+ err = -EINVAL; >+ goto out_unlock; >+ } >+ region_cb = &devlink_region_snapshot_fill; >+ region_cb_priv = snapshot; > } > > if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && >@@ -6633,9 +6665,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, > goto nla_put_failure; > } > >- err = devlink_nl_region_read_fill(skb, &devlink_region_snapshot_fill, >- snapshot, start_offset, end_offset, >- &ret_offset, cb->extack); >+ err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, >+ start_offset, end_offset, &ret_offset, >+ cb->extack); > > if (err && err != -EMSGSIZE) > goto nla_put_failure; >-- >2.38.1.420.g319605f8f00e >
On 11/24/2022 1:05 AM, Jiri Pirko wrote: > Wed, Nov 23, 2022 at 09:38:31PM CET, jacob.e.keller@intel.com wrote: > > Hmm. Being pedantic here, it changes userspace expectations: > current - message without snapshotid is errored out > new - message without snapshotid is happily processed > Yea. I guess I'm thinking more from an interactive application standpoint this wouldn't be a problem but from a scripted setup it might do something weird. > I can imagine some obscure userspace app depending on this behaviour, > in theory. Hmm. > > Safe would be to add new NLA_FLAG attr DEVLINK_ATTR_REGION_DIRECT or > something that would indicate userspace is interested in direct read. > Also, the userspace would know right away it this new functionality > is supported or not by the kernel. > The advantage of being able to see that such a feature exists is good. I can rework this to add an attribute. > > >> + if (!region->ops->read) { >> + NL_SET_ERR_MSG(cb->extack, "requested region does not support direct read"); >> + err = -EOPNOTSUPP; >> + goto out_unlock; >> + } >> + if (port) >> + region_cb = &devlink_region_port_direct_fill; >> + else >> + region_cb = &devlink_region_direct_fill; >> + region_cb_priv = region; >> + } else { >> + struct devlink_snapshot *snapshot; >> + u32 snapshot_id; >> + >> + snapshot_id = nla_get_u32(snapshot_attr); >> + snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); >> + if (!snapshot) { >> + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "requested snapshot does not exist"); >> + err = -EINVAL; >> + goto out_unlock; >> + } >> + region_cb = &devlink_region_snapshot_fill; >> + region_cb_priv = snapshot; >> } >> >> if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && >> @@ -6633,9 +6665,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, >> goto nla_put_failure; >> } >> >> - err = devlink_nl_region_read_fill(skb, &devlink_region_snapshot_fill, >> - snapshot, start_offset, end_offset, >> - &ret_offset, cb->extack); >> + err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, >> + start_offset, end_offset, &ret_offset, >> + cb->extack); >> >> if (err && err != -EMSGSIZE) >> goto nla_put_failure; >> -- >> 2.38.1.420.g319605f8f00e >>
diff --git a/Documentation/networking/devlink/devlink-region.rst b/Documentation/networking/devlink/devlink-region.rst index f06dca9a1eb6..6525a9120a4e 100644 --- a/Documentation/networking/devlink/devlink-region.rst +++ b/Documentation/networking/devlink/devlink-region.rst @@ -31,6 +31,13 @@ in its ``devlink_region_ops`` structure. If snapshot id is not set in the ``DEVLINK_CMD_REGION_NEW`` request kernel will allocate one and send the snapshot information to user space. +Regions may optionally allow directly reading from their contents without a +snapshot. Direct read requests are not atomic. In particular a read request +of size 256 bytes or larger will be split into multiple chunks. If atomic +access is required, use a snapshot. A driver wishing to enable this for a +region should implement the ``.read`` callback in the ``devlink_region_ops`` +structure. + example usage ------------- @@ -65,6 +72,10 @@ example usage $ devlink region read pci/0000:00:05.0/fw-health snapshot 1 address 0 length 16 0000000000000000 0014 95dc 0014 9514 0035 1670 0034 db30 + # Read from the region without a snapshot + $ devlink region read pci/0000:00:05.0/fw-health address 16 length 16 + 0000000000000010 0000 0000 ffff ff04 0029 8c00 0028 8cc8 + As regions are likely very device or driver specific, no generic regions are defined. See the driver-specific documentation files for information on the specific regions a driver supports. diff --git a/include/net/devlink.h b/include/net/devlink.h index 074a79b8933f..02528f736f65 100644 --- a/include/net/devlink.h +++ b/include/net/devlink.h @@ -650,6 +650,10 @@ struct devlink_info_req; * the data variable must be updated to point to the snapshot data. * The function will be called while the devlink instance lock is * held. + * @read: callback to directly read a portion of the region. On success, + * the data pointer will be updated with the contents of the + * requested portion of the region. The function will be called + * while the devlink instance lock is held. * @priv: Pointer to driver private data for the region operation */ struct devlink_region_ops { @@ -659,6 +663,10 @@ struct devlink_region_ops { const struct devlink_region_ops *ops, struct netlink_ext_ack *extack, u8 **data); + int (*read)(struct devlink *devlink, + const struct devlink_region_ops *ops, + struct netlink_ext_ack *extack, + u64 offset, u32 size, u8 *data); void *priv; }; @@ -670,6 +678,10 @@ struct devlink_region_ops { * the data variable must be updated to point to the snapshot data. * The function will be called while the devlink instance lock is * held. + * @read: callback to directly read a portion of the region. On success, + * the data pointer will be updated with the contents of the + * requested portion of the region. The function will be called + * while the devlink instance lock is held. * @priv: Pointer to driver private data for the region operation */ struct devlink_port_region_ops { @@ -679,6 +691,10 @@ struct devlink_port_region_ops { const struct devlink_port_region_ops *ops, struct netlink_ext_ack *extack, u8 **data); + int (*read)(struct devlink_port *port, + const struct devlink_port_region_ops *ops, + struct netlink_ext_ack *extack, + u64 offset, u32 size, u8 *data); void *priv; }; diff --git a/net/core/devlink.c b/net/core/devlink.c index 729e2162a4db..f249b5b57677 100644 --- a/net/core/devlink.c +++ b/net/core/devlink.c @@ -6515,6 +6515,26 @@ devlink_region_snapshot_fill(void *cb_priv, u8 *chunk, u32 chunk_size, return 0; } +static int +devlink_region_port_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, struct netlink_ext_ack *extack) +{ + struct devlink_region *region = cb_priv; + + return region->port_ops->read(region->port, region->port_ops, extack, + curr_offset, chunk_size, chunk); +} + +static int +devlink_region_direct_fill(void *cb_priv, u8 *chunk, u32 chunk_size, + u64 curr_offset, struct netlink_ext_ack *extack) +{ + struct devlink_region *region = cb_priv; + + return region->ops->read(region->devlink, region->ops, extack, + curr_offset, chunk_size, chunk); +} + static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, struct netlink_callback *cb) { @@ -6523,12 +6543,12 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, u64 ret_offset, start_offset, end_offset = U64_MAX; struct nlattr **attrs = info->attrs; struct devlink_port *port = NULL; - struct devlink_snapshot *snapshot; + devlink_chunk_fill_t *region_cb; struct devlink_region *region; const char *region_name; struct devlink *devlink; unsigned int index; - u32 snapshot_id; + void *region_cb_priv; void *hdr; int err; @@ -6546,12 +6566,6 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, goto out_unlock; } - if (!attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]) { - NL_SET_ERR_MSG(cb->extack, "No snapshot id provided"); - err = -EINVAL; - goto out_unlock; - } - if (info->attrs[DEVLINK_ATTR_PORT_INDEX]) { index = nla_get_u32(info->attrs[DEVLINK_ATTR_PORT_INDEX]); @@ -6577,12 +6591,30 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, } snapshot_attr = attrs[DEVLINK_ATTR_REGION_SNAPSHOT_ID]; - snapshot_id = nla_get_u32(snapshot_attr); - snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); - if (!snapshot) { - NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "requested snapshot does not exist"); - err = -EINVAL; - goto out_unlock; + if (!snapshot_attr) { + if (!region->ops->read) { + NL_SET_ERR_MSG(cb->extack, "requested region does not support direct read"); + err = -EOPNOTSUPP; + goto out_unlock; + } + if (port) + region_cb = &devlink_region_port_direct_fill; + else + region_cb = &devlink_region_direct_fill; + region_cb_priv = region; + } else { + struct devlink_snapshot *snapshot; + u32 snapshot_id; + + snapshot_id = nla_get_u32(snapshot_attr); + snapshot = devlink_region_snapshot_get_by_id(region, snapshot_id); + if (!snapshot) { + NL_SET_ERR_MSG_ATTR(cb->extack, snapshot_attr, "requested snapshot does not exist"); + err = -EINVAL; + goto out_unlock; + } + region_cb = &devlink_region_snapshot_fill; + region_cb_priv = snapshot; } if (attrs[DEVLINK_ATTR_REGION_CHUNK_ADDR] && @@ -6633,9 +6665,9 @@ static int devlink_nl_cmd_region_read_dumpit(struct sk_buff *skb, goto nla_put_failure; } - err = devlink_nl_region_read_fill(skb, &devlink_region_snapshot_fill, - snapshot, start_offset, end_offset, - &ret_offset, cb->extack); + err = devlink_nl_region_read_fill(skb, region_cb, region_cb_priv, + start_offset, end_offset, &ret_offset, + cb->extack); if (err && err != -EMSGSIZE) goto nla_put_failure;
To read from a region, user space must currently request a new snapshot of the region and then read from that snapshot. This can sometimes be overkill if user space only reads a tiny portion. They first create the snapshot, then request a read, then destroy the snapshot. For regions which have a single underlying "contents", it makes sense to allow supporting direct reading of the region data. Extend the DEVLINK_CMD_REGION_READ to allow direct reading from a region if supported. Instead of reporting a missing snapshot id as invalid, check if the region supports direct read and if so switch to the direct access method for reading the region data. Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> --- Changes since v1: * Update documentation to explain that direct reads are not atomic .../networking/devlink/devlink-region.rst | 11 ++++ include/net/devlink.h | 16 +++++ net/core/devlink.c | 66 ++++++++++++++----- 3 files changed, 76 insertions(+), 17 deletions(-)