Message ID | 20230108103533.10104-1-darinzon@amazon.com (mailing list archive) |
---|---|
Headers | show |
Series | Add devlink support to ena | expand |
On Sun, 8 Jan 2023 10:35:28 +0000 David Arinzon wrote:
> This patchset adds devlink support to the ena driver.
Wrong place, please take a look at
struct kernel_ethtool_ringparam::tx_push
and ETHTOOL_A_RINGS_TX_PUSH. I think you just want to configure
the max size of the TX push, right?
The reload is also an overkill, reload should re-register all driver
objects but the devlink instance, IIRC. You're not even unregistering
the netdev. You should handle this change the same way you handle any
ring size changes.
For future reference - if you ever _actually_ need devlink please use
the devl_* APIs and take the instance locks explicitly. There has not
been a single devlink reload implementation which would get locking
right using the devlink_* APIs
> On Sun, 8 Jan 2023 10:35:28 +0000 David Arinzon wrote: > > This patchset adds devlink support to the ena driver. > > Wrong place, please take a look at > > struct kernel_ethtool_ringparam::tx_push > > and ETHTOOL_A_RINGS_TX_PUSH. I think you just want to configure the > max size of the TX push, right? > Hi Jakub, Thank you for the feedbacks. We're not configuring the max size of the TX push, but effectively the maximal packet header size to be pushed to the device. This is noted in the documentation on patch 5/5 in this patchset. AFAIK, there's no relevant ethtool parameter for this configuration. > The reload is also an overkill, reload should re-register all driver objects > but the devlink instance, IIRC. You're not even unregistering the netdev. > You should handle this change the same way you handle any ring size > changes. > > The LLQ configuration is different from other configurations set via ethtool (like queue size and number of queues). LLQ requires re-negotiation with the device and requires a reset, which is not performed in the ethtool configurations case. It may be possible to unregister/register the netdev, but it is unnecessary in this case, as most of the changes are reflected in the interface and structures between the driver and the device. > For future reference - if you ever _actually_ need devlink please use the > devl_* APIs and take the instance locks explicitly. There has not been a > single devlink reload implementation which would get locking right using > the devlink_* APIs
On Tue, 10 Jan 2023 20:11:23 +0000 Arinzon, David wrote: > > On Sun, 8 Jan 2023 10:35:28 +0000 David Arinzon wrote: > > > This patchset adds devlink support to the ena driver. > > > > Wrong place, please take a look at > > > > struct kernel_ethtool_ringparam::tx_push > > > > and ETHTOOL_A_RINGS_TX_PUSH. I think you just want to configure the > > max size of the TX push, right? > > We're not configuring the max size of the TX push, but effectively the > maximal packet header size to be pushed to the device. > This is noted in the documentation on patch 5/5 in this patchset. > AFAIK, there's no relevant ethtool parameter for this configuration. Perhaps I should have complained about the low quality of that documentation to make it clear that I have in fact read it :/ I read it again - and I still don't know what you're doing. I sounds like inline header length configuration yet you also use LLQ all over the place. And LLQ for ENA is documented as basically tx_push: - **Low Latency Queue (LLQ) mode or "push-mode":** Please explain this in a way which assumes zero Amazon-specific knowledge :( > > The reload is also an overkill, reload should re-register all driver objects > > but the devlink instance, IIRC. You're not even unregistering the netdev. > > You should handle this change the same way you handle any ring size > > changes. > > The LLQ configuration is different from other configurations set via ethtool > (like queue size and number of queues). LLQ requires re-negotiation > with the device and requires a reset, which is not performed in the ethtool > configurations case. What do you mean when you say that reset is not required in the ethool configuration case? AFAIK ethtool config should not (although sadly very often it does) cause any loss of unrelated configuration. But you can certainly reset HW blocks or reneg features with FW or whatever else... > It may be possible to unregister/register the netdev, > but it is unnecessary in this case, as most of the changes are reflected in the > interface and structures between the driver and the device. > > > For future reference - if you ever _actually_ need devlink please use the > > devl_* APIs and take the instance locks explicitly. There has not been a > > single devlink reload implementation which would get locking right using > > the devlink_* APIs
> On Tue, 10 Jan 2023 20:11:23 +0000 Arinzon, David wrote: > > > On Sun, 8 Jan 2023 10:35:28 +0000 David Arinzon wrote: > > > > This patchset adds devlink support to the ena driver. > > > > > > Wrong place, please take a look at > > > > > > struct kernel_ethtool_ringparam::tx_push > > > > > > and ETHTOOL_A_RINGS_TX_PUSH. I think you just want to configure > the > > > max size of the TX push, right? > > > > We're not configuring the max size of the TX push, but effectively the > > maximal packet header size to be pushed to the device. > > This is noted in the documentation on patch 5/5 in this patchset. > > AFAIK, there's no relevant ethtool parameter for this configuration. > > Perhaps I should have complained about the low quality of that > documentation to make it clear that I have in fact read it :/ > Noted, we will see how to improve documentation going forward. > I read it again - and I still don't know what you're doing. > I sounds like inline header length configuration yet you also use LLQ all > over the place. And LLQ for ENA is documented as basically tx_push: > > - **Low Latency Queue (LLQ) mode or "push-mode":** > > Please explain this in a way which assumes zero Amazon-specific > knowledge :( > Low Latency Queues (LLQ) is a mode of operation where the packet headers (up to a defined length) are being written directly to the device memory. Therefore, you are right, the description is similar to tx_push. However, This is not a configurable option while ETHTOOL_A_RINGS_TX_PUSH configures whether to work in a mode or not. If I'm understanding the intent behind ETHTOOL_A_RINGS_TX_PUSH and the implementation in the driver that introduced the feature, it refers to a push of the packet and not just the headers, which is not what the ena driver does. In this patchset, we allow the configuration of an extended size of the Low Latency Queue, meaning, allow enabled another, larger, pre-defined size to be used as a max size of the packet header to be pushed directly to device memory. It is not configurable in value, therefore, it was defined as large LLQ. I hope this provides more clarification, if not, I'll be happy to elaborate further. > > > The reload is also an overkill, reload should re-register all driver > > > objects but the devlink instance, IIRC. You're not even unregistering > the netdev. > > > You should handle this change the same way you handle any ring size > > > changes. > > > > The LLQ configuration is different from other configurations set via > > ethtool (like queue size and number of queues). LLQ requires > > re-negotiation with the device and requires a reset, which is not > > performed in the ethtool configurations case. > > What do you mean when you say that reset is not required in the ethool > configuration case? > > AFAIK ethtool config should not (although sadly very often it does) cause > any loss of unrelated configuration. But you can certainly reset HW blocks > or reneg features with FW or whatever else... > The ena driver currently supports various configurations via ethtool, for example, changing the number of queues/channels and the queue/channel size. These options, for example, require changes in the netdev and the interfaces with the kernel, therefore, we perform a reconfiguration on this level. But, these configurations do not require the interface between the driver and device to be reset. Low Latency Queue mode change from standard to large (what's being configured in this pathset) is more significant functionality change and requires this reset. > > It may be possible to unregister/register the netdev, but it is > > unnecessary in this case, as most of the changes are reflected in the > > interface and structures between the driver and the device. > > > > > For future reference - if you ever _actually_ need devlink please > > > use the > > > devl_* APIs and take the instance locks explicitly. There has not > > > been a single devlink reload implementation which would get locking > > > right using the devlink_* APIs
On Wed, 11 Jan 2023 08:58:46 +0000 Arinzon, David wrote: > > I read it again - and I still don't know what you're doing. > > I sounds like inline header length configuration yet you also use LLQ all > > over the place. And LLQ for ENA is documented as basically tx_push: > > > > - **Low Latency Queue (LLQ) mode or "push-mode":** > > > > Please explain this in a way which assumes zero Amazon-specific > > knowledge :( > > Low Latency Queues (LLQ) is a mode of operation where the packet headers > (up to a defined length) are being written directly to the device memory. > Therefore, you are right, the description is similar to tx_push. However, > This is not a configurable option while ETHTOOL_A_RINGS_TX_PUSH > configures whether to work in a mode or not. > If I'm understanding the intent behind ETHTOOL_A_RINGS_TX_PUSH > and the implementation in the driver that introduced the feature, it > refers to a push of the packet and not just the headers, which is not what > the ena driver does. > > In this patchset, we allow the configuration of an extended size of the > Low Latency Queue, meaning, allow enabled another, larger, pre-defined > size to be used as a max size of the packet header to be pushed directly to > device memory. It is not configurable in value, therefore, it was defined as > large LLQ. > > I hope this provides more clarification, if not, I'll be happy to elaborate further. Thanks, the large missing piece in my understanding is still what the user visible impact of this change is. Without increasing the LLQ entry size, a user who sends packet with long headers will: a) see higher latency thru the NIC, but everything else is the same b) see higher latency and lower overall throughput in terms of PPS c) will have limited access to offloads, because the device requires full access to headers via LLQ for some offloads which one of the three is the closest?
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, January 11, 2023 9:01 PM > To: Arinzon, David <darinzon@amazon.com> > Cc: David Miller <davem@davemloft.net>; netdev@vger.kernel.org; > Machulsky, Zorik <zorik@amazon.com>; Matushevsky, Alexander > <matua@amazon.com>; Bshara, Saeed <saeedb@amazon.com>; Bshara, > Nafea <nafea@amazon.com>; Saidi, Ali <alisaidi@amazon.com>; > Kiyanovski, Arthur <akiyano@amazon.com>; Dagan, Noam > <ndagan@amazon.com>; Agroskin, Shay <shayagr@amazon.com>; Itzko, > Shahar <itzko@amazon.com>; Abboud, Osama <osamaabb@amazon.com> > Subject: RE: [EXTERNAL][PATCH V1 net-next 0/5] Add devlink support to > ena > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > On Wed, 11 Jan 2023 08:58:46 +0000 Arinzon, David wrote: > > > I read it again - and I still don't know what you're doing. > > > I sounds like inline header length configuration yet you also use > > > LLQ all over the place. And LLQ for ENA is documented as basically > tx_push: > > > > > > - **Low Latency Queue (LLQ) mode or "push-mode":** > > > > > > Please explain this in a way which assumes zero Amazon-specific > > > knowledge :( > > > > Low Latency Queues (LLQ) is a mode of operation where the packet > > headers (up to a defined length) are being written directly to the device > memory. > > Therefore, you are right, the description is similar to tx_push. > > However, This is not a configurable option while > > ETHTOOL_A_RINGS_TX_PUSH configures whether to work in a mode or > not. > > If I'm understanding the intent behind ETHTOOL_A_RINGS_TX_PUSH > and the > > implementation in the driver that introduced the feature, it refers to > > a push of the packet and not just the headers, which is not what the > > ena driver does. > > > > In this patchset, we allow the configuration of an extended size of > > the Low Latency Queue, meaning, allow enabled another, larger, > > pre-defined size to be used as a max size of the packet header to be > > pushed directly to device memory. It is not configurable in value, > > therefore, it was defined as large LLQ. > > > > I hope this provides more clarification, if not, I'll be happy to elaborate > further. > > Thanks, the large missing piece in my understanding is still what the user > visible impact of this change is. Without increasing the LLQ entry size, a > user who sends packet with long headers will: > a) see higher latency thru the NIC, but everything else is the same > b) see higher latency and lower overall throughput in terms of PPS > c) will have limited access to offloads, because the device requires > full access to headers via LLQ for some offloads > > which one of the three is the closest? You're right, I went through the documentation again, and I see that the implications of a case where packet headers are longer than the LLQ entry size are not mentioned properly. We'll rework it to explain the motivation to turn on this mode in relevant use cases. If the packet network headers are not within the size of the LLQ entry, then the packet will be dropped. So I'll say that c) describes the impact the best given that certain types of traffic will not succeed or have disruptions due to dropped TX packets.
On Wed, 11 Jan 2023 19:31:39 +0000 Arinzon, David wrote: > If the packet network headers are not within the size of the LLQ entry, then the packet will > be dropped. So I'll say that c) describes the impact the best given that certain types of > traffic will not succeed or have disruptions due to dropped TX packets. I see. Sounds like it could be a fit for DEVLINK_ATTR_ESWITCH_INLINE_MODE ? But that one configures the depth of the headers copied inline, rather than bytes. We could add a value for "tunneled" and have that imply 256B LLQ in case of ena. The other option is to introduce the concept of "max length of inline data" to ethtool, and add a new netlink attribute to ethtool -g. Either way - the length of "inline|push" data is not a ena-specific concept, so using private knobs (devlink params or ethtool flags) is not appropriate upstream. We should add a bona fide uAPI for it.
> -----Original Message----- > From: Jakub Kicinski <kuba@kernel.org> > Sent: Wednesday, January 11, 2023 10:00 PM > To: Arinzon, David <darinzon@amazon.com> > Cc: David Miller <davem@davemloft.net>; netdev@vger.kernel.org; > Machulsky, Zorik <zorik@amazon.com>; Matushevsky, Alexander > <matua@amazon.com>; Bshara, Saeed <saeedb@amazon.com>; Bshara, > Nafea <nafea@amazon.com>; Saidi, Ali <alisaidi@amazon.com>; > Kiyanovski, Arthur <akiyano@amazon.com>; Dagan, Noam > <ndagan@amazon.com>; Agroskin, Shay <shayagr@amazon.com>; Itzko, > Shahar <itzko@amazon.com>; Abboud, Osama <osamaabb@amazon.com> > Subject: RE: [EXTERNAL][PATCH V1 net-next 0/5] Add devlink support to > ena > > CAUTION: This email originated from outside of the organization. Do not > click links or open attachments unless you can confirm the sender and > know the content is safe. > > > > On Wed, 11 Jan 2023 19:31:39 +0000 Arinzon, David wrote: > > If the packet network headers are not within the size of the LLQ > > entry, then the packet will be dropped. So I'll say that c) describes > > the impact the best given that certain types of traffic will not succeed or > have disruptions due to dropped TX packets. > > I see. Sounds like it could be a fit for > DEVLINK_ATTR_ESWITCH_INLINE_MODE ? But that one configures the > depth of the headers copied inline, rather than bytes. We could add a value > for "tunneled" and have that imply 256B LLQ in case of ena. > > The other option is to introduce the concept of "max length of inline data" > to ethtool, and add a new netlink attribute to ethtool -g. > > Either way - the length of "inline|push" data is not a ena-specific concept, > so using private knobs (devlink params or ethtool flags) is not appropriate > upstream. We should add a bona fide uAPI for it. I've looked into the original commit of DEVLINK_ATTR_ESWITCH_INLINE_MODE and it seems to be more related to encapsulation and adding a tunneling concept here would be misleading, as it doesn't cover all the use-cases affected here. It is true that in case of tunneling and if there's a need to access the encapsulated headers, the length of the headers might be larger than the default value in LLQ, but this may be also the case for non-tunneled traffic, for example, IPv6 with TCP SACK option. I'll note again that this is not a configurable value, meaning, the only option is to have 128B (standard) or 256B (large) LLQ, there's no option to set other values, but only choose between the modes, therefore, I don't see how having such an option through ethtool, as you suggested (setting the max length) can be beneficial in our use-case (might be good overall, as you noted, it's a more generic concept). Will put more thought into it.
On Wed, 11 Jan 2023 21:21:16 +0000 Arinzon, David wrote: > I'll note again that this is not a configurable value, meaning, the only option is > to have 128B (standard) or 256B (large) LLQ, there's no option to set other values, > but only choose between the modes, therefore, I don't see how having such an > option through ethtool, as you suggested (setting the max length) can be > beneficial in our use-case (might be good overall, as you noted, it's a more > generic concept). Will put more thought into it. FWIW you can just return an error via extack if user tries to set an unsupported value, like: NL_SET_ERR_MSG(extack, "only X and Y are supported"); and it will pop up as part of the error in ethtool CLI. Or do some rounding up - a lot of drivers does that for ring params already.
On 11/01/2023 22:00, Jakub Kicinski wrote: > On Wed, 11 Jan 2023 19:31:39 +0000 Arinzon, David wrote: >> If the packet network headers are not within the size of the LLQ entry, then the packet will >> be dropped. So I'll say that c) describes the impact the best given that certain types of >> traffic will not succeed or have disruptions due to dropped TX packets. > > I see. Sounds like it could be a fit for > DEVLINK_ATTR_ESWITCH_INLINE_MODE ? But that one configures the depth of > the headers copied inline, rather than bytes. We could add a value for > "tunneled" and have that imply 256B LLQ in case of ena. > > The other option is to introduce the concept of "max length of inline > data" to ethtool, and add a new netlink attribute to ethtool -g. TX copybreak? When the user sets it to > 96 bytes, use the large LLQ. BTW, shouldn't ethtool's tx_push reflect the fact that LLQs are being used? I don't see it used in ena.
Gal Pressman <gal@nvidia.com> writes: > On 11/01/2023 22:00, Jakub Kicinski wrote: >> On Wed, 11 Jan 2023 19:31:39 +0000 Arinzon, David wrote: >>> If the packet network headers are not within the size of the >>> LLQ entry, then the packet will >>> be dropped. So I'll say that c) describes the impact the best >>> given that certain types of >>> traffic will not succeed or have disruptions due to dropped TX >>> packets. >> >> I see. Sounds like it could be a fit for >> DEVLINK_ATTR_ESWITCH_INLINE_MODE ? But that one configures the >> depth of >> the headers copied inline, rather than bytes. We could add a >> value for >> "tunneled" and have that imply 256B LLQ in case of ena. >> >> The other option is to introduce the concept of "max length of >> inline >> data" to ethtool, and add a new netlink attribute to ethtool >> -g. > > TX copybreak? When the user sets it to > 96 bytes, use the large > LLQ. > > BTW, shouldn't ethtool's tx_push reflect the fact that LLQs are > being > used? I don't see it used in ena. Using tx copybreak does sound like it can work for our use case. Thanks for the tip Gal (: Jakub, do you see an issue with utilizing tx_copybreak ethtool parameter instead of the devlink param in this patchset ? Thanks, Shay
On Thu, 12 Jan 2023 15:47:13 +0200 Shay Agroskin wrote: > Gal Pressman <gal@nvidia.com> writes: > > TX copybreak? When the user sets it to > 96 bytes, use the large > > LLQ. > > > > BTW, shouldn't ethtool's tx_push reflect the fact that LLQs are > > being > > used? I don't see it used in ena. > > Using tx copybreak does sound like it can work for our use > case. Thanks for the tip Gal (: > > Jakub, do you see an issue with utilizing tx_copybreak ethtool > parameter instead of the devlink param in this patchset ? IDK, the semantics don't feel close enough. As a user I'd set tx_copybreak only on systems which have IOMMU enabled (or otherwise have high cost of DMA mapping), to save CPU cycles. The ena feature does not seem to be about CPU cycle saving (likely the opposite, in fact), and does not operate on full segments AFAIU. Hence my preference to expose it as a new tx_push_buf_len, combining the semantics of tx_push and rx_buf_len.
On 12/01/2023 21:56, Jakub Kicinski wrote: > On Thu, 12 Jan 2023 15:47:13 +0200 Shay Agroskin wrote: >> Gal Pressman <gal@nvidia.com> writes: >>> TX copybreak? When the user sets it to > 96 bytes, use the large >>> LLQ. >>> >>> BTW, shouldn't ethtool's tx_push reflect the fact that LLQs are >>> being >>> used? I don't see it used in ena. >> >> Using tx copybreak does sound like it can work for our use >> case. Thanks for the tip Gal (: >> >> Jakub, do you see an issue with utilizing tx_copybreak ethtool >> parameter instead of the devlink param in this patchset ? > > IDK, the semantics don't feel close enough. > > As a user I'd set tx_copybreak only on systems which have IOMMU enabled > (or otherwise have high cost of DMA mapping), to save CPU cycles. > > The ena feature does not seem to be about CPU cycle saving (likely > the opposite, in fact), and does not operate on full segments AFAIU. Segments? > Hence my preference to expose it as a new tx_push_buf_len, combining > the semantics of tx_push and rx_buf_len. Sounds like a good idea. To clarify, buf_len here refers to the size of the inline'd part, not the WQE itself, correct? The driver will use whatever WQE size it needs in order to accommodate the requested inline size?
Jakub Kicinski <kuba@kernel.org> writes: > On Thu, 12 Jan 2023 15:47:13 +0200 Shay Agroskin wrote: >> Gal Pressman <gal@nvidia.com> writes: >> > TX copybreak? When the user sets it to > 96 bytes, use the >> > large >> > LLQ. >> > >> > BTW, shouldn't ethtool's tx_push reflect the fact that LLQs >> > are >> > being >> > used? I don't see it used in ena. >> >> Using tx copybreak does sound like it can work for our use >> case. Thanks for the tip Gal (: >> >> Jakub, do you see an issue with utilizing tx_copybreak ethtool >> parameter instead of the devlink param in this patchset ? > > IDK, the semantics don't feel close enough. > > As a user I'd set tx_copybreak only on systems which have IOMMU > enabled > (or otherwise have high cost of DMA mapping), to save CPU > cycles. > > The ena feature does not seem to be about CPU cycle saving > (likely > the opposite, in fact), and does not operate on full segments > AFAIU. > > Hence my preference to expose it as a new tx_push_buf_len, > combining > the semantics of tx_push and rx_buf_len. We'll proceed with working on and RFC that adds the new param to ethtool. Going forward, can I ask what's the community's stand on adding sysfs or procfse entries to the driver as means to tweak custom device attributes ? Thanks, Shay
On Sun, 15 Jan 2023 12:05:33 +0200 Gal Pressman wrote: > > IDK, the semantics don't feel close enough. > > > > As a user I'd set tx_copybreak only on systems which have IOMMU enabled > > (or otherwise have high cost of DMA mapping), to save CPU cycles. > > > > The ena feature does not seem to be about CPU cycle saving (likely > > the opposite, in fact), and does not operate on full segments AFAIU. > > Segments? Complete DMA buffers. Basically whether the optimization only kicks in if skb->len < configured_len or skb_headlen() < configured_len. > > Hence my preference to expose it as a new tx_push_buf_len, combining > > the semantics of tx_push and rx_buf_len. > > Sounds like a good idea. > To clarify, buf_len here refers to the size of the inline'd part, not > the WQE itself, correct? The driver will use whatever WQE size it needs > in order to accommodate the requested inline size? We can decide either way, but I _think_ rx_buf_len refers to the size as allocated, not necessarily usable size (in case the first buffer has padding / headroom). But as long as we clearly document - either way is fine.
On Mon, 16 Jan 2023 16:23:56 +0200 Shay Agroskin wrote: > Going forward, can I ask what's the community's stand on adding > sysfs or procfse entries to the driver as means to tweak custom > device attributes ? We prefer to avoid both of those. If you're worried about user having hard time changing the parameter because the user space ethtool command is out of date - our solution to that will likely be the "Netlink protocol descriptions in YAML". With that thing you don't really need a custom CLI tool to issue Netlink commands. Until it gets merged you may need to hand-craft the right netlink message in python or another scripting language of choice :(