Message ID | 20240812145633.52911-1-jdamato@fastly.com (mailing list archive) |
---|---|
Headers | show |
Series | Cleanup IRQ affinity checks in several drivers | expand |
On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote: > Several drivers make a check in their napi poll functions to determine > if the CPU affinity of the IRQ has changed. If it has, the napi poll > function returns a value less than the budget to force polling mode to > be disabled, so that it can be rescheduled on the correct CPU next time > the softirq is raised. Any reason not to use the irq number already stored in napi_struct ?
On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote: > On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote: > > Several drivers make a check in their napi poll functions to determine > > if the CPU affinity of the IRQ has changed. If it has, the napi poll > > function returns a value less than the budget to force polling mode to > > be disabled, so that it can be rescheduled on the correct CPU next time > > the softirq is raised. > > Any reason not to use the irq number already stored in napi_struct ? Thanks for taking a look. IIUC, that's possible if i40e, iavf, and gve are updated to call netif_napi_set_irq first, which I could certainly do. But as Stanislav points out, I would be adding a call to irq_get_effective_affinity_mask in the hot path where one did not exist before for 4 of 5 drivers. In that case, it might make more sense to introduce: bool napi_affinity_no_change(const struct cpumask *aff_mask) instead and the drivers which have a cached mask can pass it in and gve can be updated later to cache it. Not sure how crucial avoiding the irq_get_effective_affinity_mask call is; I would guess maybe some driver owners would object to adding a new call in the hot path where one didn't exist before. What do you think?
On Wed, Aug 14, 2024 at 08:14:48AM +0100, Joe Damato wrote: > On Tue, Aug 13, 2024 at 05:17:10PM -0700, Jakub Kicinski wrote: > > On Mon, 12 Aug 2024 14:56:21 +0000 Joe Damato wrote: > > > Several drivers make a check in their napi poll functions to determine > > > if the CPU affinity of the IRQ has changed. If it has, the napi poll > > > function returns a value less than the budget to force polling mode to > > > be disabled, so that it can be rescheduled on the correct CPU next time > > > the softirq is raised. > > > > Any reason not to use the irq number already stored in napi_struct ? > > Thanks for taking a look. > > IIUC, that's possible if i40e, iavf, and gve are updated to call > netif_napi_set_irq first, which I could certainly do. > > But as Stanislav points out, I would be adding a call to > irq_get_effective_affinity_mask in the hot path where one did not > exist before for 4 of 5 drivers. > > In that case, it might make more sense to introduce: > > bool napi_affinity_no_change(const struct cpumask *aff_mask) > > instead and the drivers which have a cached mask can pass it in and > gve can be updated later to cache it. > > Not sure how crucial avoiding the irq_get_effective_affinity_mask > call is; I would guess maybe some driver owners would object to > adding a new call in the hot path where one didn't exist before. > > What do you think? Actually... how about a slightly different approach, which caches the affinity mask in the core? 0. Extend napi struct to have a struct cpumask * field 1. extend netif_napi_set_irq to: a. store the IRQ number in the napi struct (as you suggested) b. call irq_get_effective_affinity_mask to store the mask in the napi struct c. set up generic affinity_notify.notify and affinity_notify.release callbacks to update the in core mask when it changes 2. add napi_affinity_no_change which now takes a napi_struct 3. cleanup all 5 drivers: a. add calls to netif_napi_set_irq for all 5 (I think no RTNL is needed, so I think this would be straight forward?) b. remove all affinity_mask caching code in 4 of 5 drivers c. update all 5 drivers to call napi_affinity_no_change in poll Then ... anyone who adds support for netif_napi_set_irq to their driver in the future gets automatic support in-core for caching/updating of the mask? And in the future netdev-genl could dump the mask since its in-core? I'll mess around with that locally to see how it looks, but let me know if that sounds like a better overall approach. - Joe
On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote: > Actually... how about a slightly different approach, which caches > the affinity mask in the core? I was gonna say :) > 0. Extend napi struct to have a struct cpumask * field > > 1. extend netif_napi_set_irq to: > a. store the IRQ number in the napi struct (as you suggested) > b. call irq_get_effective_affinity_mask to store the mask in the > napi struct > c. set up generic affinity_notify.notify and > affinity_notify.release callbacks to update the in core mask > when it changes This part I'm not an export on. > 2. add napi_affinity_no_change which now takes a napi_struct > > 3. cleanup all 5 drivers: > a. add calls to netif_napi_set_irq for all 5 (I think no RTNL > is needed, so I think this would be straight forward?) > b. remove all affinity_mask caching code in 4 of 5 drivers > c. update all 5 drivers to call napi_affinity_no_change in poll > > Then ... anyone who adds support for netif_napi_set_irq to their > driver in the future gets automatic support in-core for > caching/updating of the mask? And in the future netdev-genl could > dump the mask since its in-core? > > I'll mess around with that locally to see how it looks, but let me > know if that sounds like a better overall approach. Could we even handle this directly as part of __napi_poll(), once the driver gives core all of the relevant pieces of information ?
On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote: > On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote: > > Actually... how about a slightly different approach, which caches > > the affinity mask in the core? > > I was gonna say :) > > > 0. Extend napi struct to have a struct cpumask * field > > > > 1. extend netif_napi_set_irq to: > > a. store the IRQ number in the napi struct (as you suggested) > > b. call irq_get_effective_affinity_mask to store the mask in the > > napi struct > > c. set up generic affinity_notify.notify and > > affinity_notify.release callbacks to update the in core mask > > when it changes > > This part I'm not an export on. > > > 2. add napi_affinity_no_change which now takes a napi_struct > > > > 3. cleanup all 5 drivers: > > a. add calls to netif_napi_set_irq for all 5 (I think no RTNL > > is needed, so I think this would be straight forward?) > > b. remove all affinity_mask caching code in 4 of 5 drivers > > c. update all 5 drivers to call napi_affinity_no_change in poll > > > > Then ... anyone who adds support for netif_napi_set_irq to their > > driver in the future gets automatic support in-core for > > caching/updating of the mask? And in the future netdev-genl could > > dump the mask since its in-core? > > > > I'll mess around with that locally to see how it looks, but let me > > know if that sounds like a better overall approach. I ended up going with the approach laid out above; moving the IRQ affinity mask updating code into the core (which adds that ability to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers setup to update the cached copy?) and adding calls to netif_napi_set_irq in i40e/iavf and deleting their custom notifier code. It's almost ready for rfcv2; I think this approach is probably better ? > Could we even handle this directly as part of __napi_poll(), > once the driver gives core all of the relevant pieces of information ? I had been thinking the same thing, too, but it seems like at least one driver (mlx5) counts the number of affinity changes to export as a stat, so moving all of this to core would break that. So, I may avoid attempting that for this series. I'm still messing around with this but will send an rfcv2 in a bit.
On 14/08/2024 18:19, Joe Damato wrote: > On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote: >> On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote: >>> Actually... how about a slightly different approach, which caches >>> the affinity mask in the core? >> >> I was gonna say :) >> >>> 0. Extend napi struct to have a struct cpumask * field >>> >>> 1. extend netif_napi_set_irq to: >>> a. store the IRQ number in the napi struct (as you suggested) >>> b. call irq_get_effective_affinity_mask to store the mask in the >>> napi struct >>> c. set up generic affinity_notify.notify and >>> affinity_notify.release callbacks to update the in core mask >>> when it changes >> >> This part I'm not an export on. several net drivers (mlx5, mlx4, ice, ena and more) are using a feature called ARFS (rmap)[1], and this feature is using the affinity notifier mechanism. Also, affinity notifier infra is supporting only a single notifier per IRQ. Hence, your suggestion (1.c) will break the ARFS feature. [1] see irq_cpu_rmap_add() >> >>> 2. add napi_affinity_no_change which now takes a napi_struct >>> >>> 3. cleanup all 5 drivers: >>> a. add calls to netif_napi_set_irq for all 5 (I think no RTNL >>> is needed, so I think this would be straight forward?) >>> b. remove all affinity_mask caching code in 4 of 5 drivers >>> c. update all 5 drivers to call napi_affinity_no_change in poll >>> >>> Then ... anyone who adds support for netif_napi_set_irq to their >>> driver in the future gets automatic support in-core for >>> caching/updating of the mask? And in the future netdev-genl could >>> dump the mask since its in-core? >>> >>> I'll mess around with that locally to see how it looks, but let me >>> know if that sounds like a better overall approach. > > I ended up going with the approach laid out above; moving the IRQ > affinity mask updating code into the core (which adds that ability > to gve/mlx4/mlx5... it seems mlx4/5 cached but didn't have notifiers > setup to update the cached copy?) maybe This is probably due to what I wrote above.. > and adding calls to > netif_napi_set_irq in i40e/iavf and deleting their custom notifier > code. > > It's almost ready for rfcv2; I think this approach is probably > better ? > >> Could we even handle this directly as part of __napi_poll(), >> once the driver gives core all of the relevant pieces of information ? > > I had been thinking the same thing, too, but it seems like at least > one driver (mlx5) counts the number of affinity changes to export as > a stat, so moving all of this to core would break that. > > So, I may avoid attempting that for this series. > > I'm still messing around with this but will send an rfcv2 in a bit. >
On Wed, Aug 14, 2024 at 07:03:35PM +0300, Shay Drori wrote: > > > On 14/08/2024 18:19, Joe Damato wrote: > > On Wed, Aug 14, 2024 at 08:09:15AM -0700, Jakub Kicinski wrote: > > > On Wed, 14 Aug 2024 13:12:08 +0100 Joe Damato wrote: > > > > Actually... how about a slightly different approach, which caches > > > > the affinity mask in the core? > > > > > > I was gonna say :) > > > > > > > 0. Extend napi struct to have a struct cpumask * field > > > > > > > > 1. extend netif_napi_set_irq to: > > > > a. store the IRQ number in the napi struct (as you suggested) > > > > b. call irq_get_effective_affinity_mask to store the mask in the > > > > napi struct > > > > c. set up generic affinity_notify.notify and > > > > affinity_notify.release callbacks to update the in core mask > > > > when it changes > > > > > > This part I'm not an export on. > > several net drivers (mlx5, mlx4, ice, ena and more) are using a feature > called ARFS (rmap)[1], and this feature is using the affinity notifier > mechanism. > Also, affinity notifier infra is supporting only a single notifier per > IRQ. > > Hence, your suggestion (1.c) will break the ARFS feature. > > [1] see irq_cpu_rmap_add() Thanks for taking a look and your reply. I did notice ARFS use by some drivers and figured that might be why the notifiers were being used in some cases. I guess the question comes down to whether adding a call to irq_get_effective_affinity_mask in the hot path is a bad idea. If it is, then the only option is to have the drivers pass in their IRQ affinity masks, as Stanislav suggested, to avoid adding that call to the hot path. If not, then the IRQ from napi_struct can be used and the affinity mask can be generated on every napi poll. i40e/gve/iavf would need calls to netif_napi_set_irq to set the IRQ mapping, which seems to be straightforward. In both cases: the IRQ notifier stuff would be left as is so that it wouldn't break ARFS. I suspect that the preferred solution would be to avoid adding that call to the hot path, right?
On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote: > If it is, then the only option is to have the drivers pass in their > IRQ affinity masks, as Stanislav suggested, to avoid adding that > call to the hot path. > > If not, then the IRQ from napi_struct can be used and the affinity > mask can be generated on every napi poll. i40e/gve/iavf would need > calls to netif_napi_set_irq to set the IRQ mapping, which seems to > be straightforward. It's a bit sad to have the generic solution blocked. cpu_rmap_update() is exported. Maybe we can call it from our notifier? rmap lives in struct net_device
On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote: > On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote: > > If it is, then the only option is to have the drivers pass in their > > IRQ affinity masks, as Stanislav suggested, to avoid adding that > > call to the hot path. > > > > If not, then the IRQ from napi_struct can be used and the affinity > > mask can be generated on every napi poll. i40e/gve/iavf would need > > calls to netif_napi_set_irq to set the IRQ mapping, which seems to > > be straightforward. > > It's a bit sad to have the generic solution blocked. > cpu_rmap_update() is exported. Maybe we can call it from our notifier? > rmap lives in struct net_device I agree on the sadness. I will take a look today. I guess if we were being really ambitious, we'd try to move ARFS stuff into the core (as RSS was moved into the core).
On 15/08/2024 13:22, Joe Damato wrote: > External email: Use caution opening links or attachments > > > On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote: >> On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote: >>> If it is, then the only option is to have the drivers pass in their >>> IRQ affinity masks, as Stanislav suggested, to avoid adding that >>> call to the hot path. >>> >>> If not, then the IRQ from napi_struct can be used and the affinity >>> mask can be generated on every napi poll. i40e/gve/iavf would need >>> calls to netif_napi_set_irq to set the IRQ mapping, which seems to >>> be straightforward. >> >> It's a bit sad to have the generic solution blocked. >> cpu_rmap_update() is exported. Maybe we can call it from our notifier? >> rmap lives in struct net_device > > I agree on the sadness. I will take a look today. > > I guess if we were being really ambitious, we'd try to move ARFS > stuff into the core (as RSS was moved into the core). Sorry for the late reply. Maybe we can modify affinity notifier infra to support more than a single notifier per IRQ. @Thomas, do you know why only a single notifier per IRQ is supported?
On Tue, Aug 20, 2024 at 09:40:31AM +0300, Shay Drori wrote: > > > On 15/08/2024 13:22, Joe Damato wrote: > > External email: Use caution opening links or attachments > > > > > > On Wed, Aug 14, 2024 at 05:20:46PM -0700, Jakub Kicinski wrote: > > > On Wed, 14 Aug 2024 19:01:40 +0100 Joe Damato wrote: > > > > If it is, then the only option is to have the drivers pass in their > > > > IRQ affinity masks, as Stanislav suggested, to avoid adding that > > > > call to the hot path. > > > > > > > > If not, then the IRQ from napi_struct can be used and the affinity > > > > mask can be generated on every napi poll. i40e/gve/iavf would need > > > > calls to netif_napi_set_irq to set the IRQ mapping, which seems to > > > > be straightforward. > > > > > > It's a bit sad to have the generic solution blocked. > > > cpu_rmap_update() is exported. Maybe we can call it from our notifier? > > > rmap lives in struct net_device > > > > I agree on the sadness. I will take a look today. > > > > I guess if we were being really ambitious, we'd try to move ARFS > > stuff into the core (as RSS was moved into the core). > > > Sorry for the late reply. Maybe we can modify affinity notifier infra to > support more than a single notifier per IRQ. > @Thomas, do you know why only a single notifier per IRQ is supported? Sorry for the delayed response as well on my side; I've been in between lots of different kernel RFCs :) Jakub: the issue seems to be that the internals in lib/cpu_rmap.c are needed to call cpu_rmap_update. It's probably possible to expose them somehow so that a generic IRQ notifier could call cpu_rmap_update, as you mentioned, but some rewiring is going to be needed, I think. I had a couple ideas for rewiring stuff, but I haven't had time to context switch back on to this work as I've been busy with a few other things (the IRQ suspension stuff and another mlx5 thing I have yet to send upstream). I hope to take another look at it this week, but I welcome any suggestions from Shay/Thomas in the meantime. - Joe