mbox series

[net-next,v8,0/6] net: napi: add CPU affinity to napi->config

Message ID 20250211210657.428439-1-ahmed.zaki@intel.com (mailing list archive)
Headers show
Series net: napi: add CPU affinity to napi->config | expand

Message

Ahmed Zaki Feb. 11, 2025, 9:06 p.m. UTC
Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
after reset. However, since there can be only one IRQ affinity notifier
for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
management in the core (which also registers separate IRQ affinity
notifiers).   

Move the IRQ affinity management to the napi struct. This way we can have
a unified IRQ notifier to re-apply the user-set affinity and also manage
the ARFS rmaps.

Ice does not always delete the NAPIS before releasing the IRQs. The first
patch makes sure the driver removes the IRQ number along with the queue
when the NAPIs are disabled. Without this, the next patches in this series
would free the IRQ before releasing the IRQ notifier (which generates
warnings).

The second patch moves the ARFS rmap management to CORE. Patch 3 adds the
IRQ affinity mask to napi_config and re-applies the mask after reset.
Patches 4-6 use the new API for bnxt, ice and idpf drivers.

Tested on bnxt, ice and idpf.

V8:
    - Add a new patch in "ice" that releases the IRQs and their notifiers
      when clearing the NAPI queues (pls read 3rd paragraph above).
    - Add a new NAPI flag "NAPI_STATE_HAS_NOTIFIER" that simplifies the
      code for IRQ notifier detection (Patch 2).
    - Move the IRQ notifier auto-removal to napi_delete() instead of
      napi_disable(). This is the reason for the new ice patch. (Jakub)
    - Add a WARN_ON_ONCE(!napi->config) in netif_napi_set_irq_locked().
      This would detect drivers that asked for irq_affinity_auto but did
      not use napi_add_config(). (Patch 3) (Joe)
    - Rename netif_enable_irq_affinity() to netif_set_affinity_auto()
      (Patch 3) (Jakub).
V7:
    - https://lore.kernel.org/netdev/20250204220622.156061-1-ahmed.zaki@intel.com/
    - P1: add documentation for netif_enable_cpu_rmap()
    - P1: move a couple of "if (rx_cpu_rmap_auto)" from patch 1 to patch 2
      where they are really needed.
    - P1: remove a defensive "if (!rmap)"
    - p1: In netif_disable_cpu_rmap(), remove the for loop that frees
          notifiers since this is already done in napi_disable_locked().
          Also rename it to netif_del_cpu_rmap().
    - P1 and P2: simplify the if conditions in netif_napi_set_irq_locked()
    - Other nits

V6:
    - https://lore.kernel.org/netdev/20250118003335.155379-1-ahmed.zaki@intel.com/
    - Modifications to have less #ifdef CONFIG_RF_ACCL guards
    - Remove rmap entry in napi_disable
    - Rebase on rc7 and use netif_napi_set_irq_locked()
    - Assume IRQ can be -1 and free resources if an old valid IRQ was
      associated with the napi. For this, I had to merge the first 2
      patches to use the new rmap API.

V5:
    - https://lore.kernel.org/netdev/20250113171042.158123-1-ahmed.zaki@intel.com/
    - Add kernel doc for new netdev flags (Simon).
    - Remove defensive (if !napi) check in napi_irq_cpu_rmap_add()
      (patch 2) since caller is already dereferencing the pointer (Simon).
    - Fix build error when CONFIG_ARFS_ACCEL is not defined (patch 3).

v4:
    - https://lore.kernel.org/netdev/20250109233107.17519-1-ahmed.zaki@intel.com/
    - Better introduction in the cover letter.
    - Fix Kernel build errors in ena_init_rx_cpu_rmap() (Patch 1)
    - Fix kernel test robot warnings reported by Dan Carpenter:
      https://lore.kernel.org/all/202501050625.nY1c97EX-lkp@intel.com/
    - Remove unrelated empty line in patch 4 (Kalesh Anakkur Purayil)
    - Fix a memleak (rmap was not freed) by calling cpu_rmap_put() in
      netif_napi_affinity_release() (patch 2).

v3:
    - https://lore.kernel.org/netdev/20250104004314.208259-1-ahmed.zaki@intel.com/
    - Assign one cpu per mask starting from local NUMA node (Shay Drori).
    - Keep the new ARFS and Affinity flags per nedev (Jakub).

v2:
    - https://lore.kernel.org/netdev/202412190454.nwvp3hU2-lkp@intel.com/T/
    - Also move the ARFS IRQ affinity management from drivers to core. Via
      netif_napi_set_irq(), drivers can ask the core to add the IRQ to the
      ARFS rmap (already allocated by the driver).

RFC -> v1:
    - https://lore.kernel.org/netdev/20241210002626.366878-1-ahmed.zaki@intel.com/
    - move static inline affinity functions to net/dev/core.c
    - add the new napi->irq_flags (patch 1)
    - add code changes to bnxt, mlx4 and ice.

Ahmed Zaki (6):
  ice: clear NAPI's IRQ numbers in ice_vsi_clear_napi_queues()
  net: move ARFS rmap management to core
  net: napi: add CPU affinity to napi_config
  bnxt: use napi's irq affinity
  ice: use napi's irq affinity
  idpf: use napi's irq affinity

 Documentation/networking/scaling.rst         |   6 +-
 drivers/net/ethernet/amazon/ena/ena_netdev.c |  43 +----
 drivers/net/ethernet/broadcom/bnxt/bnxt.c    |  54 +-----
 drivers/net/ethernet/broadcom/bnxt/bnxt.h    |   2 -
 drivers/net/ethernet/intel/ice/ice.h         |   3 -
 drivers/net/ethernet/intel/ice/ice_arfs.c    |  33 +---
 drivers/net/ethernet/intel/ice/ice_arfs.h    |   2 -
 drivers/net/ethernet/intel/ice/ice_base.c    |   7 +-
 drivers/net/ethernet/intel/ice/ice_lib.c     |  16 +-
 drivers/net/ethernet/intel/ice/ice_main.c    |  47 +----
 drivers/net/ethernet/intel/idpf/idpf_lib.c   |   1 +
 drivers/net/ethernet/intel/idpf/idpf_txrx.c  |  22 +--
 drivers/net/ethernet/intel/idpf/idpf_txrx.h  |   6 +-
 include/linux/cpu_rmap.h                     |   1 +
 include/linux/netdevice.h                    |  28 ++-
 lib/cpu_rmap.c                               |   2 +-
 net/core/dev.c                               | 172 ++++++++++++++++++-
 17 files changed, 233 insertions(+), 212 deletions(-)

Comments

Jakub Kicinski Feb. 15, 2025, 5:41 p.m. UTC | #1
On Tue, 11 Feb 2025 14:06:51 -0700 Ahmed Zaki wrote:
> Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
> after reset. However, since there can be only one IRQ affinity notifier
> for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
> management in the core (which also registers separate IRQ affinity
> notifiers).   

Could you extract all the core changes as a first patch of the series
(rmap and affinity together). And then have the driver conversion
patches follow? Obviously don't do it if it'd introduce transient
breakage. But I don't think it should, since core changes should
be a noop before any driver opts in.

The way it's split now makes the logic quite hard to review.
Jakub Kicinski Feb. 15, 2025, 5:43 p.m. UTC | #2
On Sat, 15 Feb 2025 09:41:54 -0800 Jakub Kicinski wrote:
> On Tue, 11 Feb 2025 14:06:51 -0700 Ahmed Zaki wrote:
> > Drivers usually need to re-apply the user-set IRQ affinity to their IRQs
> > after reset. However, since there can be only one IRQ affinity notifier
> > for each IRQ, registering IRQ notifiers conflicts with the ARFS rmap
> > management in the core (which also registers separate IRQ affinity
> > notifiers).     
> 
> Could you extract all the core changes as a first patch of the series
> (rmap and affinity together). And then have the driver conversion
> patches follow? Obviously don't do it if it'd introduce transient
> breakage. But I don't think it should, since core changes should
> be a noop before any driver opts in.
> 
> The way it's split now makes the logic quite hard to review.

Ah, and please add the patch with the ksft test I shared earlier to
your series:
https://github.com/kuba-moo/linux/commit/de7d2475750ac05b6e414d7e5201e354b05cf146
it just needs a commit message, I think. The prereq patches are 
in the tree now.