mbox series

[RFC,net-next,v2,0/9] Add support for per-NAPI config via netlink

Message ID 20240908160702.56618-1-jdamato@fastly.com (mailing list archive)
Headers show
Series Add support for per-NAPI config via netlink | expand

Message

Joe Damato Sept. 8, 2024, 4:06 p.m. UTC
Greetings:

Welcome to v2, converted to RFC.... which is definitely incorrect, but
hopefully can serve as a basis for discussion to get to something
better.

This implementation allocates "struct napi_storage" and each NAPI
instance is assigned an index into the storage array.

It seemed like this storage area should persist even if different HW
queues are created, but should be cleared when queue counts are resized
(ethtool -L).

What I did is flat out incorrect: memset the struct to 0
on napi_enable.

I am not totally clear if I understand the part of the previous
conversation about mapping rings to NAPIs and so on, but I wanted to
make sure the rest of the implementation was starting to vaguely look
like what was discussed in the previous thread.

To help illustrate how this would end up working, I've added patches for
3 drivers, of which I have access to only 1:
  - mlx5 which is the basis of the examples below
  - mlx4 which has TX only NAPIs, just to highlight that case. I have
    only compile tested this patch; I don't have this hardware.
  - bnxt which I have only compiled tested.

Zeroing on napi_enable is incorrect because, at the very least, it
breaks sysfs (the sysfs settings should be inherited). It's not clear
to me how we achieve persistence with the zero-ing unless we assume the
drivers are changed somehow? IIRC, there was a suggestion in the
previous thread to memset the napi_storage to 0 on queue resize, but
I've definitely gotten the nuance in what was desired wrong.

Anyway, sending what I have before iterating further to see if this is
even remotely the right direction before going too deep down this path.

I hope that's OK.

Here's an example of how it works on my mlx5 as is:

# start with 4 queues

$ ethtool -l eth4 | grep Combined | tail -1
Combined:	4

First, output the current NAPI settings:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 0,
  'gro-flush-timeout': 0,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now, set the global sysfs parameters:

$ sudo bash -c 'echo 20000 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 100 >/sys/class/net/eth4/napi_defer_hard_irqs'

Output current NAPI settings again:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now set NAPI ID 927, via its ifindex and index to specific values:

$ sudo ./tools/net/ynl/cli.py \
          --spec Documentation/netlink/specs/netdev.yaml \
          --do napi-set \
          --json='{"ifindex": 7, "index": 2,
                   "defer-hard-irqs": 111,
                   "gro-flush-timeout": 11111}'
None

Now output current NAPI settings again to ensure only 927 changed:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 11111,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now, increase gro-flush-timeout only:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"ifindex": 7, "index": 2,
                              "gro-flush-timeout": 44444}
None

Now output the current NAPI settings once more:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'
[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 44444,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Now set NAPI ID 927, via its ifindex and index, to have
gro_flush_timeout of 0:

$ sudo ./tools/net/ynl/cli.py \
       --spec Documentation/netlink/specs/netdev.yaml \
       --do napi-set --json='{"ifindex": 7, "index": 2,
                              "gro-flush-timeout": 0}'
None

Check that NAPI ID 927 has a value of 0:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 100,
  'gro-flush-timeout': 20000,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 111,
  'gro-flush-timeout': 0,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Last, but not least, let's try writing the sysfs parameters to ensure
all NAPIs are rewritten:

$ sudo bash -c 'echo 33333 >/sys/class/net/eth4/gro_flush_timeout'
$ sudo bash -c 'echo 222 >/sys/class/net/eth4/napi_defer_hard_irqs'

Check that worked:

$ ./tools/net/ynl/cli.py --spec Documentation/netlink/specs/netdev.yaml \
                         --dump napi-get --json='{"ifindex": 7}'

[{'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 928,
  'ifindex': 7,
  'index': 3,
  'irq': 529},
 {'defer-hard-irqs': 222,
  'gro-flush-timeout': 33333,
  'id': 927,
  'ifindex': 7,
  'index': 2,
  'irq': 528},
[...]

Resizing the queues (ethtool -L) resets everything to 0, which is wrong
because it breaks sysfs and it breaks the persistence goal.

I hope though that his code can still be discussed to ensure that I am
moving in the correct direction toward solving these issues before going
too far down the rabbit hole :)

Thanks,
Joe

Joe Damato (9):
  net: napi: Add napi_storage
  netdev-genl: Export NAPI index
  net: napi: Make napi_defer_hard_irqs per-NAPI
  netdev-genl: Dump napi_defer_hard_irqs
  net: napi: Make gro_flush_timeout per-NAPI
  netdev-genl: Support setting per-NAPI config values
  bnxt: Add support for napi storage
  mlx5: Add support for napi storage
  mlx4: Add support for napi storage to RX CQs

 Documentation/netlink/specs/netdev.yaml       | 35 ++++++++
 .../networking/net_cachelines/net_device.rst  |  3 +-
 drivers/net/ethernet/broadcom/bnxt/bnxt.c     |  3 +-
 drivers/net/ethernet/mellanox/mlx4/en_cq.c    |  3 +-
 .../net/ethernet/mellanox/mlx5/core/en_main.c |  2 +-
 include/linux/netdevice.h                     | 38 ++++++++-
 include/uapi/linux/netdev.h                   |  4 +
 net/core/dev.c                                | 36 +++++---
 net/core/dev.h                                | 83 +++++++++++++++++++
 net/core/net-sysfs.c                          |  4 +-
 net/core/netdev-genl-gen.c                    | 15 ++++
 net/core/netdev-genl-gen.h                    |  1 +
 net/core/netdev-genl.c                        | 65 +++++++++++++++
 tools/include/uapi/linux/netdev.h             |  3 +
 14 files changed, 276 insertions(+), 19 deletions(-)