diff mbox series

[net-next] ethtool: add tunable api to disable various firmware offloads

Message ID 20240813223325.3522113-1-maze@google.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series [net-next] ethtool: add tunable api to disable various firmware offloads | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 35 this patch: 35
netdev/build_tools success Errors and warnings before: 0 this patch: 0
netdev/cc_maintainers warning 1 maintainers not CCed: andrew@lunn.ch
netdev/build_clang success Errors and warnings before: 90 this patch: 90
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1738 this patch: 1738
netdev/checkpatch warning WARNING: line length of 99 exceeds 80 columns
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 2 this patch: 2
netdev/source_inline success Was 0 now: 0
netdev/contest warning net-next-2024-08-15--00-00 (tests: 707)

Commit Message

Maciej Żenczykowski Aug. 13, 2024, 10:33 p.m. UTC
In order to save power (battery), most network hardware
designed for low power environments (ie. battery powered
devices) supports varying types of hardware/firmware offload
(filtering and/or generating replies) of incoming packets.

The goal being to prevent device wakeups caused by ingress 'spam'.

This is particularly true for wifi (especially phones/tablets),
but isn't actually wifi specific.  It can also be implemented
in wired nics (TV) or usb ethernet dongles.

For examples TVs require this to keep power consumption
under (the EU mandated) 2 Watts while idle (display off),
while still being discoverable on the network.

This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns,
but various other possibilities are also possible,
for example:
  ethertype filtering (discarding non-IP ethertypes),
  nat-t keepalive (discarding ingress, automating periodic egress),
  tcp keepalive (generation/processing/filtering),
  tethering (forwarding) offload

In many ways, in its goals, it is somewhat similar to the
relatively standard destination mac filtering most wired nics
have supported for ages: reduce the amount of traffic the host
must handle.

While working on Android we've discovered that there is
no device/driver agnostic way to disable these offloads.

This patch is an attempt to rectify this.

It does not add an API to configure these offloads, as usually
this isn't needed as the driver will just fetch the required
configuration information straight from the kernel.

What it does do is add a simple API to allow disabling (masking)
them, either for debugging or for test purposes.

Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com>
Cc: Ahmed Zaki <ahmed.zaki@intel.com>
Cc: Edward Cree <ecree.xilinx@gmail.com>
Cc: "David S. Miller" <davem@davemloft.net>
Cc: Eric Dumazet <edumazet@google.com>
Cc: Jakub Kicinski <kuba@kernel.org>
Cc: Paolo Abeni <pabeni@redhat.com>
Cc: Yuyang Huang <yuyanghuang@google.com>
Cc: Lorenzo Colitti <lorenzo@google.com>
Signed-off-by: Maciej Żenczykowski <maze@google.com>
---
 include/uapi/linux/ethtool.h | 15 +++++++++++++++
 net/ethtool/common.c         |  1 +
 net/ethtool/ioctl.c          |  5 +++++
 3 files changed, 21 insertions(+)

Comments

Jakub Kicinski Aug. 15, 2024, 12:32 a.m. UTC | #1
On Tue, 13 Aug 2024 15:33:25 -0700 Maciej Żenczykowski wrote:
> In order to save power (battery), most network hardware
> designed for low power environments (ie. battery powered
> devices) supports varying types of hardware/firmware offload
> (filtering and/or generating replies) of incoming packets.
> 
> The goal being to prevent device wakeups caused by ingress 'spam'.
> 
> This is particularly true for wifi (especially phones/tablets),
> but isn't actually wifi specific.  It can also be implemented
> in wired nics (TV) or usb ethernet dongles.
> 
> For examples TVs require this to keep power consumption
> under (the EU mandated) 2 Watts while idle (display off),
> while still being discoverable on the network.

Sounds sane, adding Florian, he mentioned MDNS at last netconf.
Tho, wasn't there supposed to be a more granular API in Android
to control such protocol offloads?

You gotta find an upstream driver which implements this for us to merge.
If Florian doesn't have any quick uses -- I think Intel ethernet drivers
have private flags for enabling/disabling an LLDP agent. That could be
another way..
Florian Fainelli Aug. 15, 2024, 3:08 a.m. UTC | #2
CC Doug, Justin,

On 8/14/2024 5:32 PM, Jakub Kicinski wrote:
> On Tue, 13 Aug 2024 15:33:25 -0700 Maciej Żenczykowski wrote:
>> In order to save power (battery), most network hardware
>> designed for low power environments (ie. battery powered
>> devices) supports varying types of hardware/firmware offload
>> (filtering and/or generating replies) of incoming packets.
>>
>> The goal being to prevent device wakeups caused by ingress 'spam'.
>>
>> This is particularly true for wifi (especially phones/tablets),
>> but isn't actually wifi specific.  It can also be implemented
>> in wired nics (TV) or usb ethernet dongles.
>>
>> For examples TVs require this to keep power consumption
>> under (the EU mandated) 2 Watts while idle (display off),
>> while still being discoverable on the network.
> 
> Sounds sane, adding Florian, he mentioned MDNS at last netconf.

Yes this looks fine to me:

Acked-by: Florian Fainelli <florian.fainelli@broadcom.com>

> Tho, wasn't there supposed to be a more granular API in Android
> to control such protocol offloads?

I still am unable to find a public link for the 
com.google.tv.mdnsoffload package that describes how we can offload mDNS 
records in ATV, but that was what I mentioned during netconf that we 
needed to program into the hardware. Ideally using an ethtool API TBD, 
or a cfg80211 one, rather than some custom ioctls() and what not.

> 
> You gotta find an upstream driver which implements this for us to merge.
> If Florian doesn't have any quick uses -- I think Intel ethernet drivers
> have private flags for enabling/disabling an LLDP agent. That could be
> another way..

Currently we have both bcmgenet and bcmasp support the WAKE_FILTER 
Wake-on-LAN specifier. Our configuration is typically done in user-space 
for mDNS with something like:

ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action 
0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff
ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action 
0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff
ethtool -s eth0 wol f

I would offer that we wire up the tunable into bcmgenet and bcmasp and 
we'd make sure on our side that the respective firmware implementations 
behave accordingly, but the respective firmware implementations 
currently look at whether any network filter have been programmed into 
the hardware, and if so, they are using those for offload. So we do not 
really need the tunable in a way, but if we were to add it, then we 
would need to find a way to tell the firmware not to use the network 
filters. We liked our design because there is no kernel <=> firmware 
communication.

Hummm
Jakub Kicinski Aug. 15, 2024, 3:45 p.m. UTC | #3
On Wed, 14 Aug 2024 20:08:05 -0700 Florian Fainelli wrote:
> > You gotta find an upstream driver which implements this for us to merge.
> > If Florian doesn't have any quick uses -- I think Intel ethernet drivers
> > have private flags for enabling/disabling an LLDP agent. That could be
> > another way..  
> 
> Currently we have both bcmgenet and bcmasp support the WAKE_FILTER 
> Wake-on-LAN specifier. Our configuration is typically done in user-space 
> for mDNS with something like:
> 
> ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action 
> 0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff
> ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action 
> 0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff
> ethtool -s eth0 wol f
> 
> I would offer that we wire up the tunable into bcmgenet and bcmasp and 
> we'd make sure on our side that the respective firmware implementations 
> behave accordingly, but the respective firmware implementations 
> currently look at whether any network filter have been programmed into 
> the hardware, and if so, they are using those for offload. So we do not 
> really need the tunable in a way, but if we were to add it, then we 
> would need to find a way to tell the firmware not to use the network 
> filters. We liked our design because there is no kernel <=> firmware 
> communication.

Hm, I may be lacking the practical exposure to such systems to say
anything sensible, but when has that ever stopped me..
IIUC you're saying that FW enables MLD offload if the wake filter is 
in place. But for ping/arp/igmp offload there's no need to wake, ever,
so there wouldn't be a filter?
Nelson, Shannon Aug. 15, 2024, 4:13 p.m. UTC | #4
On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote:
> 
> In order to save power (battery), most network hardware
> designed for low power environments (ie. battery powered
> devices) supports varying types of hardware/firmware offload
> (filtering and/or generating replies) of incoming packets.
> 
> The goal being to prevent device wakeups caused by ingress 'spam'.
> 
> This is particularly true for wifi (especially phones/tablets),
> but isn't actually wifi specific.  It can also be implemented
> in wired nics (TV) or usb ethernet dongles.
> 
> For examples TVs require this to keep power consumption
> under (the EU mandated) 2 Watts while idle (display off),
> while still being discoverable on the network.
> 
> This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns,
> but various other possibilities are also possible,
> for example:
>    ethertype filtering (discarding non-IP ethertypes),
>    nat-t keepalive (discarding ingress, automating periodic egress),
>    tcp keepalive (generation/processing/filtering),
>    tethering (forwarding) offload
> 
> In many ways, in its goals, it is somewhat similar to the
> relatively standard destination mac filtering most wired nics
> have supported for ages: reduce the amount of traffic the host
> must handle.
> 
> While working on Android we've discovered that there is
> no device/driver agnostic way to disable these offloads.
> 
> This patch is an attempt to rectify this.
> 
> It does not add an API to configure these offloads, as usually
> this isn't needed as the driver will just fetch the required
> configuration information straight from the kernel.
> 
> What it does do is add a simple API to allow disabling (masking)
> them, either for debugging or for test purposes.

I can see how this would be useful for test/debug, but it seems to me it 
is only half of a solution.  Wouldn't there also be a need to re-enable 
the offloads without having to reboot/restart the gizmo being tested? 
Or even query the current status?

sln

> 
> Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com>
> Cc: Ahmed Zaki <ahmed.zaki@intel.com>
> Cc: Edward Cree <ecree.xilinx@gmail.com>
> Cc: "David S. Miller" <davem@davemloft.net>
> Cc: Eric Dumazet <edumazet@google.com>
> Cc: Jakub Kicinski <kuba@kernel.org>
> Cc: Paolo Abeni <pabeni@redhat.com>
> Cc: Yuyang Huang <yuyanghuang@google.com>
> Cc: Lorenzo Colitti <lorenzo@google.com>
> Signed-off-by: Maciej Żenczykowski <maze@google.com>
> ---
>   include/uapi/linux/ethtool.h | 15 +++++++++++++++
>   net/ethtool/common.c         |  1 +
>   net/ethtool/ioctl.c          |  5 +++++
>   3 files changed, 21 insertions(+)
> 
> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> index 4a0a6e703483..7b58860c3731 100644
> --- a/include/uapi/linux/ethtool.h
> +++ b/include/uapi/linux/ethtool.h
> @@ -224,12 +224,27 @@ struct ethtool_value {
>   #define PFC_STORM_PREVENTION_AUTO      0xffff
>   #define PFC_STORM_PREVENTION_DISABLE   0
> 
> +/* For power/wakeup (*not* performance) related offloads */
> +enum tunable_fw_offload_disable {
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS,
> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS,
> +       /* 55 bits remaining for future use */
> +};
> +
>   enum tunable_id {
>          ETHTOOL_ID_UNSPEC,
>          ETHTOOL_RX_COPYBREAK,
>          ETHTOOL_TX_COPYBREAK,
>          ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
>          ETHTOOL_TX_COPYBREAK_BUF_SIZE,
> +       ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */
>          /*
>           * Add your fresh new tunable attribute above and remember to update
>           * tunable_strings[] in net/ethtool/common.c
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 7257ae272296..8dc0c084fce5 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
>          [ETHTOOL_TX_COPYBREAK]  = "tx-copybreak",
>          [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
>          [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size",
> +       [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable",
>   };
> 
>   const char
> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> index 18cf9fa32ae3..31318ded17a7 100644
> --- a/net/ethtool/ioctl.c
> +++ b/net/ethtool/ioctl.c
> @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev,
>   static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
>   {
>          switch (tuna->id) {
> +       case ETHTOOL_FW_OFFLOAD_DISABLE:
> +               if (tuna->len != sizeof(u64) ||
> +                   tuna->type_id != ETHTOOL_TUNABLE_U64)
> +                       return -EINVAL;
> +               break;
>          case ETHTOOL_RX_COPYBREAK:
>          case ETHTOOL_TX_COPYBREAK:
>          case ETHTOOL_TX_COPYBREAK_BUF_SIZE:
> --
> 2.46.0.76.ge559c4bf1a-goog
> 
>
Florian Fainelli Aug. 15, 2024, 4:38 p.m. UTC | #5
On 8/15/24 08:45, Jakub Kicinski wrote:
> On Wed, 14 Aug 2024 20:08:05 -0700 Florian Fainelli wrote:
>>> You gotta find an upstream driver which implements this for us to merge.
>>> If Florian doesn't have any quick uses -- I think Intel ethernet drivers
>>> have private flags for enabling/disabling an LLDP agent. That could be
>>> another way..
>>
>> Currently we have both bcmgenet and bcmasp support the WAKE_FILTER
>> Wake-on-LAN specifier. Our configuration is typically done in user-space
>> for mDNS with something like:
>>
>> ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action
>> 0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff
>> ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action
>> 0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff
>> ethtool -s eth0 wol f
>>
>> I would offer that we wire up the tunable into bcmgenet and bcmasp and
>> we'd make sure on our side that the respective firmware implementations
>> behave accordingly, but the respective firmware implementations
>> currently look at whether any network filter have been programmed into
>> the hardware, and if so, they are using those for offload. So we do not
>> really need the tunable in a way, but if we were to add it, then we
>> would need to find a way to tell the firmware not to use the network
>> filters. We liked our design because there is no kernel <=> firmware
>> communication.
> 
> Hm, I may be lacking the practical exposure to such systems to say
> anything sensible, but when has that ever stopped me..
> IIUC you're saying that FW enables MLD offload if the wake filter is
> in place. But for ping/arp/igmp offload there's no need to wake, ever,
> so there wouldn't be a filter?

That is right, we only want to wake-up on mDNS in our case. There are 
two cases deployed, at least the first one is, the second one might have 
been more of a "to be added in the future" improvement:

- a simplistic one where we use the hardware filters to trigger a 
wake-up event, and then some piece of firmware will look at the mDNS 
query contents and figure out whether the query was for one of the 
services in the local database (typically _googlecast._tcp.local is of 
particular interest). If that is the case, we trigger a system wake-up 
and we let the Host CPU process the mDNS query and we stay awake for a 
few seconds in case a streaming operation happens

- a more sophisticated one where after the mDNS query wake-up event has 
been identified we wait until we get a 3-way TCP handshake targeting the 
_googlecast._tcp.local service before waking up the Host CPU. This is 
more reflective of an actual intent to use the device that was asleep

Hope this helps.
--
Florian
Maciej Żenczykowski Aug. 16, 2024, 12:49 a.m. UTC | #6
On Wed, Aug 14, 2024 at 5:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
> Sounds sane, adding Florian, he mentioned MDNS at last netconf.
> Tho, wasn't there supposed to be a more granular API in Android
> to control such protocol offloads?

We're deprecating the 'new' but already old mdns offload logic in
favour of APFv6+ (unfortunately there was a race here, we weren't
ready with APFv6 in time...).

APF [Android Packet Filter] is a bytecode interpreter built into
[wifi] firmware, currently the bytecode is fed in via Android Wifi HAL
(we'd like to change this to ethtool or something as well) from a java
bytecode generator that knows the state of the system.

APFv2 (ingress packet filtering to prevent spurious wakeups) and APFv4
(adds drop counters) have been in use on Pixel for many many years now
(going back to Pixel 3? somewhere 2016-ish).
We require all new (launched on Android 14/U+) devices to have a
working APFv4 implementation on their wifi client/sta interface
(enforced via Android VTS).
APFv6 (only available on non publicly available hardware atm) adds
support for sending back replies (ie. arp offload & the like).  We
hope to require it for newly launched devices on (the future) Android
16/W (or whatever it ends up being) and to extend it to non-wifi (ie.
wired) as well.

The APFv6 interpreter source is here:
https://android.googlesource.com/platform/hardware/google/apf/+/refs/heads/main/v6/
for the curious.

APF is *not* eBPF simply because eBPF is *far* too complex (there's
not enough space for it in the firmware/ram of many low end wifi
chipsets).  The latest APFv6 interpreter is only ~5kB of compiled
code, and another ~2kB of bytecode (though space for 4kB is
recommended).  Some wifi chipset vendors are finding it very hard to
find room for even that.

eBPF bytecode is very space inefficient compared to APF.

> You gotta find an upstream driver which implements this for us to merge.

This came about due to a request from a major wifi chipset vendor to
notify them which offloads could/should be disabled due to being
reimplemented via APFv6 (Android Packet Filter).
So I believe (note: we're still discussing this) we likely have folks
willing to implement this for at least 1 wifi driver (not sure which
one, and if it is merged upstream or lives out-of-tree).

I am of course in a very hard position here, as I don't own any
drivers/firmware - AFAIK even Pixel's wifi/cell drivers aren't Google
owned/maintained, but rather Broadcom/Synaptics/Qualcomm/Mediatek/etc
as appropriate...

I do know there is a need for an api of this sort (not necessarily
exactly this patch though), and if we merge something (sane :-) ) into
Linux, we can then backport that (either to LTS or just to ACK), and
then we (as in Google) can require implementations (for new
hardware/drivers) in future versions of Android...

Presumably that would result in implementations in many drivers,
though not necessarily any in-tree ones (I have no idea what the
current state of in-vs-out-of-tree drivers is wrt. Android wifi/cell
hardware)

This is very much a chicken-and-egg problem though.  As long as there
is no 'public' API, the default approach is for per-vendor or even
per-chip / per-driver custom apis, hidden behind Android HALs.  For
example we have such an Android HAL api for disabling ND offload on at
least one of our devices.  Of course the HAL itself is backed by
calling into the driver - just over some driver specific netlink...

Similarly for APF bytecode installation: it all goes via Android HALs,
which then in practice immediately turn around and talk straight over
driver/vendor-specific netlink to the driver.

I'd *love* to simplify all this.  Indeed in an ideal world, we'd have
'native' Linux apis into drivers to disable offloads, load APF, and
could even teach the kernel to actually build the APF bytecode to load
in in the first place (this would fix lots of races we currently have,
where we need to fetch state out of the kernel to build the bytecode
to pass back into the kernel driver/firmware...).  Perhaps a BPF
program to build the APF bytecode during the suspend sequence --
seriously, this isn't even a joke.

That said... the problem is simply too large to bite all at once, so
I'm trying to find a way out of the logjam wrt. upstream Linux that
APF has been in for years.

If we can't settle on a 'public' Linux API for this (not saying *this*
patch is necessarily the right way) we'll likely end up implementing
this either via ethtool privflags (or something) or simply via an
Android HAL... again :-( Be aware HALs are what most of the org loves
and it's hard enough to explain why we don't want to go via a HAL....

This isn't meant to be a threat ;-) it's just an unavoidable statement
of fact... :-(

What I could offer to provide is some sort of tun/tap/veth/virtio-net
based APF implementation for testing (but that's not exactly relevant
to *this* patch, since they don't support offloads to disable in the
first place) - as I would love to have that available in cuttlefish
(Android VM) for integration testing.

> If Florian doesn't have any quick uses -- I think Intel ethernet drivers
> have private flags for enabling/disabling an LLDP agent. That could be
> another way..
> --
> pw-bot: cr

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Maciej Żenczykowski Aug. 16, 2024, 12:55 a.m. UTC | #7
On Thu, Aug 15, 2024 at 9:13 AM Nelson, Shannon <shannon.nelson@amd.com> wrote:
>
> On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote:
> >
> > In order to save power (battery), most network hardware
> > designed for low power environments (ie. battery powered
> > devices) supports varying types of hardware/firmware offload
> > (filtering and/or generating replies) of incoming packets.
> >
> > The goal being to prevent device wakeups caused by ingress 'spam'.
> >
> > This is particularly true for wifi (especially phones/tablets),
> > but isn't actually wifi specific.  It can also be implemented
> > in wired nics (TV) or usb ethernet dongles.
> >
> > For examples TVs require this to keep power consumption
> > under (the EU mandated) 2 Watts while idle (display off),
> > while still being discoverable on the network.
> >
> > This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns,
> > but various other possibilities are also possible,
> > for example:
> >    ethertype filtering (discarding non-IP ethertypes),
> >    nat-t keepalive (discarding ingress, automating periodic egress),
> >    tcp keepalive (generation/processing/filtering),
> >    tethering (forwarding) offload
> >
> > In many ways, in its goals, it is somewhat similar to the
> > relatively standard destination mac filtering most wired nics
> > have supported for ages: reduce the amount of traffic the host
> > must handle.
> >
> > While working on Android we've discovered that there is
> > no device/driver agnostic way to disable these offloads.
> >
> > This patch is an attempt to rectify this.
> >
> > It does not add an API to configure these offloads, as usually
> > this isn't needed as the driver will just fetch the required
> > configuration information straight from the kernel.
> >
> > What it does do is add a simple API to allow disabling (masking)
> > them, either for debugging or for test purposes.
>
> I can see how this would be useful for test/debug, but it seems to me it
> is only half of a solution.  Wouldn't there also be a need to re-enable
> the offloads without having to reboot/restart the gizmo being tested?
> Or even query the current status?

Since it's a mask of things to disable, setting the mask to 0, or the
relevant bit of the mask to 0 would reenable (assuming there was
anything to enable).

>
> sln
>
> >
> > Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com>
> > Cc: Ahmed Zaki <ahmed.zaki@intel.com>
> > Cc: Edward Cree <ecree.xilinx@gmail.com>
> > Cc: "David S. Miller" <davem@davemloft.net>
> > Cc: Eric Dumazet <edumazet@google.com>
> > Cc: Jakub Kicinski <kuba@kernel.org>
> > Cc: Paolo Abeni <pabeni@redhat.com>
> > Cc: Yuyang Huang <yuyanghuang@google.com>
> > Cc: Lorenzo Colitti <lorenzo@google.com>
> > Signed-off-by: Maciej Żenczykowski <maze@google.com>
> > ---
> >   include/uapi/linux/ethtool.h | 15 +++++++++++++++
> >   net/ethtool/common.c         |  1 +
> >   net/ethtool/ioctl.c          |  5 +++++
> >   3 files changed, 21 insertions(+)
> >
> > diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
> > index 4a0a6e703483..7b58860c3731 100644
> > --- a/include/uapi/linux/ethtool.h
> > +++ b/include/uapi/linux/ethtool.h
> > @@ -224,12 +224,27 @@ struct ethtool_value {
> >   #define PFC_STORM_PREVENTION_AUTO      0xffff
> >   #define PFC_STORM_PREVENTION_DISABLE   0
> >
> > +/* For power/wakeup (*not* performance) related offloads */
> > +enum tunable_fw_offload_disable {
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS,
> > +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS,
> > +       /* 55 bits remaining for future use */
> > +};
> > +
> >   enum tunable_id {
> >          ETHTOOL_ID_UNSPEC,
> >          ETHTOOL_RX_COPYBREAK,
> >          ETHTOOL_TX_COPYBREAK,
> >          ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
> >          ETHTOOL_TX_COPYBREAK_BUF_SIZE,
> > +       ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */
> >          /*
> >           * Add your fresh new tunable attribute above and remember to update
> >           * tunable_strings[] in net/ethtool/common.c
> > diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> > index 7257ae272296..8dc0c084fce5 100644
> > --- a/net/ethtool/common.c
> > +++ b/net/ethtool/common.c
> > @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
> >          [ETHTOOL_TX_COPYBREAK]  = "tx-copybreak",
> >          [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
> >          [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size",
> > +       [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable",
> >   };
> >
> >   const char
> > diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
> > index 18cf9fa32ae3..31318ded17a7 100644
> > --- a/net/ethtool/ioctl.c
> > +++ b/net/ethtool/ioctl.c
> > @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev,
> >   static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
> >   {
> >          switch (tuna->id) {
> > +       case ETHTOOL_FW_OFFLOAD_DISABLE:
> > +               if (tuna->len != sizeof(u64) ||
> > +                   tuna->type_id != ETHTOOL_TUNABLE_U64)
> > +                       return -EINVAL;
> > +               break;
> >          case ETHTOOL_RX_COPYBREAK:
> >          case ETHTOOL_TX_COPYBREAK:
> >          case ETHTOOL_TX_COPYBREAK_BUF_SIZE:
> > --
> > 2.46.0.76.ge559c4bf1a-goog
> >
> >

--
Maciej Żenczykowski, Kernel Networking Developer @ Google
Nelson, Shannon Aug. 16, 2024, 1:03 a.m. UTC | #8
On 8/15/2024 5:55 PM, Maciej Żenczykowski wrote:
> 
> On Thu, Aug 15, 2024 at 9:13 AM Nelson, Shannon <shannon.nelson@amd.com> wrote:
>>
>> On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote:
>>>
>>> In order to save power (battery), most network hardware
>>> designed for low power environments (ie. battery powered
>>> devices) supports varying types of hardware/firmware offload
>>> (filtering and/or generating replies) of incoming packets.
>>>
>>> The goal being to prevent device wakeups caused by ingress 'spam'.
>>>
>>> This is particularly true for wifi (especially phones/tablets),
>>> but isn't actually wifi specific.  It can also be implemented
>>> in wired nics (TV) or usb ethernet dongles.
>>>
>>> For examples TVs require this to keep power consumption
>>> under (the EU mandated) 2 Watts while idle (display off),
>>> while still being discoverable on the network.
>>>
>>> This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns,
>>> but various other possibilities are also possible,
>>> for example:
>>>     ethertype filtering (discarding non-IP ethertypes),
>>>     nat-t keepalive (discarding ingress, automating periodic egress),
>>>     tcp keepalive (generation/processing/filtering),
>>>     tethering (forwarding) offload
>>>
>>> In many ways, in its goals, it is somewhat similar to the
>>> relatively standard destination mac filtering most wired nics
>>> have supported for ages: reduce the amount of traffic the host
>>> must handle.
>>>
>>> While working on Android we've discovered that there is
>>> no device/driver agnostic way to disable these offloads.
>>>
>>> This patch is an attempt to rectify this.
>>>
>>> It does not add an API to configure these offloads, as usually
>>> this isn't needed as the driver will just fetch the required
>>> configuration information straight from the kernel.
>>>
>>> What it does do is add a simple API to allow disabling (masking)
>>> them, either for debugging or for test purposes.
>>
>> I can see how this would be useful for test/debug, but it seems to me it
>> is only half of a solution.  Wouldn't there also be a need to re-enable
>> the offloads without having to reboot/restart the gizmo being tested?
>> Or even query the current status?
> 
> Since it's a mask of things to disable, setting the mask to 0, or the
> relevant bit of the mask to 0 would reenable (assuming there was
> anything to enable).

That assumption is not clear in this patch, it only talks about things 
getting disabled.  Let's be sure it is clear that 0 will actively enable 
the features.

I think part of the issue is the use of "DISABLE" and "disable" in all 
the values - it tends to give a one-way meaning.  Maybe using "MASK" and 
"mask" instead would be a better way to imply both enable and disable.

sln

> 
>>
>> sln
>>
>>>
>>> Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com>
>>> Cc: Ahmed Zaki <ahmed.zaki@intel.com>
>>> Cc: Edward Cree <ecree.xilinx@gmail.com>
>>> Cc: "David S. Miller" <davem@davemloft.net>
>>> Cc: Eric Dumazet <edumazet@google.com>
>>> Cc: Jakub Kicinski <kuba@kernel.org>
>>> Cc: Paolo Abeni <pabeni@redhat.com>
>>> Cc: Yuyang Huang <yuyanghuang@google.com>
>>> Cc: Lorenzo Colitti <lorenzo@google.com>
>>> Signed-off-by: Maciej Żenczykowski <maze@google.com>
>>> ---
>>>    include/uapi/linux/ethtool.h | 15 +++++++++++++++
>>>    net/ethtool/common.c         |  1 +
>>>    net/ethtool/ioctl.c          |  5 +++++
>>>    3 files changed, 21 insertions(+)
>>>
>>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
>>> index 4a0a6e703483..7b58860c3731 100644
>>> --- a/include/uapi/linux/ethtool.h
>>> +++ b/include/uapi/linux/ethtool.h
>>> @@ -224,12 +224,27 @@ struct ethtool_value {
>>>    #define PFC_STORM_PREVENTION_AUTO      0xffff
>>>    #define PFC_STORM_PREVENTION_DISABLE   0
>>>
>>> +/* For power/wakeup (*not* performance) related offloads */
>>> +enum tunable_fw_offload_disable {
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS,
>>> +       ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS,
>>> +       /* 55 bits remaining for future use */
>>> +};
>>> +
>>>    enum tunable_id {
>>>           ETHTOOL_ID_UNSPEC,
>>>           ETHTOOL_RX_COPYBREAK,
>>>           ETHTOOL_TX_COPYBREAK,
>>>           ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
>>>           ETHTOOL_TX_COPYBREAK_BUF_SIZE,
>>> +       ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */
>>>           /*
>>>            * Add your fresh new tunable attribute above and remember to update
>>>            * tunable_strings[] in net/ethtool/common.c
>>> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
>>> index 7257ae272296..8dc0c084fce5 100644
>>> --- a/net/ethtool/common.c
>>> +++ b/net/ethtool/common.c
>>> @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
>>>           [ETHTOOL_TX_COPYBREAK]  = "tx-copybreak",
>>>           [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
>>>           [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size",
>>> +       [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable",
>>>    };
>>>
>>>    const char
>>> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
>>> index 18cf9fa32ae3..31318ded17a7 100644
>>> --- a/net/ethtool/ioctl.c
>>> +++ b/net/ethtool/ioctl.c
>>> @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev,
>>>    static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
>>>    {
>>>           switch (tuna->id) {
>>> +       case ETHTOOL_FW_OFFLOAD_DISABLE:
>>> +               if (tuna->len != sizeof(u64) ||
>>> +                   tuna->type_id != ETHTOOL_TUNABLE_U64)
>>> +                       return -EINVAL;
>>> +               break;
>>>           case ETHTOOL_RX_COPYBREAK:
>>>           case ETHTOOL_TX_COPYBREAK:
>>>           case ETHTOOL_TX_COPYBREAK_BUF_SIZE:
>>> --
>>> 2.46.0.76.ge559c4bf1a-goog
>>>
>>>
> 
> --
> Maciej Żenczykowski, Kernel Networking Developer @ Google
Jakub Kicinski Aug. 16, 2024, 5:51 p.m. UTC | #9
On Thu, 15 Aug 2024 17:49:42 -0700 Maciej Żenczykowski wrote:
> I am of course in a very hard position here, as I don't own any
> drivers/firmware - AFAIK even Pixel's wifi/cell drivers aren't Google
> owned/maintained, but rather Broadcom/Synaptics/Qualcomm/Mediatek/etc
> as appropriate...
> 
> I do know there is a need for an api of this sort (not necessarily
> exactly this patch though), and if we merge something (sane :-) ) into
> Linux, we can then backport that (either to LTS or just to ACK), and
> then we (as in Google) can require implementations (for new
> hardware/drivers) in future versions of Android...

That's why I'm suggesting the LLDP in the Intel Ethernet driver.
Others may disagree but for me it's close enough to merge a "enable
L2 protocol agent" sort of an API. We don't need to have upstream users
for each proto. Bigger cause of sadness is that the API IIUC is a part
of a deprecation path, IOW once APF comes, it will become dead weight.
Luckily it's not a lot of code.

> Presumably that would result in implementations in many drivers,
> though not necessarily any in-tree ones (I have no idea what the
> current state of in-vs-out-of-tree drivers is wrt. Android wifi/cell
> hardware)
> 
> This is very much a chicken-and-egg problem though.  As long as there
> is no 'public' API, the default approach is for per-vendor or even
> per-chip / per-driver custom apis, hidden behind Android HALs.  For
> example we have such an Android HAL api for disabling ND offload on at
> least one of our devices.  Of course the HAL itself is backed by
> calling into the driver - just over some driver specific netlink...

I wonder if there's anything we can share between APF style offloads
and Jamal's P4 work, if it materializes.
diff mbox series

Patch

diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h
index 4a0a6e703483..7b58860c3731 100644
--- a/include/uapi/linux/ethtool.h
+++ b/include/uapi/linux/ethtool.h
@@ -224,12 +224,27 @@  struct ethtool_value {
 #define PFC_STORM_PREVENTION_AUTO	0xffff
 #define PFC_STORM_PREVENTION_DISABLE	0
 
+/* For power/wakeup (*not* performance) related offloads */
+enum tunable_fw_offload_disable {
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS,
+	ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS,
+	/* 55 bits remaining for future use */
+};
+
 enum tunable_id {
 	ETHTOOL_ID_UNSPEC,
 	ETHTOOL_RX_COPYBREAK,
 	ETHTOOL_TX_COPYBREAK,
 	ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */
 	ETHTOOL_TX_COPYBREAK_BUF_SIZE,
+	ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */
 	/*
 	 * Add your fresh new tunable attribute above and remember to update
 	 * tunable_strings[] in net/ethtool/common.c
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 7257ae272296..8dc0c084fce5 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -91,6 +91,7 @@  tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = {
 	[ETHTOOL_TX_COPYBREAK]	= "tx-copybreak",
 	[ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout",
 	[ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size",
+	[ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable",
 };
 
 const char
diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c
index 18cf9fa32ae3..31318ded17a7 100644
--- a/net/ethtool/ioctl.c
+++ b/net/ethtool/ioctl.c
@@ -2733,6 +2733,11 @@  static int ethtool_get_module_eeprom(struct net_device *dev,
 static int ethtool_tunable_valid(const struct ethtool_tunable *tuna)
 {
 	switch (tuna->id) {
+	case ETHTOOL_FW_OFFLOAD_DISABLE:
+		if (tuna->len != sizeof(u64) ||
+		    tuna->type_id != ETHTOOL_TUNABLE_U64)
+			return -EINVAL;
+		break;
 	case ETHTOOL_RX_COPYBREAK:
 	case ETHTOOL_TX_COPYBREAK:
 	case ETHTOOL_TX_COPYBREAK_BUF_SIZE: