Message ID | 20240813223325.3522113-1-maze@google.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] ethtool: add tunable api to disable various firmware offloads | expand |
On Tue, 13 Aug 2024 15:33:25 -0700 Maciej Żenczykowski wrote: > In order to save power (battery), most network hardware > designed for low power environments (ie. battery powered > devices) supports varying types of hardware/firmware offload > (filtering and/or generating replies) of incoming packets. > > The goal being to prevent device wakeups caused by ingress 'spam'. > > This is particularly true for wifi (especially phones/tablets), > but isn't actually wifi specific. It can also be implemented > in wired nics (TV) or usb ethernet dongles. > > For examples TVs require this to keep power consumption > under (the EU mandated) 2 Watts while idle (display off), > while still being discoverable on the network. Sounds sane, adding Florian, he mentioned MDNS at last netconf. Tho, wasn't there supposed to be a more granular API in Android to control such protocol offloads? You gotta find an upstream driver which implements this for us to merge. If Florian doesn't have any quick uses -- I think Intel ethernet drivers have private flags for enabling/disabling an LLDP agent. That could be another way..
CC Doug, Justin, On 8/14/2024 5:32 PM, Jakub Kicinski wrote: > On Tue, 13 Aug 2024 15:33:25 -0700 Maciej Żenczykowski wrote: >> In order to save power (battery), most network hardware >> designed for low power environments (ie. battery powered >> devices) supports varying types of hardware/firmware offload >> (filtering and/or generating replies) of incoming packets. >> >> The goal being to prevent device wakeups caused by ingress 'spam'. >> >> This is particularly true for wifi (especially phones/tablets), >> but isn't actually wifi specific. It can also be implemented >> in wired nics (TV) or usb ethernet dongles. >> >> For examples TVs require this to keep power consumption >> under (the EU mandated) 2 Watts while idle (display off), >> while still being discoverable on the network. > > Sounds sane, adding Florian, he mentioned MDNS at last netconf. Yes this looks fine to me: Acked-by: Florian Fainelli <florian.fainelli@broadcom.com> > Tho, wasn't there supposed to be a more granular API in Android > to control such protocol offloads? I still am unable to find a public link for the com.google.tv.mdnsoffload package that describes how we can offload mDNS records in ATV, but that was what I mentioned during netconf that we needed to program into the hardware. Ideally using an ethtool API TBD, or a cfg80211 one, rather than some custom ioctls() and what not. > > You gotta find an upstream driver which implements this for us to merge. > If Florian doesn't have any quick uses -- I think Intel ethernet drivers > have private flags for enabling/disabling an LLDP agent. That could be > another way.. Currently we have both bcmgenet and bcmasp support the WAKE_FILTER Wake-on-LAN specifier. Our configuration is typically done in user-space for mDNS with something like: ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action 0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action 0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff ethtool -s eth0 wol f I would offer that we wire up the tunable into bcmgenet and bcmasp and we'd make sure on our side that the respective firmware implementations behave accordingly, but the respective firmware implementations currently look at whether any network filter have been programmed into the hardware, and if so, they are using those for offload. So we do not really need the tunable in a way, but if we were to add it, then we would need to find a way to tell the firmware not to use the network filters. We liked our design because there is no kernel <=> firmware communication. Hummm
On Wed, 14 Aug 2024 20:08:05 -0700 Florian Fainelli wrote: > > You gotta find an upstream driver which implements this for us to merge. > > If Florian doesn't have any quick uses -- I think Intel ethernet drivers > > have private flags for enabling/disabling an LLDP agent. That could be > > another way.. > > Currently we have both bcmgenet and bcmasp support the WAKE_FILTER > Wake-on-LAN specifier. Our configuration is typically done in user-space > for mDNS with something like: > > ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action > 0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff > ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action > 0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff > ethtool -s eth0 wol f > > I would offer that we wire up the tunable into bcmgenet and bcmasp and > we'd make sure on our side that the respective firmware implementations > behave accordingly, but the respective firmware implementations > currently look at whether any network filter have been programmed into > the hardware, and if so, they are using those for offload. So we do not > really need the tunable in a way, but if we were to add it, then we > would need to find a way to tell the firmware not to use the network > filters. We liked our design because there is no kernel <=> firmware > communication. Hm, I may be lacking the practical exposure to such systems to say anything sensible, but when has that ever stopped me.. IIUC you're saying that FW enables MLD offload if the wake filter is in place. But for ping/arp/igmp offload there's no need to wake, ever, so there wouldn't be a filter?
On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote: > > In order to save power (battery), most network hardware > designed for low power environments (ie. battery powered > devices) supports varying types of hardware/firmware offload > (filtering and/or generating replies) of incoming packets. > > The goal being to prevent device wakeups caused by ingress 'spam'. > > This is particularly true for wifi (especially phones/tablets), > but isn't actually wifi specific. It can also be implemented > in wired nics (TV) or usb ethernet dongles. > > For examples TVs require this to keep power consumption > under (the EU mandated) 2 Watts while idle (display off), > while still being discoverable on the network. > > This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns, > but various other possibilities are also possible, > for example: > ethertype filtering (discarding non-IP ethertypes), > nat-t keepalive (discarding ingress, automating periodic egress), > tcp keepalive (generation/processing/filtering), > tethering (forwarding) offload > > In many ways, in its goals, it is somewhat similar to the > relatively standard destination mac filtering most wired nics > have supported for ages: reduce the amount of traffic the host > must handle. > > While working on Android we've discovered that there is > no device/driver agnostic way to disable these offloads. > > This patch is an attempt to rectify this. > > It does not add an API to configure these offloads, as usually > this isn't needed as the driver will just fetch the required > configuration information straight from the kernel. > > What it does do is add a simple API to allow disabling (masking) > them, either for debugging or for test purposes. I can see how this would be useful for test/debug, but it seems to me it is only half of a solution. Wouldn't there also be a need to re-enable the offloads without having to reboot/restart the gizmo being tested? Or even query the current status? sln > > Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com> > Cc: Ahmed Zaki <ahmed.zaki@intel.com> > Cc: Edward Cree <ecree.xilinx@gmail.com> > Cc: "David S. Miller" <davem@davemloft.net> > Cc: Eric Dumazet <edumazet@google.com> > Cc: Jakub Kicinski <kuba@kernel.org> > Cc: Paolo Abeni <pabeni@redhat.com> > Cc: Yuyang Huang <yuyanghuang@google.com> > Cc: Lorenzo Colitti <lorenzo@google.com> > Signed-off-by: Maciej Żenczykowski <maze@google.com> > --- > include/uapi/linux/ethtool.h | 15 +++++++++++++++ > net/ethtool/common.c | 1 + > net/ethtool/ioctl.c | 5 +++++ > 3 files changed, 21 insertions(+) > > diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h > index 4a0a6e703483..7b58860c3731 100644 > --- a/include/uapi/linux/ethtool.h > +++ b/include/uapi/linux/ethtool.h > @@ -224,12 +224,27 @@ struct ethtool_value { > #define PFC_STORM_PREVENTION_AUTO 0xffff > #define PFC_STORM_PREVENTION_DISABLE 0 > > +/* For power/wakeup (*not* performance) related offloads */ > +enum tunable_fw_offload_disable { > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS, > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS, > + /* 55 bits remaining for future use */ > +}; > + > enum tunable_id { > ETHTOOL_ID_UNSPEC, > ETHTOOL_RX_COPYBREAK, > ETHTOOL_TX_COPYBREAK, > ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ > ETHTOOL_TX_COPYBREAK_BUF_SIZE, > + ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */ > /* > * Add your fresh new tunable attribute above and remember to update > * tunable_strings[] in net/ethtool/common.c > diff --git a/net/ethtool/common.c b/net/ethtool/common.c > index 7257ae272296..8dc0c084fce5 100644 > --- a/net/ethtool/common.c > +++ b/net/ethtool/common.c > @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = { > [ETHTOOL_TX_COPYBREAK] = "tx-copybreak", > [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout", > [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size", > + [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable", > }; > > const char > diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c > index 18cf9fa32ae3..31318ded17a7 100644 > --- a/net/ethtool/ioctl.c > +++ b/net/ethtool/ioctl.c > @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev, > static int ethtool_tunable_valid(const struct ethtool_tunable *tuna) > { > switch (tuna->id) { > + case ETHTOOL_FW_OFFLOAD_DISABLE: > + if (tuna->len != sizeof(u64) || > + tuna->type_id != ETHTOOL_TUNABLE_U64) > + return -EINVAL; > + break; > case ETHTOOL_RX_COPYBREAK: > case ETHTOOL_TX_COPYBREAK: > case ETHTOOL_TX_COPYBREAK_BUF_SIZE: > -- > 2.46.0.76.ge559c4bf1a-goog > >
On 8/15/24 08:45, Jakub Kicinski wrote: > On Wed, 14 Aug 2024 20:08:05 -0700 Florian Fainelli wrote: >>> You gotta find an upstream driver which implements this for us to merge. >>> If Florian doesn't have any quick uses -- I think Intel ethernet drivers >>> have private flags for enabling/disabling an LLDP agent. That could be >>> another way.. >> >> Currently we have both bcmgenet and bcmasp support the WAKE_FILTER >> Wake-on-LAN specifier. Our configuration is typically done in user-space >> for mDNS with something like: >> >> ethtool -N eth0 flow-type ether dst 33:33:00:00:00:fb action >> 0xfffffffffffffffe user-def 0x320000 m 0xffffffffff000fff >> ethtool -N eth0 flow-type ether dst 01:00:5e:00:00:fb action >> 0xfffffffffffffffe user-def 0x1e0000 m 0xffffffffff000fff >> ethtool -s eth0 wol f >> >> I would offer that we wire up the tunable into bcmgenet and bcmasp and >> we'd make sure on our side that the respective firmware implementations >> behave accordingly, but the respective firmware implementations >> currently look at whether any network filter have been programmed into >> the hardware, and if so, they are using those for offload. So we do not >> really need the tunable in a way, but if we were to add it, then we >> would need to find a way to tell the firmware not to use the network >> filters. We liked our design because there is no kernel <=> firmware >> communication. > > Hm, I may be lacking the practical exposure to such systems to say > anything sensible, but when has that ever stopped me.. > IIUC you're saying that FW enables MLD offload if the wake filter is > in place. But for ping/arp/igmp offload there's no need to wake, ever, > so there wouldn't be a filter? That is right, we only want to wake-up on mDNS in our case. There are two cases deployed, at least the first one is, the second one might have been more of a "to be added in the future" improvement: - a simplistic one where we use the hardware filters to trigger a wake-up event, and then some piece of firmware will look at the mDNS query contents and figure out whether the query was for one of the services in the local database (typically _googlecast._tcp.local is of particular interest). If that is the case, we trigger a system wake-up and we let the Host CPU process the mDNS query and we stay awake for a few seconds in case a streaming operation happens - a more sophisticated one where after the mDNS query wake-up event has been identified we wait until we get a 3-way TCP handshake targeting the _googlecast._tcp.local service before waking up the Host CPU. This is more reflective of an actual intent to use the device that was asleep Hope this helps. -- Florian
On Wed, Aug 14, 2024 at 5:32 PM Jakub Kicinski <kuba@kernel.org> wrote: > Sounds sane, adding Florian, he mentioned MDNS at last netconf. > Tho, wasn't there supposed to be a more granular API in Android > to control such protocol offloads? We're deprecating the 'new' but already old mdns offload logic in favour of APFv6+ (unfortunately there was a race here, we weren't ready with APFv6 in time...). APF [Android Packet Filter] is a bytecode interpreter built into [wifi] firmware, currently the bytecode is fed in via Android Wifi HAL (we'd like to change this to ethtool or something as well) from a java bytecode generator that knows the state of the system. APFv2 (ingress packet filtering to prevent spurious wakeups) and APFv4 (adds drop counters) have been in use on Pixel for many many years now (going back to Pixel 3? somewhere 2016-ish). We require all new (launched on Android 14/U+) devices to have a working APFv4 implementation on their wifi client/sta interface (enforced via Android VTS). APFv6 (only available on non publicly available hardware atm) adds support for sending back replies (ie. arp offload & the like). We hope to require it for newly launched devices on (the future) Android 16/W (or whatever it ends up being) and to extend it to non-wifi (ie. wired) as well. The APFv6 interpreter source is here: https://android.googlesource.com/platform/hardware/google/apf/+/refs/heads/main/v6/ for the curious. APF is *not* eBPF simply because eBPF is *far* too complex (there's not enough space for it in the firmware/ram of many low end wifi chipsets). The latest APFv6 interpreter is only ~5kB of compiled code, and another ~2kB of bytecode (though space for 4kB is recommended). Some wifi chipset vendors are finding it very hard to find room for even that. eBPF bytecode is very space inefficient compared to APF. > You gotta find an upstream driver which implements this for us to merge. This came about due to a request from a major wifi chipset vendor to notify them which offloads could/should be disabled due to being reimplemented via APFv6 (Android Packet Filter). So I believe (note: we're still discussing this) we likely have folks willing to implement this for at least 1 wifi driver (not sure which one, and if it is merged upstream or lives out-of-tree). I am of course in a very hard position here, as I don't own any drivers/firmware - AFAIK even Pixel's wifi/cell drivers aren't Google owned/maintained, but rather Broadcom/Synaptics/Qualcomm/Mediatek/etc as appropriate... I do know there is a need for an api of this sort (not necessarily exactly this patch though), and if we merge something (sane :-) ) into Linux, we can then backport that (either to LTS or just to ACK), and then we (as in Google) can require implementations (for new hardware/drivers) in future versions of Android... Presumably that would result in implementations in many drivers, though not necessarily any in-tree ones (I have no idea what the current state of in-vs-out-of-tree drivers is wrt. Android wifi/cell hardware) This is very much a chicken-and-egg problem though. As long as there is no 'public' API, the default approach is for per-vendor or even per-chip / per-driver custom apis, hidden behind Android HALs. For example we have such an Android HAL api for disabling ND offload on at least one of our devices. Of course the HAL itself is backed by calling into the driver - just over some driver specific netlink... Similarly for APF bytecode installation: it all goes via Android HALs, which then in practice immediately turn around and talk straight over driver/vendor-specific netlink to the driver. I'd *love* to simplify all this. Indeed in an ideal world, we'd have 'native' Linux apis into drivers to disable offloads, load APF, and could even teach the kernel to actually build the APF bytecode to load in in the first place (this would fix lots of races we currently have, where we need to fetch state out of the kernel to build the bytecode to pass back into the kernel driver/firmware...). Perhaps a BPF program to build the APF bytecode during the suspend sequence -- seriously, this isn't even a joke. That said... the problem is simply too large to bite all at once, so I'm trying to find a way out of the logjam wrt. upstream Linux that APF has been in for years. If we can't settle on a 'public' Linux API for this (not saying *this* patch is necessarily the right way) we'll likely end up implementing this either via ethtool privflags (or something) or simply via an Android HAL... again :-( Be aware HALs are what most of the org loves and it's hard enough to explain why we don't want to go via a HAL.... This isn't meant to be a threat ;-) it's just an unavoidable statement of fact... :-( What I could offer to provide is some sort of tun/tap/veth/virtio-net based APF implementation for testing (but that's not exactly relevant to *this* patch, since they don't support offloads to disable in the first place) - as I would love to have that available in cuttlefish (Android VM) for integration testing. > If Florian doesn't have any quick uses -- I think Intel ethernet drivers > have private flags for enabling/disabling an LLDP agent. That could be > another way.. > -- > pw-bot: cr -- Maciej Żenczykowski, Kernel Networking Developer @ Google
On Thu, Aug 15, 2024 at 9:13 AM Nelson, Shannon <shannon.nelson@amd.com> wrote: > > On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote: > > > > In order to save power (battery), most network hardware > > designed for low power environments (ie. battery powered > > devices) supports varying types of hardware/firmware offload > > (filtering and/or generating replies) of incoming packets. > > > > The goal being to prevent device wakeups caused by ingress 'spam'. > > > > This is particularly true for wifi (especially phones/tablets), > > but isn't actually wifi specific. It can also be implemented > > in wired nics (TV) or usb ethernet dongles. > > > > For examples TVs require this to keep power consumption > > under (the EU mandated) 2 Watts while idle (display off), > > while still being discoverable on the network. > > > > This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns, > > but various other possibilities are also possible, > > for example: > > ethertype filtering (discarding non-IP ethertypes), > > nat-t keepalive (discarding ingress, automating periodic egress), > > tcp keepalive (generation/processing/filtering), > > tethering (forwarding) offload > > > > In many ways, in its goals, it is somewhat similar to the > > relatively standard destination mac filtering most wired nics > > have supported for ages: reduce the amount of traffic the host > > must handle. > > > > While working on Android we've discovered that there is > > no device/driver agnostic way to disable these offloads. > > > > This patch is an attempt to rectify this. > > > > It does not add an API to configure these offloads, as usually > > this isn't needed as the driver will just fetch the required > > configuration information straight from the kernel. > > > > What it does do is add a simple API to allow disabling (masking) > > them, either for debugging or for test purposes. > > I can see how this would be useful for test/debug, but it seems to me it > is only half of a solution. Wouldn't there also be a need to re-enable > the offloads without having to reboot/restart the gizmo being tested? > Or even query the current status? Since it's a mask of things to disable, setting the mask to 0, or the relevant bit of the mask to 0 would reenable (assuming there was anything to enable). > > sln > > > > > Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com> > > Cc: Ahmed Zaki <ahmed.zaki@intel.com> > > Cc: Edward Cree <ecree.xilinx@gmail.com> > > Cc: "David S. Miller" <davem@davemloft.net> > > Cc: Eric Dumazet <edumazet@google.com> > > Cc: Jakub Kicinski <kuba@kernel.org> > > Cc: Paolo Abeni <pabeni@redhat.com> > > Cc: Yuyang Huang <yuyanghuang@google.com> > > Cc: Lorenzo Colitti <lorenzo@google.com> > > Signed-off-by: Maciej Żenczykowski <maze@google.com> > > --- > > include/uapi/linux/ethtool.h | 15 +++++++++++++++ > > net/ethtool/common.c | 1 + > > net/ethtool/ioctl.c | 5 +++++ > > 3 files changed, 21 insertions(+) > > > > diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h > > index 4a0a6e703483..7b58860c3731 100644 > > --- a/include/uapi/linux/ethtool.h > > +++ b/include/uapi/linux/ethtool.h > > @@ -224,12 +224,27 @@ struct ethtool_value { > > #define PFC_STORM_PREVENTION_AUTO 0xffff > > #define PFC_STORM_PREVENTION_DISABLE 0 > > > > +/* For power/wakeup (*not* performance) related offloads */ > > +enum tunable_fw_offload_disable { > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS, > > + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS, > > + /* 55 bits remaining for future use */ > > +}; > > + > > enum tunable_id { > > ETHTOOL_ID_UNSPEC, > > ETHTOOL_RX_COPYBREAK, > > ETHTOOL_TX_COPYBREAK, > > ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ > > ETHTOOL_TX_COPYBREAK_BUF_SIZE, > > + ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */ > > /* > > * Add your fresh new tunable attribute above and remember to update > > * tunable_strings[] in net/ethtool/common.c > > diff --git a/net/ethtool/common.c b/net/ethtool/common.c > > index 7257ae272296..8dc0c084fce5 100644 > > --- a/net/ethtool/common.c > > +++ b/net/ethtool/common.c > > @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = { > > [ETHTOOL_TX_COPYBREAK] = "tx-copybreak", > > [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout", > > [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size", > > + [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable", > > }; > > > > const char > > diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c > > index 18cf9fa32ae3..31318ded17a7 100644 > > --- a/net/ethtool/ioctl.c > > +++ b/net/ethtool/ioctl.c > > @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev, > > static int ethtool_tunable_valid(const struct ethtool_tunable *tuna) > > { > > switch (tuna->id) { > > + case ETHTOOL_FW_OFFLOAD_DISABLE: > > + if (tuna->len != sizeof(u64) || > > + tuna->type_id != ETHTOOL_TUNABLE_U64) > > + return -EINVAL; > > + break; > > case ETHTOOL_RX_COPYBREAK: > > case ETHTOOL_TX_COPYBREAK: > > case ETHTOOL_TX_COPYBREAK_BUF_SIZE: > > -- > > 2.46.0.76.ge559c4bf1a-goog > > > > -- Maciej Żenczykowski, Kernel Networking Developer @ Google
On 8/15/2024 5:55 PM, Maciej Żenczykowski wrote: > > On Thu, Aug 15, 2024 at 9:13 AM Nelson, Shannon <shannon.nelson@amd.com> wrote: >> >> On 8/13/2024 3:33 PM, Maciej Żenczykowski wrote: >>> >>> In order to save power (battery), most network hardware >>> designed for low power environments (ie. battery powered >>> devices) supports varying types of hardware/firmware offload >>> (filtering and/or generating replies) of incoming packets. >>> >>> The goal being to prevent device wakeups caused by ingress 'spam'. >>> >>> This is particularly true for wifi (especially phones/tablets), >>> but isn't actually wifi specific. It can also be implemented >>> in wired nics (TV) or usb ethernet dongles. >>> >>> For examples TVs require this to keep power consumption >>> under (the EU mandated) 2 Watts while idle (display off), >>> while still being discoverable on the network. >>> >>> This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns, >>> but various other possibilities are also possible, >>> for example: >>> ethertype filtering (discarding non-IP ethertypes), >>> nat-t keepalive (discarding ingress, automating periodic egress), >>> tcp keepalive (generation/processing/filtering), >>> tethering (forwarding) offload >>> >>> In many ways, in its goals, it is somewhat similar to the >>> relatively standard destination mac filtering most wired nics >>> have supported for ages: reduce the amount of traffic the host >>> must handle. >>> >>> While working on Android we've discovered that there is >>> no device/driver agnostic way to disable these offloads. >>> >>> This patch is an attempt to rectify this. >>> >>> It does not add an API to configure these offloads, as usually >>> this isn't needed as the driver will just fetch the required >>> configuration information straight from the kernel. >>> >>> What it does do is add a simple API to allow disabling (masking) >>> them, either for debugging or for test purposes. >> >> I can see how this would be useful for test/debug, but it seems to me it >> is only half of a solution. Wouldn't there also be a need to re-enable >> the offloads without having to reboot/restart the gizmo being tested? >> Or even query the current status? > > Since it's a mask of things to disable, setting the mask to 0, or the > relevant bit of the mask to 0 would reenable (assuming there was > anything to enable). That assumption is not clear in this patch, it only talks about things getting disabled. Let's be sure it is clear that 0 will actively enable the features. I think part of the issue is the use of "DISABLE" and "disable" in all the values - it tends to give a one-way meaning. Maybe using "MASK" and "mask" instead would be a better way to imply both enable and disable. sln > >> >> sln >> >>> >>> Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com> >>> Cc: Ahmed Zaki <ahmed.zaki@intel.com> >>> Cc: Edward Cree <ecree.xilinx@gmail.com> >>> Cc: "David S. Miller" <davem@davemloft.net> >>> Cc: Eric Dumazet <edumazet@google.com> >>> Cc: Jakub Kicinski <kuba@kernel.org> >>> Cc: Paolo Abeni <pabeni@redhat.com> >>> Cc: Yuyang Huang <yuyanghuang@google.com> >>> Cc: Lorenzo Colitti <lorenzo@google.com> >>> Signed-off-by: Maciej Żenczykowski <maze@google.com> >>> --- >>> include/uapi/linux/ethtool.h | 15 +++++++++++++++ >>> net/ethtool/common.c | 1 + >>> net/ethtool/ioctl.c | 5 +++++ >>> 3 files changed, 21 insertions(+) >>> >>> diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h >>> index 4a0a6e703483..7b58860c3731 100644 >>> --- a/include/uapi/linux/ethtool.h >>> +++ b/include/uapi/linux/ethtool.h >>> @@ -224,12 +224,27 @@ struct ethtool_value { >>> #define PFC_STORM_PREVENTION_AUTO 0xffff >>> #define PFC_STORM_PREVENTION_DISABLE 0 >>> >>> +/* For power/wakeup (*not* performance) related offloads */ >>> +enum tunable_fw_offload_disable { >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS, >>> + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS, >>> + /* 55 bits remaining for future use */ >>> +}; >>> + >>> enum tunable_id { >>> ETHTOOL_ID_UNSPEC, >>> ETHTOOL_RX_COPYBREAK, >>> ETHTOOL_TX_COPYBREAK, >>> ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ >>> ETHTOOL_TX_COPYBREAK_BUF_SIZE, >>> + ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */ >>> /* >>> * Add your fresh new tunable attribute above and remember to update >>> * tunable_strings[] in net/ethtool/common.c >>> diff --git a/net/ethtool/common.c b/net/ethtool/common.c >>> index 7257ae272296..8dc0c084fce5 100644 >>> --- a/net/ethtool/common.c >>> +++ b/net/ethtool/common.c >>> @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = { >>> [ETHTOOL_TX_COPYBREAK] = "tx-copybreak", >>> [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout", >>> [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size", >>> + [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable", >>> }; >>> >>> const char >>> diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c >>> index 18cf9fa32ae3..31318ded17a7 100644 >>> --- a/net/ethtool/ioctl.c >>> +++ b/net/ethtool/ioctl.c >>> @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev, >>> static int ethtool_tunable_valid(const struct ethtool_tunable *tuna) >>> { >>> switch (tuna->id) { >>> + case ETHTOOL_FW_OFFLOAD_DISABLE: >>> + if (tuna->len != sizeof(u64) || >>> + tuna->type_id != ETHTOOL_TUNABLE_U64) >>> + return -EINVAL; >>> + break; >>> case ETHTOOL_RX_COPYBREAK: >>> case ETHTOOL_TX_COPYBREAK: >>> case ETHTOOL_TX_COPYBREAK_BUF_SIZE: >>> -- >>> 2.46.0.76.ge559c4bf1a-goog >>> >>> > > -- > Maciej Żenczykowski, Kernel Networking Developer @ Google
On Thu, 15 Aug 2024 17:49:42 -0700 Maciej Żenczykowski wrote: > I am of course in a very hard position here, as I don't own any > drivers/firmware - AFAIK even Pixel's wifi/cell drivers aren't Google > owned/maintained, but rather Broadcom/Synaptics/Qualcomm/Mediatek/etc > as appropriate... > > I do know there is a need for an api of this sort (not necessarily > exactly this patch though), and if we merge something (sane :-) ) into > Linux, we can then backport that (either to LTS or just to ACK), and > then we (as in Google) can require implementations (for new > hardware/drivers) in future versions of Android... That's why I'm suggesting the LLDP in the Intel Ethernet driver. Others may disagree but for me it's close enough to merge a "enable L2 protocol agent" sort of an API. We don't need to have upstream users for each proto. Bigger cause of sadness is that the API IIUC is a part of a deprecation path, IOW once APF comes, it will become dead weight. Luckily it's not a lot of code. > Presumably that would result in implementations in many drivers, > though not necessarily any in-tree ones (I have no idea what the > current state of in-vs-out-of-tree drivers is wrt. Android wifi/cell > hardware) > > This is very much a chicken-and-egg problem though. As long as there > is no 'public' API, the default approach is for per-vendor or even > per-chip / per-driver custom apis, hidden behind Android HALs. For > example we have such an Android HAL api for disabling ND offload on at > least one of our devices. Of course the HAL itself is backed by > calling into the driver - just over some driver specific netlink... I wonder if there's anything we can share between APF style offloads and Jamal's P4 work, if it materializes.
diff --git a/include/uapi/linux/ethtool.h b/include/uapi/linux/ethtool.h index 4a0a6e703483..7b58860c3731 100644 --- a/include/uapi/linux/ethtool.h +++ b/include/uapi/linux/ethtool.h @@ -224,12 +224,27 @@ struct ethtool_value { #define PFC_STORM_PREVENTION_AUTO 0xffff #define PFC_STORM_PREVENTION_DISABLE 0 +/* For power/wakeup (*not* performance) related offloads */ +enum tunable_fw_offload_disable { + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_ALL, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_ARP, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_ND, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_PING, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_PING, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_IGMP, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MLD, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV4_MDNS, + ETHTOOL_TUNABLE_FW_OFFLOAD_DISABLE_IPV6_MDNS, + /* 55 bits remaining for future use */ +}; + enum tunable_id { ETHTOOL_ID_UNSPEC, ETHTOOL_RX_COPYBREAK, ETHTOOL_TX_COPYBREAK, ETHTOOL_PFC_PREVENTION_TOUT, /* timeout in msecs */ ETHTOOL_TX_COPYBREAK_BUF_SIZE, + ETHTOOL_FW_OFFLOAD_DISABLE, /* u64 bits numbered from LSB per tunable_fw_offload_disable */ /* * Add your fresh new tunable attribute above and remember to update * tunable_strings[] in net/ethtool/common.c diff --git a/net/ethtool/common.c b/net/ethtool/common.c index 7257ae272296..8dc0c084fce5 100644 --- a/net/ethtool/common.c +++ b/net/ethtool/common.c @@ -91,6 +91,7 @@ tunable_strings[__ETHTOOL_TUNABLE_COUNT][ETH_GSTRING_LEN] = { [ETHTOOL_TX_COPYBREAK] = "tx-copybreak", [ETHTOOL_PFC_PREVENTION_TOUT] = "pfc-prevention-tout", [ETHTOOL_TX_COPYBREAK_BUF_SIZE] = "tx-copybreak-buf-size", + [ETHTOOL_FW_OFFLOAD_DISABLE] = "fw-offload-disable", }; const char diff --git a/net/ethtool/ioctl.c b/net/ethtool/ioctl.c index 18cf9fa32ae3..31318ded17a7 100644 --- a/net/ethtool/ioctl.c +++ b/net/ethtool/ioctl.c @@ -2733,6 +2733,11 @@ static int ethtool_get_module_eeprom(struct net_device *dev, static int ethtool_tunable_valid(const struct ethtool_tunable *tuna) { switch (tuna->id) { + case ETHTOOL_FW_OFFLOAD_DISABLE: + if (tuna->len != sizeof(u64) || + tuna->type_id != ETHTOOL_TUNABLE_U64) + return -EINVAL; + break; case ETHTOOL_RX_COPYBREAK: case ETHTOOL_TX_COPYBREAK: case ETHTOOL_TX_COPYBREAK_BUF_SIZE:
In order to save power (battery), most network hardware designed for low power environments (ie. battery powered devices) supports varying types of hardware/firmware offload (filtering and/or generating replies) of incoming packets. The goal being to prevent device wakeups caused by ingress 'spam'. This is particularly true for wifi (especially phones/tablets), but isn't actually wifi specific. It can also be implemented in wired nics (TV) or usb ethernet dongles. For examples TVs require this to keep power consumption under (the EU mandated) 2 Watts while idle (display off), while still being discoverable on the network. This may include things like: ARP/IPv6 ND, IGMP/MLD, ping, mdns, but various other possibilities are also possible, for example: ethertype filtering (discarding non-IP ethertypes), nat-t keepalive (discarding ingress, automating periodic egress), tcp keepalive (generation/processing/filtering), tethering (forwarding) offload In many ways, in its goals, it is somewhat similar to the relatively standard destination mac filtering most wired nics have supported for ages: reduce the amount of traffic the host must handle. While working on Android we've discovered that there is no device/driver agnostic way to disable these offloads. This patch is an attempt to rectify this. It does not add an API to configure these offloads, as usually this isn't needed as the driver will just fetch the required configuration information straight from the kernel. What it does do is add a simple API to allow disabling (masking) them, either for debugging or for test purposes. Cc: "Kory Maincent (Dent Project)" <kory.maincent@bootlin.com> Cc: Ahmed Zaki <ahmed.zaki@intel.com> Cc: Edward Cree <ecree.xilinx@gmail.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: Eric Dumazet <edumazet@google.com> Cc: Jakub Kicinski <kuba@kernel.org> Cc: Paolo Abeni <pabeni@redhat.com> Cc: Yuyang Huang <yuyanghuang@google.com> Cc: Lorenzo Colitti <lorenzo@google.com> Signed-off-by: Maciej Żenczykowski <maze@google.com> --- include/uapi/linux/ethtool.h | 15 +++++++++++++++ net/ethtool/common.c | 1 + net/ethtool/ioctl.c | 5 +++++ 3 files changed, 21 insertions(+)