Message ID | 20220411230305.28951-1-luizluca@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next,v2] docs: net: dsa: describe issues with checksum offload | expand |
I've copied a bunch of new people to this email. TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, does the DSA master declare any of the following features as "on"? ethtool -k eth0 | grep tx-checksum-ip I would expect not. Otherwise, we've either found a bug, or discovered the Sasquatch. On Mon, Apr 11, 2022 at 08:03:06PM -0300, Luiz Angelo Daros de Luca wrote: > DSA tags before IP header (categories 1 and 2) or after the payload (3) > might introduce offload checksum issues. > > Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> > --- Reviewed-by: Vladimir Oltean <vladimir.oltean@nxp.com> > Documentation/networking/dsa/dsa.rst | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst > index ddc1dd039337..ed7fa76e7a40 100644 > --- a/Documentation/networking/dsa/dsa.rst > +++ b/Documentation/networking/dsa/dsa.rst > @@ -193,6 +193,23 @@ protocol. If not all packets are of equal size, the tagger can implement the > default behavior by specifying the correct offset incurred by each individual > RX packet. Tail taggers do not cause issues to the flow dissector. > > +Checksum offload should work with category 1 and 2 taggers when the DSA master > +driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and > +csum_offset. For those cases, DSA will shift the checksum start and offset by > +the tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM > +or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the > +offload hardware already expects that specific tag (perhaps due to matching > +vendors). DSA slaves inherit those flags from the master port, and it is up to > +the driver to correctly fall back to software checksum when the IP header is not > +where the hardware expects. If that check is ineffective, the packets might go > +to the network without a proper checksum (the checksum field will have the > +pseudo IP header sum). For category 3, when the offload hardware does not > +already expect the switch tag in use, the checksum must be calculated before any > +tag is inserted (i.e. inside the tagger). Otherwise, the DSA master would > +include the tail tag in the (software or hardware) checksum calculation. Then, > +when the tag gets stripped by the switch during transmission, it will leave an > +incorrect IP checksum in place. > + While what you're describing here is truthful to what is currently being done, I'm re-reading this conversation: https://lore.kernel.org/netdev/20210715114908.ripblpevmdujkf2m@skbuf/T/#m13a2e3a78d22b82f14bcdf85d988844053b1e8f9 and trying to remember why I didn't point out what now seems obvious. It was said that inheriting master->vlan_features & NETIF_F_HW_CSUM is counter-productive for tail taggers, since now we have to patch all of them to do that "skb->ip_summed == CHECKSUM_PARTIAL && skb_checksum_help(skb))" dance. And that is most definitely true. It was also said that some systems where the DSA master vendor coincides with the DSA switch vendor rely on the switch inheriting NETIF_F_HW_CSUM from master->vlan_features, for a boost in performance. That is also most definitely true. But none of the examples given was for a tail tagger, which is what the discussion was about. With the exception of the obsolete tag_trailer.c used by mv88e6060, Marvell use Ethertype headers, and Broadcom either use an Ethertype header or a header prepended to the Ethernet header. Of all tagging protocol drivers which declare a non-zero needed_tailroom: - tag_ksz.c also calls skb_checksum_help() so I don't have doubts that there aren't masters which offload checksumming for it - tag_trailer.c doesn't call skb_checksum_help(), but it's orphan and probably super broken anyway - tag_hellcreek.c doesn't call skb_checksum_help() and is therefore probably broken with checksum offloading masters. But it was probably only tested on FPGA (or at least I assume "hirschmann,hellcreek-de1soc-r1" stands for "Altera DE1") and it happens to work there. - tag_xrs700x.c doesn't call skb_checksum_help() either, so there are probably breakages waiting to happen - tag_sja1105.c (actually only SJA1110) uses a tail tag only for link-local traffic, which is non-IP so there is no checksum offload breakage there - tag_rtl8_4.c (the tail tagging version) has been added by yourself with a call to skb_checksum_help(). In any case, we give this advice to driver writers so off-hand that it's comical (I'm not attacking you, Luiz, for merely writing it down): | For category 3, when the offload hardware does not already expect the | switch tag in use, the checksum must be calculated before any tag is | inserted (i.e. inside the tagger). As if the tagging protocol driver has any crystal ball to guess whether the offload hardware of the DSA master in current use is going to expect the tail tag or not. BS. A tail tagging protocol concerned with correctness and portability is always going to call skb_checksum_help(), hence the absurdity of allowing tail taggers to inherit NETIF_F_HW_CSUM | NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM from the master in the first place. > Due to various reasons (most common being category 1 taggers being associated > with DSA-unaware masters, mangling what the master perceives as MAC DA), the > tagging protocol may require the DSA master to operate in promiscuous mode, to > -- > 2.35.1 >
On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote: > I've copied a bunch of new people to this email. > > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, > does the DSA master declare any of the following features as "on"? > > ethtool -k eth0 | grep tx-checksum-ip Zii-devel-c, which uses a FEC as master: root@zii-devel-c:~# ethtool -k eth1 | grep tx-checksum-ip tx-checksum-ipv4: off [fixed] tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: off [fixed] 370RD is a Marvell reference design, using mvneta as the master andrew@370rd:~$ /usr/sbin/ethtool -k eth1 | grep tx-checksum-ip tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on WRT1900AC is a WiFi access point, also mvneta root@wrt1900ac:~# ethtool -k eth0 | grep tx-checksum-ip tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: on I have one more system i can check, using a Marvell Kirkwood SoC using the mv643xx as master. I need to blow the dust off it first, i've not booted it in years. Andrew
On Wed, Apr 13, 2022 at 10:49:53PM +0200, Andrew Lunn wrote: > On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote: > > I've copied a bunch of new people to this email. > > > > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, > > does the DSA master declare any of the following features as "on"? > > > > ethtool -k eth0 | grep tx-checksum-ip > > Zii-devel-c, which uses a FEC as master: > > root@zii-devel-c:~# ethtool -k eth1 | grep tx-checksum-ip > tx-checksum-ipv4: off [fixed] > tx-checksum-ip-generic: off [fixed] > tx-checksum-ipv6: off [fixed] > > 370RD is a Marvell reference design, using mvneta as the master > > andrew@370rd:~$ /usr/sbin/ethtool -k eth1 | grep tx-checksum-ip > tx-checksum-ipv4: on > tx-checksum-ip-generic: off [fixed] > tx-checksum-ipv6: on > > WRT1900AC is a WiFi access point, also mvneta > > root@wrt1900ac:~# ethtool -k eth0 | grep tx-checksum-ip > tx-checksum-ipv4: on > tx-checksum-ip-generic: off [fixed] > tx-checksum-ipv6: on > > I have one more system i can check, using a Marvell Kirkwood SoC using > the mv643xx as master. I need to blow the dust off it first, i've not > booted it in years. > > Andrew I meant to ask about the actual mv88e6060 driver, the one that uses tag_trailer.c, not mv88e6xxx.
On Wed, Apr 13, 2022 at 08:08:41PM +0000, Vladimir Oltean wrote: > I've copied a bunch of new people to this email. > > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, > does the DSA master declare any of the following features as "on"? > > ethtool -k eth0 | grep tx-checksum-ip > > I would expect not. Otherwise, we've either found a bug, or discovered the Sasquatch. Easier than i was expecting. As i said, Marvell Kirkwood with a mv643xx as master: root@dir665:~# ethtool -k eth0 | grep tx-checksum-ip tx-checksum-ipv4: on tx-checksum-ip-generic: off [fixed] tx-checksum-ipv6: off [fixed] Andrew
> I meant to ask about the actual mv88e6060 driver, the one that uses > tag_trailer.c, not mv88e6xxx. Ah, sorry, i don't have one of those. And i've no idea of anybody who does. It is a long long time since i heard of anybody with one. Andrew
On Wed, Apr 13, 2022 at 10:57:42PM +0200, Andrew Lunn wrote: > > I meant to ask about the actual mv88e6060 driver, the one that uses > > tag_trailer.c, not mv88e6xxx. > > Ah, sorry, i don't have one of those. And i've no idea of anybody who > does. It is a long long time since i heard of anybody with one. > > Andrew Ok, I'll go with "no checksum offload for its trailer tag, and bugs never fixed because no one uses it", in any case no Sasquatch. Thanks.
> Ok, I'll go with "no checksum offload for its trailer tag, and bugs > never fixed because no one uses it", in any case no Sasquatch. Thanks. Vladimir, so the DSA switch will not copy the offload flags when a tag requests tail room? At least it will work. Now, if the offload HW does support that tag, what would be the options? Set the slave port checksum flag from userland? It would be nice to have some type of "magic trick" to have it enabled by default. I'm already expecting a no, but wouldn't it be a nice case for a DSA property in the device tree? Regards, Luiz
On Wed Apr 13 2022, Vladimir Oltean wrote: > I've copied a bunch of new people to this email. > > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, > does the DSA master declare any of the following features as "on"? > > ethtool -k eth0 | grep tx-checksum-ip It's a Cyclone V with stmmac as master: |root@tsn:~# ethtool -k eth0 | grep tx-checksum-ip | tx-checksum-ipv4: on | tx-checksum-ip-generic: off [fixed] | tx-checksum-ipv6: on > > I would expect not. Otherwise, we've either found a bug, or discovered > the Sasquatch. Now, I'm wondering how this actually works. Anyway, I'll send a patch to add the missing skb_checksum_help(). Thanks, Kurt
On Wed, Apr 13, 2022 at 11:48:46PM -0300, Luiz Angelo Daros de Luca wrote: > > Ok, I'll go with "no checksum offload for its trailer tag, and bugs > > never fixed because no one uses it", in any case no Sasquatch. Thanks. > > Vladimir, so the DSA switch will not copy the offload flags when a tag > requests tail room? At least it will work. > > Now, if the offload HW does support that tag, what would be the > options? Set the slave port checksum flag from userland? > It would be nice to have some type of "magic trick" to have it enabled > by default. I'm already expecting a no, but wouldn't it be a nice case > for a DSA property in the device tree? > > Regards, > > Luiz DSA calls netdev_upper_dev_link(master, slave_dev, NULL) to establish a relationship with its master and the master driver can detect this by monitoring NETDEV_CHANGEUPPER. If we look a bit closer at the implementation of netdev_upper_dev_link we see it calls __netdev_upper_dev_link() which contains an interesting pair of arguments "void *upper_priv, void *upper_info". These are accessible to netdev_master_upper_dev_link(), and the bonding driver (for example) makes use of them, see bond_master_upper_dev_link(). I'm thinking DSA could create a struct netdev_dsa_upper_info too, and certain DSA masters could populate things in it. Then DSA could look at what the DSA master has said (or not said) and fix up features accordingly. One information that could get populated by the master is a bit field of whether checksumming is supported for a certain tagging protocol. I'd rather pass a full bit field of all tagging protocols, rather than just the protocol in current use by the slave, because: (a) it's less gory compared to having the master look at dsa_port_from_netdev(info->upper_dev)->cpu_dp->tag_ops->proto (b) the DSA tagging protocol can change at runtime, and when it does, no NETDEV_CHANGEUPPER will be emitted, so the master won't have a chance to inform us whether it can offload checksumming for the new protocol. So it's better to have this information from the get go. We'd also need the DSA master to populate a "bool acked=true" from this new struct netdev_dsa_upper_info. The reason is to be able to distinguish between an empty bit mask that means "yup, I really can't offload checksumming for anything", and a bit mask that means "DSA who?" (where checksum offloading is expected to work under the normal circumstances described by you, no special code required).
On Thu, Apr 14, 2022 at 08:29:12AM +0200, Kurt Kanzenbach wrote: > On Wed Apr 13 2022, Vladimir Oltean wrote: > > I've copied a bunch of new people to this email. > > > > TL;DR: Kurt/George/Andrew, on your systems with hellcreek/xrs700x/mv88e6060, > > does the DSA master declare any of the following features as "on"? > > > > ethtool -k eth0 | grep tx-checksum-ip > > It's a Cyclone V with stmmac as master: > > |root@tsn:~# ethtool -k eth0 | grep tx-checksum-ip > | tx-checksum-ipv4: on > | tx-checksum-ip-generic: off [fixed] > | tx-checksum-ipv6: on > > > > > I would expect not. Otherwise, we've either found a bug, or discovered > > the Sasquatch. > > Now, I'm wondering how this actually works. Anyway, I'll send a patch to > add the missing skb_checksum_help(). > > Thanks, > Kurt Thanks, Kurt. It works because stmmac declares NETIF_F_IP_CSUM | NETIF_F_IPV6_CSUM in ndev->hw_features but not in ndev->vlan_features. Whereas DSA inherits what it inherits from ndev->vlan_features. It's good to fix this in hellcreek anyway.
> On Wed, Apr 13, 2022 at 11:48:46PM -0300, Luiz Angelo Daros de Luca wrote: > > > Ok, I'll go with "no checksum offload for its trailer tag, and bugs > > > never fixed because no one uses it", in any case no Sasquatch. Thanks. > > > > Vladimir, so the DSA switch will not copy the offload flags when a tag > > requests tail room? At least it will work. > > > > Now, if the offload HW does support that tag, what would be the > > options? Set the slave port checksum flag from userland? > > It would be nice to have some type of "magic trick" to have it enabled > > by default. I'm already expecting a no, but wouldn't it be a nice case > > for a DSA property in the device tree? > > > > Regards, > > > > Luiz > > DSA calls netdev_upper_dev_link(master, slave_dev, NULL) to establish a > relationship with its master and the master driver can detect this by > monitoring NETDEV_CHANGEUPPER. > > If we look a bit closer at the implementation of netdev_upper_dev_link > we see it calls __netdev_upper_dev_link() which contains an interesting > pair of arguments "void *upper_priv, void *upper_info". These are > accessible to netdev_master_upper_dev_link(), and the bonding driver > (for example) makes use of them, see bond_master_upper_dev_link(). > > I'm thinking DSA could create a struct netdev_dsa_upper_info too, and > certain DSA masters could populate things in it. Then DSA could look at > what the DSA master has said (or not said) and fix up features > accordingly. > > One information that could get populated by the master is a bit field of > whether checksumming is supported for a certain tagging protocol. > I'd rather pass a full bit field of all tagging protocols, rather than > just the protocol in current use by the slave, because: > (a) it's less gory compared to having the master look at > dsa_port_from_netdev(info->upper_dev)->cpu_dp->tag_ops->proto > (b) the DSA tagging protocol can change at runtime, and when it does, no > NETDEV_CHANGEUPPER will be emitted, so the master won't have a > chance to inform us whether it can offload checksumming for the new > protocol. So it's better to have this information from the get go. > > We'd also need the DSA master to populate a "bool acked=true" from this > new struct netdev_dsa_upper_info. The reason is to be able to > distinguish between an empty bit mask that means "yup, I really can't > offload checksumming for anything", and a bit mask that means "DSA who?" > (where checksum offloading is expected to work under the normal > circumstances described by you, no special code required). That looks like a larger change. Should we put this patch on hold waiting for the code refactor or we merge it as is (as it tells no lies about current code).
On Fri, Apr 15, 2022 at 05:01:46AM -0300, Luiz Angelo Daros de Luca wrote: > That looks like a larger change. Well, you asked what are the options in the unlikely event that we'll see offloading DSA masters for tail tags. > Should we put this patch on hold waiting for the code refactor or we > merge it as is (as it tells no lies about current code). It looks like the patch was marked as "changes requested", my side comments other than the Reviewed-by tag were probably misinterpreted: https://patchwork.kernel.org/project/netdevbpf/patch/20220411230305.28951-1-luizluca@gmail.com/ So please repost.
> > Should we put this patch on hold waiting for the code refactor or we > > merge it as is (as it tells no lies about current code). > > It looks like the patch was marked as "changes requested", my side > comments other than the Reviewed-by tag were probably misinterpreted: > https://patchwork.kernel.org/project/netdevbpf/patch/20220411230305.28951-1-luizluca@gmail.com/ > > So please repost. Done as v3 (unchanged except for you review tag). Thanks!
diff --git a/Documentation/networking/dsa/dsa.rst b/Documentation/networking/dsa/dsa.rst index ddc1dd039337..ed7fa76e7a40 100644 --- a/Documentation/networking/dsa/dsa.rst +++ b/Documentation/networking/dsa/dsa.rst @@ -193,6 +193,23 @@ protocol. If not all packets are of equal size, the tagger can implement the default behavior by specifying the correct offset incurred by each individual RX packet. Tail taggers do not cause issues to the flow dissector. +Checksum offload should work with category 1 and 2 taggers when the DSA master +driver declares NETIF_F_HW_CSUM in vlan_features and looks at csum_start and +csum_offset. For those cases, DSA will shift the checksum start and offset by +the tag size. If the DSA master driver still uses the legacy NETIF_F_IP_CSUM +or NETIF_F_IPV6_CSUM in vlan_features, the offload might only work if the +offload hardware already expects that specific tag (perhaps due to matching +vendors). DSA slaves inherit those flags from the master port, and it is up to +the driver to correctly fall back to software checksum when the IP header is not +where the hardware expects. If that check is ineffective, the packets might go +to the network without a proper checksum (the checksum field will have the +pseudo IP header sum). For category 3, when the offload hardware does not +already expect the switch tag in use, the checksum must be calculated before any +tag is inserted (i.e. inside the tagger). Otherwise, the DSA master would +include the tail tag in the (software or hardware) checksum calculation. Then, +when the tag gets stripped by the switch during transmission, it will leave an +incorrect IP checksum in place. + Due to various reasons (most common being category 1 taggers being associated with DSA-unaware masters, mangling what the master perceives as MAC DA), the tagging protocol may require the DSA master to operate in promiscuous mode, to
DSA tags before IP header (categories 1 and 2) or after the payload (3) might introduce offload checksum issues. Signed-off-by: Luiz Angelo Daros de Luca <luizluca@gmail.com> --- Documentation/networking/dsa/dsa.rst | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)