Message ID | 20240701195507.256374-1-tom@herbertland.com (mailing list archive) |
---|---|
Headers | show |
Series | drivers: Fix drivers doing TX csum offload with EH | expand |
On 7/1/24 21:55, Tom Herbert wrote: > Several NICs would seem to support protocol specific TX checksum offload > and allow for cases where an IPv6 packet contains extension headers. > When deciding whether to offload a packet, ipv6_skip_exthdr is called > to skip extension headers. The problem is that if a packet contains an > IPv6 Routing Header then protocol specific checksum offload can't work, > the destination IP address in the IPv6 header is not the same one that > is used in the pseudo header for TCP or UDP. The correct address is > derived from the last segment in the routing list (which itself might > be obfuscated so that a device could even read it). feels like there is a missing "not" after "could" - with it added, reads fine (not a request to change, just being verbose about assumptions) > > This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be > called in lieu of ipv6_skip_exthdr. If a routing header is present in > a packet then ipv6_skip_exthdr_no_rthdr returns a value less than > zero, this is an indication to the driver that TX checksum offload > is not viable and it should call skb_checksum_help instead of > offloading the checksum. > > The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly > to call ipv6_skip_exthdr_no_rthdr. > > Testing: The code compiles, but is otherwise untested due to lack of > NIC hardware. It would be appreciated if someone with access to the > hardware could test. we could test intel ones (except fm10k) via @Tony's tree > > v2: Fixed uninitialized variable in exthdrs_core.c > > Tom Herbert (7): > ipv6: Add ipv6_skip_exthdr_no_rthdr > i40e: Don't do TX csum offload with routing header present > iavf: Don't do TX csum offload with routing header present > ice: Don't do TX csum offload with routing header present sidenote: our HW is supporting (among others) a GCO check-summing mode described as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not yet provided patches for that, and I don't even know if this mode will be used (CC @Paul) > idpf: Don't do TX csum offload with routing header present > hinic: Don't do TX csum offload with routing header present > fm10k: Don't do TX csum offload with routing header present > > drivers/net/ethernet/huawei/hinic/hinic_tx.c | 23 +++++++++++---- > drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 ++++-- > drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 ++++++--------- > drivers/net/ethernet/intel/iavf/iavf_txrx.c | 20 ++++++------- > drivers/net/ethernet/intel/ice/ice_txrx.c | 22 ++++++--------- > .../ethernet/intel/idpf/idpf_singleq_txrx.c | 28 +++++++++---------- > include/net/ipv6.h | 17 +++++++++-- > net/ipv6/exthdrs_core.c | 25 ++++++++++++----- > 8 files changed, 98 insertions(+), 68 deletions(-) > I have reviewed the patches and they conform to commit message/intent, Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> (for the series)
On Mon, 1 Jul 2024 12:55:00 -0700 Tom Herbert wrote: > Testing: The code compiles, but is otherwise untested due to lack of > NIC hardware. It would be appreciated if someone with access to the > hardware could test. Could you pop a script under tools/testing/selftests/drivers/net/ that'd exercise this? This will hopefully guarantee good coverage soon, due to: https://netdev.bots.linux.dev/devices.html
On 7/2/2024 3:31 AM, Przemek Kitszel wrote: > On 7/1/24 21:55, Tom Herbert wrote: >> Several NICs would seem to support protocol specific TX checksum offload >> and allow for cases where an IPv6 packet contains extension headers. >> When deciding whether to offload a packet, ipv6_skip_exthdr is called >> to skip extension headers. The problem is that if a packet contains an >> IPv6 Routing Header then protocol specific checksum offload can't work, >> the destination IP address in the IPv6 header is not the same one that >> is used in the pseudo header for TCP or UDP. The correct address is >> derived from the last segment in the routing list (which itself might >> be obfuscated so that a device could even read it). > > feels like there is a missing "not" after "could" - with it added, reads > fine (not a request to change, just being verbose about assumptions) > >> >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be >> called in lieu of ipv6_skip_exthdr. If a routing header is present in >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than >> zero, this is an indication to the driver that TX checksum offload >> is not viable and it should call skb_checksum_help instead of >> offloading the checksum. >> >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly >> to call ipv6_skip_exthdr_no_rthdr. >> >> Testing: The code compiles, but is otherwise untested due to lack of >> NIC hardware. It would be appreciated if someone with access to the >> hardware could test. > > we could test intel ones (except fm10k) via @Tony's tree > >> >> v2: Fixed uninitialized variable in exthdrs_core.c >> >> Tom Herbert (7): >> ipv6: Add ipv6_skip_exthdr_no_rthdr >> i40e: Don't do TX csum offload with routing header present >> iavf: Don't do TX csum offload with routing header present >> ice: Don't do TX csum offload with routing header present > > sidenote: > our HW is supporting (among others) a GCO check-summing mode described > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not > yet provided patches for that, and I don't even know if this mode > will be used (CC @Paul) > We will be adding support for GCO "Checksum 16 with pseudo Headers" to the ice driver. It will be off by default. >> idpf: Don't do TX csum offload with routing header present >> hinic: Don't do TX csum offload with routing header present >> fm10k: Don't do TX csum offload with routing header present >> >> drivers/net/ethernet/huawei/hinic/hinic_tx.c | 23 +++++++++++---- >> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 ++++-- >> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 ++++++--------- >> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 20 ++++++------- >> drivers/net/ethernet/intel/ice/ice_txrx.c | 22 ++++++--------- >> .../ethernet/intel/idpf/idpf_singleq_txrx.c | 28 +++++++++---------- >> include/net/ipv6.h | 17 +++++++++-- >> net/ipv6/exthdrs_core.c | 25 ++++++++++++----- >> 8 files changed, 98 insertions(+), 68 deletions(-) >> > > I have reviewed the patches and they conform to commit message/intent, > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > (for the series)
On Tue, Jul 2, 2024 at 6:46 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Mon, 1 Jul 2024 12:55:00 -0700 Tom Herbert wrote: > > Testing: The code compiles, but is otherwise untested due to lack of > > NIC hardware. It would be appreciated if someone with access to the > > hardware could test. > > Could you pop a script under tools/testing/selftests/drivers/net/ > that'd exercise this? > > This will hopefully guarantee good coverage soon, due to: > https://netdev.bots.linux.dev/devices.html Sure. Thanks for the pointer. Tom
On 7/3/24 16:38, Tom Herbert wrote: > > > On Wed, Jul 3, 2024, 7:20 AM Greenwalt, Paul <paul.greenwalt@intel.com > <mailto:paul.greenwalt@intel.com>> wrote: > > > > On 7/2/2024 3:31 AM, Przemek Kitszel wrote: > > On 7/1/24 21:55, Tom Herbert wrote: > >> Several NICs would seem to support protocol specific TX checksum > offload > >> and allow for cases where an IPv6 packet contains extension headers. > >> When deciding whether to offload a packet, ipv6_skip_exthdr is > called > >> to skip extension headers. The problem is that if a packet > contains an > >> IPv6 Routing Header then protocol specific checksum offload > can't work, > >> the destination IP address in the IPv6 header is not the same > one that > >> is used in the pseudo header for TCP or UDP. The correct address is > >> derived from the last segment in the routing list (which itself > might > >> be obfuscated so that a device could even read it). > > > > feels like there is a missing "not" after "could" - with it > added, reads > > fine (not a request to change, just being verbose about assumptions) > > > >> > >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be > >> called in lieu of ipv6_skip_exthdr. If a routing header is > present in > >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than > >> zero, this is an indication to the driver that TX checksum offload > >> is not viable and it should call skb_checksum_help instead of > >> offloading the checksum. > >> > >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly > >> to call ipv6_skip_exthdr_no_rthdr. > >> > >> Testing: The code compiles, but is otherwise untested due to lack of > >> NIC hardware. It would be appreciated if someone with access to the > >> hardware could test. > > > > we could test intel ones (except fm10k) via @Tony's tree > > > Awesome! If you need any help let me know. > > > > >> > >> v2: Fixed uninitialized variable in exthdrs_core.c > >> > >> Tom Herbert (7): > >> ipv6: Add ipv6_skip_exthdr_no_rthdr > >> i40e: Don't do TX csum offload with routing header present > >> iavf: Don't do TX csum offload with routing header present > >> ice: Don't do TX csum offload with routing header present > > > > sidenote: > > our HW is supporting (among others) a GCO check-summing mode > described > > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not > > yet provided patches for that, and I don't even know if this mode > > will be used (CC @Paul) > > > > We will be adding support for GCO "Checksum 16 with pseudo Headers" to > the ice driver. It will be off by default. > > > I'm not sure what that means. IPv6 Routing Headers render (simple?) HW-offloaded checksumming wrong, but not for the "no pseudo Header"-checksum. I have no idea how such checksum will be useful, and we don't have plans to implement it, so this is not much relevant. But that is just one mode that you could config our (new) HW. > Can ICE just provide checksum-complete? > It's by far the simplest, most robust method with the most flexibility > for users. :-) sorry, could you please elaborate? Paul will implement GCO for ice and otherwise my understanding was that our checksum is fine. Is there a room for improvement? > > Tom > > > >> idpf: Don't do TX csum offload with routing header present > >> hinic: Don't do TX csum offload with routing header present > >> fm10k: Don't do TX csum offload with routing header present > >> > >> drivers/net/ethernet/huawei/hinic/hinic_tx.c | 23 +++++++++++---- > >> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 ++++-- > >> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 ++++++--------- > >> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 20 ++++++------- > >> drivers/net/ethernet/intel/ice/ice_txrx.c | 22 ++++++--------- > >> .../ethernet/intel/idpf/idpf_singleq_txrx.c | 28 > +++++++++---------- > >> include/net/ipv6.h | 17 +++++++++-- > >> net/ipv6/exthdrs_core.c | 25 > ++++++++++++----- > >> 8 files changed, 98 insertions(+), 68 deletions(-) > >> > > > > I have reviewed the patches and they conform to commit > message/intent, > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com > <mailto:przemyslaw.kitszel@intel.com>> > > (for the series) >
On Wed, Jul 3, 2024 at 8:03 AM Przemek Kitszel <przemyslaw.kitszel@intel.com> wrote: > > On 7/3/24 16:38, Tom Herbert wrote: > > > > > > On Wed, Jul 3, 2024, 7:20 AM Greenwalt, Paul <paul.greenwalt@intel.com > > <mailto:paul.greenwalt@intel.com>> wrote: > > > > > > > > On 7/2/2024 3:31 AM, Przemek Kitszel wrote: > > > On 7/1/24 21:55, Tom Herbert wrote: > > >> Several NICs would seem to support protocol specific TX checksum > > offload > > >> and allow for cases where an IPv6 packet contains extension headers. > > >> When deciding whether to offload a packet, ipv6_skip_exthdr is > > called > > >> to skip extension headers. The problem is that if a packet > > contains an > > >> IPv6 Routing Header then protocol specific checksum offload > > can't work, > > >> the destination IP address in the IPv6 header is not the same > > one that > > >> is used in the pseudo header for TCP or UDP. The correct address is > > >> derived from the last segment in the routing list (which itself > > might > > >> be obfuscated so that a device could even read it). > > > > > > feels like there is a missing "not" after "could" - with it > > added, reads > > > fine (not a request to change, just being verbose about assumptions) > > > > > >> > > >> This patch set adds a new function ipv6_skip_exthdr_no_rthdr to be > > >> called in lieu of ipv6_skip_exthdr. If a routing header is > > present in > > >> a packet then ipv6_skip_exthdr_no_rthdr returns a value less than > > >> zero, this is an indication to the driver that TX checksum offload > > >> is not viable and it should call skb_checksum_help instead of > > >> offloading the checksum. > > >> > > >> The i40e, iavf, ice, idpf, hinic, and fm10k are updated accordingly > > >> to call ipv6_skip_exthdr_no_rthdr. > > >> > > >> Testing: The code compiles, but is otherwise untested due to lack of > > >> NIC hardware. It would be appreciated if someone with access to the > > >> hardware could test. > > > > > > we could test intel ones (except fm10k) via @Tony's tree > > > > > > Awesome! If you need any help let me know. > > > > > > > >> > > >> v2: Fixed uninitialized variable in exthdrs_core.c > > >> > > >> Tom Herbert (7): > > >> ipv6: Add ipv6_skip_exthdr_no_rthdr > > >> i40e: Don't do TX csum offload with routing header present > > >> iavf: Don't do TX csum offload with routing header present > > >> ice: Don't do TX csum offload with routing header present > > > > > > sidenote: > > > our HW is supporting (among others) a GCO check-summing mode > > described > > > as: "Checksum 16bit (TCP/UDP) with no pseudo Header", but we have not > > > yet provided patches for that, and I don't even know if this mode > > > will be used (CC @Paul) > > > > > > > We will be adding support for GCO "Checksum 16 with pseudo Headers" to > > the ice driver. It will be off by default. > > > > > > I'm not sure what that means. > > IPv6 Routing Headers render (simple?) HW-offloaded checksumming wrong, > but not for the "no pseudo Header"-checksum. I have no idea how such > checksum will be useful, and we don't have plans to implement it, so > this is not much relevant. But that is just one mode that you could > config our (new) HW. > > > Can ICE just provide checksum-complete? > > It's by far the simplest, most robust method with the most flexibility > > for users. :-) > > sorry, could you please elaborate? > > Paul will implement GCO for ice and otherwise my understanding was that > our checksum is fine. Is there a room for improvement? Przemek, No, there's plenty of room for improvement :-). This is protocol specific checksum offload versus protocol agnostic checksum offload, and the opinion of the community has been clear for a long time: protocol agnostic checksum offload is the preferred method and protocol specific checksum offload is deprecated. This is true for both transmit and receive checksum offload. For background see Dave Miller's presentation on this from 2016: https://www.youtube.com/watch?v=6VgmazGwL_Y. Protocol agnostic checksum offload isn't just a to have, it addresses many bugs in protocol specific checksum offload. This patch set addresses one obvious bug, but there are others. For instance, in IETF there is a proposal in spring WG to do SRv6 without a routing header that would make the checksum incorrect on the wire. This will break protocol specific TX checksum offload and there's nothing to key on in the packet like an RH so that a driver would know the offload will fail. I'm really not sure how we could fix this without major surgery in the stack. Use protocol agnostic checksum offload and it "just works" (the proposal to purposely send a bad checksum on the wire without a RH is a bad idea in general, but I'm not sure we'll be able to stop it in IETF). And not to pick on the ICE driver, but please take a look at the function ice_tx_csum. This function is called on every packet just to evaluate whether the device is going to be able to offload the packet. Basically, it parses the packet on transmit to make sure that the device will be able to parse the packet (this is where we need to call ipv6_skip_exthdr_no_rthdr). This function is 180 LOC! If the device properly supports protocol agnostic checksum offload all that is needed is to write the start offset and checksum offset into the receive descriptor. Maybe there's some checks on the offset values, but I can't see that needing more than ten LOC-- it's much less susceptible to bugs, more robust, and works with a much wider set of protocol combinations. BTW, this patch set is the first in a series to formally deprecate and remove protocol specific checksum offloads from the core kernel. IMO, we've waited long enough! My intent is to remove CHECKSUM_UNNECESSARY, NETIF_F_IP_CSUM, and NETIF_F_IPV6_CSUM. (note comment in skbuff.h "New devices should use %NETIF_F_HW_CSUM to indicate checksum offload capability."). Helper functions will be provided to support legacy devices. Tom > > > > > Tom > > > > > > >> idpf: Don't do TX csum offload with routing header present > > >> hinic: Don't do TX csum offload with routing header present > > >> fm10k: Don't do TX csum offload with routing header present > > >> > > >> drivers/net/ethernet/huawei/hinic/hinic_tx.c | 23 +++++++++++---- > > >> drivers/net/ethernet/intel/fm10k/fm10k_main.c | 9 ++++-- > > >> drivers/net/ethernet/intel/i40e/i40e_txrx.c | 22 ++++++--------- > > >> drivers/net/ethernet/intel/iavf/iavf_txrx.c | 20 ++++++------- > > >> drivers/net/ethernet/intel/ice/ice_txrx.c | 22 ++++++--------- > > >> .../ethernet/intel/idpf/idpf_singleq_txrx.c | 28 > > +++++++++---------- > > >> include/net/ipv6.h | 17 +++++++++-- > > >> net/ipv6/exthdrs_core.c | 25 > > ++++++++++++----- > > >> 8 files changed, 98 insertions(+), 68 deletions(-) > > >> > > > > > > I have reviewed the patches and they conform to commit > > message/intent, > > > Reviewed-by: Przemek Kitszel <przemyslaw.kitszel@intel.com > > <mailto:przemyslaw.kitszel@intel.com>> > > > (for the series) > > >