diff mbox series

[v5,net-next,08/13] gso: AccECN support

Message ID 20241105100647.117346-9-chia-yu.chang@nokia-bell-labs.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series AccECN protocol preparation patch series | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 42 this patch: 42
netdev/build_tools success Errors and warnings before: 2 (+0) this patch: 2 (+0)
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 78 this patch: 78
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 5008 this patch: 5008
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 77 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 82 this patch: 82
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-07--12-00 (tests: 787)

Commit Message

Chia-Yu Chang (Nokia) Nov. 5, 2024, 10:06 a.m. UTC
From: Ilpo Järvinen <ij@kernel.org>

Handling the CWR flag differs between RFC 3168 ECN and AccECN.
With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
starting from 2nd segment which is incompatible how AccECN handles
the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
With AccECN, CWR flag (or more accurately, the ACE field that also
includes ECE & AE flags) changes only when new packet(s) with CE
mark arrives so the flag should not be changed within a super-skb.
The new skb/feature flags are necessary to prevent such TSO engines
corrupting AccECN ACE counters by clearing the CWR flag (if the
CWR handling feature cannot be turned off).

If NIC is completely unaware of RFC3168 ECN (doesn't support
NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
with AccECN on such NIC. This should be evaluated per NIC basis
(not done in this patch series for any NICs).

For the cases, where TSO cannot keep its hands off the CWR flag,
a GSO fallback is provided by this patch.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 include/linux/netdev_features.h | 8 +++++---
 include/linux/netdevice.h       | 2 ++
 include/linux/skbuff.h          | 2 ++
 net/ethtool/common.c            | 1 +
 net/ipv4/tcp_offload.c          | 6 +++++-
 5 files changed, 15 insertions(+), 4 deletions(-)

Comments

Eric Dumazet Nov. 7, 2024, 12:41 p.m. UTC | #1
On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
>
> From: Ilpo Järvinen <ij@kernel.org>
>
> Handling the CWR flag differs between RFC 3168 ECN and AccECN.
> With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
> starting from 2nd segment which is incompatible how AccECN handles
> the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
> With AccECN, CWR flag (or more accurately, the ACE field that also
> includes ECE & AE flags) changes only when new packet(s) with CE
> mark arrives so the flag should not be changed within a super-skb.
> The new skb/feature flags are necessary to prevent such TSO engines
> corrupting AccECN ACE counters by clearing the CWR flag (if the
> CWR handling feature cannot be turned off).
>
> If NIC is completely unaware of RFC3168 ECN (doesn't support
> NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
> despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
> with AccECN on such NIC. This should be evaluated per NIC basis
> (not done in this patch series for any NICs).
>
> For the cases, where TSO cannot keep its hands off the CWR flag,
> a GSO fallback is provided by this patch.
>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>

Reviewed-by: Eric Dumazet <edumazet@google.com>
Ilpo Järvinen Nov. 9, 2024, 10:09 a.m. UTC | #2
On Tue, 5 Nov 2024, chia-yu.chang@nokia-bell-labs.com wrote:

> From: Ilpo Järvinen <ij@kernel.org>
> 
> Handling the CWR flag differs between RFC 3168 ECN and AccECN.
> With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
> starting from 2nd segment which is incompatible how AccECN handles
> the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
> With AccECN, CWR flag (or more accurately, the ACE field that also
> includes ECE & AE flags) changes only when new packet(s) with CE
> mark arrives so the flag should not be changed within a super-skb.
> The new skb/feature flags are necessary to prevent such TSO engines
> corrupting AccECN ACE counters by clearing the CWR flag (if the
> CWR handling feature cannot be turned off).
> 
> If NIC is completely unaware of RFC3168 ECN (doesn't support
> NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
> despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
> with AccECN on such NIC. This should be evaluated per NIC basis
> (not done in this patch series for any NICs).
> 
> For the cases, where TSO cannot keep its hands off the CWR flag,
> a GSO fallback is provided by this patch.
> 
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  include/linux/netdev_features.h | 8 +++++---
>  include/linux/netdevice.h       | 2 ++
>  include/linux/skbuff.h          | 2 ++
>  net/ethtool/common.c            | 1 +
>  net/ipv4/tcp_offload.c          | 6 +++++-
>  5 files changed, 15 insertions(+), 4 deletions(-)
> 
> diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> index 66e7d26b70a4..c59db449bcf0 100644
> --- a/include/linux/netdev_features.h
> +++ b/include/linux/netdev_features.h
> @@ -53,12 +53,12 @@ enum {
>  	NETIF_F_GSO_UDP_BIT,		/* ... UFO, deprecated except tuntap */
>  	NETIF_F_GSO_UDP_L4_BIT,		/* ... UDP payload GSO (not UFO) */
>  	NETIF_F_GSO_FRAGLIST_BIT,		/* ... Fraglist GSO */
> +	NETIF_F_GSO_ACCECN_BIT,         /* TCP AccECN w/ TSO (no clear CWR) */
>  	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
> -		NETIF_F_GSO_FRAGLIST_BIT,
> +		NETIF_F_GSO_ACCECN_BIT,
>  
>  	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
>  	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
> -	__UNUSED_NETIF_F_37,
>  	NETIF_F_NTUPLE_BIT,		/* N-tuple filters supported */
>  	NETIF_F_RXHASH_BIT,		/* Receive hashing offload */
>  	NETIF_F_RXCSUM_BIT,		/* Receive checksumming offload */
> @@ -128,6 +128,7 @@ enum {
>  #define NETIF_F_SG		__NETIF_F(SG)
>  #define NETIF_F_TSO6		__NETIF_F(TSO6)
>  #define NETIF_F_TSO_ECN		__NETIF_F(TSO_ECN)
> +#define NETIF_F_GSO_ACCECN	__NETIF_F(GSO_ACCECN)
>  #define NETIF_F_TSO		__NETIF_F(TSO)
>  #define NETIF_F_VLAN_CHALLENGED	__NETIF_F(VLAN_CHALLENGED)
>  #define NETIF_F_RXFCS		__NETIF_F(RXFCS)
> @@ -210,7 +211,8 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
>  				 NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID)
>  
>  /* List of features with software fallbacks. */
> -#define NETIF_F_GSO_SOFTWARE	(NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP |	     \
> +#define NETIF_F_GSO_SOFTWARE	(NETIF_F_ALL_TSO | \
> +				 NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \
>  				 NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST)
>  
>  /*
> diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> index 6e0f8e4aeb14..480d915b3bdb 100644
> --- a/include/linux/netdevice.h
> +++ b/include/linux/netdevice.h
> @@ -5076,6 +5076,8 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
>  	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
>  	BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
> +	BUILD_BUG_ON(SKB_GSO_TCP_ACCECN !=
> +		     (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT));
>  
>  	return (features & feature) == feature;
>  }
> diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> index 48f1e0fa2a13..530cb325fb86 100644
> --- a/include/linux/skbuff.h
> +++ b/include/linux/skbuff.h
> @@ -694,6 +694,8 @@ enum {
>  	SKB_GSO_UDP_L4 = 1 << 17,
>  
>  	SKB_GSO_FRAGLIST = 1 << 18,
> +
> +	SKB_GSO_TCP_ACCECN = 1 << 19,
>  };
>  
>  #if BITS_PER_LONG > 32
> diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> index 0d62363dbd9d..5c3ba2dfaa74 100644
> --- a/net/ethtool/common.c
> +++ b/net/ethtool/common.c
> @@ -32,6 +32,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
>  	[NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
>  	[NETIF_F_GSO_ROBUST_BIT] =       "tx-gso-robust",
>  	[NETIF_F_TSO_ECN_BIT] =          "tx-tcp-ecn-segmentation",
> +	[NETIF_F_GSO_ACCECN_BIT] =	 "tx-tcp-accecn-segmentation",
>  	[NETIF_F_TSO_MANGLEID_BIT] =	 "tx-tcp-mangleid-segmentation",
>  	[NETIF_F_TSO6_BIT] =             "tx-tcp6-segmentation",
>  	[NETIF_F_FSO_BIT] =              "tx-fcoe-segmentation",
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 2308665b51c5..0b05f30e9e5f 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -139,6 +139,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  	struct sk_buff *gso_skb = skb;
>  	__sum16 newcheck;
>  	bool ooo_okay, copy_destructor;
> +	bool ecn_cwr_mask;
>  	__wsum delta;
>  
>  	th = tcp_hdr(skb);
> @@ -198,6 +199,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  
>  	newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta));
>  
> +	ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN);
> +
>  	while (skb->next) {
>  		th->fin = th->psh = 0;
>  		th->check = newcheck;
> @@ -217,7 +220,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
>  		th = tcp_hdr(skb);
>  
>  		th->seq = htonl(seq);
> -		th->cwr = 0;
> +
> +		th->cwr &= ecn_cwr_mask;

Hi all,

I started to wonder if this approach would avoid CWR corruption without
need to introduce the SKB_GSO_TCP_ACCECN flag at all:

- Never set SKB_GSO_TCP_ECN for any skb
- Split CWR segment on TCP level before sending. While this causes need to 
split the super skb, it's once per RTT thing so does not sound very 
significant (compare e.g. with SACK that cause split on every ACK)
- Remove th->cwr cleaning from GSO (the above line)
- Likely needed: don't negotiate HOST/GUEST_ECN in virtio

I think that would prevent offloading treating CWR flag as RFC3168 and CWR
would be just copied as is like a normal flag which is required to not 
corrupt it. In practice, RFC3168 style traffic would just not combine 
anything with the CWR'ed packet as CWRs are singletons with RFC3168.

Would that work? It sure sounds too simple to work but I cannot 
immediately see why it would not.
Neal Cardwell Nov. 9, 2024, 7:42 p.m. UTC | #3
On Sat, Nov 9, 2024 at 5:09 AM Ilpo Järvinen <ij@kernel.org> wrote:
>
> On Tue, 5 Nov 2024, chia-yu.chang@nokia-bell-labs.com wrote:
>
> > From: Ilpo Järvinen <ij@kernel.org>
> >
> > Handling the CWR flag differs between RFC 3168 ECN and AccECN.
> > With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
> > starting from 2nd segment which is incompatible how AccECN handles
> > the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
> > With AccECN, CWR flag (or more accurately, the ACE field that also
> > includes ECE & AE flags) changes only when new packet(s) with CE
> > mark arrives so the flag should not be changed within a super-skb.
> > The new skb/feature flags are necessary to prevent such TSO engines
> > corrupting AccECN ACE counters by clearing the CWR flag (if the
> > CWR handling feature cannot be turned off).
> >
> > If NIC is completely unaware of RFC3168 ECN (doesn't support
> > NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
> > despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
> > with AccECN on such NIC. This should be evaluated per NIC basis
> > (not done in this patch series for any NICs).
> >
> > For the cases, where TSO cannot keep its hands off the CWR flag,
> > a GSO fallback is provided by this patch.
> >
> > Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> >  include/linux/netdev_features.h | 8 +++++---
> >  include/linux/netdevice.h       | 2 ++
> >  include/linux/skbuff.h          | 2 ++
> >  net/ethtool/common.c            | 1 +
> >  net/ipv4/tcp_offload.c          | 6 +++++-
> >  5 files changed, 15 insertions(+), 4 deletions(-)
> >
> > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> > index 66e7d26b70a4..c59db449bcf0 100644
> > --- a/include/linux/netdev_features.h
> > +++ b/include/linux/netdev_features.h
> > @@ -53,12 +53,12 @@ enum {
> >       NETIF_F_GSO_UDP_BIT,            /* ... UFO, deprecated except tuntap */
> >       NETIF_F_GSO_UDP_L4_BIT,         /* ... UDP payload GSO (not UFO) */
> >       NETIF_F_GSO_FRAGLIST_BIT,               /* ... Fraglist GSO */
> > +     NETIF_F_GSO_ACCECN_BIT,         /* TCP AccECN w/ TSO (no clear CWR) */
> >       /**/NETIF_F_GSO_LAST =          /* last bit, see GSO_MASK */
> > -             NETIF_F_GSO_FRAGLIST_BIT,
> > +             NETIF_F_GSO_ACCECN_BIT,
> >
> >       NETIF_F_FCOE_CRC_BIT,           /* FCoE CRC32 */
> >       NETIF_F_SCTP_CRC_BIT,           /* SCTP checksum offload */
> > -     __UNUSED_NETIF_F_37,
> >       NETIF_F_NTUPLE_BIT,             /* N-tuple filters supported */
> >       NETIF_F_RXHASH_BIT,             /* Receive hashing offload */
> >       NETIF_F_RXCSUM_BIT,             /* Receive checksumming offload */
> > @@ -128,6 +128,7 @@ enum {
> >  #define NETIF_F_SG           __NETIF_F(SG)
> >  #define NETIF_F_TSO6         __NETIF_F(TSO6)
> >  #define NETIF_F_TSO_ECN              __NETIF_F(TSO_ECN)
> > +#define NETIF_F_GSO_ACCECN   __NETIF_F(GSO_ACCECN)
> >  #define NETIF_F_TSO          __NETIF_F(TSO)
> >  #define NETIF_F_VLAN_CHALLENGED      __NETIF_F(VLAN_CHALLENGED)
> >  #define NETIF_F_RXFCS                __NETIF_F(RXFCS)
> > @@ -210,7 +211,8 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
> >                                NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID)
> >
> >  /* List of features with software fallbacks. */
> > -#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP |        \
> > +#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \
> > +                              NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \
> >                                NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST)
> >
> >  /*
> > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > index 6e0f8e4aeb14..480d915b3bdb 100644
> > --- a/include/linux/netdevice.h
> > +++ b/include/linux/netdevice.h
> > @@ -5076,6 +5076,8 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
> >       BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
> >       BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
> >       BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
> > +     BUILD_BUG_ON(SKB_GSO_TCP_ACCECN !=
> > +                  (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT));
> >
> >       return (features & feature) == feature;
> >  }
> > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > index 48f1e0fa2a13..530cb325fb86 100644
> > --- a/include/linux/skbuff.h
> > +++ b/include/linux/skbuff.h
> > @@ -694,6 +694,8 @@ enum {
> >       SKB_GSO_UDP_L4 = 1 << 17,
> >
> >       SKB_GSO_FRAGLIST = 1 << 18,
> > +
> > +     SKB_GSO_TCP_ACCECN = 1 << 19,
> >  };
> >
> >  #if BITS_PER_LONG > 32
> > diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> > index 0d62363dbd9d..5c3ba2dfaa74 100644
> > --- a/net/ethtool/common.c
> > +++ b/net/ethtool/common.c
> > @@ -32,6 +32,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
> >       [NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
> >       [NETIF_F_GSO_ROBUST_BIT] =       "tx-gso-robust",
> >       [NETIF_F_TSO_ECN_BIT] =          "tx-tcp-ecn-segmentation",
> > +     [NETIF_F_GSO_ACCECN_BIT] =       "tx-tcp-accecn-segmentation",
> >       [NETIF_F_TSO_MANGLEID_BIT] =     "tx-tcp-mangleid-segmentation",
> >       [NETIF_F_TSO6_BIT] =             "tx-tcp6-segmentation",
> >       [NETIF_F_FSO_BIT] =              "tx-fcoe-segmentation",
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > index 2308665b51c5..0b05f30e9e5f 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -139,6 +139,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> >       struct sk_buff *gso_skb = skb;
> >       __sum16 newcheck;
> >       bool ooo_okay, copy_destructor;
> > +     bool ecn_cwr_mask;
> >       __wsum delta;
> >
> >       th = tcp_hdr(skb);
> > @@ -198,6 +199,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> >
> >       newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta));
> >
> > +     ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN);
> > +
> >       while (skb->next) {
> >               th->fin = th->psh = 0;
> >               th->check = newcheck;
> > @@ -217,7 +220,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> >               th = tcp_hdr(skb);
> >
> >               th->seq = htonl(seq);
> > -             th->cwr = 0;
> > +
> > +             th->cwr &= ecn_cwr_mask;
>
> Hi all,
>
> I started to wonder if this approach would avoid CWR corruption without
> need to introduce the SKB_GSO_TCP_ACCECN flag at all:
>
> - Never set SKB_GSO_TCP_ECN for any skb
> - Split CWR segment on TCP level before sending. While this causes need to
> split the super skb, it's once per RTT thing so does not sound very
> significant (compare e.g. with SACK that cause split on every ACK)
> - Remove th->cwr cleaning from GSO (the above line)
> - Likely needed: don't negotiate HOST/GUEST_ECN in virtio
>
> I think that would prevent offloading treating CWR flag as RFC3168 and CWR
> would be just copied as is like a normal flag which is required to not
> corrupt it. In practice, RFC3168 style traffic would just not combine
> anything with the CWR'ed packet as CWRs are singletons with RFC3168.
>
> Would that work? It sure sounds too simple to work but I cannot
> immediately see why it would not.

The above description just seems to describe the dynamics for the
RFC3168 case (since it mentions the CWR bit being set as just a "once
per RTT thing").

Ilpo, can you please describe how AccECN traffic would be handled in
your design proposal? (Since with established AccECN connections the
CWR bit is part of the ACE counter and may be set on roughly half of
outgoing data skbs...)

Sorry if I'm misunderstanding something...

Thanks,
neal
Ilpo Järvinen Nov. 9, 2024, 9:38 p.m. UTC | #4
On Sat, 9 Nov 2024, Neal Cardwell wrote:

> On Sat, Nov 9, 2024 at 5:09 AM Ilpo Järvinen <ij@kernel.org> wrote:
> >
> > On Tue, 5 Nov 2024, chia-yu.chang@nokia-bell-labs.com wrote:
> >
> > > From: Ilpo Järvinen <ij@kernel.org>
> > >
> > > Handling the CWR flag differs between RFC 3168 ECN and AccECN.
> > > With RFC 3168 ECN aware TSO (NETIF_F_TSO_ECN) CWR flag is cleared
> > > starting from 2nd segment which is incompatible how AccECN handles
> > > the CWR flag. Such super-segments are indicated by SKB_GSO_TCP_ECN.
> > > With AccECN, CWR flag (or more accurately, the ACE field that also
> > > includes ECE & AE flags) changes only when new packet(s) with CE
> > > mark arrives so the flag should not be changed within a super-skb.
> > > The new skb/feature flags are necessary to prevent such TSO engines
> > > corrupting AccECN ACE counters by clearing the CWR flag (if the
> > > CWR handling feature cannot be turned off).
> > >
> > > If NIC is completely unaware of RFC3168 ECN (doesn't support
> > > NETIF_F_TSO_ECN) or its TSO engine can be set to not touch CWR flag
> > > despite supporting also NETIF_F_TSO_ECN, TSO could be safely used
> > > with AccECN on such NIC. This should be evaluated per NIC basis
> > > (not done in this patch series for any NICs).
> > >
> > > For the cases, where TSO cannot keep its hands off the CWR flag,
> > > a GSO fallback is provided by this patch.
> > >
> > > Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> > > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > > ---
> > >  include/linux/netdev_features.h | 8 +++++---
> > >  include/linux/netdevice.h       | 2 ++
> > >  include/linux/skbuff.h          | 2 ++
> > >  net/ethtool/common.c            | 1 +
> > >  net/ipv4/tcp_offload.c          | 6 +++++-
> > >  5 files changed, 15 insertions(+), 4 deletions(-)
> > >
> > > diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
> > > index 66e7d26b70a4..c59db449bcf0 100644
> > > --- a/include/linux/netdev_features.h
> > > +++ b/include/linux/netdev_features.h
> > > @@ -53,12 +53,12 @@ enum {
> > >       NETIF_F_GSO_UDP_BIT,            /* ... UFO, deprecated except tuntap */
> > >       NETIF_F_GSO_UDP_L4_BIT,         /* ... UDP payload GSO (not UFO) */
> > >       NETIF_F_GSO_FRAGLIST_BIT,               /* ... Fraglist GSO */
> > > +     NETIF_F_GSO_ACCECN_BIT,         /* TCP AccECN w/ TSO (no clear CWR) */
> > >       /**/NETIF_F_GSO_LAST =          /* last bit, see GSO_MASK */
> > > -             NETIF_F_GSO_FRAGLIST_BIT,
> > > +             NETIF_F_GSO_ACCECN_BIT,
> > >
> > >       NETIF_F_FCOE_CRC_BIT,           /* FCoE CRC32 */
> > >       NETIF_F_SCTP_CRC_BIT,           /* SCTP checksum offload */
> > > -     __UNUSED_NETIF_F_37,
> > >       NETIF_F_NTUPLE_BIT,             /* N-tuple filters supported */
> > >       NETIF_F_RXHASH_BIT,             /* Receive hashing offload */
> > >       NETIF_F_RXCSUM_BIT,             /* Receive checksumming offload */
> > > @@ -128,6 +128,7 @@ enum {
> > >  #define NETIF_F_SG           __NETIF_F(SG)
> > >  #define NETIF_F_TSO6         __NETIF_F(TSO6)
> > >  #define NETIF_F_TSO_ECN              __NETIF_F(TSO_ECN)
> > > +#define NETIF_F_GSO_ACCECN   __NETIF_F(GSO_ACCECN)
> > >  #define NETIF_F_TSO          __NETIF_F(TSO)
> > >  #define NETIF_F_VLAN_CHALLENGED      __NETIF_F(VLAN_CHALLENGED)
> > >  #define NETIF_F_RXFCS                __NETIF_F(RXFCS)
> > > @@ -210,7 +211,8 @@ static inline int find_next_netdev_feature(u64 feature, unsigned long start)
> > >                                NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID)
> > >
> > >  /* List of features with software fallbacks. */
> > > -#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP |        \
> > > +#define NETIF_F_GSO_SOFTWARE (NETIF_F_ALL_TSO | \
> > > +                              NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \
> > >                                NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST)
> > >
> > >  /*
> > > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
> > > index 6e0f8e4aeb14..480d915b3bdb 100644
> > > --- a/include/linux/netdevice.h
> > > +++ b/include/linux/netdevice.h
> > > @@ -5076,6 +5076,8 @@ static inline bool net_gso_ok(netdev_features_t features, int gso_type)
> > >       BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
> > >       BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
> > >       BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
> > > +     BUILD_BUG_ON(SKB_GSO_TCP_ACCECN !=
> > > +                  (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT));
> > >
> > >       return (features & feature) == feature;
> > >  }
> > > diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
> > > index 48f1e0fa2a13..530cb325fb86 100644
> > > --- a/include/linux/skbuff.h
> > > +++ b/include/linux/skbuff.h
> > > @@ -694,6 +694,8 @@ enum {
> > >       SKB_GSO_UDP_L4 = 1 << 17,
> > >
> > >       SKB_GSO_FRAGLIST = 1 << 18,
> > > +
> > > +     SKB_GSO_TCP_ACCECN = 1 << 19,
> > >  };
> > >
> > >  #if BITS_PER_LONG > 32
> > > diff --git a/net/ethtool/common.c b/net/ethtool/common.c
> > > index 0d62363dbd9d..5c3ba2dfaa74 100644
> > > --- a/net/ethtool/common.c
> > > +++ b/net/ethtool/common.c
> > > @@ -32,6 +32,7 @@ const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
> > >       [NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
> > >       [NETIF_F_GSO_ROBUST_BIT] =       "tx-gso-robust",
> > >       [NETIF_F_TSO_ECN_BIT] =          "tx-tcp-ecn-segmentation",
> > > +     [NETIF_F_GSO_ACCECN_BIT] =       "tx-tcp-accecn-segmentation",
> > >       [NETIF_F_TSO_MANGLEID_BIT] =     "tx-tcp-mangleid-segmentation",
> > >       [NETIF_F_TSO6_BIT] =             "tx-tcp6-segmentation",
> > >       [NETIF_F_FSO_BIT] =              "tx-fcoe-segmentation",
> > > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > > index 2308665b51c5..0b05f30e9e5f 100644
> > > --- a/net/ipv4/tcp_offload.c
> > > +++ b/net/ipv4/tcp_offload.c
> > > @@ -139,6 +139,7 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > >       struct sk_buff *gso_skb = skb;
> > >       __sum16 newcheck;
> > >       bool ooo_okay, copy_destructor;
> > > +     bool ecn_cwr_mask;
> > >       __wsum delta;
> > >
> > >       th = tcp_hdr(skb);
> > > @@ -198,6 +199,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > >
> > >       newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta));
> > >
> > > +     ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN);
> > > +
> > >       while (skb->next) {
> > >               th->fin = th->psh = 0;
> > >               th->check = newcheck;
> > > @@ -217,7 +220,8 @@ struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
> > >               th = tcp_hdr(skb);
> > >
> > >               th->seq = htonl(seq);
> > > -             th->cwr = 0;
> > > +
> > > +             th->cwr &= ecn_cwr_mask;
> >
> > Hi all,
> >
> > I started to wonder if this approach would avoid CWR corruption without
> > need to introduce the SKB_GSO_TCP_ACCECN flag at all:
> >
> > - Never set SKB_GSO_TCP_ECN for any skb
> > - Split CWR segment on TCP level before sending. While this causes need to
> > split the super skb, it's once per RTT thing so does not sound very
> > significant (compare e.g. with SACK that cause split on every ACK)
> > - Remove th->cwr cleaning from GSO (the above line)
> > - Likely needed: don't negotiate HOST/GUEST_ECN in virtio
> >
> > I think that would prevent offloading treating CWR flag as RFC3168 and CWR
> > would be just copied as is like a normal flag which is required to not
> > corrupt it. In practice, RFC3168 style traffic would just not combine
> > anything with the CWR'ed packet as CWRs are singletons with RFC3168.
> >
> > Would that work? It sure sounds too simple to work but I cannot
> > immediately see why it would not.
> 
> The above description just seems to describe the dynamics for the
> RFC3168 case (since it mentions the CWR bit being set as just a "once
> per RTT thing").

For RFC3168 ECN case, that will still hold.

I think the misunderstanding was from this, I meant to only briefly say
how this would impact TCP using RFC3168 ECN. It needs to split CWR segment 
to own skb to not share skb with non-CWR segments and that even is 
infrequent enough (one per RTT) that it doesn't sound a bad compromise.

> Ilpo, can you please describe how AccECN traffic would be handled in
> your design proposal? (Since with established AccECN connections the
> CWR bit is part of the ACE counter and may be set on roughly half of
> outgoing data skbs...)

TSO (and GSO) RFC3168 ECN offloading takes an skb, puts CWR only into 
the first segment and the rest of the skb are sent without it.

What we want for AccECN is to never combine CWR and non-CWR skbs. The 
question is just how to achieve this.

My idea is to instead of this SKB_GSO_TCP_ACCECN flag split those skbs 
always apart on TCP level (RFC3168 case) and to adapt GRO to not set 
SKB_GSO_TCP_ECN either (the latter should already be done by this series). 

When SKB_GSO_TCP_ECN is not set by TCP nor GRO anymore, is that enough to 
prevent HW offloading CWR in the RFC3168 way? That is, will the HW offload 
still mess with CWR flag or does HW offloading honor the lack of 
SKB_GSO_TCP_ECN (I'm not sure about the answer as I've not dealt with NICs 
on that level)? If the HW offloading keeps its hands off from CWR, then 
there would be no need for this new flag.

GSO case is handled by simply removing CWR mangling from the code.


Well, I now went to read some random NIC drivers and I suspect the answer 
is that HW offloading will still mess with CWR, given the lack of code 
that would be depending on that gso_type flag... So I guess my idea is a 
dead-end and SKB_GSO_TCP_ACCECN flag is needed to prevent HW offloads from 
seeing the CWR'ed skb.

> Sorry if I'm misunderstanding something...
diff mbox series

Patch

diff --git a/include/linux/netdev_features.h b/include/linux/netdev_features.h
index 66e7d26b70a4..c59db449bcf0 100644
--- a/include/linux/netdev_features.h
+++ b/include/linux/netdev_features.h
@@ -53,12 +53,12 @@  enum {
 	NETIF_F_GSO_UDP_BIT,		/* ... UFO, deprecated except tuntap */
 	NETIF_F_GSO_UDP_L4_BIT,		/* ... UDP payload GSO (not UFO) */
 	NETIF_F_GSO_FRAGLIST_BIT,		/* ... Fraglist GSO */
+	NETIF_F_GSO_ACCECN_BIT,         /* TCP AccECN w/ TSO (no clear CWR) */
 	/**/NETIF_F_GSO_LAST =		/* last bit, see GSO_MASK */
-		NETIF_F_GSO_FRAGLIST_BIT,
+		NETIF_F_GSO_ACCECN_BIT,
 
 	NETIF_F_FCOE_CRC_BIT,		/* FCoE CRC32 */
 	NETIF_F_SCTP_CRC_BIT,		/* SCTP checksum offload */
-	__UNUSED_NETIF_F_37,
 	NETIF_F_NTUPLE_BIT,		/* N-tuple filters supported */
 	NETIF_F_RXHASH_BIT,		/* Receive hashing offload */
 	NETIF_F_RXCSUM_BIT,		/* Receive checksumming offload */
@@ -128,6 +128,7 @@  enum {
 #define NETIF_F_SG		__NETIF_F(SG)
 #define NETIF_F_TSO6		__NETIF_F(TSO6)
 #define NETIF_F_TSO_ECN		__NETIF_F(TSO_ECN)
+#define NETIF_F_GSO_ACCECN	__NETIF_F(GSO_ACCECN)
 #define NETIF_F_TSO		__NETIF_F(TSO)
 #define NETIF_F_VLAN_CHALLENGED	__NETIF_F(VLAN_CHALLENGED)
 #define NETIF_F_RXFCS		__NETIF_F(RXFCS)
@@ -210,7 +211,8 @@  static inline int find_next_netdev_feature(u64 feature, unsigned long start)
 				 NETIF_F_TSO_ECN | NETIF_F_TSO_MANGLEID)
 
 /* List of features with software fallbacks. */
-#define NETIF_F_GSO_SOFTWARE	(NETIF_F_ALL_TSO | NETIF_F_GSO_SCTP |	     \
+#define NETIF_F_GSO_SOFTWARE	(NETIF_F_ALL_TSO | \
+				 NETIF_F_GSO_ACCECN | NETIF_F_GSO_SCTP | \
 				 NETIF_F_GSO_UDP_L4 | NETIF_F_GSO_FRAGLIST)
 
 /*
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h
index 6e0f8e4aeb14..480d915b3bdb 100644
--- a/include/linux/netdevice.h
+++ b/include/linux/netdevice.h
@@ -5076,6 +5076,8 @@  static inline bool net_gso_ok(netdev_features_t features, int gso_type)
 	BUILD_BUG_ON(SKB_GSO_UDP != (NETIF_F_GSO_UDP >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_UDP_L4 != (NETIF_F_GSO_UDP_L4 >> NETIF_F_GSO_SHIFT));
 	BUILD_BUG_ON(SKB_GSO_FRAGLIST != (NETIF_F_GSO_FRAGLIST >> NETIF_F_GSO_SHIFT));
+	BUILD_BUG_ON(SKB_GSO_TCP_ACCECN !=
+		     (NETIF_F_GSO_ACCECN >> NETIF_F_GSO_SHIFT));
 
 	return (features & feature) == feature;
 }
diff --git a/include/linux/skbuff.h b/include/linux/skbuff.h
index 48f1e0fa2a13..530cb325fb86 100644
--- a/include/linux/skbuff.h
+++ b/include/linux/skbuff.h
@@ -694,6 +694,8 @@  enum {
 	SKB_GSO_UDP_L4 = 1 << 17,
 
 	SKB_GSO_FRAGLIST = 1 << 18,
+
+	SKB_GSO_TCP_ACCECN = 1 << 19,
 };
 
 #if BITS_PER_LONG > 32
diff --git a/net/ethtool/common.c b/net/ethtool/common.c
index 0d62363dbd9d..5c3ba2dfaa74 100644
--- a/net/ethtool/common.c
+++ b/net/ethtool/common.c
@@ -32,6 +32,7 @@  const char netdev_features_strings[NETDEV_FEATURE_COUNT][ETH_GSTRING_LEN] = {
 	[NETIF_F_TSO_BIT] =              "tx-tcp-segmentation",
 	[NETIF_F_GSO_ROBUST_BIT] =       "tx-gso-robust",
 	[NETIF_F_TSO_ECN_BIT] =          "tx-tcp-ecn-segmentation",
+	[NETIF_F_GSO_ACCECN_BIT] =	 "tx-tcp-accecn-segmentation",
 	[NETIF_F_TSO_MANGLEID_BIT] =	 "tx-tcp-mangleid-segmentation",
 	[NETIF_F_TSO6_BIT] =             "tx-tcp6-segmentation",
 	[NETIF_F_FSO_BIT] =              "tx-fcoe-segmentation",
diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 2308665b51c5..0b05f30e9e5f 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -139,6 +139,7 @@  struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 	struct sk_buff *gso_skb = skb;
 	__sum16 newcheck;
 	bool ooo_okay, copy_destructor;
+	bool ecn_cwr_mask;
 	__wsum delta;
 
 	th = tcp_hdr(skb);
@@ -198,6 +199,8 @@  struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 
 	newcheck = ~csum_fold(csum_add(csum_unfold(th->check), delta));
 
+	ecn_cwr_mask = !!(skb_shinfo(gso_skb)->gso_type & SKB_GSO_TCP_ACCECN);
+
 	while (skb->next) {
 		th->fin = th->psh = 0;
 		th->check = newcheck;
@@ -217,7 +220,8 @@  struct sk_buff *tcp_gso_segment(struct sk_buff *skb,
 		th = tcp_hdr(skb);
 
 		th->seq = htonl(seq);
-		th->cwr = 0;
+
+		th->cwr &= ecn_cwr_mask;
 	}
 
 	/* Following permits TCP Small Queues to work well with GSO :