diff mbox series

[v5,net-next,09/13] gro: prevent ACE field corruption & better AccECN handling

Message ID 20241105100647.117346-10-chia-yu.chang@nokia-bell-labs.com (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series AccECN protocol preparation patch series | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next, async
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 3 this patch: 3
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 6 of 6 maintainers
netdev/build_clang success Errors and warnings before: 3 this patch: 3
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 16 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-11-07--12-00 (tests: 787)

Commit Message

Chia-Yu Chang (Nokia) Nov. 5, 2024, 10:06 a.m. UTC
From: Ilpo Järvinen <ij@kernel.org>

There are important differences in how the CWR field behaves
in RFC3168 and AccECN. With AccECN, CWR flag is part of the
ACE counter and its changes are important so adjust the flags
changed mask accordingly.

Also, if CWR is there, set the Accurate ECN GSO flag to avoid
corrupting CWR flag somewhere.

Signed-off-by: Ilpo Järvinen <ij@kernel.org>
Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
---
 net/ipv4/tcp_offload.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Eric Dumazet Nov. 7, 2024, 12:40 p.m. UTC | #1
On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
>
> From: Ilpo Järvinen <ij@kernel.org>
>
> There are important differences in how the CWR field behaves
> in RFC3168 and AccECN. With AccECN, CWR flag is part of the
> ACE counter and its changes are important so adjust the flags
> changed mask accordingly.
>
> Also, if CWR is there, set the Accurate ECN GSO flag to avoid
> corrupting CWR flag somewhere.
>
> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> ---
>  net/ipv4/tcp_offload.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> index 0b05f30e9e5f..f59762d88c38 100644
> --- a/net/ipv4/tcp_offload.c
> +++ b/net/ipv4/tcp_offload.c
> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
>         th2 = tcp_hdr(p);
>         flush = (__force int)(flags & TCP_FLAG_CWR);
>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>         for (i = sizeof(*th); i < thlen; i += 4)
>                 flush |= *(u32 *)((u8 *)th + i) ^
> @@ -405,7 +405,7 @@ void tcp_gro_complete(struct sk_buff *skb)
>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
>
>         if (th->cwr)
> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
>  }
>  EXPORT_SYMBOL(tcp_gro_complete);
>

I do not really understand this patch. How a GRO engine can know which
ECN variant the peers are using ?

SKB_GSO_TCP_ECN is also used from other points, what is your plan ?

git grep -n SKB_GSO_TCP_ECN
drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN !=
(NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
net/ipv4/tcp_offload.c:404:             shinfo->gso_type |= SKB_GSO_TCP_ECN;
net/ipv4/tcp_output.c:389:
skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
Ilpo Järvinen Nov. 7, 2024, 7:28 p.m. UTC | #2
On Thu, 7 Nov 2024, Eric Dumazet wrote:

> On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
> >
> > From: Ilpo Järvinen <ij@kernel.org>
> >
> > There are important differences in how the CWR field behaves
> > in RFC3168 and AccECN. With AccECN, CWR flag is part of the
> > ACE counter and its changes are important so adjust the flags
> > changed mask accordingly.
> >
> > Also, if CWR is there, set the Accurate ECN GSO flag to avoid
> > corrupting CWR flag somewhere.
> >
> > Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> > Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > ---
> >  net/ipv4/tcp_offload.c | 4 ++--
> >  1 file changed, 2 insertions(+), 2 deletions(-)
> >
> > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
> > index 0b05f30e9e5f..f59762d88c38 100644
> > --- a/net/ipv4/tcp_offload.c
> > +++ b/net/ipv4/tcp_offload.c
> > @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
> >         th2 = tcp_hdr(p);
> >         flush = (__force int)(flags & TCP_FLAG_CWR);
> >         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> > -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> > +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
> >         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
> >         for (i = sizeof(*th); i < thlen; i += 4)
> >                 flush |= *(u32 *)((u8 *)th + i) ^
> > @@ -405,7 +405,7 @@ void tcp_gro_complete(struct sk_buff *skb)
> >         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
> >
> >         if (th->cwr)
> > -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
> > +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
> >  }
> >  EXPORT_SYMBOL(tcp_gro_complete);
> >
> 
> I do not really understand this patch. How a GRO engine can know which
> ECN variant the peers are using ?

Hi Eric,

Thanks for taking a look.

GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can 
result in header change that corrupts ACE field. Thus, GRO has to assume 
the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks 
the connection and knows the bits are used for RFC3168 which is something 
nobody is going to do).

The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or 
NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE 
field value.

SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but 
the same CWR flag should be repeated for all segments. In a sense, it's 
simpler than RFC3168 offloading.

> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
> 
> git grep -n SKB_GSO_TCP_ECN
> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;

These looked like they should be just changed to set SKB_GSO_TCP_ACCECN 
instead.

I don't anymore recall why I didn't change those when I made this patch 
long time ago, perhaps it was just an oversight or things have changed 
somehow since then.

> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN !=
> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,

Not relevant.

> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)

These need to be looked further what's going on as UAPI is also involved 
here.

> net/ipv4/tcp_offload.c:404:             shinfo->gso_type |= SKB_GSO_TCP_ECN;

This was changed above. :-)

> net/ipv4/tcp_output.c:389: skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;

I'm pretty sure this relates to sending side which will be taken care by 
a later patch in the full series (not among these preparatory patches).


FYI, these TSO/GSO/GRO changes are what I was most unsure myself if I 
got everything right when I was working with this series a few years back 
so please keep that in mind while reviewing so my lack of knowledge 
doesn't end up breaking something. :-)
Chia-Yu Chang (Nokia) Nov. 12, 2024, 2:09 a.m. UTC | #3
>From: Ilpo Järvinen <ij@kernel.org> 
>Sent: Thursday, November 7, 2024 8:28 PM
>To: Eric Dumazet <edumazet@google.com>
>Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; netdev@vger.kernel.org; dsahern@gmail.com; davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; joel.granados@kernel.org; kuba@kernel.org; andrew+netdev@lunn.ch; horms@kernel.org; pablo@netfilter.org; kadlec@netfilter.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
>Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
>
>
>CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
>On Thu, 7 Nov 2024, Eric Dumazet wrote:
>
>>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
>>>
>>> From: Ilpo Järvinen <ij@kernel.org>
>>>
>>> There are important differences in how the CWR field behaves in 
>>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter 
>>> and its changes are important so adjust the flags changed mask 
>>> accordingly.
>>>
>>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid 
>>> corrupting CWR flag somewhere.
>>>
>>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
>>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>>> ---
>>>  net/ipv4/tcp_offload.c | 4 ++--
>>>  1 file changed, 2 insertions(+), 2 deletions(-)
>>>
>>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 
>>> 0b05f30e9e5f..f59762d88c38 100644
>>> --- a/net/ipv4/tcp_offload.c
>>> +++ b/net/ipv4/tcp_offload.c
>>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
>>>         th2 = tcp_hdr(p);
>>>         flush = (__force int)(flags & TCP_FLAG_CWR);
>>>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
>>> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
>>> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
>>>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>>>         for (i = sizeof(*th); i < thlen; i += 4)
>>>                 flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7 
>>> @@ void tcp_gro_complete(struct sk_buff *skb)
>>>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
>>>
>>>         if (th->cwr)
>>> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
>>> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
>>>  }
>>>  EXPORT_SYMBOL(tcp_gro_complete);
>>>
>>
>> I do not really understand this patch. How a GRO engine can know which 
>> ECN variant the peers are using ?
>
>Hi Eric,
>
>Thanks for taking a look.
>
>GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
>
>The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.

Hi Eric and Ilpo,

From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.

>
>SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
>
>> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
>>
>> git grep -n SKB_GSO_TCP_ECN
>> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
>> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
>> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
>> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>
>These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.

I agree with these changes and will apply them in the next version.

>
>I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
>
>> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != 
>> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
>
>Not relevant.
>
>> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
>> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
>> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>
>These need to be looked further what's going on as UAPI is also involved here.

I had a look at these parts, and only the 1st and 3rd ones are relevant.
Related to the 1st one, I propose to change
from

                if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
                        gso_type |= SKB_GSO_TCP_ECN;

to

                if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
                        gso_type |= SKB_GSO_TCP_ACCECN;

The reason behind this proposed change is similar as the above changes in en_rx.c.

For the 3rd one, I suggest to change from

                if (sinfo->gso_type & SKB_GSO_TCP_ECN)
                        hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;

to

                if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
                        hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;

This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to allow TCP packets requiring segmentation offload which have ECN bit set.
So, no matter whether skb gso_type have GSO_TCP_ECN flag or GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.

But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN flag. Will it corrupts CWR or not?

--
Chia-Yu

>
>> net/ipv4/tcp_offload.c:404:             shinfo->gso_type |= SKB_GSO_TCP_ECN;
>
>This was changed above. :-)
>
>> net/ipv4/tcp_output.c:389: skb_shinfo(skb)->gso_type |= 
>> SKB_GSO_TCP_ECN;
>
>I'm pretty sure this relates to sending side which will be taken care by a later patch in the full series (not among these preparatory patches).
>
>
>FYI, these TSO/GSO/GRO changes are what I was most unsure myself if I got everything right when I was working with this series a few years back so please keep that in mind while reviewing so my lack of knowledge doesn't end up breaking something. :-)
Ilpo Järvinen Nov. 12, 2024, 9:19 p.m. UTC | #4
Adding a few virtio people. Please see the virtio spec/flag question 
below.

On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote:

> >From: Ilpo Järvinen <ij@kernel.org> 
> >Sent: Thursday, November 7, 2024 8:28 PM
> >To: Eric Dumazet <edumazet@google.com>
> >Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; netdev@vger.kernel.org; dsahern@gmail.com; davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; joel.granados@kernel.org; kuba@kernel.org; andrew+netdev@lunn.ch; horms@kernel.org; pablo@netfilter.org; kadlec@netfilter.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
> >
> >
> >CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> >
> >
> >
> >On Thu, 7 Nov 2024, Eric Dumazet wrote:
> >
> >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
> >>>
> >>> From: Ilpo Järvinen <ij@kernel.org>
> >>>
> >>> There are important differences in how the CWR field behaves in 
> >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter 
> >>> and its changes are important so adjust the flags changed mask 
> >>> accordingly.
> >>>
> >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid 
> >>> corrupting CWR flag somewhere.
> >>>
> >>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> >>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> >>> ---
> >>>  net/ipv4/tcp_offload.c | 4 ++--
> >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> >>>
> >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index 
> >>> 0b05f30e9e5f..f59762d88c38 100644
> >>> --- a/net/ipv4/tcp_offload.c
> >>> +++ b/net/ipv4/tcp_offload.c
> >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
> >>>         th2 = tcp_hdr(p);
> >>>         flush = (__force int)(flags & TCP_FLAG_CWR);
> >>>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> >>> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> >>> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
> >>>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
> >>>         for (i = sizeof(*th); i < thlen; i += 4)
> >>>                 flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7 
> >>> @@ void tcp_gro_complete(struct sk_buff *skb)
> >>>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
> >>>
> >>>         if (th->cwr)
> >>> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
> >>> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
> >>>  }
> >>>  EXPORT_SYMBOL(tcp_gro_complete);
> >>>
> >>
> >> I do not really understand this patch. How a GRO engine can know which 
> >> ECN variant the peers are using ?
> >
> >Hi Eric,
> >
> >Thanks for taking a look.
> >
> >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
> >
> >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.
> 
> Hi Eric and Ilpo,
> 
> From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.
> 
> >
> >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
> >
> >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
> >>
> >> git grep -n SKB_GSO_TCP_ECN
> >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
> >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> >
> >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.
> 
> I agree with these changes and will apply them in the next version.
> 
> >
> >I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
> >
> >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != 
> >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> >> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
> >
> >Not relevant.
> >
> >> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
> >> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
> >> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> >
> >These need to be looked further what's going on as UAPI is also 
> > involved here. 
> 
> I had a look at these parts, and only the 1st and 3rd ones are relevant.
> Related to the 1st one, I propose to change
> from
> 
>                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                         gso_type |= SKB_GSO_TCP_ECN;
> 
> to
> 
>                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                         gso_type |= SKB_GSO_TCP_ACCECN;
> 
> The reason behind this proposed change is similar as the above changes 
> in en_rx.c.

But en_rx.c is based one CWR flag, there's no similar check here.

> For the 3rd one, I suggest to change from
> 
>                 if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>
> to
> 
>                 if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
>                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> 
> This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to 
> allow TCP packets requiring segmentation offload which have ECN bit set.
> So, no matter whether skb gso_type have GSO_TCP_ECN flag or 
> GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.
> 
> But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN 
> flag. Will it corrupts CWR or not?

I'm starting to heavily question what is the meaning of that 
VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent...

https://github.com/qemu/qemu/blob/master/net/eth.c

That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel associates 
the related SKB_GSO_TCP_ECN to CWR offloading.)

The virtio spec is way too vague to be useful so it would not be 
surprising if there are diverging interpretations from implementers:

"If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the 
VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has 
the ECN bit set."

What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio 
folks please explain?
Jason Wang Nov. 13, 2024, 1:42 a.m. UTC | #5
On Wed, Nov 13, 2024 at 5:19 AM Ilpo Järvinen <ij@kernel.org> wrote:
>
> Adding a few virtio people. Please see the virtio spec/flag question
> below.
>
> On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote:
>
> > >From: Ilpo Järvinen <ij@kernel.org>
> > >Sent: Thursday, November 7, 2024 8:28 PM
> > >To: Eric Dumazet <edumazet@google.com>
> > >Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; netdev@vger.kernel.org; dsahern@gmail.com; davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; joel.granados@kernel.org; kuba@kernel.org; andrew+netdev@lunn.ch; horms@kernel.org; pablo@netfilter.org; kadlec@netfilter.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
> > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
> > >
> > >
> > >CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
> > >
> > >
> > >
> > >On Thu, 7 Nov 2024, Eric Dumazet wrote:
> > >
> > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
> > >>>
> > >>> From: Ilpo Järvinen <ij@kernel.org>
> > >>>
> > >>> There are important differences in how the CWR field behaves in
> > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE counter
> > >>> and its changes are important so adjust the flags changed mask
> > >>> accordingly.
> > >>>
> > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid
> > >>> corrupting CWR flag somewhere.
> > >>>
> > >>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
> > >>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
> > >>> ---
> > >>>  net/ipv4/tcp_offload.c | 4 ++--
> > >>>  1 file changed, 2 insertions(+), 2 deletions(-)
> > >>>
> > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index
> > >>> 0b05f30e9e5f..f59762d88c38 100644
> > >>> --- a/net/ipv4/tcp_offload.c
> > >>> +++ b/net/ipv4/tcp_offload.c
> > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
> > >>>         th2 = tcp_hdr(p);
> > >>>         flush = (__force int)(flags & TCP_FLAG_CWR);
> > >>>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
> > >>> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
> > >>> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
> > >>>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
> > >>>         for (i = sizeof(*th); i < thlen; i += 4)
> > >>>                 flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 +405,7
> > >>> @@ void tcp_gro_complete(struct sk_buff *skb)
> > >>>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
> > >>>
> > >>>         if (th->cwr)
> > >>> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
> > >>> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
> > >>>  }
> > >>>  EXPORT_SYMBOL(tcp_gro_complete);
> > >>>
> > >>
> > >> I do not really understand this patch. How a GRO engine can know which
> > >> ECN variant the peers are using ?
> > >
> > >Hi Eric,
> > >
> > >Thanks for taking a look.
> > >
> > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
> > >
> > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.
> >
> > Hi Eric and Ilpo,
> >
> > From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.
> >
> > >
> > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
> > >
> > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
> > >>
> > >> git grep -n SKB_GSO_TCP_ECN
> > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
> > >
> > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.
> >
> > I agree with these changes and will apply them in the next version.
> >
> > >
> > >I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
> > >
> > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN !=
> > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
> > >> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
> > >
> > >Not relevant.
> > >
> > >> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
> > >> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
> > >> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> > >
> > >These need to be looked further what's going on as UAPI is also
> > > involved here.
> >
> > I had a look at these parts, and only the 1st and 3rd ones are relevant.
> > Related to the 1st one, I propose to change
> > from
> >
> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> >                         gso_type |= SKB_GSO_TCP_ECN;
> >
> > to
> >
> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
> >                         gso_type |= SKB_GSO_TCP_ACCECN;
> >
> > The reason behind this proposed change is similar as the above changes
> > in en_rx.c.
>
> But en_rx.c is based one CWR flag, there's no similar check here.
>
> > For the 3rd one, I suggest to change from
> >
> >                 if (sinfo->gso_type & SKB_GSO_TCP_ECN)
> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> >
> > to
> >
> >                 if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
> >
> > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set to
> > allow TCP packets requiring segmentation offload which have ECN bit set.
> > So, no matter whether skb gso_type have GSO_TCP_ECN flag or
> > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.
> >
> > But, I wonder what would the driver do when with VIRTIO_NET_HDR_GSO_ECN
> > flag. Will it corrupts CWR or not?
>
> I'm starting to heavily question what is the meaning of that
> VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent...
>
> https://github.com/qemu/qemu/blob/master/net/eth.c
>
> That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel associates
> the related SKB_GSO_TCP_ECN to CWR offloading.)
>
> The virtio spec is way too vague to be useful so it would not be
> surprising if there are diverging interpretations from implementers:
>
> "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the
> VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has
> the ECN bit set."
>
> What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio
> folks please explain?

According to the current Linux implementation in virtio_net_hdr_to_skb():

                if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
                        gso_type |= SKB_GSO_TCP_ECN;

It mapps to SKB_GSO_TCP_ECN which is:

        /* This indicates the tcp segment has CWR set. */
        SKB_GSO_TCP_ECN = 1 << 2,

Thanks

>
>
> --
>  i.
Chia-Yu Chang (Nokia) Nov. 14, 2024, 1:04 a.m. UTC | #6
>From: Jason Wang <jasowang@redhat.com> 
>Sent: Wednesday, November 13, 2024 2:43 AM
>To: Ilpo Järvinen <ij@kernel.org>
>Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; Michael S. Tsirkin <mst@redhat.com>; virtualization@lists.linux.dev; Eric Dumazet <edumazet@google.com>; netdev@vger.kernel.org; dsahern@gmail.com; davem@davemloft.net; dsahern@kernel.org; pabeni@redhat.com; joel.granados@kernel.org; kuba@kernel.org; andrew+netdev@lunn.ch; horms@kernel.org; pablo@netfilter.org; kadlec@netfilter.org; netfilter-devel@vger.kernel.org; coreteam@netfilter.org; ncardwell@google.com; Koen De Schepper (Nokia) <koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; vidhi_goel@apple.com
>Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field corruption & better AccECN handling
>
>[You don't often get email from jasowang@redhat.com. Learn why this is important at https://aka.ms/LearnAboutSenderIdentification ]
>
>CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>
>
>
>On Wed, Nov 13, 2024 at 5:19 AM Ilpo Järvinen <ij@kernel.org> wrote:
>>
>> Adding a few virtio people. Please see the virtio spec/flag question 
>> below.
>>
>> On Tue, 12 Nov 2024, Chia-Yu Chang (Nokia) wrote:
>>
>> > >From: Ilpo Järvinen <ij@kernel.org>
>> > >Sent: Thursday, November 7, 2024 8:28 PM
>> > >To: Eric Dumazet <edumazet@google.com>
>> > >Cc: Chia-Yu Chang (Nokia) <chia-yu.chang@nokia-bell-labs.com>; 
>> > >netdev@vger.kernel.org; dsahern@gmail.com; davem@davemloft.net; 
>> > >dsahern@kernel.org; pabeni@redhat.com; joel.granados@kernel.org; 
>> > >kuba@kernel.org; andrew+netdev@lunn.ch; horms@kernel.org; 
>> > >pablo@netfilter.org; kadlec@netfilter.org; 
>> > >netfilter-devel@vger.kernel.org; coreteam@netfilter.org; 
>> > >ncardwell@google.com; Koen De Schepper (Nokia) 
>> > ><koen.de_schepper@nokia-bell-labs.com>; g.white@cablelabs.com; 
>> > >ingemar.s.johansson@ericsson.com; mirja.kuehlewind@ericsson.com; 
>> > >cheshire@apple.com; rs.ietf@gmx.at; Jason_Livingood@comcast.com; 
>> > >vidhi_goel@apple.com
>> > >Subject: Re: [PATCH v5 net-next 09/13] gro: prevent ACE field 
>> > >corruption & better AccECN handling
>> > >
>> > >
>> > >CAUTION: This is an external email. Please be very careful when clicking links or opening attachments. See the URL nok.it/ext for additional information.
>> > >
>> > >
>> > >
>> > >On Thu, 7 Nov 2024, Eric Dumazet wrote:
>> > >
>> > >>On Tue, Nov 5, 2024 at 11:07 AM <chia-yu.chang@nokia-bell-labs.com> wrote:
>> > >>>
>> > >>> From: Ilpo Järvinen <ij@kernel.org>
>> > >>>
>> > >>> There are important differences in how the CWR field behaves in
>> > >>> RFC3168 and AccECN. With AccECN, CWR flag is part of the ACE 
>> > >>> counter and its changes are important so adjust the flags 
>> > >>> changed mask accordingly.
>> > >>>
>> > >>> Also, if CWR is there, set the Accurate ECN GSO flag to avoid 
>> > >>> corrupting CWR flag somewhere.
>> > >>>
>> > >>> Signed-off-by: Ilpo Järvinen <ij@kernel.org>
>> > >>> Signed-off-by: Chia-Yu Chang <chia-yu.chang@nokia-bell-labs.com>
>> > >>> ---
>> > >>>  net/ipv4/tcp_offload.c | 4 ++--
>> > >>>  1 file changed, 2 insertions(+), 2 deletions(-)
>> > >>>
>> > >>> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c 
>> > >>> index
>> > >>> 0b05f30e9e5f..f59762d88c38 100644
>> > >>> --- a/net/ipv4/tcp_offload.c
>> > >>> +++ b/net/ipv4/tcp_offload.c
>> > >>> @@ -329,7 +329,7 @@ struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
>> > >>>         th2 = tcp_hdr(p);
>> > >>>         flush = (__force int)(flags & TCP_FLAG_CWR);
>> > >>>         flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
>> > >>> -                 ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
>> > >>> +                 ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
>> > >>>         flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
>> > >>>         for (i = sizeof(*th); i < thlen; i += 4)
>> > >>>                 flush |= *(u32 *)((u8 *)th + i) ^ @@ -405,7 
>> > >>> +405,7 @@ void tcp_gro_complete(struct sk_buff *skb)
>> > >>>         shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
>> > >>>
>> > >>>         if (th->cwr)
>> > >>> -               shinfo->gso_type |= SKB_GSO_TCP_ECN;
>> > >>> +               shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
>> > >>>  }
>> > >>>  EXPORT_SYMBOL(tcp_gro_complete);
>> > >>>
>> > >>
>> > >> I do not really understand this patch. How a GRO engine can know 
>> > >> which ECN variant the peers are using ?
>> > >
>> > >Hi Eric,
>> > >
>> > >Thanks for taking a look.
>> > >
>> > >GRO doesn't know. Setting SKB_GSO_TCP_ECN in case of not knowing can result in header change that corrupts ACE field. Thus, GRO has to assume the RFC3168 CWR offloading trick cannot be used anymore (unless it tracks the connection and knows the bits are used for RFC3168 which is something nobody is going to do).
>> > >
>> > >The main point of SKB_GSO_TCP_ACCECN is to prevent SKB_GSO_TCP_ECN or NETIF_F_TSO_ECN offloading to be used for the skb as it would corrupt ACE field value.
>> >
>> > Hi Eric and Ilpo,
>> >
>> > From my understanding of another email thread (patch 08/13), it seems the conclusions that SKB_GSO_TCP_ACCECN will still be needed.
>> >
>> > >
>> > >SKB_GSO_TCP_ACCECN doesn't allow CWR bits change within a super-skb but the same CWR flag should be repeated for all segments. In a sense, it's simpler than RFC3168 offloading.
>> > >
>> > >> SKB_GSO_TCP_ECN is also used from other points, what is your plan ?
>> > >>
>> > >> git grep -n SKB_GSO_TCP_ECN
>> > >> drivers/net/ethernet/hisilicon/hns3/hns3_enet.c:3888:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1291:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >> drivers/net/ethernet/mellanox/mlx5/core/en_rx.c:1312:
>> > >> skb_shinfo(skb)->gso_type |= SKB_GSO_TCP_ECN;
>> > >
>> > >These looked like they should be just changed to set SKB_GSO_TCP_ACCECN instead.
>> >
>> > I agree with these changes and will apply them in the next version.
>> >
>> > >
>> > >I don't anymore recall why I didn't change those when I made this patch long time ago, perhaps it was just an oversight or things have changed somehow since then.
>> > >
>> > >> include/linux/netdevice.h:5061: BUILD_BUG_ON(SKB_GSO_TCP_ECN != 
>> > >> (NETIF_F_TSO_ECN >> NETIF_F_GSO_SHIFT));
>> > >> include/linux/skbuff.h:664:     SKB_GSO_TCP_ECN = 1 << 2,
>> > >
>> > >Not relevant.
>> > >
>> > >> include/linux/virtio_net.h:88:                  gso_type |= SKB_GSO_TCP_ECN;
>> > >> include/linux/virtio_net.h:161:         switch (gso_type & ~SKB_GSO_TCP_ECN) {
>> > >> include/linux/virtio_net.h:226:         if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>> > >
>> > >These need to be looked further what's going on as UAPI is also  
>> > >involved here.
>> >
>> > I had a look at these parts, and only the 1st and 3rd ones are relevant.
>> > Related to the 1st one, I propose to change from
>> >
>> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>> >                         gso_type |= SKB_GSO_TCP_ECN;
>> >
>> > to
>> >
>> >                 if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>> >                         gso_type |= SKB_GSO_TCP_ACCECN;
>> >
>> > The reason behind this proposed change is similar as the above 
>> > changes in en_rx.c.
>>
>> But en_rx.c is based one CWR flag, there's no similar check here.
>>
>> > For the 3rd one, I suggest to change from
>> >
>> >                 if (sinfo->gso_type & SKB_GSO_TCP_ECN)
>> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>> >
>> > to
>> >
>> >                 if (sinfo->gso_type & (SKB_GSO_TCP_ECN | SKB_GSO_TCP_ACCECN))
>> >                         hdr->gso_type |= VIRTIO_NET_HDR_GSO_ECN;
>> >
>> > This proposed change is because VIRTIO_NET_HDR_GSO_ECN must be set 
>> > to allow TCP packets requiring segmentation offload which have ECN bit set.
>> > So, no matter whether skb gso_type have GSO_TCP_ECN flag or 
>> > GSO_TCP_ACCECN flag, the corresponding hdr gso_type shall be set.
>> >
>> > But, I wonder what would the driver do when with 
>> > VIRTIO_NET_HDR_GSO_ECN flag. Will it corrupts CWR or not?
>>
>> I'm starting to heavily question what is the meaning of that 
>> VIRTIO_NET_HDR_GSO_ECN flag and if it's even consistent...
>>
>> https://eur03.safelinks.protection.outlook.com/?url=https%3A%2F%2Fgith
>> ub.com%2Fqemu%2Fqemu%2Fblob%2Fmaster%2Fnet%2Feth.c&data=05%7C02%7Cchia
>> -yu.chang%40nokia-bell-labs.com%7C64e73bacd5eb4647905208dd03848651%7C5
>> d4717519675428d917b70f44f9630b0%7C0%7C0%7C638670589949336838%7CUnknown
>> %7CTWFpbGZsb3d8eyJFbXB0eU1hcGkiOnRydWUsIlYiOiIwLjAuMDAwMCIsIlAiOiJXaW4
>> zMiIsIkFOIjoiTWFpbCIsIldUIjoyfQ%3D%3D%7C0%7C%7C%7C&sdata=iEeGdA4mcq1WC
>> unKWvxNZ6yt213xKWKOpjyYdBzLzTk%3D&reserved=0
>>
>> That sets VIRTIO_NET_HDR_GSO_ECN based on CE?? (Whereas kernel 
>> associates the related SKB_GSO_TCP_ECN to CWR offloading.)
>>
>> The virtio spec is way too vague to be useful so it would not be 
>> surprising if there are diverging interpretations from implementers:
>>
>> "If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the 
>> VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet 
>> has the ECN bit set."
>>
>> What is "the ECN bit" in terms used by TCP (or IP)? Could some virtio 
>> folks please explain?
>
>According to the current Linux implementation in virtio_net_hdr_to_skb():
>
>                if (hdr->gso_type & VIRTIO_NET_HDR_GSO_ECN)
>                        gso_type |= SKB_GSO_TCP_ECN;
>
>It mapps to SKB_GSO_TCP_ECN which is:
>
>        /* This indicates the tcp segment has CWR set. */
>        SKB_GSO_TCP_ECN = 1 << 2,
>
>Thanks

This patch series is related to adding new SKB_GSO_TCP_ACCECN.
The reason behind is because how the CWR field handling is different between RFC3168 and AccECN.
You can to find the original patches in below:
https://lore.kernel.org/netdev/20241105100647.117346-9-chia-yu.chang@nokia-bell-labs.com/
https://lore.kernel.org/netdev/20241105100647.117346-10-chia-yu.chang@nokia-bell-labs.com/

When reviewing all occurrence of SKB_GSO_TCP_ECN, we see it's related to VIRTIO_NET_HDR_GSO_ECN.
But after checking the virtio spec, it's quite unclear about VIRTIO_NET_HDR_GSO_ECN:
"If the driver negotiated the VIRTIO_NET_F_HOST_ECN feature, the VIRTIO_NET_HDR_GSO_ECN bit in gso_type indicates that the TCP packet has the ECN bit set."

In virtnet_probe(), it looks like VIRTIO_NET_F_HOST_ECN is used to enable NETIF_F_TSO_ECN support:
                if (virtio_has_feature(vdev, VIRTIO_NET_F_HOST_ECN))
                        dev->hw_features |= NETIF_F_TSO_ECN;

So the question is that is its TSO engine can be set to not touch CWR flag even with NETIF_F_TSO_ECN?
diff mbox series

Patch

diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c
index 0b05f30e9e5f..f59762d88c38 100644
--- a/net/ipv4/tcp_offload.c
+++ b/net/ipv4/tcp_offload.c
@@ -329,7 +329,7 @@  struct sk_buff *tcp_gro_receive(struct list_head *head, struct sk_buff *skb,
 	th2 = tcp_hdr(p);
 	flush = (__force int)(flags & TCP_FLAG_CWR);
 	flush |= (__force int)((flags ^ tcp_flag_word(th2)) &
-		  ~(TCP_FLAG_CWR | TCP_FLAG_FIN | TCP_FLAG_PSH));
+		  ~(TCP_FLAG_FIN | TCP_FLAG_PSH));
 	flush |= (__force int)(th->ack_seq ^ th2->ack_seq);
 	for (i = sizeof(*th); i < thlen; i += 4)
 		flush |= *(u32 *)((u8 *)th + i) ^
@@ -405,7 +405,7 @@  void tcp_gro_complete(struct sk_buff *skb)
 	shinfo->gso_segs = NAPI_GRO_CB(skb)->count;
 
 	if (th->cwr)
-		shinfo->gso_type |= SKB_GSO_TCP_ECN;
+		shinfo->gso_type |= SKB_GSO_TCP_ACCECN;
 }
 EXPORT_SYMBOL(tcp_gro_complete);