diff mbox series

[v2,net] net: udp: fix Fast/frag0 UDP GRO

Message ID 0eaG8xtbtKY1dEKCTKUBubGiC9QawGgB3tVZtNqVdY@cp4-web-030.plabs.ch (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series [v2,net] net: udp: fix Fast/frag0 UDP GRO | expand

Checks

Context Check Description
netdev/cover_letter success Link
netdev/fixes_present success Link
netdev/patch_count success Link
netdev/tree_selection success Clearly marked for net
netdev/subject_prefix success Link
netdev/source_inline success Was 0 now: 0
netdev/verify_signedoff success Link
netdev/module_param success Was 0 now: 0
netdev/build_32bit success Errors and warnings before: 8 this patch: 8
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/verify_fixes success Link
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 19 lines checked
netdev/build_allmodconfig_warn success Errors and warnings before: 8 this patch: 8
netdev/header_inline success Link
netdev/stable success Stable not CCed

Commit Message

Alexander Lobakin Nov. 9, 2020, 4:56 p.m. UTC
While testing UDP GSO fraglists forwarding through driver that uses
Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
iperf packets:

[ ID] Interval           Transfer     Bitrate         Jitter
[SUM]  0.0-40.0 sec  12106 datagrams received out-of-order

Simple switch to napi_gro_receive() any other method without frag0
shortcut completely resolved them.

I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
callback. While it's probably OK for non-frag0 paths (when all
headers or even the entire frame are already in skb->data), this
inline points to junk when using Fast GRO (napi_gro_frags() or
napi_gro_receive() with only Ethernet header in skb->data and all
the rest in shinfo->frags) and breaks GRO packet compilation and
the packet flow itself.
To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
are typically used. UDP even has an inline helper that makes use of
them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
to get rid of the out-of-order delivers.

Present since the introduction of plain UDP GRO in 5.0-rc1.

Since v1 [1]:
 - added a NULL pointer check for "uh" as suggested by Willem.

[1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch

Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
Signed-off-by: Alexander Lobakin <alobakin@pm.me>
---
 net/ipv4/udp_offload.c | 7 ++++++-
 1 file changed, 6 insertions(+), 1 deletion(-)

Comments

Willem de Bruijn Nov. 9, 2020, 5:08 p.m. UTC | #1
On Mon, Nov 9, 2020 at 11:57 AM Alexander Lobakin <alobakin@pm.me> wrote:
>
> While testing UDP GSO fraglists forwarding through driver that uses
> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
> iperf packets:
>
> [ ID] Interval           Transfer     Bitrate         Jitter
> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
>
> Simple switch to napi_gro_receive() any other method without frag0
> shortcut completely resolved them.
>
> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
> callback. While it's probably OK for non-frag0 paths (when all
> headers or even the entire frame are already in skb->data), this
> inline points to junk when using Fast GRO (napi_gro_frags() or
> napi_gro_receive() with only Ethernet header in skb->data and all
> the rest in shinfo->frags) and breaks GRO packet compilation and
> the packet flow itself.
> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
> are typically used. UDP even has an inline helper that makes use of
> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
> to get rid of the out-of-order delivers.
>
> Present since the introduction of plain UDP GRO in 5.0-rc1.
>
> Since v1 [1]:
>  - added a NULL pointer check for "uh" as suggested by Willem.
>
> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>
> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>

Acked-by: Willem de Bruijn <willemb@google.com>

Thanks
Eric Dumazet Nov. 9, 2020, 5:37 p.m. UTC | #2
On 11/9/20 5:56 PM, Alexander Lobakin wrote:
> While testing UDP GSO fraglists forwarding through driver that uses
> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
> iperf packets:
> 
> [ ID] Interval           Transfer     Bitrate         Jitter
> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
> 
> Simple switch to napi_gro_receive() any other method without frag0
> shortcut completely resolved them.
> 
> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
> callback. While it's probably OK for non-frag0 paths (when all
> headers or even the entire frame are already in skb->data), this
> inline points to junk when using Fast GRO (napi_gro_frags() or
> napi_gro_receive() with only Ethernet header in skb->data and all
> the rest in shinfo->frags) and breaks GRO packet compilation and
> the packet flow itself.
> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
> are typically used. UDP even has an inline helper that makes use of
> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
> to get rid of the out-of-order delivers.
> 
> Present since the introduction of plain UDP GRO in 5.0-rc1.
> 
> Since v1 [1]:
>  - added a NULL pointer check for "uh" as suggested by Willem.
> 
> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
> 
> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> ---
>  net/ipv4/udp_offload.c | 7 ++++++-
>  1 file changed, 6 insertions(+), 1 deletion(-)
> 
> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> index e67a66fbf27b..7f6bd221880a 100644
> --- a/net/ipv4/udp_offload.c
> +++ b/net/ipv4/udp_offload.c
> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>  					       struct sk_buff *skb)
>  {
> -	struct udphdr *uh = udp_hdr(skb);
> +	struct udphdr *uh = udp_gro_udphdr(skb);
>  	struct sk_buff *pp = NULL;
>  	struct udphdr *uh2;
>  	struct sk_buff *p;
>  	unsigned int ulen;
>  	int ret = 0;
>  
> +	if (unlikely(!uh)) {

How uh could be NULL here ?

My understanding is that udp_gro_receive() is called
only after udp4_gro_receive() or udp6_gro_receive()
validated that udp_gro_udphdr(skb) was not NULL.


> +		NAPI_GRO_CB(skb)->flush = 1;
> +		return NULL;
> +	}
> +
>  	/* requires non zero csum, for symmetry with GSO */
>  	if (!uh->check) {
>  		NAPI_GRO_CB(skb)->flush = 1;
> 

Why uh2 is left unchanged ?

    uh2 = udp_hdr(p);

...
Alexander Lobakin Nov. 9, 2020, 6:26 p.m. UTC | #3
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 9 Nov 2020 18:37:36 +0100

> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
>> While testing UDP GSO fraglists forwarding through driver that uses
>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
>> iperf packets:
>>
>> [ ID] Interval           Transfer     Bitrate         Jitter
>> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
>>
>> Simple switch to napi_gro_receive() any other method without frag0
>> shortcut completely resolved them.
>>
>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
>> callback. While it's probably OK for non-frag0 paths (when all
>> headers or even the entire frame are already in skb->data), this
>> inline points to junk when using Fast GRO (napi_gro_frags() or
>> napi_gro_receive() with only Ethernet header in skb->data and all
>> the rest in shinfo->frags) and breaks GRO packet compilation and
>> the packet flow itself.
>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
>> are typically used. UDP even has an inline helper that makes use of
>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
>> to get rid of the out-of-order delivers.
>>
>> Present since the introduction of plain UDP GRO in 5.0-rc1.
>>
>> Since v1 [1]:
>>  - added a NULL pointer check for "uh" as suggested by Willem.
>>
>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>>
>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
>> ---
>>  net/ipv4/udp_offload.c | 7 ++++++-
>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>
>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>> index e67a66fbf27b..7f6bd221880a 100644
>> --- a/net/ipv4/udp_offload.c
>> +++ b/net/ipv4/udp_offload.c
>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>>  					       struct sk_buff *skb)
>>  {
>> -	struct udphdr *uh = udp_hdr(skb);
>> +	struct udphdr *uh = udp_gro_udphdr(skb);
>>  	struct sk_buff *pp = NULL;
>>  	struct udphdr *uh2;
>>  	struct sk_buff *p;
>>  	unsigned int ulen;
>>  	int ret = 0;
>>
>> +	if (unlikely(!uh)) {
>
> How uh could be NULL here ?
>
> My understanding is that udp_gro_receive() is called
> only after udp4_gro_receive() or udp6_gro_receive()
> validated that udp_gro_udphdr(skb) was not NULL.

Right, but only after udp{4,6}_lib_lookup_skb() in certain cases.
I don't know for sure if their logic can actually edit skb->data,
so it's better to check from my point of view.

>> +		NAPI_GRO_CB(skb)->flush = 1;
>> +		return NULL;
>> +	}
>> +
>>  	/* requires non zero csum, for symmetry with GSO */
>>  	if (!uh->check) {
>>  		NAPI_GRO_CB(skb)->flush = 1;
>>
>
>Why uh2 is left unchanged ?
>
>    uh2 = udp_hdr(p);
>
>...

Packets from list_head *head have their headers already pulled to
skb->data in 100% cases, no need to change anything here.
I double-checked that udp_hdr(p) always returns the same pointer as
"p->data + network offset" and left it as it is.

Thanks,
Al
Willem de Bruijn Nov. 9, 2020, 6:28 p.m. UTC | #4
On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
> > While testing UDP GSO fraglists forwarding through driver that uses
> > Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
> > iperf packets:
> >
> > [ ID] Interval           Transfer     Bitrate         Jitter
> > [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
> >
> > Simple switch to napi_gro_receive() any other method without frag0
> > shortcut completely resolved them.
> >
> > I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
> > callback. While it's probably OK for non-frag0 paths (when all
> > headers or even the entire frame are already in skb->data), this
> > inline points to junk when using Fast GRO (napi_gro_frags() or
> > napi_gro_receive() with only Ethernet header in skb->data and all
> > the rest in shinfo->frags) and breaks GRO packet compilation and
> > the packet flow itself.
> > To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
> > are typically used. UDP even has an inline helper that makes use of
> > them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
> > to get rid of the out-of-order delivers.
> >
> > Present since the introduction of plain UDP GRO in 5.0-rc1.
> >
> > Since v1 [1]:
> >  - added a NULL pointer check for "uh" as suggested by Willem.
> >
> > [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
> >
> > Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
> > Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> > ---
> >  net/ipv4/udp_offload.c | 7 ++++++-
> >  1 file changed, 6 insertions(+), 1 deletion(-)
> >
> > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> > index e67a66fbf27b..7f6bd221880a 100644
> > --- a/net/ipv4/udp_offload.c
> > +++ b/net/ipv4/udp_offload.c
> > @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
> >  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> >                                              struct sk_buff *skb)
> >  {
> > -     struct udphdr *uh = udp_hdr(skb);
> > +     struct udphdr *uh = udp_gro_udphdr(skb);
> >       struct sk_buff *pp = NULL;
> >       struct udphdr *uh2;
> >       struct sk_buff *p;
> >       unsigned int ulen;
> >       int ret = 0;
> >
> > +     if (unlikely(!uh)) {
>
> How uh could be NULL here ?
>
> My understanding is that udp_gro_receive() is called
> only after udp4_gro_receive() or udp6_gro_receive()
> validated that udp_gro_udphdr(skb) was not NULL.

Oh indeed. This has already been checked before.

> > +             NAPI_GRO_CB(skb)->flush = 1;
> > +             return NULL;
> > +     }
> > +
> >       /* requires non zero csum, for symmetry with GSO */
> >       if (!uh->check) {
> >               NAPI_GRO_CB(skb)->flush = 1;
> >
>
> Why uh2 is left unchanged ?
>
>     uh2 = udp_hdr(p);

Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0
optimization to worry about for packets on the list.


> ...
>
Eric Dumazet Nov. 9, 2020, 6:54 p.m. UTC | #5
On 11/9/20 7:28 PM, Willem de Bruijn wrote:
> On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
>>> While testing UDP GSO fraglists forwarding through driver that uses
>>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
>>> iperf packets:
>>>
>>> [ ID] Interval           Transfer     Bitrate         Jitter
>>> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
>>>
>>> Simple switch to napi_gro_receive() any other method without frag0
>>> shortcut completely resolved them.
>>>
>>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
>>> callback. While it's probably OK for non-frag0 paths (when all
>>> headers or even the entire frame are already in skb->data), this
>>> inline points to junk when using Fast GRO (napi_gro_frags() or
>>> napi_gro_receive() with only Ethernet header in skb->data and all
>>> the rest in shinfo->frags) and breaks GRO packet compilation and
>>> the packet flow itself.
>>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
>>> are typically used. UDP even has an inline helper that makes use of
>>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
>>> to get rid of the out-of-order delivers.
>>>
>>> Present since the introduction of plain UDP GRO in 5.0-rc1.
>>>
>>> Since v1 [1]:
>>>  - added a NULL pointer check for "uh" as suggested by Willem.
>>>
>>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>>>
>>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
>>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
>>> ---
>>>  net/ipv4/udp_offload.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>> index e67a66fbf27b..7f6bd221880a 100644
>>> --- a/net/ipv4/udp_offload.c
>>> +++ b/net/ipv4/udp_offload.c
>>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>>>                                              struct sk_buff *skb)
>>>  {
>>> -     struct udphdr *uh = udp_hdr(skb);
>>> +     struct udphdr *uh = udp_gro_udphdr(skb);
>>>       struct sk_buff *pp = NULL;
>>>       struct udphdr *uh2;
>>>       struct sk_buff *p;
>>>       unsigned int ulen;
>>>       int ret = 0;
>>>
>>> +     if (unlikely(!uh)) {
>>
>> How uh could be NULL here ?
>>
>> My understanding is that udp_gro_receive() is called
>> only after udp4_gro_receive() or udp6_gro_receive()
>> validated that udp_gro_udphdr(skb) was not NULL.
> 
> Oh indeed. This has already been checked before.
> 
>>> +             NAPI_GRO_CB(skb)->flush = 1;
>>> +             return NULL;
>>> +     }
>>> +
>>>       /* requires non zero csum, for symmetry with GSO */
>>>       if (!uh->check) {
>>>               NAPI_GRO_CB(skb)->flush = 1;
>>>
>>
>> Why uh2 is left unchanged ?
>>
>>     uh2 = udp_hdr(p);
> 
> Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0
> optimization to worry about for packets on the list.

My feeling was that tcp_gro_receive() is terminal in the GRO stack.

While UDP could be encapsulated in UDP :)

I guess we do not support this yet.

Years ago we made sure to propagate the current header offset into GRO stack
(when we added SIT/IPIP/GRE support to GRO)
299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support")


udp_hdr() is using transport header, which is unique for skb "on the list"
Willem de Bruijn Nov. 9, 2020, 7:06 p.m. UTC | #6
On Mon, Nov 9, 2020 at 1:54 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>
>
>
> On 11/9/20 7:28 PM, Willem de Bruijn wrote:
> > On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
> >>
> >>
> >>
> >> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
> >>> While testing UDP GSO fraglists forwarding through driver that uses
> >>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
> >>> iperf packets:
> >>>
> >>> [ ID] Interval           Transfer     Bitrate         Jitter
> >>> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
> >>>
> >>> Simple switch to napi_gro_receive() any other method without frag0
> >>> shortcut completely resolved them.
> >>>
> >>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
> >>> callback. While it's probably OK for non-frag0 paths (when all
> >>> headers or even the entire frame are already in skb->data), this
> >>> inline points to junk when using Fast GRO (napi_gro_frags() or
> >>> napi_gro_receive() with only Ethernet header in skb->data and all
> >>> the rest in shinfo->frags) and breaks GRO packet compilation and
> >>> the packet flow itself.
> >>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
> >>> are typically used. UDP even has an inline helper that makes use of
> >>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
> >>> to get rid of the out-of-order delivers.
> >>>
> >>> Present since the introduction of plain UDP GRO in 5.0-rc1.
> >>>
> >>> Since v1 [1]:
> >>>  - added a NULL pointer check for "uh" as suggested by Willem.
> >>>
> >>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
> >>>
> >>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
> >>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
> >>> ---
> >>>  net/ipv4/udp_offload.c | 7 ++++++-
> >>>  1 file changed, 6 insertions(+), 1 deletion(-)
> >>>
> >>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
> >>> index e67a66fbf27b..7f6bd221880a 100644
> >>> --- a/net/ipv4/udp_offload.c
> >>> +++ b/net/ipv4/udp_offload.c
> >>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
> >>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
> >>>                                              struct sk_buff *skb)
> >>>  {
> >>> -     struct udphdr *uh = udp_hdr(skb);
> >>> +     struct udphdr *uh = udp_gro_udphdr(skb);
> >>>       struct sk_buff *pp = NULL;
> >>>       struct udphdr *uh2;
> >>>       struct sk_buff *p;
> >>>       unsigned int ulen;
> >>>       int ret = 0;
> >>>
> >>> +     if (unlikely(!uh)) {
> >>
> >> How uh could be NULL here ?
> >>
> >> My understanding is that udp_gro_receive() is called
> >> only after udp4_gro_receive() or udp6_gro_receive()
> >> validated that udp_gro_udphdr(skb) was not NULL.
> >
> > Oh indeed. This has already been checked before.
> >
> >>> +             NAPI_GRO_CB(skb)->flush = 1;
> >>> +             return NULL;
> >>> +     }
> >>> +
> >>>       /* requires non zero csum, for symmetry with GSO */
> >>>       if (!uh->check) {
> >>>               NAPI_GRO_CB(skb)->flush = 1;
> >>>
> >>
> >> Why uh2 is left unchanged ?
> >>
> >>     uh2 = udp_hdr(p);
> >
> > Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0
> > optimization to worry about for packets on the list.
>
> My feeling was that tcp_gro_receive() is terminal in the GRO stack.
>
> While UDP could be encapsulated in UDP :)
>
> I guess we do not support this yet.
>
> Years ago we made sure to propagate the current header offset into GRO stack
> (when we added SIT/IPIP/GRE support to GRO)
> 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support")

On which note, and Alexander's mention of udp4_lib_lookup_skb(), that
performs a standard ip_hdr on the possibly frag0 optimized incoming
packet:

struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
                                 __be16 sport, __be16 dport)
{
        const struct iphdr *iph = ip_hdr(skb);

        return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport,
                                 iph->daddr, dport, inet_iif(skb),
                                 inet_sdif(skb), &udp_table, NULL);
}

This should use skb_gro_header_.. too, then?
Eric Dumazet Nov. 9, 2020, 7:29 p.m. UTC | #7
On 11/9/20 7:26 PM, Alexander Lobakin wrote:
> From: Eric Dumazet <eric.dumazet@gmail.com>
> Date: Mon, 9 Nov 2020 18:37:36 +0100
> 
>> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
>>> While testing UDP GSO fraglists forwarding through driver that uses
>>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
>>> iperf packets:
>>>
.
>>>
>>> Since v1 [1]:
>>>  - added a NULL pointer check for "uh" as suggested by Willem.
>>>
>>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>>>
>>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
>>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
>>> ---
>>>  net/ipv4/udp_offload.c | 7 ++++++-
>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>> index e67a66fbf27b..7f6bd221880a 100644
>>> --- a/net/ipv4/udp_offload.c
>>> +++ b/net/ipv4/udp_offload.c
>>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>>>  					       struct sk_buff *skb)
>>>  {
>>> -	struct udphdr *uh = udp_hdr(skb);
>>> +	struct udphdr *uh = udp_gro_udphdr(skb);
>>>  	struct sk_buff *pp = NULL;
>>>  	struct udphdr *uh2;
>>>  	struct sk_buff *p;
>>>  	unsigned int ulen;
>>>  	int ret = 0;
>>>
>>> +	if (unlikely(!uh)) {
>>
>> How uh could be NULL here ?
>>
>> My understanding is that udp_gro_receive() is called
>> only after udp4_gro_receive() or udp6_gro_receive()
>> validated that udp_gro_udphdr(skb) was not NULL.
> 
> Right, but only after udp{4,6}_lib_lookup_skb() in certain cases.
> I don't know for sure if their logic can actually edit skb->data,
> so it's better to check from my point of view.

Not really. This would send a wrong signal to readers of this code.

I do not think these functions can mess with GRO internals.

This would be a nightmare, GRO is already way too complex.

In fact these functions should use a const qualifier
for their " struct sk_buff *skb" argument to prevent future bugs.

I will test and submit this patch :

diff --git a/include/net/ip.h b/include/net/ip.h
index 2d6b985d11ccaa75827b3a15ac3f898d7a193242..e20874059f826eb0f9e899aed556bfbc9c9d71e8 100644
--- a/include/net/ip.h
+++ b/include/net/ip.h
@@ -99,7 +99,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
 #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb))
 
 /* return enslaved device index if relevant */
-static inline int inet_sdif(struct sk_buff *skb)
+static inline int inet_sdif(const struct sk_buff *skb)
 {
 #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
        if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
diff --git a/include/net/udp.h b/include/net/udp.h
index 295d52a73598277dc5071536f777d1a87e7df1d1..877832bed4713a011a514a2f6f522728c8c89e20 100644
--- a/include/net/udp.h
+++ b/include/net/udp.h
@@ -164,7 +164,7 @@ static inline void udp_csum_pull_header(struct sk_buff *skb)
        UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr);
 }
 
-typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
+typedef struct sock *(*udp_lookup_t)(const struct sk_buff *skb, __be16 sport,
                                     __be16 dport);
 
 INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp4_gro_receive(struct list_head *,
@@ -313,7 +313,7 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
 struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
                               __be32 daddr, __be16 dport, int dif, int sdif,
                               struct udp_table *tbl, struct sk_buff *skb);
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
                                 __be16 sport, __be16 dport);
 struct sock *udp6_lib_lookup(struct net *net,
                             const struct in6_addr *saddr, __be16 sport,
@@ -324,7 +324,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
                               const struct in6_addr *daddr, __be16 dport,
                               int dif, int sdif, struct udp_table *tbl,
                               struct sk_buff *skb);
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
                                 __be16 sport, __be16 dport);
 
 /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
index 09f0a23d1a01741d335ce45f25fe70a4e00698c7..8b8dadfea6c9854e6bfaa0fabcb774db39976da3 100644
--- a/net/ipv4/udp.c
+++ b/net/ipv4/udp.c
@@ -541,7 +541,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
                                 inet_sdif(skb), udptable, skb);
 }
 
-struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
                                 __be16 sport, __be16 dport)
 {
        const struct iphdr *iph = ip_hdr(skb);
diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
index 29d9691359b9c49ccb56a11f79e3658b1a76700d..adfe9ca6f516612b5aad6d6c654c7da1dd56a50e 100644
--- a/net/ipv6/udp.c
+++ b/net/ipv6/udp.c
@@ -276,7 +276,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
                                 inet6_sdif(skb), udptable, skb);
 }
 
-struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
+struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
                                 __be16 sport, __be16 dport)
 {
        const struct ipv6hdr *iph = ipv6_hdr(skb);
Alexander Lobakin Nov. 9, 2020, 7:48 p.m. UTC | #8
From: Willem de Bruijn <willemdebruijn.kernel@gmail.com>
Date: Mon, 9 Nov 2020 14:06:07 -0500

> On Mon, Nov 9, 2020 at 1:54 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>
>>
>>
>> On 11/9/20 7:28 PM, Willem de Bruijn wrote:
>>> On Mon, Nov 9, 2020 at 12:37 PM Eric Dumazet <eric.dumazet@gmail.com> wrote:
>>>>
>>>>
>>>>
>>>> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
>>>>> While testing UDP GSO fraglists forwarding through driver that uses
>>>>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
>>>>> iperf packets:
>>>>>
>>>>> [ ID] Interval           Transfer     Bitrate         Jitter
>>>>> [SUM]  0.0-40.0 sec  12106 datagrams received out-of-order
>>>>>
>>>>> Simple switch to napi_gro_receive() any other method without frag0
>>>>> shortcut completely resolved them.
>>>>>
>>>>> I've found that UDP GRO uses udp_hdr(skb) in its .gro_receive()
>>>>> callback. While it's probably OK for non-frag0 paths (when all
>>>>> headers or even the entire frame are already in skb->data), this
>>>>> inline points to junk when using Fast GRO (napi_gro_frags() or
>>>>> napi_gro_receive() with only Ethernet header in skb->data and all
>>>>> the rest in shinfo->frags) and breaks GRO packet compilation and
>>>>> the packet flow itself.
>>>>> To support both modes, skb_gro_header_fast() + skb_gro_header_slow()
>>>>> are typically used. UDP even has an inline helper that makes use of
>>>>> them, udp_gro_udphdr(). Use that instead of troublemaking udp_hdr()
>>>>> to get rid of the out-of-order delivers.
>>>>>
>>>>> Present since the introduction of plain UDP GRO in 5.0-rc1.
>>>>>
>>>>> Since v1 [1]:
>>>>>  - added a NULL pointer check for "uh" as suggested by Willem.
>>>>>
>>>>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>>>>>
>>>>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
>>>>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
>>>>> ---
>>>>>  net/ipv4/udp_offload.c | 7 ++++++-
>>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>>>> index e67a66fbf27b..7f6bd221880a 100644
>>>>> --- a/net/ipv4/udp_offload.c
>>>>> +++ b/net/ipv4/udp_offload.c
>>>>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>>>>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>>>>>                                              struct sk_buff *skb)
>>>>>  {
>>>>> -     struct udphdr *uh = udp_hdr(skb);
>>>>> +     struct udphdr *uh = udp_gro_udphdr(skb);
>>>>>       struct sk_buff *pp = NULL;
>>>>>       struct udphdr *uh2;
>>>>>       struct sk_buff *p;
>>>>>       unsigned int ulen;
>>>>>       int ret = 0;
>>>>>
>>>>> +     if (unlikely(!uh)) {
>>>>
>>>> How uh could be NULL here ?
>>>>
>>>> My understanding is that udp_gro_receive() is called
>>>> only after udp4_gro_receive() or udp6_gro_receive()
>>>> validated that udp_gro_udphdr(skb) was not NULL.
>>>
>>> Oh indeed. This has already been checked before.
>>>
>>>>> +             NAPI_GRO_CB(skb)->flush = 1;
>>>>> +             return NULL;
>>>>> +     }
>>>>> +
>>>>>       /* requires non zero csum, for symmetry with GSO */
>>>>>       if (!uh->check) {
>>>>>               NAPI_GRO_CB(skb)->flush = 1;
>>>>>
>>>>
>>>> Why uh2 is left unchanged ?
>>>>
>>>>     uh2 = udp_hdr(p);
>>>
>>> Isn't that the same as th2 = tcp_hdr(p) in tcp_gro_receive? no frag0
>>> optimization to worry about for packets on the list.
>>
>> My feeling was that tcp_gro_receive() is terminal in the GRO stack.
>>
>> While UDP could be encapsulated in UDP :)
>>
>> I guess we do not support this yet.
>>
>> Years ago we made sure to propagate the current header offset into GRO stack
>> (when we added SIT/IPIP/GRE support to GRO)
>> 299603e8370a93dd5d8e8d800f0dff1ce2c53d36 ("net-gro: Prepare GRO stack for the upcoming tunneling support")
>
> On which note, and Alexander's mention of udp4_lib_lookup_skb(), that
> performs a standard ip_hdr on the possibly frag0 optimized incoming
> packet:
>
> struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
>                                  __be16 sport, __be16 dport)
> {
>         const struct iphdr *iph = ip_hdr(skb);
>
>         return __udp4_lib_lookup(dev_net(skb->dev), iph->saddr, sport,
>                                  iph->daddr, dport, inet_iif(skb),
>                                  inet_sdif(skb), &udp_table, NULL);
> }
>
> This should use skb_gro_header_.. too, then?

Well, both of udp{4,6}_lib_lookup_skb() and a couple of neighbour
functions use ip_hdr(skb). I didn't check this path before as my
test case was NAT/forwarding with no sk.
I suspect they should use skb_gro_network_header() to obtain an IP
header. I'll fix them in v3, thanks for pointing this out!

Al
Alexander Lobakin Nov. 9, 2020, 7:56 p.m. UTC | #9
From: Eric Dumazet <eric.dumazet@gmail.com>
Date: Mon, 9 Nov 2020 20:29:03 +0100

> On 11/9/20 7:26 PM, Alexander Lobakin wrote:
>> From: Eric Dumazet <eric.dumazet@gmail.com>
>> Date: Mon, 9 Nov 2020 18:37:36 +0100
>> 
>>> On 11/9/20 5:56 PM, Alexander Lobakin wrote:
>>>> While testing UDP GSO fraglists forwarding through driver that uses
>>>> Fast GRO (via napi_gro_frags()), I was observing lots of out-of-order
>>>> iperf packets:
>>>>
>.
>>>>
>>>> Since v1 [1]:
>>>>  - added a NULL pointer check for "uh" as suggested by Willem.
>>>>
>>>> [1] https://lore.kernel.org/netdev/YazU6GEzBdpyZMDMwJirxDX7B4sualpDG68ADZYvJI@cp4-web-034.plabs.ch
>>>>
>>>> Fixes: e20cf8d3f1f7 ("udp: implement GRO for plain UDP sockets.")
>>>> Signed-off-by: Alexander Lobakin <alobakin@pm.me>
>>>> ---
>>>>  net/ipv4/udp_offload.c | 7 ++++++-
>>>>  1 file changed, 6 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
>>>> index e67a66fbf27b..7f6bd221880a 100644
>>>> --- a/net/ipv4/udp_offload.c
>>>> +++ b/net/ipv4/udp_offload.c
>>>> @@ -366,13 +366,18 @@ static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
>>>>  static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
>>>>  					       struct sk_buff *skb)
>>>>  {
>>>> -	struct udphdr *uh = udp_hdr(skb);
>>>> +	struct udphdr *uh = udp_gro_udphdr(skb);
>>>>  	struct sk_buff *pp = NULL;
>>>>  	struct udphdr *uh2;
>>>>  	struct sk_buff *p;
>>>>  	unsigned int ulen;
>>>>  	int ret = 0;
>>>>
>>>> +	if (unlikely(!uh)) {
>>>
>>> How uh could be NULL here ?
>>>
>>> My understanding is that udp_gro_receive() is called
>>> only after udp4_gro_receive() or udp6_gro_receive()
>>> validated that udp_gro_udphdr(skb) was not NULL.
>> 
>> Right, but only after udp{4,6}_lib_lookup_skb() in certain cases.
>> I don't know for sure if their logic can actually edit skb->data,
>> so it's better to check from my point of view.
>
> Not really. This would send a wrong signal to readers of this code.
>
> I do not think these functions can mess with GRO internals.
>
> This would be a nightmare, GRO is already way too complex.
>
> In fact these functions should use a const qualifier
> for their " struct sk_buff *skb" argument to prevent future bugs.

Agree, you're right. Lack of const qualifiers gave me a doubt that
these functions can't edit skbs. They really should use it if they
really can't.
I'll omit the check in v3.

> I will test and submit this patch :

That would be a very nice one, thanks.

> diff --git a/include/net/ip.h b/include/net/ip.h
> index 2d6b985d11ccaa75827b3a15ac3f898d7a193242..e20874059f826eb0f9e899aed556bfbc9c9d71e8 100644
> --- a/include/net/ip.h
> +++ b/include/net/ip.h
> @@ -99,7 +99,7 @@ static inline void ipcm_init_sk(struct ipcm_cookie *ipcm,
>  #define PKTINFO_SKB_CB(skb) ((struct in_pktinfo *)((skb)->cb))
>  
>  /* return enslaved device index if relevant */
> -static inline int inet_sdif(struct sk_buff *skb)
> +static inline int inet_sdif(const struct sk_buff *skb)
>  {
>  #if IS_ENABLED(CONFIG_NET_L3_MASTER_DEV)
>         if (skb && ipv4_l3mdev_skb(IPCB(skb)->flags))
> diff --git a/include/net/udp.h b/include/net/udp.h
> index 295d52a73598277dc5071536f777d1a87e7df1d1..877832bed4713a011a514a2f6f522728c8c89e20 100644
> --- a/include/net/udp.h
> +++ b/include/net/udp.h
> @@ -164,7 +164,7 @@ static inline void udp_csum_pull_header(struct sk_buff *skb)
>         UDP_SKB_CB(skb)->cscov -= sizeof(struct udphdr);
>  }
>  
> -typedef struct sock *(*udp_lookup_t)(struct sk_buff *skb, __be16 sport,
> +typedef struct sock *(*udp_lookup_t)(const struct sk_buff *skb, __be16 sport,
>                                      __be16 dport);
>  
>  INDIRECT_CALLABLE_DECLARE(struct sk_buff *udp4_gro_receive(struct list_head *,
> @@ -313,7 +313,7 @@ struct sock *udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
>  struct sock *__udp4_lib_lookup(struct net *net, __be32 saddr, __be16 sport,
>                                __be32 daddr, __be16 dport, int dif, int sdif,
>                                struct udp_table *tbl, struct sk_buff *skb);
> -struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
> +struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
>                                  __be16 sport, __be16 dport);
>  struct sock *udp6_lib_lookup(struct net *net,
>                              const struct in6_addr *saddr, __be16 sport,
> @@ -324,7 +324,7 @@ struct sock *__udp6_lib_lookup(struct net *net,
>                                const struct in6_addr *daddr, __be16 dport,
>                                int dif, int sdif, struct udp_table *tbl,
>                                struct sk_buff *skb);
> -struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
> +struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
>                                  __be16 sport, __be16 dport);
>  
>  /* UDP uses skb->dev_scratch to cache as much information as possible and avoid
> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c
> index 09f0a23d1a01741d335ce45f25fe70a4e00698c7..8b8dadfea6c9854e6bfaa0fabcb774db39976da3 100644
> --- a/net/ipv4/udp.c
> +++ b/net/ipv4/udp.c
> @@ -541,7 +541,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb,
>                                  inet_sdif(skb), udptable, skb);
>  }
>  
> -struct sock *udp4_lib_lookup_skb(struct sk_buff *skb,
> +struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb,
>                                  __be16 sport, __be16 dport)
>  {
>         const struct iphdr *iph = ip_hdr(skb);
> diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c
> index 29d9691359b9c49ccb56a11f79e3658b1a76700d..adfe9ca6f516612b5aad6d6c654c7da1dd56a50e 100644
> --- a/net/ipv6/udp.c
> +++ b/net/ipv6/udp.c
> @@ -276,7 +276,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb,
>                                  inet6_sdif(skb), udptable, skb);
>  }
>  
> -struct sock *udp6_lib_lookup_skb(struct sk_buff *skb,
> +struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb,
>                                  __be16 sport, __be16 dport)
>  {
>         const struct ipv6hdr *iph = ipv6_hdr(skb);
>

Al
diff mbox series

Patch

diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c
index e67a66fbf27b..7f6bd221880a 100644
--- a/net/ipv4/udp_offload.c
+++ b/net/ipv4/udp_offload.c
@@ -366,13 +366,18 @@  static struct sk_buff *udp4_ufo_fragment(struct sk_buff *skb,
 static struct sk_buff *udp_gro_receive_segment(struct list_head *head,
 					       struct sk_buff *skb)
 {
-	struct udphdr *uh = udp_hdr(skb);
+	struct udphdr *uh = udp_gro_udphdr(skb);
 	struct sk_buff *pp = NULL;
 	struct udphdr *uh2;
 	struct sk_buff *p;
 	unsigned int ulen;
 	int ret = 0;
 
+	if (unlikely(!uh)) {
+		NAPI_GRO_CB(skb)->flush = 1;
+		return NULL;
+	}
+
 	/* requires non zero csum, for symmetry with GSO */
 	if (!uh->check) {
 		NAPI_GRO_CB(skb)->flush = 1;