Message ID | 20240424163045.123528-2-richardbgobert@gmail.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net: gro: add flush/flush_id checks and fix wrong offset in udp | expand |
Richard Gobert wrote: > Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: > additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the > complete phase of gro. The functions always return skb->network_header, > which in the case of encapsulated packets at the gro complete phase, is > always set to the innermost L3 of the packet. That means that calling > {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in > gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ > L3/L4 may return an unexpected value. > > This incorrect usage leads to a bug in GRO's UDP socket lookup. > udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These > *_hdr functions return network_header which will point to the innermost L3, > resulting in the wrong offset being used in __udp{4,6}_lib_lookup with > encapsulated packets. > > This patch adds network_offset and inner_network_offset to napi_gro_cb, and > makes sure both are set correctly. > > To fix the issue, network_offsets union is used inside napi_gro_cb, in > which both the outer and the inner network offsets are saved. > > Reproduction example: > > Endpoint configuration example (fou + local address bind) > > # ip fou add port 6666 ipproto 4 > # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip > # ip link set tun1 up > # ip a add 1.1.1.2/24 dev tun1 > > Netperf TCP_STREAM result on net-next before patch is applied: > > net-next main, GRO enabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.28 2.37 > > net-next main, GRO disabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.01 2745.06 > > patch applied, GRO enabled: > $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 > Recv Send Send > Socket Socket Message Elapsed > Size Size Size Time Throughput > bytes bytes bytes secs. 10^6bits/sec > > 131072 16384 16384 5.01 2877.38 > > Fixes: 57c67ff4bd92 ("udp: additional GRO support") > Signed-off-by: Richard Gobert <richardbgobert@gmail.com> > --- > include/net/gro.h | 18 ++++++++++++++++-- > net/8021q/vlan_core.c | 2 ++ > net/core/gro.c | 1 + > net/ipv4/af_inet.c | 5 +---- > net/ipv4/tcp_offload.c | 3 ++- > net/ipv4/udp.c | 3 ++- > net/ipv4/udp_offload.c | 3 ++- > net/ipv6/ip6_offload.c | 6 +++--- > net/ipv6/tcpv6_offload.c | 3 ++- > net/ipv6/udp.c | 3 ++- > net/ipv6/udp_offload.c | 3 ++- > 11 files changed, 35 insertions(+), 15 deletions(-) > > diff --git a/include/net/gro.h b/include/net/gro.h > index 50f1e403dbbb..1faff23b66e8 100644 > --- a/include/net/gro.h > +++ b/include/net/gro.h > @@ -87,6 +87,15 @@ struct napi_gro_cb { > > /* used to support CHECKSUM_COMPLETE for tunneling protocols */ > __wsum csum; > + > + /* L3 offsets */ > + union { > + struct { > + u16 network_offset; > + u16 inner_network_offset; > + }; > + u16 network_offsets[2]; > + }; > }; > > #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) > @@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen, > return ptr; > } > > +static inline int skb_gro_network_offset(const struct sk_buff *skb) > +{ > + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; > +} > + > static inline void *skb_gro_network_header(const struct sk_buff *skb) > { > if (skb_gro_may_pull(skb, skb_gro_offset(skb))) > - return skb_gro_header_fast(skb, skb_network_offset(skb)); > + return skb_gro_header_fast(skb, skb_gro_network_offset(skb)); > > - return skb_network_header(skb); > + return skb->data + skb_gro_network_offset(skb); > } > > static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb, > diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c > index f00158234505..9404dd551dfd 100644 > --- a/net/8021q/vlan_core.c > +++ b/net/8021q/vlan_core.c > @@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head, > if (unlikely(!vhdr)) > goto out; > > + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen; > + > type = vhdr->h_vlan_encapsulated_proto; > > ptype = gro_find_receive_by_type(type); > diff --git a/net/core/gro.c b/net/core/gro.c > index 83f35d99a682..c7901253a1a8 100644 > --- a/net/core/gro.c > +++ b/net/core/gro.c > @@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff) > const skb_frag_t *frag0; > unsigned int headlen; > > + NAPI_GRO_CB(skb)->network_offset = 0; > NAPI_GRO_CB(skb)->data_offset = 0; > headlen = skb_headlen(skb); > NAPI_GRO_CB(skb)->frag0 = skb->data; > diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c > index 55bd72997b31..2daf635ab99e 100644 > --- a/net/ipv4/af_inet.c > +++ b/net/ipv4/af_inet.c > @@ -1568,10 +1568,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) > > NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); > NAPI_GRO_CB(skb)->flush |= flush; > - skb_set_network_header(skb, off); > - /* The above will be needed by the transport layer if there is one > - * immediately following this IP hdr. > - */ > + NAPI_GRO_CB(skb)->inner_network_offset = off; > > /* Note : No need to call skb_gro_postpull_rcsum() here, > * as we already checked checksum over ipv4 header was 0 > diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c > index ebe4722bb020..be8ddf6da88c 100644 > --- a/net/ipv4/tcp_offload.c > +++ b/net/ipv4/tcp_offload.c > @@ -332,7 +332,8 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) > > INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff) > { > - const struct iphdr *iph = ip_hdr(skb); > + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; > + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); > struct tcphdr *th = tcp_hdr(skb); > > th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr, > diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c > index c02bf011d4a6..1399fce82b3f 100644 > --- a/net/ipv4/udp.c > +++ b/net/ipv4/udp.c > @@ -532,7 +532,8 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, > struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, > __be16 sport, __be16 dport) > { > - const struct iphdr *iph = ip_hdr(skb); > + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; > + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); > struct net *net = dev_net(skb->dev); > int iif, sdif; > > diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c > index 3498dd1d0694..fd29d21d579c 100644 > --- a/net/ipv4/udp_offload.c > +++ b/net/ipv4/udp_offload.c > @@ -718,7 +718,8 @@ EXPORT_SYMBOL(udp_gro_complete); > > INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) > { > - const struct iphdr *iph = ip_hdr(skb); > + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; > + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); > struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); > > /* do fraglist only if there is no outer UDP encap (or we already processed it) */ > diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c > index b41e35af69ea..5d6b875a4638 100644 > --- a/net/ipv6/ip6_offload.c > +++ b/net/ipv6/ip6_offload.c > @@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto) > off += len; > } > > - skb_gro_pull(skb, off - skb_network_offset(skb)); > + skb_gro_pull(skb, off - skb_gro_network_offset(skb)); > return proto; > } > > @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > if (unlikely(!iph)) > goto out; > > - skb_set_network_header(skb, off); > + NAPI_GRO_CB(skb)->inner_network_offset = off; > > flush += ntohs(iph->payload_len) != skb->len - hlen; > > @@ -259,7 +259,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, > NAPI_GRO_CB(skb)->proto = proto; > > flush--; > - nlen = skb_network_header_len(skb); > + nlen = skb_gro_offset(skb) - off; > > list_for_each_entry(p, head, list) { > const struct ipv6hdr *iph2; > diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c > index 4b07d1e6c952..e70d60e0f86f 100644 > --- a/net/ipv6/tcpv6_offload.c > +++ b/net/ipv6/tcpv6_offload.c > @@ -29,7 +29,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) > > INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) > { > - const struct ipv6hdr *iph = ipv6_hdr(skb); > + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; > + const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); > struct tcphdr *th = tcp_hdr(skb); > Only udp code is affected, as only that can be used as tunnel. For bug fixes, let's try to avoid touching other code. Also that vlan. As a minimal patch all that is needed is the following, right? - add the fields - store in inet_gro_receive + ipv6_gro_receive - read in udp[46]_gro_complete and udp[46]_lib_lookup_skb +++ b/include/net/gro.h @@ -87,6 +87,8 @@ struct napi_gro_cb { /* used to support CHECKSUM_COMPLETE for tunneling protocols */ __wsum csum; + + u16 network_offsets[2]; }; #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) @@ -180,6 +182,11 @@ static inline void *skb_gro_network_header(const struct sk_buff *skb) return skb_network_header(skb); } +static inline void *skb_gro_complete_network_header(const struct sk_buff *skb) +{ + return skb->data + NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; +} + diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index a7cfeda28bb2..4294efdd5613 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1569,6 +1569,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); NAPI_GRO_CB(skb)->flush |= flush; + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; skb_set_network_header(skb, off); /* The above will be needed by the transport layer if there is one * immediately following this IP hdr. diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index 944922172c98..1469e19bafd5 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -543,7 +543,7 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport) { - const struct iphdr *iph = ip_hdr(skb); + const struct iphdr *iph = skb_gro_complete_network_header(skb); struct net *net = dev_net(skb->dev); int iif, sdif; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 3498dd1d0694..e93ddd42a091 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -718,7 +718,7 @@ EXPORT_SYMBOL(udp_gro_complete); INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) { - const struct iphdr *iph = ip_hdr(skb); + const struct iphdr *iph = skb_gro_complete_network_header(skb); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); /* do fraglist only if there is no outer UDP encap (or we already processed it) */ diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index b41e35af69ea..5033ebfd5e92 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -236,6 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, if (unlikely(!iph)) goto out; + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = off; skb_set_network_header(skb, off); flush += ntohs(iph->payload_len) != skb->len - hlen; diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 9914e73f4785..c012f6c41444 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -285,7 +285,7 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb, struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport) { - const struct ipv6hdr *iph = ipv6_hdr(skb); + const struct ipv6hdr *iph = skb_gro_complete_network_header(skb); struct net *net = dev_net(skb->dev); int iif, sdif; diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index bbd347de00b4..274dd2983216 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -164,7 +164,7 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff) { - const struct ipv6hdr *ipv6h = ipv6_hdr(skb); + const struct ipv6hdr *ipv6h = skb_gro_complete_network_header(skb); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); The part in udp_gro_complete that clears encap_mark and sets encapsulation is definitely something to watch out for.
Willem de Bruijn wrote: > Richard Gobert wrote: >> Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: >> additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the >> complete phase of gro. The functions always return skb->network_header, >> which in the case of encapsulated packets at the gro complete phase, is >> always set to the innermost L3 of the packet. That means that calling >> {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in >> gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ >> L3/L4 may return an unexpected value. >> >> This incorrect usage leads to a bug in GRO's UDP socket lookup. >> udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These >> *_hdr functions return network_header which will point to the innermost L3, >> resulting in the wrong offset being used in __udp{4,6}_lib_lookup with >> encapsulated packets. >> >> This patch adds network_offset and inner_network_offset to napi_gro_cb, and >> makes sure both are set correctly. >> >> To fix the issue, network_offsets union is used inside napi_gro_cb, in >> which both the outer and the inner network offsets are saved. >> >> Reproduction example: >> >> Endpoint configuration example (fou + local address bind) >> >> # ip fou add port 6666 ipproto 4 >> # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip >> # ip link set tun1 up >> # ip a add 1.1.1.2/24 dev tun1 >> >> Netperf TCP_STREAM result on net-next before patch is applied: >> >> net-next main, GRO enabled: >> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 >> Recv Send Send >> Socket Socket Message Elapsed >> Size Size Size Time Throughput >> bytes bytes bytes secs. 10^6bits/sec >> >> 131072 16384 16384 5.28 2.37 >> >> net-next main, GRO disabled: >> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 >> Recv Send Send >> Socket Socket Message Elapsed >> Size Size Size Time Throughput >> bytes bytes bytes secs. 10^6bits/sec >> >> 131072 16384 16384 5.01 2745.06 >> >> patch applied, GRO enabled: >> $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 >> Recv Send Send >> Socket Socket Message Elapsed >> Size Size Size Time Throughput >> bytes bytes bytes secs. 10^6bits/sec >> >> 131072 16384 16384 5.01 2877.38 >> >> Fixes: 57c67ff4bd92 ("udp: additional GRO support") >> Signed-off-by: Richard Gobert <richardbgobert@gmail.com> >> --- >> include/net/gro.h | 18 ++++++++++++++++-- >> net/8021q/vlan_core.c | 2 ++ >> net/core/gro.c | 1 + >> net/ipv4/af_inet.c | 5 +---- >> net/ipv4/tcp_offload.c | 3 ++- >> net/ipv4/udp.c | 3 ++- >> net/ipv4/udp_offload.c | 3 ++- >> net/ipv6/ip6_offload.c | 6 +++--- >> net/ipv6/tcpv6_offload.c | 3 ++- >> net/ipv6/udp.c | 3 ++- >> net/ipv6/udp_offload.c | 3 ++- >> 11 files changed, 35 insertions(+), 15 deletions(-) >> >> diff --git a/include/net/gro.h b/include/net/gro.h >> index 50f1e403dbbb..1faff23b66e8 100644 >> --- a/include/net/gro.h >> +++ b/include/net/gro.h >> @@ -87,6 +87,15 @@ struct napi_gro_cb { >> >> /* used to support CHECKSUM_COMPLETE for tunneling protocols */ >> __wsum csum; >> + >> + /* L3 offsets */ >> + union { >> + struct { >> + u16 network_offset; >> + u16 inner_network_offset; >> + }; >> + u16 network_offsets[2]; >> + }; >> }; >> >> #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) >> @@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen, >> return ptr; >> } >> >> +static inline int skb_gro_network_offset(const struct sk_buff *skb) >> +{ >> + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; >> +} >> + >> static inline void *skb_gro_network_header(const struct sk_buff *skb) >> { >> if (skb_gro_may_pull(skb, skb_gro_offset(skb))) >> - return skb_gro_header_fast(skb, skb_network_offset(skb)); >> + return skb_gro_header_fast(skb, skb_gro_network_offset(skb)); >> >> - return skb_network_header(skb); >> + return skb->data + skb_gro_network_offset(skb); >> } >> >> static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb, >> diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c >> index f00158234505..9404dd551dfd 100644 >> --- a/net/8021q/vlan_core.c >> +++ b/net/8021q/vlan_core.c >> @@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head, >> if (unlikely(!vhdr)) >> goto out; >> >> + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen; >> + >> type = vhdr->h_vlan_encapsulated_proto; >> >> ptype = gro_find_receive_by_type(type); >> diff --git a/net/core/gro.c b/net/core/gro.c >> index 83f35d99a682..c7901253a1a8 100644 >> --- a/net/core/gro.c >> +++ b/net/core/gro.c >> @@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff) >> const skb_frag_t *frag0; >> unsigned int headlen; >> >> + NAPI_GRO_CB(skb)->network_offset = 0; >> NAPI_GRO_CB(skb)->data_offset = 0; >> headlen = skb_headlen(skb); >> NAPI_GRO_CB(skb)->frag0 = skb->data; >> diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c >> index 55bd72997b31..2daf635ab99e 100644 >> --- a/net/ipv4/af_inet.c >> +++ b/net/ipv4/af_inet.c >> @@ -1568,10 +1568,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) >> >> NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); >> NAPI_GRO_CB(skb)->flush |= flush; >> - skb_set_network_header(skb, off); >> - /* The above will be needed by the transport layer if there is one >> - * immediately following this IP hdr. >> - */ >> + NAPI_GRO_CB(skb)->inner_network_offset = off; >> >> /* Note : No need to call skb_gro_postpull_rcsum() here, >> * as we already checked checksum over ipv4 header was 0 >> diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c >> index ebe4722bb020..be8ddf6da88c 100644 >> --- a/net/ipv4/tcp_offload.c >> +++ b/net/ipv4/tcp_offload.c >> @@ -332,7 +332,8 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) >> >> INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff) >> { >> - const struct iphdr *iph = ip_hdr(skb); >> + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; >> + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); >> struct tcphdr *th = tcp_hdr(skb); >> >> th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr, >> diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c >> index c02bf011d4a6..1399fce82b3f 100644 >> --- a/net/ipv4/udp.c >> +++ b/net/ipv4/udp.c >> @@ -532,7 +532,8 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, >> struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, >> __be16 sport, __be16 dport) >> { >> - const struct iphdr *iph = ip_hdr(skb); >> + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; >> + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); >> struct net *net = dev_net(skb->dev); >> int iif, sdif; >> >> diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c >> index 3498dd1d0694..fd29d21d579c 100644 >> --- a/net/ipv4/udp_offload.c >> +++ b/net/ipv4/udp_offload.c >> @@ -718,7 +718,8 @@ EXPORT_SYMBOL(udp_gro_complete); >> >> INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) >> { >> - const struct iphdr *iph = ip_hdr(skb); >> + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; >> + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); >> struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); >> >> /* do fraglist only if there is no outer UDP encap (or we already processed it) */ >> diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c >> index b41e35af69ea..5d6b875a4638 100644 >> --- a/net/ipv6/ip6_offload.c >> +++ b/net/ipv6/ip6_offload.c >> @@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto) >> off += len; >> } >> >> - skb_gro_pull(skb, off - skb_network_offset(skb)); >> + skb_gro_pull(skb, off - skb_gro_network_offset(skb)); >> return proto; >> } >> >> @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, >> if (unlikely(!iph)) >> goto out; >> >> - skb_set_network_header(skb, off); >> + NAPI_GRO_CB(skb)->inner_network_offset = off; >> >> flush += ntohs(iph->payload_len) != skb->len - hlen; >> >> @@ -259,7 +259,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, >> NAPI_GRO_CB(skb)->proto = proto; >> >> flush--; >> - nlen = skb_network_header_len(skb); >> + nlen = skb_gro_offset(skb) - off; >> >> list_for_each_entry(p, head, list) { >> const struct ipv6hdr *iph2; >> diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c >> index 4b07d1e6c952..e70d60e0f86f 100644 >> --- a/net/ipv6/tcpv6_offload.c >> +++ b/net/ipv6/tcpv6_offload.c >> @@ -29,7 +29,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) >> >> INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) >> { >> - const struct ipv6hdr *iph = ipv6_hdr(skb); >> + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; >> + const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); >> struct tcphdr *th = tcp_hdr(skb); >> > > Only udp code is affected, as only that can be used as tunnel. > > For bug fixes, let's try to avoid touching other code. Also that vlan. > > As a minimal patch all that is needed is the following, right? > > - add the fields > - store in inet_gro_receive + ipv6_gro_receive > - read in udp[46]_gro_complete and udp[46]_lib_lookup_skb > This approach is smaller, thanks for writing it down. What do you think about doing this and still writing to inner_network_offset exclusively in {inet,ipv6}_gro_receive? I still prefer it for reasons discussed in the previous series. The code line in vlan_gro_receive will still be there, but that will be the only addition to your snippet.
> >> --- a/net/ipv6/tcpv6_offload.c > >> +++ b/net/ipv6/tcpv6_offload.c > >> @@ -29,7 +29,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) > >> > >> INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) > >> { > >> - const struct ipv6hdr *iph = ipv6_hdr(skb); > >> + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; > >> + const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); > >> struct tcphdr *th = tcp_hdr(skb); > >> > > > > Only udp code is affected, as only that can be used as tunnel. > > > > For bug fixes, let's try to avoid touching other code. Also that vlan. > > > > As a minimal patch all that is needed is the following, right? > > > > - add the fields > > - store in inet_gro_receive + ipv6_gro_receive > > - read in udp[46]_gro_complete and udp[46]_lib_lookup_skb > > > > This approach is smaller, thanks for writing it down. > > What do you think about doing this and still writing to > inner_network_offset exclusively in {inet,ipv6}_gro_receive? I still > prefer it for reasons discussed in the previous series. The code line > in vlan_gro_receive will still be there, but that will be the only > addition to your snippet. That sounds fine, thanks.
diff --git a/include/net/gro.h b/include/net/gro.h index 50f1e403dbbb..1faff23b66e8 100644 --- a/include/net/gro.h +++ b/include/net/gro.h @@ -87,6 +87,15 @@ struct napi_gro_cb { /* used to support CHECKSUM_COMPLETE for tunneling protocols */ __wsum csum; + + /* L3 offsets */ + union { + struct { + u16 network_offset; + u16 inner_network_offset; + }; + u16 network_offsets[2]; + }; }; #define NAPI_GRO_CB(skb) ((struct napi_gro_cb *)(skb)->cb) @@ -172,12 +181,17 @@ static inline void *skb_gro_header(struct sk_buff *skb, unsigned int hlen, return ptr; } +static inline int skb_gro_network_offset(const struct sk_buff *skb) +{ + return NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark]; +} + static inline void *skb_gro_network_header(const struct sk_buff *skb) { if (skb_gro_may_pull(skb, skb_gro_offset(skb))) - return skb_gro_header_fast(skb, skb_network_offset(skb)); + return skb_gro_header_fast(skb, skb_gro_network_offset(skb)); - return skb_network_header(skb); + return skb->data + skb_gro_network_offset(skb); } static inline __wsum inet_gro_compute_pseudo(const struct sk_buff *skb, diff --git a/net/8021q/vlan_core.c b/net/8021q/vlan_core.c index f00158234505..9404dd551dfd 100644 --- a/net/8021q/vlan_core.c +++ b/net/8021q/vlan_core.c @@ -478,6 +478,8 @@ static struct sk_buff *vlan_gro_receive(struct list_head *head, if (unlikely(!vhdr)) goto out; + NAPI_GRO_CB(skb)->network_offsets[NAPI_GRO_CB(skb)->encap_mark] = hlen; + type = vhdr->h_vlan_encapsulated_proto; ptype = gro_find_receive_by_type(type); diff --git a/net/core/gro.c b/net/core/gro.c index 83f35d99a682..c7901253a1a8 100644 --- a/net/core/gro.c +++ b/net/core/gro.c @@ -371,6 +371,7 @@ static inline void skb_gro_reset_offset(struct sk_buff *skb, u32 nhoff) const skb_frag_t *frag0; unsigned int headlen; + NAPI_GRO_CB(skb)->network_offset = 0; NAPI_GRO_CB(skb)->data_offset = 0; headlen = skb_headlen(skb); NAPI_GRO_CB(skb)->frag0 = skb->data; diff --git a/net/ipv4/af_inet.c b/net/ipv4/af_inet.c index 55bd72997b31..2daf635ab99e 100644 --- a/net/ipv4/af_inet.c +++ b/net/ipv4/af_inet.c @@ -1568,10 +1568,7 @@ struct sk_buff *inet_gro_receive(struct list_head *head, struct sk_buff *skb) NAPI_GRO_CB(skb)->is_atomic = !!(iph->frag_off & htons(IP_DF)); NAPI_GRO_CB(skb)->flush |= flush; - skb_set_network_header(skb, off); - /* The above will be needed by the transport layer if there is one - * immediately following this IP hdr. - */ + NAPI_GRO_CB(skb)->inner_network_offset = off; /* Note : No need to call skb_gro_postpull_rcsum() here, * as we already checked checksum over ipv4 header was 0 diff --git a/net/ipv4/tcp_offload.c b/net/ipv4/tcp_offload.c index ebe4722bb020..be8ddf6da88c 100644 --- a/net/ipv4/tcp_offload.c +++ b/net/ipv4/tcp_offload.c @@ -332,7 +332,8 @@ struct sk_buff *tcp4_gro_receive(struct list_head *head, struct sk_buff *skb) INDIRECT_CALLABLE_SCOPE int tcp4_gro_complete(struct sk_buff *skb, int thoff) { - const struct iphdr *iph = ip_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); struct tcphdr *th = tcp_hdr(skb); th->check = ~tcp_v4_check(skb->len - thoff, iph->saddr, diff --git a/net/ipv4/udp.c b/net/ipv4/udp.c index c02bf011d4a6..1399fce82b3f 100644 --- a/net/ipv4/udp.c +++ b/net/ipv4/udp.c @@ -532,7 +532,8 @@ static inline struct sock *__udp4_lib_lookup_skb(struct sk_buff *skb, struct sock *udp4_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport) { - const struct iphdr *iph = ip_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); struct net *net = dev_net(skb->dev); int iif, sdif; diff --git a/net/ipv4/udp_offload.c b/net/ipv4/udp_offload.c index 3498dd1d0694..fd29d21d579c 100644 --- a/net/ipv4/udp_offload.c +++ b/net/ipv4/udp_offload.c @@ -718,7 +718,8 @@ EXPORT_SYMBOL(udp_gro_complete); INDIRECT_CALLABLE_SCOPE int udp4_gro_complete(struct sk_buff *skb, int nhoff) { - const struct iphdr *iph = ip_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct iphdr *iph = (struct iphdr *)(skb->data + offset); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); /* do fraglist only if there is no outer UDP encap (or we already processed it) */ diff --git a/net/ipv6/ip6_offload.c b/net/ipv6/ip6_offload.c index b41e35af69ea..5d6b875a4638 100644 --- a/net/ipv6/ip6_offload.c +++ b/net/ipv6/ip6_offload.c @@ -67,7 +67,7 @@ static int ipv6_gro_pull_exthdrs(struct sk_buff *skb, int off, int proto) off += len; } - skb_gro_pull(skb, off - skb_network_offset(skb)); + skb_gro_pull(skb, off - skb_gro_network_offset(skb)); return proto; } @@ -236,7 +236,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, if (unlikely(!iph)) goto out; - skb_set_network_header(skb, off); + NAPI_GRO_CB(skb)->inner_network_offset = off; flush += ntohs(iph->payload_len) != skb->len - hlen; @@ -259,7 +259,7 @@ INDIRECT_CALLABLE_SCOPE struct sk_buff *ipv6_gro_receive(struct list_head *head, NAPI_GRO_CB(skb)->proto = proto; flush--; - nlen = skb_network_header_len(skb); + nlen = skb_gro_offset(skb) - off; list_for_each_entry(p, head, list) { const struct ipv6hdr *iph2; diff --git a/net/ipv6/tcpv6_offload.c b/net/ipv6/tcpv6_offload.c index 4b07d1e6c952..e70d60e0f86f 100644 --- a/net/ipv6/tcpv6_offload.c +++ b/net/ipv6/tcpv6_offload.c @@ -29,7 +29,8 @@ struct sk_buff *tcp6_gro_receive(struct list_head *head, struct sk_buff *skb) INDIRECT_CALLABLE_SCOPE int tcp6_gro_complete(struct sk_buff *skb, int thoff) { - const struct ipv6hdr *iph = ipv6_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); struct tcphdr *th = tcp_hdr(skb); th->check = ~tcp_v6_check(skb->len - thoff, &iph->saddr, diff --git a/net/ipv6/udp.c b/net/ipv6/udp.c index 8b1dd7f51249..f7880e306410 100644 --- a/net/ipv6/udp.c +++ b/net/ipv6/udp.c @@ -272,7 +272,8 @@ static struct sock *__udp6_lib_lookup_skb(struct sk_buff *skb, struct sock *udp6_lib_lookup_skb(const struct sk_buff *skb, __be16 sport, __be16 dport) { - const struct ipv6hdr *iph = ipv6_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct ipv6hdr *iph = (struct ipv6hdr *)(skb->data + offset); struct net *net = dev_net(skb->dev); int iif, sdif; diff --git a/net/ipv6/udp_offload.c b/net/ipv6/udp_offload.c index bbd347de00b4..b41152dd4246 100644 --- a/net/ipv6/udp_offload.c +++ b/net/ipv6/udp_offload.c @@ -164,7 +164,8 @@ struct sk_buff *udp6_gro_receive(struct list_head *head, struct sk_buff *skb) INDIRECT_CALLABLE_SCOPE int udp6_gro_complete(struct sk_buff *skb, int nhoff) { - const struct ipv6hdr *ipv6h = ipv6_hdr(skb); + const u16 offset = NAPI_GRO_CB(skb)->network_offsets[skb->encapsulation]; + const struct ipv6hdr *ipv6h = (struct ipv6hdr *)(skb->data + offset); struct udphdr *uh = (struct udphdr *)(skb->data + nhoff); /* do fraglist only if there is no outer UDP encap (or we already processed it) */
Commits a602456 ("udp: Add GRO functions to UDP socket") and 57c67ff ("udp: additional GRO support") introduce incorrect usage of {ip,ipv6}_hdr in the complete phase of gro. The functions always return skb->network_header, which in the case of encapsulated packets at the gro complete phase, is always set to the innermost L3 of the packet. That means that calling {ip,ipv6}_hdr for skbs which completed the GRO receive phase (both in gro_list and *_gro_complete) when parsing an encapsulated packet's _outer_ L3/L4 may return an unexpected value. This incorrect usage leads to a bug in GRO's UDP socket lookup. udp{4,6}_lib_lookup_skb functions use ip_hdr/ipv6_hdr respectively. These *_hdr functions return network_header which will point to the innermost L3, resulting in the wrong offset being used in __udp{4,6}_lib_lookup with encapsulated packets. This patch adds network_offset and inner_network_offset to napi_gro_cb, and makes sure both are set correctly. To fix the issue, network_offsets union is used inside napi_gro_cb, in which both the outer and the inner network offsets are saved. Reproduction example: Endpoint configuration example (fou + local address bind) # ip fou add port 6666 ipproto 4 # ip link add name tun1 type ipip remote 2.2.2.1 local 2.2.2.2 encap fou encap-dport 5555 encap-sport 6666 mode ipip # ip link set tun1 up # ip a add 1.1.1.2/24 dev tun1 Netperf TCP_STREAM result on net-next before patch is applied: net-next main, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.28 2.37 net-next main, GRO disabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2745.06 patch applied, GRO enabled: $ netperf -H 1.1.1.2 -t TCP_STREAM -l 5 Recv Send Send Socket Socket Message Elapsed Size Size Size Time Throughput bytes bytes bytes secs. 10^6bits/sec 131072 16384 16384 5.01 2877.38 Fixes: 57c67ff4bd92 ("udp: additional GRO support") Signed-off-by: Richard Gobert <richardbgobert@gmail.com> --- include/net/gro.h | 18 ++++++++++++++++-- net/8021q/vlan_core.c | 2 ++ net/core/gro.c | 1 + net/ipv4/af_inet.c | 5 +---- net/ipv4/tcp_offload.c | 3 ++- net/ipv4/udp.c | 3 ++- net/ipv4/udp_offload.c | 3 ++- net/ipv6/ip6_offload.c | 6 +++--- net/ipv6/tcpv6_offload.c | 3 ++- net/ipv6/udp.c | 3 ++- net/ipv6/udp_offload.c | 3 ++- 11 files changed, 35 insertions(+), 15 deletions(-)