Message ID | cover.1674526718.git.lucien.xin@gmail.com (mailing list archive) |
---|---|
Headers | show |
Series | net: support ipv4 big tcp | expand |
On Tue, Jan 24, 2023 at 3:20 AM Xin Long <lucien.xin@gmail.com> wrote: > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > https://lwn.net/Articles/895398/ > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > indicate this might be a BIG TCP packet and use skb->len as the real > IPv4 total length. > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > processed on the same host as they were created; There is no padding > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > packet total length; Also, before implementing the feature, all those > places that may get iph tot_len from BIG TCP packets are taken care > with some new APIs: > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > used in all these places where IPv4 BIG TCP packets may reach in Patch > 2-8, and Patch 9 implements this feature and Patch 10 adds a selftest > for it. > > Note that the similar change as in Patch 2-6 are also needed for IPv6 > BIG TCP packets, and will be addressed in another patchset. > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > and 1.5K MTU: > > No BIG TCP: > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > 168 322 337 3776.49 > 143 236 277 4654.67 > 128 258 288 4772.83 > 171 229 278 4645.77 > 175 228 243 4678.93 > 149 239 279 4599.86 > 164 234 268 4606.94 > 155 276 289 4235.82 > 180 255 268 4418.95 > 168 241 249 4417.82 > NACK again You have not addressed my feedback. Given the experimental nature of BIG TCP, we need separate netlink attributes, so that we can selectively enable BIG TCP for IPV6, and not for IPV4. Some of us do not have time to risk breaking IPV4, even if IPV4 for our networks is less than 0.01 % of the traffic.
On Tue, Jan 24, 2023 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jan 24, 2023 at 3:20 AM Xin Long <lucien.xin@gmail.com> wrote: > > > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > > > https://lwn.net/Articles/895398/ > > > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > > indicate this might be a BIG TCP packet and use skb->len as the real > > IPv4 total length. > > > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > > processed on the same host as they were created; There is no padding > > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > > packet total length; Also, before implementing the feature, all those > > places that may get iph tot_len from BIG TCP packets are taken care > > with some new APIs: > > > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > > used in all these places where IPv4 BIG TCP packets may reach in Patch > > 2-8, and Patch 9 implements this feature and Patch 10 adds a selftest > > for it. > > > > Note that the similar change as in Patch 2-6 are also needed for IPv6 > > BIG TCP packets, and will be addressed in another patchset. > > > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > > and 1.5K MTU: > > > > No BIG TCP: > > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > > 168 322 337 3776.49 > > 143 236 277 4654.67 > > 128 258 288 4772.83 > > 171 229 278 4645.77 > > 175 228 243 4678.93 > > 149 239 279 4599.86 > > 164 234 268 4606.94 > > 155 276 289 4235.82 > > 180 255 268 4418.95 > > 168 241 249 4417.82 > > > > NACK again > > You have not addressed my feedback. > > Given the experimental nature of BIG TCP, we need separate netlink attributes, > so that we can selectively enable BIG TCP for IPV6, and not for IPV4. > That will be some change, and I will try to work on it. While at it, just try to be clearer, about the fixes for IPv6 BIG TCP I mentioned in this patchset. Since skb->len is trustable for GSO TCP packets. Are you still not okay with checking the skb_ipv6_pktlen() API added to fix them in netfilter/tc/bridge/openvswitch? Thanks.
On Tue, Jan 24, 2023 at 4:51 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Tue, Jan 24, 2023 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Jan 24, 2023 at 3:20 AM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > > > > > https://lwn.net/Articles/895398/ > > > > > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > > > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > > > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > > > indicate this might be a BIG TCP packet and use skb->len as the real > > > IPv4 total length. > > > > > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > > > processed on the same host as they were created; There is no padding > > > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > > > packet total length; Also, before implementing the feature, all those > > > places that may get iph tot_len from BIG TCP packets are taken care > > > with some new APIs: > > > > > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > > > used in all these places where IPv4 BIG TCP packets may reach in Patch > > > 2-8, and Patch 9 implements this feature and Patch 10 adds a selftest > > > for it. > > > > > > Note that the similar change as in Patch 2-6 are also needed for IPv6 > > > BIG TCP packets, and will be addressed in another patchset. > > > > > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > > > and 1.5K MTU: > > > > > > No BIG TCP: > > > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > > > 168 322 337 3776.49 > > > 143 236 277 4654.67 > > > 128 258 288 4772.83 > > > 171 229 278 4645.77 > > > 175 228 243 4678.93 > > > 149 239 279 4599.86 > > > 164 234 268 4606.94 > > > 155 276 289 4235.82 > > > 180 255 268 4418.95 > > > 168 241 249 4417.82 > > > > > > > NACK again > > > > You have not addressed my feedback. > > > > Given the experimental nature of BIG TCP, we need separate netlink attributes, > > so that we can selectively enable BIG TCP for IPV6, and not for IPV4. > > > That will be some change, and I will try to work on it. > > While at it, just try to be clearer, about the fixes for IPv6 BIG TCP > I mentioned in this patchset. Since skb->len is trustable for GSO TCP > packets. Are you still not okay with checking the skb_ipv6_pktlen() > API added to fix them in netfilter/tc/bridge/openvswitch? > Are you speaking of length_mt6() ? Quite frankly I do not think its implementation should care of GSO or anything. Considering the definition of this thing clearly never thought of having big packets, and an overflow was already possible, I do not see how you can fix it without some hack... struct xt_length_info { __u16 min, max; __u8 invert; }; Something like: diff --git a/net/netfilter/xt_length.c b/net/netfilter/xt_length.c index 1873da3a945abbc6e8849e4555b42acdd34cff2d..90eba619cbe1d11f0fdd394f6dfda2b03fa573cd 100644 --- a/net/netfilter/xt_length.c +++ b/net/netfilter/xt_length.c @@ -30,8 +30,7 @@ static bool length_mt6(const struct sk_buff *skb, struct xt_action_param *par) { const struct xt_length_info *info = par->matchinfo; - const u_int16_t pktlen = ntohs(ipv6_hdr(skb)->payload_len) + - sizeof(struct ipv6hdr); + u32 pktlen = min_t(u32, skb->len, 65535); return (pktlen >= info->min && pktlen <= info->max) ^ info->invert; }
On Tue, Jan 24, 2023 at 11:35 AM Eric Dumazet <edumazet@google.com> wrote: > > On Tue, Jan 24, 2023 at 4:51 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > On Tue, Jan 24, 2023 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > On Tue, Jan 24, 2023 at 3:20 AM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > > > > > > > https://lwn.net/Articles/895398/ > > > > > > > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > > > > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > > > > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > > > > indicate this might be a BIG TCP packet and use skb->len as the real > > > > IPv4 total length. > > > > > > > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > > > > processed on the same host as they were created; There is no padding > > > > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > > > > packet total length; Also, before implementing the feature, all those > > > > places that may get iph tot_len from BIG TCP packets are taken care > > > > with some new APIs: > > > > > > > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > > > > used in all these places where IPv4 BIG TCP packets may reach in Patch > > > > 2-8, and Patch 9 implements this feature and Patch 10 adds a selftest > > > > for it. > > > > > > > > Note that the similar change as in Patch 2-6 are also needed for IPv6 > > > > BIG TCP packets, and will be addressed in another patchset. > > > > > > > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > > > > and 1.5K MTU: > > > > > > > > No BIG TCP: > > > > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > > > > 168 322 337 3776.49 > > > > 143 236 277 4654.67 > > > > 128 258 288 4772.83 > > > > 171 229 278 4645.77 > > > > 175 228 243 4678.93 > > > > 149 239 279 4599.86 > > > > 164 234 268 4606.94 > > > > 155 276 289 4235.82 > > > > 180 255 268 4418.95 > > > > 168 241 249 4417.82 > > > > > > > > > > NACK again > > > > > > You have not addressed my feedback. > > > > > > Given the experimental nature of BIG TCP, we need separate netlink attributes, > > > so that we can selectively enable BIG TCP for IPV6, and not for IPV4. > > > > > That will be some change, and I will try to work on it. > > > > While at it, just try to be clearer, about the fixes for IPv6 BIG TCP > > I mentioned in this patchset. Since skb->len is trustable for GSO TCP > > packets. Are you still not okay with checking the skb_ipv6_pktlen() > > API added to fix them in netfilter/tc/bridge/openvswitch? > > > > Are you speaking of length_mt6() ? Yes, but not only there, see also: [1]: dump_ipv6_packet() nf_ct_bridge_pre() ovs_skb_network_trim()/tcf_ct_skb_network_trim() cake_ack_filter() These places get pktlen directly from ipv6_hdr(skb)->payload_len. > > Quite frankly I do not think its implementation should care of GSO or anything. Agree, and IPv6 jumbo packets don't only include IPv6 BIG TCP. But, > > Considering the definition of this thing clearly never thought of > having big packets, > and an overflow was already possible, I do not see how you can fix it > without some hack... > > struct xt_length_info { > __u16 min, max; > __u8 invert; > }; > > Something like: > > diff --git a/net/netfilter/xt_length.c b/net/netfilter/xt_length.c > index 1873da3a945abbc6e8849e4555b42acdd34cff2d..90eba619cbe1d11f0fdd394f6dfda2b03fa573cd > 100644 > --- a/net/netfilter/xt_length.c > +++ b/net/netfilter/xt_length.c > @@ -30,8 +30,7 @@ static bool > length_mt6(const struct sk_buff *skb, struct xt_action_param *par) > { > const struct xt_length_info *info = par->matchinfo; > - const u_int16_t pktlen = ntohs(ipv6_hdr(skb)->payload_len) + > - sizeof(struct ipv6hdr); > + u32 pktlen = min_t(u32, skb->len, 65535); do you also expect this to be applied to other places in [1] above? I was just thinking since skb->len is trustable for GSO TCP (not all jumbo packets), we should use skb->len as the pktlen only when it's a GSO TCP packet in such places. Thanks.
On Tue, Jan 24, 2023 at 6:14 PM Xin Long <lucien.xin@gmail.com> wrote: > > On Tue, Jan 24, 2023 at 11:35 AM Eric Dumazet <edumazet@google.com> wrote: > > > > On Tue, Jan 24, 2023 at 4:51 PM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > On Tue, Jan 24, 2023 at 3:27 AM Eric Dumazet <edumazet@google.com> wrote: > > > > > > > > On Tue, Jan 24, 2023 at 3:20 AM Xin Long <lucien.xin@gmail.com> wrote: > > > > > > > > > > This is similar to the BIG TCP patchset added by Eric for IPv6: > > > > > > > > > > https://lwn.net/Articles/895398/ > > > > > > > > > > Different from IPv6, IPv4 tot_len is 16-bit long only, and IPv4 header > > > > > doesn't have exthdrs(options) for the BIG TCP packets' length. To make > > > > > it simple, as David and Paolo suggested, we set IPv4 tot_len to 0 to > > > > > indicate this might be a BIG TCP packet and use skb->len as the real > > > > > IPv4 total length. > > > > > > > > > > This will work safely, as all BIG TCP packets are GSO/GRO packets and > > > > > processed on the same host as they were created; There is no padding > > > > > in GSO/GRO packets, and skb->len - network_offset is exactly the IPv4 > > > > > packet total length; Also, before implementing the feature, all those > > > > > places that may get iph tot_len from BIG TCP packets are taken care > > > > > with some new APIs: > > > > > > > > > > Patch 1 adds some APIs for iph tot_len setting and getting, which are > > > > > used in all these places where IPv4 BIG TCP packets may reach in Patch > > > > > 2-8, and Patch 9 implements this feature and Patch 10 adds a selftest > > > > > for it. > > > > > > > > > > Note that the similar change as in Patch 2-6 are also needed for IPv6 > > > > > BIG TCP packets, and will be addressed in another patchset. > > > > > > > > > > The similar performance test is done for IPv4 BIG TCP with 25Gbit NIC > > > > > and 1.5K MTU: > > > > > > > > > > No BIG TCP: > > > > > for i in {1..10}; do netperf -t TCP_RR -H 192.168.100.1 -- -r80000,80000 -O MIN_LATENCY,P90_LATENCY,P99_LATENCY,THROUGHPUT|tail -1; done > > > > > 168 322 337 3776.49 > > > > > 143 236 277 4654.67 > > > > > 128 258 288 4772.83 > > > > > 171 229 278 4645.77 > > > > > 175 228 243 4678.93 > > > > > 149 239 279 4599.86 > > > > > 164 234 268 4606.94 > > > > > 155 276 289 4235.82 > > > > > 180 255 268 4418.95 > > > > > 168 241 249 4417.82 > > > > > > > > > > > > > NACK again > > > > > > > > You have not addressed my feedback. > > > > > > > > Given the experimental nature of BIG TCP, we need separate netlink attributes, > > > > so that we can selectively enable BIG TCP for IPV6, and not for IPV4. > > > > > > > That will be some change, and I will try to work on it. > > > > > > While at it, just try to be clearer, about the fixes for IPv6 BIG TCP > > > I mentioned in this patchset. Since skb->len is trustable for GSO TCP > > > packets. Are you still not okay with checking the skb_ipv6_pktlen() > > > API added to fix them in netfilter/tc/bridge/openvswitch? > > > > > > > Are you speaking of length_mt6() ? > Yes, but not only there, see also: > > [1]: > dump_ipv6_packet() > nf_ct_bridge_pre() > ovs_skb_network_trim()/tcf_ct_skb_network_trim() > cake_ack_filter() > > These places get pktlen directly from ipv6_hdr(skb)->payload_len. dump_ipv6_packet() is right if its intent is to 'dump header values' If jumbo is not yet parsed, this should be added. cake_ack_filter() depends on receiving non malicious packets, which is not guaranteed. Special casing GSO seems the wrong way to fix buggy spots like these.