Message ID | 20220203015140.3022854-2-eric.dumazet@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | tcp: BIG TCP implementation | expand |
On Wed, 2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote: > From: Eric Dumazet <edumazet@google.com> > > Some NIC (or virtual devices) are LSOv2 compatible. > > BIG TCP plans using the large LSOv2 feature for IPv6. > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined. > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit. > > Unchanged drivers are not allowing big TSO packets to be sent. Many drivers will have a limit on how many buffer descriptors they can chain, not the size of the super frame, I'd think. Is that not the case? We can't assume all pages but the first and last are full, right? > diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h > index e490b84732d1654bf067b30f2bb0b0825f88dea9..b1f68df2b37bc4b623f61cc2c6f0c02ba2afbe02 100644 > --- a/include/linux/netdevice.h > +++ b/include/linux/netdevice.h > @@ -1948,6 +1948,7 @@ enum netdev_ml_priv_type { > * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. > * @linkwatch_dev_tracker: refcount tracker used by linkwatch. > * @watchdog_dev_tracker: refcount tracker used by watchdog. > + * @tso_ipv6_max_size: Maximum size of IPv6 TSO packets (driver/NIC limit) > * > * FIXME: cleanup struct net_device such that network protocol info > * moves out. > @@ -2282,6 +2283,7 @@ struct net_device { > u8 dev_addr_shadow[MAX_ADDR_LEN]; > netdevice_tracker linkwatch_dev_tracker; > netdevice_tracker watchdog_dev_tracker; > + unsigned int tso_ipv6_max_size; > }; > #define to_net_dev(d) container_of(d, struct net_device, dev) > > @@ -4818,6 +4820,14 @@ static inline void netif_set_gro_max_size(struct net_device *dev, > WRITE_ONCE(dev->gro_max_size, size); > } > > +/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */ > +static inline void netif_set_tso_ipv6_max_size(struct net_device *dev, > + unsigned int size) > +{ > + dev->tso_ipv6_max_size = size; > +} > + > + nit: double new line
On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote: > > From: Eric Dumazet <edumazet@google.com> > > > > Some NIC (or virtual devices) are LSOv2 compatible. > > > > BIG TCP plans using the large LSOv2 feature for IPv6. > > > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined. > > > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit. > > > > Unchanged drivers are not allowing big TSO packets to be sent. > > Many drivers will have a limit on how many buffer descriptors they > can chain, not the size of the super frame, I'd think. Is that not > the case? We can't assume all pages but the first and last are full, > right? In our case, we have a 100Gbit Google NIC which has these limits: - TX descriptor has a 16bit field filled with skb->len - No more than 21 frags per 'packet' In order to support BIG TCP on it, we had to split the bigger TCP packets into smaller chunks, to satisfy both constraints (even if the second constraint is hardly hit once you chop to ~60KB packets, given our 4K MTU) ndo_features_check() might help to take care of small oddities. For instance I will insert the following in the next version of the series: commit 26644be08edc2f14f6ec79f650cc4a5d380df498 Author: Eric Dumazet <edumazet@google.com> Date: Wed Feb 2 23:22:01 2022 -0800 net: typhoon: implement ndo_features_check method Instead of disabling TSO if MAX_SKB_FRAGS > 32, implement ndo_features_check() method for this driver. If skb has more than 32 frags, use the following heuristic: 1) force GSO for gso packets 2) Otherwise force linearization. Most locally generated TCP packets will use a small number of fragments anyway. Signed-off-by: Eric Dumazet <edumazet@google.com> diff --git a/drivers/net/ethernet/3com/typhoon.c b/drivers/net/ethernet/3com/typhoon.c index 8aec5d9fbfef2803c181387537300502a937caf0..216e26a49e9c272ba7483bfa06941ff11ea40e3c 100644 --- a/drivers/net/ethernet/3com/typhoon.c +++ b/drivers/net/ethernet/3com/typhoon.c @@ -138,11 +138,6 @@ MODULE_PARM_DESC(use_mmio, "Use MMIO (1) or PIO(0) to access the NIC. " module_param(rx_copybreak, int, 0); module_param(use_mmio, int, 0); -#if defined(NETIF_F_TSO) && MAX_SKB_FRAGS > 32 -#warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO -#undef NETIF_F_TSO -#endif - #if TXLO_ENTRIES <= (2 * MAX_SKB_FRAGS) #error TX ring too small! #endif @@ -2261,9 +2256,23 @@ typhoon_test_mmio(struct pci_dev *pdev) return mode; } +static netdev_features_t typhoon_features_check(struct sk_buff *skb, + struct net_device *dev, + netdev_features_t features) +{ + if (skb_shinfo(skb)->nr_frags > 32) { + if (skb_is_gso(skb)) + features &= ~NETIF_F_GSO_MASK; + else + features &= ~NETIF_F_SG; + } + return features; +} + static const struct net_device_ops typhoon_netdev_ops = { .ndo_open = typhoon_open, .ndo_stop = typhoon_close, + .ndo_features_check = typhoon_features_check, .ndo_start_xmit = typhoon_start_tx, .ndo_set_rx_mode = typhoon_set_rx_mode, .ndo_tx_timeout = typhoon_tx_timeout,
On Thu, 3 Feb 2022 08:56:56 -0800 Eric Dumazet wrote: > On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Wed, 2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote: > > > From: Eric Dumazet <edumazet@google.com> > > > > > > Some NIC (or virtual devices) are LSOv2 compatible. > > > > > > BIG TCP plans using the large LSOv2 feature for IPv6. > > > > > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined. > > > > > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit. > > > > > > Unchanged drivers are not allowing big TSO packets to be sent. > > > > Many drivers will have a limit on how many buffer descriptors they > > can chain, not the size of the super frame, I'd think. Is that not > > the case? We can't assume all pages but the first and last are full, > > right? > > In our case, we have a 100Gbit Google NIC which has these limits: > > - TX descriptor has a 16bit field filled with skb->len > - No more than 21 frags per 'packet' > > In order to support BIG TCP on it, we had to split the bigger TCP packets > into smaller chunks, to satisfy both constraints (even if the second > constraint is hardly hit once you chop to ~60KB packets, given our 4K > MTU) > > ndo_features_check() might help to take care of small oddities. Makes sense, I was curious if we can do more in the core so that fewer changes are required in the drivers. Both so that drivers don't have to strip the header and so that drivers with limitations can be served pre-cooked smaller skbs. > For instance I will insert the following in the next version of the series: > > commit 26644be08edc2f14f6ec79f650cc4a5d380df498 > Author: Eric Dumazet <edumazet@google.com> > Date: Wed Feb 2 23:22:01 2022 -0800 > > net: typhoon: implement ndo_features_check method > > Instead of disabling TSO if MAX_SKB_FRAGS > 32, implement > ndo_features_check() method for this driver. > > If skb has more than 32 frags, use the following heuristic: > > 1) force GSO for gso packets > 2) Otherwise force linearization. > > Most locally generated TCP packets will use a small number of fragments > anyway. > > Signed-off-by: Eric Dumazet <edumazet@google.com> > > diff --git a/drivers/net/ethernet/3com/typhoon.c > b/drivers/net/ethernet/3com/typhoon.c > index 8aec5d9fbfef2803c181387537300502a937caf0..216e26a49e9c272ba7483bfa06941ff11ea40e3c > 100644 > --- a/drivers/net/ethernet/3com/typhoon.c > +++ b/drivers/net/ethernet/3com/typhoon.c > @@ -138,11 +138,6 @@ MODULE_PARM_DESC(use_mmio, "Use MMIO (1) or > PIO(0) to access the NIC. " > module_param(rx_copybreak, int, 0); > module_param(use_mmio, int, 0); > > -#if defined(NETIF_F_TSO) && MAX_SKB_FRAGS > 32 > -#warning Typhoon only supports 32 entries in its SG list for TSO, disabling TSO > -#undef NETIF_F_TSO > -#endif I wonder how many drivers just assumed MAX_SKB_FRAGS will never change :S What do you think about a device-level check in the core for number of frags?
On Thu, Feb 3, 2022 at 10:58 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 3 Feb 2022 08:56:56 -0800 Eric Dumazet wrote: > > On Thu, Feb 3, 2022 at 8:34 AM Jakub Kicinski <kuba@kernel.org> wrote: > > > On Wed, 2 Feb 2022 17:51:26 -0800 Eric Dumazet wrote: > > > > From: Eric Dumazet <edumazet@google.com> > > > > > > > > Some NIC (or virtual devices) are LSOv2 compatible. > > > > > > > > BIG TCP plans using the large LSOv2 feature for IPv6. > > > > > > > > New netlink attribute IFLA_TSO_IPV6_MAX_SIZE is defined. > > > > > > > > Drivers should use netif_set_tso_ipv6_max_size() to advertize their limit. > > > > > > > > Unchanged drivers are not allowing big TSO packets to be sent. > > > > > > Many drivers will have a limit on how many buffer descriptors they > > > can chain, not the size of the super frame, I'd think. Is that not > > > the case? We can't assume all pages but the first and last are full, > > > right? > > > > In our case, we have a 100Gbit Google NIC which has these limits: > > > > - TX descriptor has a 16bit field filled with skb->len > > - No more than 21 frags per 'packet' > > > > In order to support BIG TCP on it, we had to split the bigger TCP packets > > into smaller chunks, to satisfy both constraints (even if the second > > constraint is hardly hit once you chop to ~60KB packets, given our 4K > > MTU) > > > > ndo_features_check() might help to take care of small oddities. > > Makes sense, I was curious if we can do more in the core so that fewer > changes are required in the drivers. Both so that drivers don't have to > strip the header and so that drivers with limitations can be served > pre-cooked smaller skbs. I have on my plate to implement a helper to split 'big GRO/TSO' packets into smaller chunks. I have avoided doing it in our Google NIC driver, to avoid extra sk_buff/skb->head allocations for each BIG TCP packet. Yes, core networking stack could use it. > I wonder how many drivers just assumed MAX_SKB_FRAGS will never > change :S What do you think about a device-level check in the core > for number of frags? I guess we could do this if the CONFIG_MAX_SKB_FRAGS > 17
diff --git a/include/linux/netdevice.h b/include/linux/netdevice.h index e490b84732d1654bf067b30f2bb0b0825f88dea9..b1f68df2b37bc4b623f61cc2c6f0c02ba2afbe02 100644 --- a/include/linux/netdevice.h +++ b/include/linux/netdevice.h @@ -1948,6 +1948,7 @@ enum netdev_ml_priv_type { * @dev_addr_shadow: Copy of @dev_addr to catch direct writes. * @linkwatch_dev_tracker: refcount tracker used by linkwatch. * @watchdog_dev_tracker: refcount tracker used by watchdog. + * @tso_ipv6_max_size: Maximum size of IPv6 TSO packets (driver/NIC limit) * * FIXME: cleanup struct net_device such that network protocol info * moves out. @@ -2282,6 +2283,7 @@ struct net_device { u8 dev_addr_shadow[MAX_ADDR_LEN]; netdevice_tracker linkwatch_dev_tracker; netdevice_tracker watchdog_dev_tracker; + unsigned int tso_ipv6_max_size; }; #define to_net_dev(d) container_of(d, struct net_device, dev) @@ -4818,6 +4820,14 @@ static inline void netif_set_gro_max_size(struct net_device *dev, WRITE_ONCE(dev->gro_max_size, size); } +/* Used by drivers to give their hardware/firmware limit for LSOv2 packets */ +static inline void netif_set_tso_ipv6_max_size(struct net_device *dev, + unsigned int size) +{ + dev->tso_ipv6_max_size = size; +} + + static inline void skb_gso_error_unwind(struct sk_buff *skb, __be16 protocol, int pulled_hlen, u16 mac_offset, int mac_len) diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6218f93f5c1a92b5765bc19dfb9d7583c3b9369b..79b9d399cd297a1f79dca5ce89762800c38ed4a8 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -348,6 +348,7 @@ enum { IFLA_PARENT_DEV_NAME, IFLA_PARENT_DEV_BUS_NAME, IFLA_GRO_MAX_SIZE, + IFLA_TSO_IPV6_MAX_SIZE, __IFLA_MAX }; diff --git a/net/core/dev.c b/net/core/dev.c index 1baab07820f65f9bcf88a6d73e2c9ff741d33c18..b6ca3c348d41a097baf210f2a5d966b71308c69b 100644 --- a/net/core/dev.c +++ b/net/core/dev.c @@ -10188,6 +10188,8 @@ struct net_device *alloc_netdev_mqs(int sizeof_priv, const char *name, dev->gso_max_size = GSO_MAX_SIZE; dev->gso_max_segs = GSO_MAX_SEGS; dev->gro_max_size = GRO_MAX_SIZE; + dev->tso_ipv6_max_size = GSO_MAX_SIZE; + dev->upper_level = 1; dev->lower_level = 1; #ifdef CONFIG_LOCKDEP diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index e476403231f00053e1a261f31a8760325c75c941..4cefa07195ba3b67e7b724194b5d729d395ba466 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -1027,6 +1027,7 @@ static noinline size_t if_nlmsg_size(const struct net_device *dev, + nla_total_size(4) /* IFLA_GSO_MAX_SEGS */ + nla_total_size(4) /* IFLA_GSO_MAX_SIZE */ + nla_total_size(4) /* IFLA_GRO_MAX_SIZE */ + + nla_total_size(4) /* IFLA_TSO_IPV6_MAX_SIZE */ + nla_total_size(1) /* IFLA_OPERSTATE */ + nla_total_size(1) /* IFLA_LINKMODE */ + nla_total_size(4) /* IFLA_CARRIER_CHANGES */ @@ -1730,6 +1731,7 @@ static int rtnl_fill_ifinfo(struct sk_buff *skb, nla_put_u32(skb, IFLA_GSO_MAX_SEGS, dev->gso_max_segs) || nla_put_u32(skb, IFLA_GSO_MAX_SIZE, dev->gso_max_size) || nla_put_u32(skb, IFLA_GRO_MAX_SIZE, dev->gro_max_size) || + nla_put_u32(skb, IFLA_TSO_IPV6_MAX_SIZE, dev->tso_ipv6_max_size) || #ifdef CONFIG_RPS nla_put_u32(skb, IFLA_NUM_RX_QUEUES, dev->num_rx_queues) || #endif @@ -1883,6 +1885,7 @@ static const struct nla_policy ifla_policy[IFLA_MAX+1] = { [IFLA_NEW_IFINDEX] = NLA_POLICY_MIN(NLA_S32, 1), [IFLA_PARENT_DEV_NAME] = { .type = NLA_NUL_STRING }, [IFLA_GRO_MAX_SIZE] = { .type = NLA_U32 }, + [IFLA_TSO_IPV6_MAX_SIZE] = { .type = NLA_U32 }, }; static const struct nla_policy ifla_info_policy[IFLA_INFO_MAX+1] = { diff --git a/tools/include/uapi/linux/if_link.h b/tools/include/uapi/linux/if_link.h index 6218f93f5c1a92b5765bc19dfb9d7583c3b9369b..79b9d399cd297a1f79dca5ce89762800c38ed4a8 100644 --- a/tools/include/uapi/linux/if_link.h +++ b/tools/include/uapi/linux/if_link.h @@ -348,6 +348,7 @@ enum { IFLA_PARENT_DEV_NAME, IFLA_PARENT_DEV_BUS_NAME, IFLA_GRO_MAX_SIZE, + IFLA_TSO_IPV6_MAX_SIZE, __IFLA_MAX };