Message ID | 20230611105108.122586-1-gal@nvidia.com (mailing list archive) |
---|---|
State | Accepted |
Commit | fa0e21fa44438a0e856d42224bfa24641d37b979 |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO | expand |
On Sun, 11 Jun 2023 13:51:08 +0300 Gal Pressman <gal@nvidia.com> wrote: > From: Edwin Peer <edwin.peer@broadcom.com> > > This filter already exists for excluding IPv6 SNMP stats. Extend its > definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. > > This patch constitutes a partial fix for a netlink attribute nesting > overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the > requester doesn't need them, the truncation of the VF list is avoided. > > While it was technically only the stats added in commit c5a9f6f0ab40 > ("net/core: Add drop counters to VF statistics") breaking the camel's > back, the appreciable size of the stats data should never have been > included without due consideration for the maximum number of VFs > supported by PCI. > > Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice") > Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") > Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> > Cc: Edwin Peer <espeer@gmail.com> > Signed-off-by: Gal Pressman <gal@nvidia.com> > --- Better but it is still possible to create too many VF's that the response won't fit.
On 11/06/2023 18:06, Stephen Hemminger wrote: > On Sun, 11 Jun 2023 13:51:08 +0300 > Gal Pressman <gal@nvidia.com> wrote: > >> From: Edwin Peer <edwin.peer@broadcom.com> >> >> This filter already exists for excluding IPv6 SNMP stats. Extend its >> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. >> >> This patch constitutes a partial fix for a netlink attribute nesting >> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the >> requester doesn't need them, the truncation of the VF list is avoided. >> >> While it was technically only the stats added in commit c5a9f6f0ab40 >> ("net/core: Add drop counters to VF statistics") breaking the camel's >> back, the appreciable size of the stats data should never have been >> included without due consideration for the maximum number of VFs >> supported by PCI. >> >> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice") >> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") >> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> >> Cc: Edwin Peer <espeer@gmail.com> >> Signed-off-by: Gal Pressman <gal@nvidia.com> >> --- > > Better but it is still possible to create too many VF's that the response > won't fit. Correct, no argues here. It allowed me to see around ~200 VFs instead of ~70, a step in the right direction.
> -----Original Message----- > From: Gal Pressman <gal@nvidia.com> > Sent: Sunday, June 11, 2023 10:59 AM > To: Stephen Hemminger <stephen@networkplumber.org> > Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; > David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>; > netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer > <espeer@gmail.com> > Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to > IFLA_VF_INFO > > On 11/06/2023 18:06, Stephen Hemminger wrote: > > On Sun, 11 Jun 2023 13:51:08 +0300 > > Gal Pressman <gal@nvidia.com> wrote: > > > >> From: Edwin Peer <edwin.peer@broadcom.com> > >> > >> This filter already exists for excluding IPv6 SNMP stats. Extend its > >> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. > >> > >> This patch constitutes a partial fix for a netlink attribute nesting > >> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the > >> requester doesn't need them, the truncation of the VF list is avoided. > >> > >> While it was technically only the stats added in commit c5a9f6f0ab40 > >> ("net/core: Add drop counters to VF statistics") breaking the camel's > >> back, the appreciable size of the stats data should never have been > >> included without due consideration for the maximum number of VFs > >> supported by PCI. > >> > >> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF > netdevice") > >> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics") > >> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com> > >> Cc: Edwin Peer <espeer@gmail.com> > >> Signed-off-by: Gal Pressman <gal@nvidia.com> > >> --- > > > > Better but it is still possible to create too many VF's that the response > > won't fit. > > Correct, no argues here. > It allowed me to see around ~200 VFs instead of ~70, a step in the right > direction. I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error. Fixing this completely is non-trivial. Thanks, Jake
On Mon, 2023-06-12 at 05:34 +0000, Keller, Jacob E wrote: > > -----Original Message----- > > From: Gal Pressman <gal@nvidia.com> > > Sent: Sunday, June 11, 2023 10:59 AM > > To: Stephen Hemminger <stephen@networkplumber.org> > > Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; > > David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>; > > netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer > > <espeer@gmail.com> > > Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to > > IFLA_VF_INFO > > > > On 11/06/2023 18:06, Stephen Hemminger wrote: > > > > > Better but it is still possible to create too many VF's that the response > > > won't fit. > > > > Correct, no argues here. > > It allowed me to see around ~200 VFs instead of ~70, a step in the right > > direction. > > I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error. > > Fixing this completely is non-trivial. As it looks like the is substantial agreement on this approach being a step in the right direction and I can't think of anything better, I suggest to merge this as is, unless someone voices concerns very soon, very loudly. Thanks! Paolo
Hello: This patch was applied to netdev/net-next.git (main) by Paolo Abeni <pabeni@redhat.com>: On Sun, 11 Jun 2023 13:51:08 +0300 you wrote: > From: Edwin Peer <edwin.peer@broadcom.com> > > This filter already exists for excluding IPv6 SNMP stats. Extend its > definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK. > > This patch constitutes a partial fix for a netlink attribute nesting > overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the > requester doesn't need them, the truncation of the VF list is avoided. > > [...] Here is the summary with links: - [net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO https://git.kernel.org/netdev/net-next/c/fa0e21fa4443 You are awesome, thank you!
On 6/14/23 3:03 AM, Paolo Abeni wrote: > On Mon, 2023-06-12 at 05:34 +0000, Keller, Jacob E wrote: >>> -----Original Message----- >>> From: Gal Pressman <gal@nvidia.com> >>> Sent: Sunday, June 11, 2023 10:59 AM >>> To: Stephen Hemminger <stephen@networkplumber.org> >>> Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; >>> David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>; >>> netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer >>> <espeer@gmail.com> >>> Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to >>> IFLA_VF_INFO >>> >>> On 11/06/2023 18:06, Stephen Hemminger wrote: >>> >>>> Better but it is still possible to create too many VF's that the response >>>> won't fit. >>> >>> Correct, no argues here. >>> It allowed me to see around ~200 VFs instead of ~70, a step in the right >>> direction. >> >> I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error. >> >> Fixing this completely is non-trivial. > > As it looks like the is substantial agreement on this approach being a > step in the right direction and I can't think of anything better, I > suggest to merge this as is, unless someone voices concerns very soon, > very loudly. > My only concern is that this "hot potato" stays in the air longer. This problem has been around for years, and someone needs to step up and propose an API.
On Wed, 14 Jun 2023 08:17:44 -0700 David Ahern wrote: > > As it looks like the is substantial agreement on this approach being a > > step in the right direction and I can't think of anything better, I > > suggest to merge this as is, unless someone voices concerns very soon, > > very loudly. > > My only concern is that this "hot potato" stays in the air longer. This > problem has been around for years, and someone needs to step up and > propose an API. The functionality is fully covered by devlink and tc. We're not accepting any new drivers/implementations of the old SR-IOV NDOs at all. We try to put this potato in the grave and it keeps trying to grow :(
diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c index 41de3a2f29e1..d1815122298d 100644 --- a/net/core/rtnetlink.c +++ b/net/core/rtnetlink.c @@ -961,24 +961,27 @@ static inline int rtnl_vfinfo_size(const struct net_device *dev, nla_total_size(sizeof(struct ifla_vf_rate)) + nla_total_size(sizeof(struct ifla_vf_link_state)) + nla_total_size(sizeof(struct ifla_vf_rss_query_en)) + - nla_total_size(0) + /* nest IFLA_VF_STATS */ - /* IFLA_VF_STATS_RX_PACKETS */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_TX_PACKETS */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_RX_BYTES */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_TX_BYTES */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_BROADCAST */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_MULTICAST */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_RX_DROPPED */ - nla_total_size_64bit(sizeof(__u64)) + - /* IFLA_VF_STATS_TX_DROPPED */ - nla_total_size_64bit(sizeof(__u64)) + nla_total_size(sizeof(struct ifla_vf_trust))); + if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) { + size += num_vfs * + (nla_total_size(0) + /* nest IFLA_VF_STATS */ + /* IFLA_VF_STATS_RX_PACKETS */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_TX_PACKETS */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_RX_BYTES */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_TX_BYTES */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_BROADCAST */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_MULTICAST */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_RX_DROPPED */ + nla_total_size_64bit(sizeof(__u64)) + + /* IFLA_VF_STATS_TX_DROPPED */ + nla_total_size_64bit(sizeof(__u64))); + } return size; } else return 0; @@ -1270,7 +1273,8 @@ static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb, static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb, struct net_device *dev, int vfs_num, - struct nlattr *vfinfo) + struct nlattr *vfinfo, + u32 ext_filter_mask) { struct ifla_vf_rss_query_en vf_rss_query_en; struct nlattr *vf, *vfstats, *vfvlanlist; @@ -1376,33 +1380,35 @@ static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb, goto nla_put_vf_failure; } nla_nest_end(skb, vfvlanlist); - memset(&vf_stats, 0, sizeof(vf_stats)); - if (dev->netdev_ops->ndo_get_vf_stats) - dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num, - &vf_stats); - vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS); - if (!vfstats) - goto nla_put_vf_failure; - if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS, - vf_stats.rx_packets, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS, - vf_stats.tx_packets, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES, - vf_stats.rx_bytes, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES, - vf_stats.tx_bytes, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST, - vf_stats.broadcast, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST, - vf_stats.multicast, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED, - vf_stats.rx_dropped, IFLA_VF_STATS_PAD) || - nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED, - vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) { - nla_nest_cancel(skb, vfstats); - goto nla_put_vf_failure; + if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) { + memset(&vf_stats, 0, sizeof(vf_stats)); + if (dev->netdev_ops->ndo_get_vf_stats) + dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num, + &vf_stats); + vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS); + if (!vfstats) + goto nla_put_vf_failure; + if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS, + vf_stats.rx_packets, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS, + vf_stats.tx_packets, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES, + vf_stats.rx_bytes, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES, + vf_stats.tx_bytes, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST, + vf_stats.broadcast, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST, + vf_stats.multicast, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED, + vf_stats.rx_dropped, IFLA_VF_STATS_PAD) || + nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED, + vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) { + nla_nest_cancel(skb, vfstats); + goto nla_put_vf_failure; + } + nla_nest_end(skb, vfstats); } - nla_nest_end(skb, vfstats); nla_nest_end(skb, vf); return 0; @@ -1435,7 +1441,7 @@ static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb, return -EMSGSIZE; for (i = 0; i < num_vfs; i++) { - if (rtnl_fill_vfinfo(skb, dev, i, vfinfo)) + if (rtnl_fill_vfinfo(skb, dev, i, vfinfo, ext_filter_mask)) return -EMSGSIZE; }