diff mbox series

[net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO

Message ID 20230611105108.122586-1-gal@nvidia.com (mailing list archive)
State Accepted
Commit fa0e21fa44438a0e856d42224bfa24641d37b979
Delegated to: Netdev Maintainers
Headers show
Series [net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO | expand

Checks

Context Check Description
netdev/series_format success Single patches do not need cover letters
netdev/tree_selection success Clearly marked for net-next
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 10 this patch: 10
netdev/cc_maintainers fail 5 blamed authors not CCed: ogerlitz@mellanox.com eugenia@mellanox.com saeedm@mellanox.com hadarh@mellanox.com eranbe@mellanox.com; 9 maintainers not CCed: razor@blackwall.org ogerlitz@mellanox.com idosch@nvidia.com eugenia@mellanox.com saeedm@mellanox.com hadarh@mellanox.com pabeni@redhat.com eranbe@mellanox.com edumazet@google.com
netdev/build_clang success Errors and warnings before: 8 this patch: 8
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success Fixes tag looks correct
netdev/build_allmodconfig_warn success Errors and warnings before: 10 this patch: 10
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 122 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Gal Pressman June 11, 2023, 10:51 a.m. UTC
From: Edwin Peer <edwin.peer@broadcom.com>

This filter already exists for excluding IPv6 SNMP stats. Extend its
definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.

This patch constitutes a partial fix for a netlink attribute nesting
overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
requester doesn't need them, the truncation of the VF list is avoided.

While it was technically only the stats added in commit c5a9f6f0ab40
("net/core: Add drop counters to VF statistics") breaking the camel's
back, the appreciable size of the stats data should never have been
included without due consideration for the maximum number of VFs
supported by PCI.

Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
Cc: Edwin Peer <espeer@gmail.com>
Signed-off-by: Gal Pressman <gal@nvidia.com>
---
Following our discussion in:
https://lore.kernel.org/netdev/20230607093324.2b7712d9@kernel.org/

I've rebased and testesd this single patch by Edwin.
It is only a partial "fix", but increases the number of VFs presented
before the truncation occurs.
---
 net/core/rtnetlink.c | 96 +++++++++++++++++++++++---------------------
 1 file changed, 51 insertions(+), 45 deletions(-)

Comments

Stephen Hemminger June 11, 2023, 3:06 p.m. UTC | #1
On Sun, 11 Jun 2023 13:51:08 +0300
Gal Pressman <gal@nvidia.com> wrote:

> From: Edwin Peer <edwin.peer@broadcom.com>
> 
> This filter already exists for excluding IPv6 SNMP stats. Extend its
> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
> 
> This patch constitutes a partial fix for a netlink attribute nesting
> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
> requester doesn't need them, the truncation of the VF list is avoided.
> 
> While it was technically only the stats added in commit c5a9f6f0ab40
> ("net/core: Add drop counters to VF statistics") breaking the camel's
> back, the appreciable size of the stats data should never have been
> included without due consideration for the maximum number of VFs
> supported by PCI.
> 
> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
> Cc: Edwin Peer <espeer@gmail.com>
> Signed-off-by: Gal Pressman <gal@nvidia.com>
> ---

Better but it is still possible to create too many VF's that the response
won't fit.
Gal Pressman June 11, 2023, 5:58 p.m. UTC | #2
On 11/06/2023 18:06, Stephen Hemminger wrote:
> On Sun, 11 Jun 2023 13:51:08 +0300
> Gal Pressman <gal@nvidia.com> wrote:
> 
>> From: Edwin Peer <edwin.peer@broadcom.com>
>>
>> This filter already exists for excluding IPv6 SNMP stats. Extend its
>> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
>>
>> This patch constitutes a partial fix for a netlink attribute nesting
>> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
>> requester doesn't need them, the truncation of the VF list is avoided.
>>
>> While it was technically only the stats added in commit c5a9f6f0ab40
>> ("net/core: Add drop counters to VF statistics") breaking the camel's
>> back, the appreciable size of the stats data should never have been
>> included without due consideration for the maximum number of VFs
>> supported by PCI.
>>
>> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF netdevice")
>> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
>> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
>> Cc: Edwin Peer <espeer@gmail.com>
>> Signed-off-by: Gal Pressman <gal@nvidia.com>
>> ---
> 
> Better but it is still possible to create too many VF's that the response
> won't fit.

Correct, no argues here.
It allowed me to see around ~200 VFs instead of ~70, a step in the right
direction.
Jacob Keller June 12, 2023, 5:34 a.m. UTC | #3
> -----Original Message-----
> From: Gal Pressman <gal@nvidia.com>
> Sent: Sunday, June 11, 2023 10:59 AM
> To: Stephen Hemminger <stephen@networkplumber.org>
> Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>;
> David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>;
> netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer
> <espeer@gmail.com>
> Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to
> IFLA_VF_INFO
> 
> On 11/06/2023 18:06, Stephen Hemminger wrote:
> > On Sun, 11 Jun 2023 13:51:08 +0300
> > Gal Pressman <gal@nvidia.com> wrote:
> >
> >> From: Edwin Peer <edwin.peer@broadcom.com>
> >>
> >> This filter already exists for excluding IPv6 SNMP stats. Extend its
> >> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
> >>
> >> This patch constitutes a partial fix for a netlink attribute nesting
> >> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
> >> requester doesn't need them, the truncation of the VF list is avoided.
> >>
> >> While it was technically only the stats added in commit c5a9f6f0ab40
> >> ("net/core: Add drop counters to VF statistics") breaking the camel's
> >> back, the appreciable size of the stats data should never have been
> >> included without due consideration for the maximum number of VFs
> >> supported by PCI.
> >>
> >> Fixes: 3b766cd83232 ("net/core: Add reading VF statistics through the PF
> netdevice")
> >> Fixes: c5a9f6f0ab40 ("net/core: Add drop counters to VF statistics")
> >> Signed-off-by: Edwin Peer <edwin.peer@broadcom.com>
> >> Cc: Edwin Peer <espeer@gmail.com>
> >> Signed-off-by: Gal Pressman <gal@nvidia.com>
> >> ---
> >
> > Better but it is still possible to create too many VF's that the response
> > won't fit.
> 
> Correct, no argues here.
> It allowed me to see around ~200 VFs instead of ~70, a step in the right
> direction.

I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error.

Fixing this completely is non-trivial.

Thanks,
Jake
Paolo Abeni June 14, 2023, 9:03 a.m. UTC | #4
On Mon, 2023-06-12 at 05:34 +0000, Keller, Jacob E wrote:
> > -----Original Message-----
> > From: Gal Pressman <gal@nvidia.com>
> > Sent: Sunday, June 11, 2023 10:59 AM
> > To: Stephen Hemminger <stephen@networkplumber.org>
> > Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>;
> > David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>;
> > netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer
> > <espeer@gmail.com>
> > Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to
> > IFLA_VF_INFO
> > 
> > On 11/06/2023 18:06, Stephen Hemminger wrote:
> > 
> > > Better but it is still possible to create too many VF's that the response
> > > won't fit.
> > 
> > Correct, no argues here.
> > It allowed me to see around ~200 VFs instead of ~70, a step in the right
> > direction.
> 
> I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error.
> 
> Fixing this completely is non-trivial.

As it looks like the is substantial agreement on this approach being a
step in the right direction and I can't think of anything better, I
suggest to merge this as is, unless someone voices concerns very soon,
very loudly.

Thanks!

Paolo
patchwork-bot+netdevbpf@kernel.org June 14, 2023, 1:20 p.m. UTC | #5
Hello:

This patch was applied to netdev/net-next.git (main)
by Paolo Abeni <pabeni@redhat.com>:

On Sun, 11 Jun 2023 13:51:08 +0300 you wrote:
> From: Edwin Peer <edwin.peer@broadcom.com>
> 
> This filter already exists for excluding IPv6 SNMP stats. Extend its
> definition to also exclude IFLA_VF_INFO stats in RTM_GETLINK.
> 
> This patch constitutes a partial fix for a netlink attribute nesting
> overflow bug in IFLA_VFINFO_LIST. By excluding the stats when the
> requester doesn't need them, the truncation of the VF list is avoided.
> 
> [...]

Here is the summary with links:
  - [net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to IFLA_VF_INFO
    https://git.kernel.org/netdev/net-next/c/fa0e21fa4443

You are awesome, thank you!
David Ahern June 14, 2023, 3:17 p.m. UTC | #6
On 6/14/23 3:03 AM, Paolo Abeni wrote:
> On Mon, 2023-06-12 at 05:34 +0000, Keller, Jacob E wrote:
>>> -----Original Message-----
>>> From: Gal Pressman <gal@nvidia.com>
>>> Sent: Sunday, June 11, 2023 10:59 AM
>>> To: Stephen Hemminger <stephen@networkplumber.org>
>>> Cc: David S. Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>;
>>> David Ahern <dsahern@gmail.com>; Michal Kubecek <mkubecek@suse.cz>;
>>> netdev@vger.kernel.org; Edwin Peer <edwin.peer@broadcom.com>; Edwin Peer
>>> <espeer@gmail.com>
>>> Subject: Re: [PATCH net-next] rtnetlink: extend RTEXT_FILTER_SKIP_STATS to
>>> IFLA_VF_INFO
>>>
>>> On 11/06/2023 18:06, Stephen Hemminger wrote:
>>>
>>>> Better but it is still possible to create too many VF's that the response
>>>> won't fit.
>>>
>>> Correct, no argues here.
>>> It allowed me to see around ~200 VFs instead of ~70, a step in the right
>>> direction.
>>
>> I remember investigating this a few years ago and we hit limits of ~200 that were essentially unfixable without creating a new API that can separate the reply over more than one message. The VF info data was not designed in the current op to allow processing over multiple messages. It also (unfortunately) doesn't report errors so it ends up just truncating instead of producing an error.
>>
>> Fixing this completely is non-trivial.
> 
> As it looks like the is substantial agreement on this approach being a
> step in the right direction and I can't think of anything better, I
> suggest to merge this as is, unless someone voices concerns very soon,
> very loudly.
> 

My only concern is that this "hot potato" stays in the air longer. This
problem has been around for years, and someone needs to step up and
propose an API.
Jakub Kicinski June 14, 2023, 4:39 p.m. UTC | #7
On Wed, 14 Jun 2023 08:17:44 -0700 David Ahern wrote:
> > As it looks like the is substantial agreement on this approach being a
> > step in the right direction and I can't think of anything better, I
> > suggest to merge this as is, unless someone voices concerns very soon,
> > very loudly.
> 
> My only concern is that this "hot potato" stays in the air longer. This
> problem has been around for years, and someone needs to step up and
> propose an API.

The functionality is fully covered by devlink and tc. We're not
accepting any new drivers/implementations of the old SR-IOV NDOs
at all.

We try to put this potato in the grave and it keeps trying to grow :(
diff mbox series

Patch

diff --git a/net/core/rtnetlink.c b/net/core/rtnetlink.c
index 41de3a2f29e1..d1815122298d 100644
--- a/net/core/rtnetlink.c
+++ b/net/core/rtnetlink.c
@@ -961,24 +961,27 @@  static inline int rtnl_vfinfo_size(const struct net_device *dev,
 			 nla_total_size(sizeof(struct ifla_vf_rate)) +
 			 nla_total_size(sizeof(struct ifla_vf_link_state)) +
 			 nla_total_size(sizeof(struct ifla_vf_rss_query_en)) +
-			 nla_total_size(0) + /* nest IFLA_VF_STATS */
-			 /* IFLA_VF_STATS_RX_PACKETS */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_PACKETS */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_RX_BYTES */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_BYTES */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_BROADCAST */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_MULTICAST */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_RX_DROPPED */
-			 nla_total_size_64bit(sizeof(__u64)) +
-			 /* IFLA_VF_STATS_TX_DROPPED */
-			 nla_total_size_64bit(sizeof(__u64)) +
 			 nla_total_size(sizeof(struct ifla_vf_trust)));
+		if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+			size += num_vfs *
+				(nla_total_size(0) + /* nest IFLA_VF_STATS */
+				 /* IFLA_VF_STATS_RX_PACKETS */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_PACKETS */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_RX_BYTES */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_BYTES */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_BROADCAST */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_MULTICAST */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_RX_DROPPED */
+				 nla_total_size_64bit(sizeof(__u64)) +
+				 /* IFLA_VF_STATS_TX_DROPPED */
+				 nla_total_size_64bit(sizeof(__u64)));
+		}
 		return size;
 	} else
 		return 0;
@@ -1270,7 +1273,8 @@  static noinline_for_stack int rtnl_fill_stats(struct sk_buff *skb,
 static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 					       struct net_device *dev,
 					       int vfs_num,
-					       struct nlattr *vfinfo)
+					       struct nlattr *vfinfo,
+					       u32 ext_filter_mask)
 {
 	struct ifla_vf_rss_query_en vf_rss_query_en;
 	struct nlattr *vf, *vfstats, *vfvlanlist;
@@ -1376,33 +1380,35 @@  static noinline_for_stack int rtnl_fill_vfinfo(struct sk_buff *skb,
 		goto nla_put_vf_failure;
 	}
 	nla_nest_end(skb, vfvlanlist);
-	memset(&vf_stats, 0, sizeof(vf_stats));
-	if (dev->netdev_ops->ndo_get_vf_stats)
-		dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num,
-						&vf_stats);
-	vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
-	if (!vfstats)
-		goto nla_put_vf_failure;
-	if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
-			      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
-			      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
-			      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
-			      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
-			      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
-			      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
-			      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
-	    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
-			      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
-		nla_nest_cancel(skb, vfstats);
-		goto nla_put_vf_failure;
+	if (~ext_filter_mask & RTEXT_FILTER_SKIP_STATS) {
+		memset(&vf_stats, 0, sizeof(vf_stats));
+		if (dev->netdev_ops->ndo_get_vf_stats)
+			dev->netdev_ops->ndo_get_vf_stats(dev, vfs_num,
+							  &vf_stats);
+		vfstats = nla_nest_start_noflag(skb, IFLA_VF_STATS);
+		if (!vfstats)
+			goto nla_put_vf_failure;
+		if (nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_PACKETS,
+				      vf_stats.rx_packets, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_PACKETS,
+				      vf_stats.tx_packets, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_BYTES,
+				      vf_stats.rx_bytes, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_BYTES,
+				      vf_stats.tx_bytes, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_BROADCAST,
+				      vf_stats.broadcast, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_MULTICAST,
+				      vf_stats.multicast, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_RX_DROPPED,
+				      vf_stats.rx_dropped, IFLA_VF_STATS_PAD) ||
+		    nla_put_u64_64bit(skb, IFLA_VF_STATS_TX_DROPPED,
+				      vf_stats.tx_dropped, IFLA_VF_STATS_PAD)) {
+			nla_nest_cancel(skb, vfstats);
+			goto nla_put_vf_failure;
+		}
+		nla_nest_end(skb, vfstats);
 	}
-	nla_nest_end(skb, vfstats);
 	nla_nest_end(skb, vf);
 	return 0;
 
@@ -1435,7 +1441,7 @@  static noinline_for_stack int rtnl_fill_vf(struct sk_buff *skb,
 		return -EMSGSIZE;
 
 	for (i = 0; i < num_vfs; i++) {
-		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo))
+		if (rtnl_fill_vfinfo(skb, dev, i, vfinfo, ext_filter_mask))
 			return -EMSGSIZE;
 	}