Message ID | 20200807212916.2883031-1-jwadams@google.com (mailing list archive) |
---|---|
Headers | show |
Series | metricfs metric file system and examples | expand |
> net/dev/stats/tx_bytes/annotations > DESCRIPTION net\ device\ transmited\ bytes\ count > CUMULATIVE > net/dev/stats/tx_bytes/fields > interface value > str int > net/dev/stats/tx_bytes/values > lo 4394430608 > eth0 33353183843 > eth1 16228847091 This is a rather small system. Have you tested it at scale? An Ethernet switch with 64 physical interfaces, and say 32 VLAN interfaces stack on top. So this one file will contain 2048 entries? And generally, you are not interested in one statistic, but many statistics. So you will need to cat each file, not just one file. And the way this is implemented: +static void dev_stats_emit(struct metric_emitter *e, + struct net_device *dev, + struct metric_def *metricd) +{ + struct rtnl_link_stats64 temp; + const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp); + + if (stats) { + __u8 *ptr = (((__u8 *)stats) + metricd->off); + + METRIC_EMIT_INT(e, *(__u64 *)ptr, dev->name, NULL); + } +} means you are going to be calling dev_get_stats() for each file, and there are 23 files if i counted correctly. So dev_get_stats() will be called 47104 times, in this made up example. And this is not always cheap, these counts can be atomic. So i personally don't think netdev statistics is a good idea, i doubt it scales. I also think you are looking at the wrong set of netdev counters. I would be more interested in ethtool -S counters. But it appears you make the assumption that each object you are collecting metrics for has the same set of counters. This is untrue for network interfaces, where each driver can export whatever counters it wants, and they can be dynamic. Andrew
On 8/7/20 8:06 PM, Andrew Lunn wrote: > So i personally don't think netdev statistics is a good idea, i doubt > it scales. +1
On Fri 2020-08-07 14:29:09, Jonathan Adams wrote: > [resending to widen the CC lists per rdunlap@infradead.org's suggestion > original posting to lkml here: https://lkml.org/lkml/2020/8/5/1009] > > To try to restart the discussion of kernel statistics started by the > statsfs patchsets (https://lkml.org/lkml/2020/5/26/332), I wanted > to share the following set of patches which are Google's 'metricfs' > implementation and some example uses. Google has been using metricfs > internally since 2012 as a way to export various statistics to our > telemetry systems (similar to OpenTelemetry), and we have over 200 > statistics exported on a typical machine. > > These patches have been cleaned up and modernized v.s. the versions > in production; I've included notes under the fold in the patches. > They're based on v5.8-rc6. > > The statistics live under debugfs, in a tree rooted at: > > /sys/kernel/debug/metricfs Is debugfs right place for this? It looks like something where people would expect compatibility guarantees... Pavel --
On Sat, 8 Aug 2020 09:59:34 -0600 David Ahern wrote: > On 8/7/20 8:06 PM, Andrew Lunn wrote: > > So i personally don't think netdev statistics is a good idea, i doubt > > it scales. > > +1 +1 Please stop using networking as the example for this. We don't want file interfaces for stats, and we already made that very clear last time.