Message ID | 20241110045221.4959-1-00107082@163.com (mailing list archive) |
---|---|
State | Rejected |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | net/core/net-procfs: use seq_put_decimal_ull_width() for decimal values in /proc/net/dev | expand |
On 11/10/24 05:52, David Wang wrote: > seq_printf() is costy, when reading /proc/net/dev, profiling indicates > about 13% samples of seq_printf(): > dev_seq_show(98.350% 428046/435229) > dev_seq_printf_stats(99.777% 427092/428046) > dev_get_stats(86.121% 367814/427092) > rtl8169_get_stats64(98.519% 362365/367814) > dev_fetch_sw_netstats(0.554% 2038/367814) > loopback_get_stats64(0.250% 919/367814) > dev_get_tstats64(0.077% 284/367814) > netdev_stats_to_stats64(0.051% 189/367814) > _find_next_bit(0.029% 106/367814) > seq_printf(13.719% 58594/427092) > And on a system with one wireless interface, timing for 1 million rounds of > stress reading /proc/net/dev: > real 0m51.828s > user 0m0.225s > sys 0m51.671s > On average, reading /proc/net/dev takes ~0.051ms > > With this patch, extra costs parsing format string by seq_printf() can be > optimized out, and the timing for 1 million rounds of read is: > real 0m49.127s > user 0m0.295s > sys 0m48.552s > On average, ~0.048ms reading /proc/net/dev, a ~6% improvement. > > Even though dev_get_stats() takes up the majority of the reading process, > the improvement is still significant; > And the improvement may vary with the physical interface on the system. > > Signed-off-by: David Wang <00107082@163.com> If the user-space is concerned with performances, it must use netlink. Optimizing a legacy interface gives IMHO a very wrong message. I'm sorry, I think we should not accept this change. /P
At 2024-11-14 17:17:32, "Paolo Abeni" <pabeni@redhat.com> wrote: > > >On 11/10/24 05:52, David Wang wrote: >> seq_printf() is costy, when reading /proc/net/dev, profiling indicates >> about 13% samples of seq_printf(): >> dev_seq_show(98.350% 428046/435229) >> dev_seq_printf_stats(99.777% 427092/428046) >> dev_get_stats(86.121% 367814/427092) >> rtl8169_get_stats64(98.519% 362365/367814) >> dev_fetch_sw_netstats(0.554% 2038/367814) >> loopback_get_stats64(0.250% 919/367814) >> dev_get_tstats64(0.077% 284/367814) >> netdev_stats_to_stats64(0.051% 189/367814) >> _find_next_bit(0.029% 106/367814) >> seq_printf(13.719% 58594/427092) >> And on a system with one wireless interface, timing for 1 million rounds of >> stress reading /proc/net/dev: >> real 0m51.828s >> user 0m0.225s >> sys 0m51.671s >> On average, reading /proc/net/dev takes ~0.051ms >> >> With this patch, extra costs parsing format string by seq_printf() can be >> optimized out, and the timing for 1 million rounds of read is: >> real 0m49.127s >> user 0m0.295s >> sys 0m48.552s >> On average, ~0.048ms reading /proc/net/dev, a ~6% improvement. >> >> Even though dev_get_stats() takes up the majority of the reading process, >> the improvement is still significant; >> And the improvement may vary with the physical interface on the system. >> >> Signed-off-by: David Wang <00107082@163.com> > >If the user-space is concerned with performances, it must use netlink. >Optimizing a legacy interface gives IMHO a very wrong message. > >I'm sorry, I think we should not accept this change. It's OK. I have been using /proc/net/dev to gauge the transmit/receive rate for each interface, and /proc/net/netstat for abnormalities in my monitoring tools. I guess my knowledge are quite out of date now, I will look into netlink; And thanks for information. > >/P Thanks David
diff --git a/net/core/net-procfs.c b/net/core/net-procfs.c index fa6d3969734a..a0d6c5b32b58 100644 --- a/net/core/net-procfs.c +++ b/net/core/net-procfs.c @@ -46,23 +46,26 @@ static void dev_seq_printf_stats(struct seq_file *seq, struct net_device *dev) struct rtnl_link_stats64 temp; const struct rtnl_link_stats64 *stats = dev_get_stats(dev, &temp); - seq_printf(seq, "%6s: %7llu %7llu %4llu %4llu %4llu %5llu %10llu %9llu " - "%8llu %7llu %4llu %4llu %4llu %5llu %7llu %10llu\n", - dev->name, stats->rx_bytes, stats->rx_packets, - stats->rx_errors, - stats->rx_dropped + stats->rx_missed_errors, - stats->rx_fifo_errors, - stats->rx_length_errors + stats->rx_over_errors + - stats->rx_crc_errors + stats->rx_frame_errors, - stats->rx_compressed, stats->multicast, - stats->tx_bytes, stats->tx_packets, - stats->tx_errors, stats->tx_dropped, - stats->tx_fifo_errors, stats->collisions, - stats->tx_carrier_errors + - stats->tx_aborted_errors + - stats->tx_window_errors + - stats->tx_heartbeat_errors, - stats->tx_compressed); + seq_printf(seq, "%6s:", dev->name); + seq_put_decimal_ull_width(seq, " ", stats->rx_bytes, 7); + seq_put_decimal_ull_width(seq, " ", stats->rx_packets, 7); + seq_put_decimal_ull_width(seq, " ", stats->rx_errors, 4); + seq_put_decimal_ull_width(seq, " ", stats->rx_dropped + stats->rx_missed_errors, 4); + seq_put_decimal_ull_width(seq, " ", stats->rx_fifo_errors, 4); + seq_put_decimal_ull_width(seq, " ", stats->rx_length_errors + stats->rx_over_errors + + stats->rx_crc_errors + stats->rx_frame_errors, 5); + seq_put_decimal_ull_width(seq, " ", stats->rx_compressed, 10); + seq_put_decimal_ull_width(seq, " ", stats->multicast, 9); + seq_put_decimal_ull_width(seq, " ", stats->tx_bytes, 8); + seq_put_decimal_ull_width(seq, " ", stats->tx_packets, 7); + seq_put_decimal_ull_width(seq, " ", stats->tx_errors, 4); + seq_put_decimal_ull_width(seq, " ", stats->tx_dropped, 4); + seq_put_decimal_ull_width(seq, " ", stats->tx_fifo_errors, 4); + seq_put_decimal_ull_width(seq, " ", stats->collisions, 5); + seq_put_decimal_ull_width(seq, " ", stats->tx_carrier_errors + stats->tx_aborted_errors + + stats->tx_window_errors + stats->tx_heartbeat_errors, 7); + seq_put_decimal_ull_width(seq, " ", stats->tx_compressed, 10); + seq_putc(seq, '\n'); } /*
seq_printf() is costy, when reading /proc/net/dev, profiling indicates about 13% samples of seq_printf(): dev_seq_show(98.350% 428046/435229) dev_seq_printf_stats(99.777% 427092/428046) dev_get_stats(86.121% 367814/427092) rtl8169_get_stats64(98.519% 362365/367814) dev_fetch_sw_netstats(0.554% 2038/367814) loopback_get_stats64(0.250% 919/367814) dev_get_tstats64(0.077% 284/367814) netdev_stats_to_stats64(0.051% 189/367814) _find_next_bit(0.029% 106/367814) seq_printf(13.719% 58594/427092) And on a system with one wireless interface, timing for 1 million rounds of stress reading /proc/net/dev: real 0m51.828s user 0m0.225s sys 0m51.671s On average, reading /proc/net/dev takes ~0.051ms With this patch, extra costs parsing format string by seq_printf() can be optimized out, and the timing for 1 million rounds of read is: real 0m49.127s user 0m0.295s sys 0m48.552s On average, ~0.048ms reading /proc/net/dev, a ~6% improvement. Even though dev_get_stats() takes up the majority of the reading process, the improvement is still significant; And the improvement may vary with the physical interface on the system. Signed-off-by: David Wang <00107082@163.com> --- net/core/net-procfs.c | 37 ++++++++++++++++++++----------------- 1 file changed, 20 insertions(+), 17 deletions(-)