Message ID | 20240227143124.21015-1-przemyslaw.kitszel@intel.com (mailing list archive) |
---|---|
State | Awaiting Upstream |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [iwl-net] ice: fix stats being updated by way too large values | expand |
On Tue, Feb 27, 2024 at 03:31:06PM +0100, Przemek Kitszel wrote: > Simplify stats accumulation logic to fix the case where we don't take > previous stat value into account, we should always respect it. > > Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of > magnitude too big during OpenStack reconfiguration events, possibly other > reconfiguration cases too. > > The regression was reported to be between 6.1 and 6.2, so I was almost > certain that on of the two "preserve stats over reset" commits were the > culprit. While reading the code, it was found that in some cases we will > increase the stats by arbitrarily large number (thanks to ignoring "-prev" > part of condition, after zeroing it). > > Note that this fixes also the case where we were around limits of u64, but > that was not the regression reported. > > Full disclosure: I remember suggesting this particular piece of code to > Ben a few years ago, so blame on me. > > Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset") > Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com> > Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com > Reported-by: Christian Rohmann <christian.rohmann@inovex.de> > Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> Reviewed-by: Simon Horman <horms@kernel.org>
> -----Original Message----- > From: Intel-wired-lan <intel-wired-lan-bounces@osuosl.org> On Behalf Of Przemek Kitszel > Sent: Tuesday, February 27, 2024 8:01 PM > To: intel-wired-lan@lists.osuosl.org > Cc: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com>; netdev@vger.kernel.org; Czapnik, Lukasz <lukasz.czapnik@intel.com>; Lobakin, Aleksander <aleksander.lobakin@intel.com>; Nguyen, Anthony L <anthony.l.nguyen@intel.com>; Kitszel, Przemyslaw <przemyslaw.kitszel@intel.com>; Keller, Jacob E <jacob.e.keller@intel.com>; Christian Rohmann <christian.rohmann@inovex.de> > Subject: [Intel-wired-lan] [PATCH iwl-net] ice: fix stats being updated by way too large values > > Simplify stats accumulation logic to fix the case where we don't take > previous stat value into account, we should always respect it. > > Main netdev stats of our PF (Tx/Rx packets/bytes) were reported orders of > magnitude too big during OpenStack reconfiguration events, possibly other > reconfiguration cases too. > > The regression was reported to be between 6.1 and 6.2, so I was almost > certain that on of the two "preserve stats over reset" commits were the > culprit. While reading the code, it was found that in some cases we will > increase the stats by arbitrarily large number (thanks to ignoring "-prev" > part of condition, after zeroing it). > > Note that this fixes also the case where we were around limits of u64, but > that was not the regression reported. > > Full disclosure: I remember suggesting this particular piece of code to > Ben a few years ago, so blame on me. > > Fixes: 2fd5e433cd26 ("ice: Accumulate HW and Netdev statistics over reset") > Reported-by: Nebojsa Stevanovic <nebojsa.stevanovic@gcore.com> > Link: https://lore.kernel.org/intel-wired-lan/VI1PR02MB439744DEDAA7B59B9A2833FE912EA@VI1PR02MB4397.eurprd02.prod.outlook.com > Reported-by: Christian Rohmann <christian.rohmann@inovex.de> > Link: https://lore.kernel.org/intel-wired-lan/f38a6ca4-af05-48b1-a3e6-17ef2054e525@inovex.de > Reviewed-by: Jacob Keller <jacob.e.keller@intel.com> > Signed-off-by: Przemek Kitszel <przemyslaw.kitszel@intel.com> > --- > drivers/net/ethernet/intel/ice/ice_main.c | 24 +++++++++++------------ > 1 file changed, 11 insertions(+), 13 deletions(-) > Tested-by: Pucha Himasekhar Reddy <himasekharx.reddy.pucha@intel.com> (A Contingent worker at Intel)
diff --git a/drivers/net/ethernet/intel/ice/ice_main.c b/drivers/net/ethernet/intel/ice/ice_main.c index dd4a9bc0dfdc..a7c7b1b633a5 100644 --- a/drivers/net/ethernet/intel/ice/ice_main.c +++ b/drivers/net/ethernet/intel/ice/ice_main.c @@ -6736,6 +6736,7 @@ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi) { struct rtnl_link_stats64 *net_stats, *stats_prev; struct rtnl_link_stats64 *vsi_stats; + struct ice_pf *pf = vsi->back; u64 pkts, bytes; int i; @@ -6781,21 +6782,18 @@ static void ice_update_vsi_ring_stats(struct ice_vsi *vsi) net_stats = &vsi->net_stats; stats_prev = &vsi->net_stats_prev; - /* clear prev counters after reset */ - if (vsi_stats->tx_packets < stats_prev->tx_packets || - vsi_stats->rx_packets < stats_prev->rx_packets) { - stats_prev->tx_packets = 0; - stats_prev->tx_bytes = 0; - stats_prev->rx_packets = 0; - stats_prev->rx_bytes = 0; + /* Update netdev counters, but keep in mind that values could start at + * random value after PF reset. And as we increase the reported stat by + * diff of Prev-Cur, we need to be sure that Prev is valid. If it's not, + * let's skip this round. + */ + if (likely(pf->stat_prev_loaded)) { + net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets; + net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes; + net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets; + net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes; } - /* update netdev counters */ - net_stats->tx_packets += vsi_stats->tx_packets - stats_prev->tx_packets; - net_stats->tx_bytes += vsi_stats->tx_bytes - stats_prev->tx_bytes; - net_stats->rx_packets += vsi_stats->rx_packets - stats_prev->rx_packets; - net_stats->rx_bytes += vsi_stats->rx_bytes - stats_prev->rx_bytes; - stats_prev->tx_packets = vsi_stats->tx_packets; stats_prev->tx_bytes = vsi_stats->tx_bytes; stats_prev->rx_packets = vsi_stats->rx_packets;