diff mbox series

[v3,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors

Message ID 20241206090328.4758-1-laoar.shao@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series [v3,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors | expand

Commit Message

Yafang Shao Dec. 6, 2024, 9:03 a.m. UTC
We observed a high number of rx_discards_phy events on some servers when
running `ethtool -S`. However, this important counter is not currently
reflected in the /proc/net/dev statistics file, making it challenging to
monitor effectively.

Since rx_fifo_errors represents receive FIFO errors on this network
deivice, it makes sense to include rx_discards_phy in this counter to
enhance monitoring visibility. This change will help administrators track
these events more effectively through standard interfaces.

I have also verified the manual of ethtool counters on mlx5 [0], it seems
that rx_discards_phy and rx_fifo_errors has the same meaning:

  rx_discards_phy: The number of received packets dropped due to lack of
                   buffers on a physical port. If this counter is
                   increasing, it implies that the adapter is congested and
                   cannot absorb the traffic coming from the network.

                   ConnectX-3 naming : rx_fifo_errors

Link: https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters [0]
Suggested-by: Tariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Gal Pressman <gal@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 1 file changed, 1 insertion(+)

Changes:
v2->v3:
- Drop the changes on the Doc

v1->v2: https://lore.kernel.org/netdev/20241114021711.5691-1-laoar.shao@gmail.com/
- Use rx_fifo_errors instead (Tariq)
- Update the if_link.h accordingly

v1: https://lore.kernel.org/netdev/20241106064015.4118-1-laoar.shao@gmail.com/

Comments

Jakub Kicinski Dec. 8, 2024, 1:38 a.m. UTC | #1
On Fri,  6 Dec 2024 17:03:28 +0800 Yafang Shao wrote:
> We observed a high number of rx_discards_phy events on some servers when
> running `ethtool -S`. However, this important counter is not currently
> reflected in the /proc/net/dev statistics file, making it challenging to
> monitor effectively.
> 
> Since rx_fifo_errors represents receive FIFO errors on this network
> deivice, it makes sense to include rx_discards_phy in this counter to
> enhance monitoring visibility. This change will help administrators track
> these events more effectively through standard interfaces.

It's not a standard if there is no definition applicable across vendors.
Count it as generic rx_dropped. If you disagree with me please carry
this tag on future versions:

Nacked-by: Jakub Kicinski <kuba@kernel.org>
Yafang Shao Dec. 8, 2024, 6:01 a.m. UTC | #2
On Sun, Dec 8, 2024 at 9:38 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri,  6 Dec 2024 17:03:28 +0800 Yafang Shao wrote:
> > We observed a high number of rx_discards_phy events on some servers when
> > running `ethtool -S`. However, this important counter is not currently
> > reflected in the /proc/net/dev statistics file, making it challenging to
> > monitor effectively.
> >
> > Since rx_fifo_errors represents receive FIFO errors on this network
> > deivice, it makes sense to include rx_discards_phy in this counter to
> > enhance monitoring visibility. This change will help administrators track
> > these events more effectively through standard interfaces.
>
> It's not a standard if there is no definition applicable across vendors.
> Count it as generic rx_dropped.

Thank you for your suggestion. I'm okay with counting it as generic
rx_dropped as long as we have a metric to track it.
I will send a new version.

--
Regards
Yafang
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e601324a690a..15b1a3e6e641 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3916,6 +3916,7 @@  mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	}
 
 	stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer;
+	stats->rx_fifo_errors = PPORT_2863_GET(pstats, if_in_discards);
 
 	stats->rx_length_errors =
 		PPORT_802_3_GET(pstats, a_in_range_length_errors) +