diff mbox series

[v2,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors

Message ID 20241114021711.5691-1-laoar.shao@gmail.com (mailing list archive)
State Not Applicable
Headers show
Series [v2,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors | expand

Commit Message

Yafang Shao Nov. 14, 2024, 2:17 a.m. UTC
We observed a high number of rx_discards_phy events on some servers when
running `ethtool -S`. However, this important counter is not currently
reflected in the /proc/net/dev statistics file, making it challenging to
monitor effectively.

Since rx_fifo_errors represents receive FIFO errors on this network
deivice, it makes sense to include rx_discards_phy in this counter to
enhance monitoring visibility. This change will help administrators track
these events more effectively through standard interfaces.

I’ve also reviewed the manual for ethtool counters on mlx5 [0], and it
appears that rx_discards_phy and rx_fifo_errors have the same meaning.

  rx_discards_phy: The number of received packets dropped due to lack of
                   buffers on a physical port. If this counter is
                   increasing, it implies that the adapter is congested and
                   cannot absorb the traffic coming from the network.

                   ConnectX-3 naming : rx_fifo_errors

The documentation in if_link.h has been updated accordingly.

Link: https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters [0]
Suggested-by: Tariq Toukan <ttoukan.linux@gmail.com>
Signed-off-by: Yafang Shao <laoar.shao@gmail.com>
Cc: Tariq Toukan <ttoukan.linux@gmail.com>
Cc: Saeed Mahameed <saeedm@nvidia.com>
Cc: Leon Romanovsky <leon@kernel.org>
Cc: Gal Pressman <gal@nvidia.com>
Cc: Jakub Kicinski <kuba@kernel.org>
---
 drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 +
 include/uapi/linux/if_link.h                      | 4 ----
 2 files changed, 1 insertion(+), 4 deletions(-)

Changes:
v1->v2:
- Use rx_fifo_errors instead (Tariq)
- Update the if_link.h accordingly

v1: https://lore.kernel.org/netdev/20241106064015.4118-1-laoar.shao@gmail.com/

Comments

Jakub Kicinski Nov. 15, 2024, 2:27 a.m. UTC | #1
On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
> - *   Not recommended for use in drivers for high speed interfaces.

I thought I suggested we provide clear guidance on this counter being
related to processing pipeline being to slow, vs host backpressure.
Just deleting the line that says "don't use" is not going to cut it :|
Yafang Shao Nov. 15, 2024, 3:56 a.m. UTC | #2
On Fri, Nov 15, 2024 at 10:27 AM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
> > - *   Not recommended for use in drivers for high speed interfaces.
>
> I thought I suggested we provide clear guidance on this counter being
> related to processing pipeline being to slow, vs host backpressure.
> Just deleting the line that says "don't use" is not going to cut it :|

Hello Jakub,

After investigating other network drivers, I found that they all
report this metric to rx_missed_errors:

- i40e
  The corresponding ethtool metric is port.rx_discards, which was
mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
rx_missed_errors for buffer exhaustion").

- broadcom
  The equivalent metric is rx_total_discard_pkts, reported as
rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
ethernet driver")

Given this, it seems we should align with the standard practice and
report this metric to rx_missed_errors.

Tariq, what are your thoughts?

--
Regards

Yafang
Jakub Kicinski Nov. 15, 2024, 4:32 a.m. UTC | #3
On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
> > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:  
> > > - *   Not recommended for use in drivers for high speed interfaces.  
> >
> > I thought I suggested we provide clear guidance on this counter being
> > related to processing pipeline being to slow, vs host backpressure.
> > Just deleting the line that says "don't use" is not going to cut it :|  
> 
> Hello Jakub,
> 
> After investigating other network drivers, I found that they all
> report this metric to rx_missed_errors:
> 
> - i40e
>   The corresponding ethtool metric is port.rx_discards, which was
> mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
> rx_missed_errors for buffer exhaustion").
> 
> - broadcom
>   The equivalent metric is rx_total_discard_pkts, reported as
> rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
> ethernet driver")
> 
> Given this, it seems we should align with the standard practice and
> report this metric to rx_missed_errors.
> 
> Tariq, what are your thoughts?

mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
different kind of drops than the drops reported as 'missed'.
The distinction is useful in production in my experience working with
mlx5 devices.
Yafang Shao Nov. 15, 2024, 5:50 a.m. UTC | #4
On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
>
> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
> > > > - *   Not recommended for use in drivers for high speed interfaces.
> > >
> > > I thought I suggested we provide clear guidance on this counter being
> > > related to processing pipeline being to slow, vs host backpressure.
> > > Just deleting the line that says "don't use" is not going to cut it :|
> >
> > Hello Jakub,
> >
> > After investigating other network drivers, I found that they all
> > report this metric to rx_missed_errors:
> >
> > - i40e
> >   The corresponding ethtool metric is port.rx_discards, which was
> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
> > rx_missed_errors for buffer exhaustion").
> >
> > - broadcom
> >   The equivalent metric is rx_total_discard_pkts, reported as
> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
> > ethernet driver")
> >
> > Given this, it seems we should align with the standard practice and
> > report this metric to rx_missed_errors.
> >
> > Tariq, what are your thoughts?
>
> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
> different kind of drops than the drops reported as 'missed'.
> The distinction is useful in production in my experience working with
> mlx5 devices.

From the manual [0], it says :

The number of received packets dropped due to lack of buffers on a
physical port. If this counter is increasing, it implies that the
adapter is congested and cannot absorb the traffic coming from the
network.

Would it be possible to add this description to if_link.h?

Frankly, it doesn’t make much difference to end users like me whether
this is reported to rx_missed_errors or rx_fifo_errors; the main goal
is simply to monitor this metric to flag any issues...

[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters


--
Regards
Yafang
Saeed Mahameed Nov. 15, 2024, 8:01 a.m. UTC | #5
On 15 Nov 13:50, Yafang Shao wrote:
>On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote:
>>
>> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote:
>> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
>> > > > - *   Not recommended for use in drivers for high speed interfaces.
>> > >
>> > > I thought I suggested we provide clear guidance on this counter being
>> > > related to processing pipeline being to slow, vs host backpressure.
>> > > Just deleting the line that says "don't use" is not going to cut it :|
>> >
>> > Hello Jakub,
>> >
>> > After investigating other network drivers, I found that they all
>> > report this metric to rx_missed_errors:
>> >
>> > - i40e
>> >   The corresponding ethtool metric is port.rx_discards, which was
>> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add
>> > rx_missed_errors for buffer exhaustion").
>> >
>> > - broadcom
>> >   The equivalent metric is rx_total_discard_pkts, reported as
>> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom
>> > ethernet driver")
>> >
>> > Given this, it seems we should align with the standard practice and
>> > report this metric to rx_missed_errors.
>> >
>> > Tariq, what are your thoughts?
>>
>> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very
>> different kind of drops than the drops reported as 'missed'.
>> The distinction is useful in production in my experience working with
>> mlx5 devices.

Yes rx_missed_errors is lack of software descriptors, please don't mix it
with hardware pipeline FIFO discards.

FYI: mlx5 reports more discards related to pipeline see below, 
especially for per PF/VF buffers. When these are advancing, usually they 
indicate congestion control issues, for example pause frames is off.


rx_prio[p]_buf_discard	
The number of packets discarded by device due to lack of per host receive buffers.

rx_prio[p]_cong_discard	
The number of packets discarded by device due to per host congestion.

rx_prio[p]_discard (rx_discard_phy is the sum of all rx_prio[p]_discard)
The number of packets discarded by device due to lack of receive buffers.

That being said, these are not errors, so reporting them via rx_xyz_error
is very misleading, rx_missed_errors is a special case though, and let's
keep it that way.

>
>From the manual [0], it says :
>
>The number of received packets dropped due to lack of buffers on a
>physical port. If this counter is increasing, it implies that the
>adapter is congested and cannot absorb the traffic coming from the
>network.
>
>Would it be possible to add this description to if_link.h?
>
>Frankly, it doesn’t make much difference to end users like me whether
>this is reported to rx_missed_errors or rx_fifo_errors; the main goal
>is simply to monitor this metric to flag any issues...
>

not rx_missed_errors please, it is exclusive for software lack of buffers.

Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
extend, this is the place. 

RFC2863[1] defines this type of discards as ifInDiscards. So let's add
it to ehttool std stats. mlx5 reports most of them already to driver custom
ethtool -S 

[1] https://datatracker.ietf.org/doc/html/rfc2863

- Saeed

>[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters
>
>
>--
>Regards
>Yafang
Jakub Kicinski Nov. 15, 2024, 7:24 p.m. UTC | #6
On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote:
> not rx_missed_errors please, it is exclusive for software lack of buffers.
> 
> Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
> extend, this is the place. 
> 
> RFC2863[1] defines this type of discards as ifInDiscards. So let's add
> it to ehttool std stats. mlx5 reports most of them already to driver custom
> ethtool -S 

We can, but honestly I'd just make sure they are counted in rx_dropped
and leave the detailed breakdown in ethtool -S. The value of the common
stats kicks in when we have multiple NICs with reasonably similar
interpretations. Hopefully for missed we do have that interpretation.
Anything further down in the pipeline will be device specific.
Or at least I haven't figured out sufficient commonalities among 
the devices I deal with in production..
Saeed Mahameed Nov. 15, 2024, 7:54 p.m. UTC | #7
On 15 Nov 11:24, Jakub Kicinski wrote:
>On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote:
>> not rx_missed_errors please, it is exclusive for software lack of buffers.
>>
>> Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
>> extend, this is the place.
>>
>> RFC2863[1] defines this type of discards as ifInDiscards. So let's add
>> it to ehttool std stats. mlx5 reports most of them already to driver custom
>> ethtool -S
>
>We can, but honestly I'd just make sure they are counted in rx_dropped

rx_dropped: Number of packets received but not processed,
  *   e.g. due to lack of resources or unsupported protocol.
  *   For hardware interfaces this counter may include packets discarded
  *   due to L2 address filtering but should not include packets dropped
                                  ^^^^^^^^^^^^^^
  *   by the device due to buffer exhaustion which are counted separately in
                           ^^^^^^^^^^^^^^^^^
  *   @rx_missed_errors (since procfs folds those two counters together).
      ^^^^^^^^^^^^^^^^^

I think we should use rx_fifo_errors for this and update documentation:

rx_missed_errors --> host buffers
rx_fifo_errors   --> device buffers
rx_dropped       --> unsupported portocols, filter drops, link down, etc..

rx_dropped doesn't reflect a performance issue, but a configuration mishap
"lack of resources" should be removed from the doc or improved
since I believe it meant "allocation failure of resources" such as skbs,
which is the common use case.

>and leave the detailed breakdown in ethtool -S. The value of the common
>stats kicks in when we have multiple NICs with reasonably similar
>interpretations. Hopefully for missed we do have that interpretation.
>Anything further down in the pipeline will be device specific.
>Or at least I haven't figured out sufficient commonalities among
>the devices I deal with in production..
>
Jakub Kicinski Nov. 15, 2024, 9:25 p.m. UTC | #8
On Fri, 15 Nov 2024 11:54:38 -0800 Saeed Mahameed wrote:
> >We can, but honestly I'd just make sure they are counted in rx_dropped  
> 
> rx_dropped: Number of packets received but not processed,
>   *   e.g. due to lack of resources or unsupported protocol.
>   *   For hardware interfaces this counter may include packets discarded
>   *   due to L2 address filtering but should not include packets dropped
>                                   ^^^^^^^^^^^^^^
>   *   by the device due to buffer exhaustion which are counted separately in
>                            ^^^^^^^^^^^^^^^^^
>   *   @rx_missed_errors (since procfs folds those two counters together).
>       ^^^^^^^^^^^^^^^^^

I presume you quote this comment to indicate the rx_dropped should
count packets dropped due to buffer exhaustion? If yes then you don't
understand the comment. If no then I don't understand why you're
quoting it.

> I think we should use rx_fifo_errors for this and update documentation:
> 
> rx_missed_errors --> host buffers
> rx_fifo_errors   --> device buffers

In theory I'd love to use fifo errors to mean device buffer drops.
In practice devices can backpressure due to host slowness, so the
device drops are hard to categorize. The vendors themselves have
limited understanding of how their devices will behave under real
workloads. And once devices are deployed it may be too late to change
definitions.

> rx_dropped       --> unsupported portocols, filter drops, link down, etc..
Saeed Mahameed Nov. 15, 2024, 10:09 p.m. UTC | #9
On 15 Nov 13:25, Jakub Kicinski wrote:
>On Fri, 15 Nov 2024 11:54:38 -0800 Saeed Mahameed wrote:
>> >We can, but honestly I'd just make sure they are counted in rx_dropped
>>
>> rx_dropped: Number of packets received but not processed,
>>   *   e.g. due to lack of resources or unsupported protocol.
>>   *   For hardware interfaces this counter may include packets discarded
>>   *   due to L2 address filtering but should not include packets dropped
>>                                   ^^^^^^^^^^^^^^
>>   *   by the device due to buffer exhaustion which are counted separately in
>>                            ^^^^^^^^^^^^^^^^^
>>   *   @rx_missed_errors (since procfs folds those two counters together).
>>       ^^^^^^^^^^^^^^^^^
>
>I presume you quote this comment to indicate the rx_dropped should
>count packets dropped due to buffer exhaustion? If yes then you don't
>understand the comment. If no then I don't understand why you're
>quoting it.
>

I quoted this because you suggested to use rx_dropped. It's not a good fit.
In your previous reply you said: 
"but honestly I'd just make sure they are counted in rx_dropped"

>> I think we should use rx_fifo_errors for this and update documentation:
>>
>> rx_missed_errors --> host buffers
>> rx_fifo_errors   --> device buffers
>
>In theory I'd love to use fifo errors to mean device buffer drops.
>In practice devices can backpressure due to host slowness, so the

So what? host slowness will always be counted in rx_missed_errors.
if you see both rx_missed_erros and fifo_errors progressing, you can make
the connection.. With CX devices out_of_buffer "missed_errors" can never cause
fifo drops "fifo_errors".

>device drops are hard to categorize. The vendors themselves have
>limited understanding of how their devices will behave under real
>workloads. And once devices are deployed it may be too late to change
>definitions.
>

Forget about vendors, here is my simplified categorization that we could
align with easily among all vendors and users

// delivered to SW but dropped in SW
1) seen by sw dropped by sw. (rx_dropped?)

// couldn't deliver to SW (back-pressure)
2) slow SW: passed pipeline/fifo: but SW lack of descriptors (rx_missed_errors)
3) overflow: pipeline/fifo overflow at any point before SW queue (rx_fifo_errors)
4) expected drops: steering/flow filters, configuration, carrier
    down, etc ).. no counter in rtnl_stats exists for this
  
// couldn't deliver to SW, errors
5) errors, dropped due to packet related errors or HW errors

If all vendors agree, with some repurposing and renaming of the counters
maybe we can achieve the above with minimal backward compatibility breakage.
To solve this once and for all, we need the documentation to reflect strong
and clear definitions even if it renders existing implementation/interpretation 
to be wrong. Otherwise this will never be solved.

>> rx_dropped       --> unsupported portocols, filter drops, link down, etc..
Jakub Kicinski Nov. 15, 2024, 10:42 p.m. UTC | #10
On Fri, 15 Nov 2024 14:09:02 -0800 Saeed Mahameed wrote:
> >> rx_dropped: Number of packets received but not processed,
> >>   *   e.g. due to lack of resources or unsupported protocol.
> >>   *   For hardware interfaces this counter may include packets discarded
> >>   *   due to L2 address filtering but should not include packets dropped
> >>                                   ^^^^^^^^^^^^^^
> >>   *   by the device due to buffer exhaustion which are counted separately in
> >>                            ^^^^^^^^^^^^^^^^^
> >>   *   @rx_missed_errors (since procfs folds those two counters together).
> >>       ^^^^^^^^^^^^^^^^^  
> >
> >I presume you quote this comment to indicate the rx_dropped should
> >count packets dropped due to buffer exhaustion? If yes then you don't
> >understand the comment. If no then I don't understand why you're
> >quoting it.
> 
> I quoted this because you suggested to use rx_dropped. It's not a good fit.
> In your previous reply you said: 
> "but honestly I'd just make sure they are counted in rx_dropped"

The comment just says not to add what's already counted in missed,
because profcs adds the two and we'd end up double counting.
Yafang Shao Nov. 17, 2024, 6:33 a.m. UTC | #11
On Sat, Nov 16, 2024 at 3:54 AM Saeed Mahameed <saeed@kernel.org> wrote:
>
> On 15 Nov 11:24, Jakub Kicinski wrote:
> >On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote:
> >> not rx_missed_errors please, it is exclusive for software lack of buffers.
> >>
> >> Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to
> >> extend, this is the place.
> >>
> >> RFC2863[1] defines this type of discards as ifInDiscards. So let's add
> >> it to ehttool std stats. mlx5 reports most of them already to driver custom
> >> ethtool -S
> >
> >We can, but honestly I'd just make sure they are counted in rx_dropped
>
> rx_dropped: Number of packets received but not processed,
>   *   e.g. due to lack of resources or unsupported protocol.
>   *   For hardware interfaces this counter may include packets discarded
>   *   due to L2 address filtering but should not include packets dropped
>                                   ^^^^^^^^^^^^^^
>   *   by the device due to buffer exhaustion which are counted separately in
>                            ^^^^^^^^^^^^^^^^^
>   *   @rx_missed_errors (since procfs folds those two counters together).
>       ^^^^^^^^^^^^^^^^^
>
> I think we should use rx_fifo_errors for this and update documentation:

Hello Saeed,

Could we apply the change to
`drivers/net/ethernet/mellanox/mlx5/core/en_main.c` first and update
the documentation in a future patch? Updating the documentation
perfectly seems like a challenging task, and I’m not the best person
for that job ;)


--
Regards
Yafang
Gal Pressman Nov. 20, 2024, 6:04 a.m. UTC | #12
On 16/11/2024 0:42, Jakub Kicinski wrote:
> On Fri, 15 Nov 2024 14:09:02 -0800 Saeed Mahameed wrote:
>>>> rx_dropped: Number of packets received but not processed,
>>>>   *   e.g. due to lack of resources or unsupported protocol.
>>>>   *   For hardware interfaces this counter may include packets discarded
>>>>   *   due to L2 address filtering but should not include packets dropped
>>>>                                   ^^^^^^^^^^^^^^
>>>>   *   by the device due to buffer exhaustion which are counted separately in
>>>>                            ^^^^^^^^^^^^^^^^^
>>>>   *   @rx_missed_errors (since procfs folds those two counters together).
>>>>       ^^^^^^^^^^^^^^^^^  
>>>
>>> I presume you quote this comment to indicate the rx_dropped should
>>> count packets dropped due to buffer exhaustion? If yes then you don't
>>> understand the comment. If no then I don't understand why you're
>>> quoting it.
>>
>> I quoted this because you suggested to use rx_dropped. It's not a good fit.
>> In your previous reply you said: 
>> "but honestly I'd just make sure they are counted in rx_dropped"
> 
> The comment just says not to add what's already counted in missed,
> because profcs adds the two and we'd end up double counting.

So this is a procfs thing only?
Does that mean that netlink's rx_dropped might be different than procfs'
rx_dropped?
diff mbox series

Patch

diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
index e601324a690a..15b1a3e6e641 100644
--- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
+++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c
@@ -3916,6 +3916,7 @@  mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats)
 	}
 
 	stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer;
+	stats->rx_fifo_errors = PPORT_2863_GET(pstats, if_in_discards);
 
 	stats->rx_length_errors =
 		PPORT_802_3_GET(pstats, a_in_range_length_errors) +
diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h
index 6dc258993b17..16dfaf5f47ca 100644
--- a/include/uapi/linux/if_link.h
+++ b/include/uapi/linux/if_link.h
@@ -144,10 +144,6 @@  struct rtnl_link_stats {
  *   not correspond one-to-one with dropped packets.
  *
  *   This statistics was used interchangeably with @rx_over_errors.
- *   Not recommended for use in drivers for high speed interfaces.
- *
- *   This statistic is used on software devices, e.g. to count software
- *   packet queue overflow (can) or sequencing errors (GRE).
  *
  * @rx_missed_errors: Count of packets missed by the host.
  *   Folded into the "drop" counter in `/proc/net/dev`.