Message ID | 20241114021711.5691-1-laoar.shao@gmail.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [v2,net-next] net/mlx5e: Report rx_discards_phy via rx_fifo_errors | expand |
On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote:
> - * Not recommended for use in drivers for high speed interfaces.
I thought I suggested we provide clear guidance on this counter being
related to processing pipeline being to slow, vs host backpressure.
Just deleting the line that says "don't use" is not going to cut it :|
On Fri, Nov 15, 2024 at 10:27 AM Jakub Kicinski <kuba@kernel.org> wrote: > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote: > > - * Not recommended for use in drivers for high speed interfaces. > > I thought I suggested we provide clear guidance on this counter being > related to processing pipeline being to slow, vs host backpressure. > Just deleting the line that says "don't use" is not going to cut it :| Hello Jakub, After investigating other network drivers, I found that they all report this metric to rx_missed_errors: - i40e The corresponding ethtool metric is port.rx_discards, which was mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add rx_missed_errors for buffer exhaustion"). - broadcom The equivalent metric is rx_total_discard_pkts, reported as rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom ethernet driver") Given this, it seems we should align with the standard practice and report this metric to rx_missed_errors. Tariq, what are your thoughts? -- Regards Yafang
On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote: > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote: > > > - * Not recommended for use in drivers for high speed interfaces. > > > > I thought I suggested we provide clear guidance on this counter being > > related to processing pipeline being to slow, vs host backpressure. > > Just deleting the line that says "don't use" is not going to cut it :| > > Hello Jakub, > > After investigating other network drivers, I found that they all > report this metric to rx_missed_errors: > > - i40e > The corresponding ethtool metric is port.rx_discards, which was > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add > rx_missed_errors for buffer exhaustion"). > > - broadcom > The equivalent metric is rx_total_discard_pkts, reported as > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom > ethernet driver") > > Given this, it seems we should align with the standard practice and > report this metric to rx_missed_errors. > > Tariq, what are your thoughts? mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very different kind of drops than the drops reported as 'missed'. The distinction is useful in production in my experience working with mlx5 devices.
On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote: > > On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote: > > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote: > > > > - * Not recommended for use in drivers for high speed interfaces. > > > > > > I thought I suggested we provide clear guidance on this counter being > > > related to processing pipeline being to slow, vs host backpressure. > > > Just deleting the line that says "don't use" is not going to cut it :| > > > > Hello Jakub, > > > > After investigating other network drivers, I found that they all > > report this metric to rx_missed_errors: > > > > - i40e > > The corresponding ethtool metric is port.rx_discards, which was > > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add > > rx_missed_errors for buffer exhaustion"). > > > > - broadcom > > The equivalent metric is rx_total_discard_pkts, reported as > > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom > > ethernet driver") > > > > Given this, it seems we should align with the standard practice and > > report this metric to rx_missed_errors. > > > > Tariq, what are your thoughts? > > mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very > different kind of drops than the drops reported as 'missed'. > The distinction is useful in production in my experience working with > mlx5 devices. From the manual [0], it says : The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. Would it be possible to add this description to if_link.h? Frankly, it doesn’t make much difference to end users like me whether this is reported to rx_missed_errors or rx_fifo_errors; the main goal is simply to monitor this metric to flag any issues... [0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters -- Regards Yafang
On 15 Nov 13:50, Yafang Shao wrote: >On Fri, Nov 15, 2024 at 12:32 PM Jakub Kicinski <kuba@kernel.org> wrote: >> >> On Fri, 15 Nov 2024 11:56:38 +0800 Yafang Shao wrote: >> > > On Thu, 14 Nov 2024 10:17:11 +0800 Yafang Shao wrote: >> > > > - * Not recommended for use in drivers for high speed interfaces. >> > > >> > > I thought I suggested we provide clear guidance on this counter being >> > > related to processing pipeline being to slow, vs host backpressure. >> > > Just deleting the line that says "don't use" is not going to cut it :| >> > >> > Hello Jakub, >> > >> > After investigating other network drivers, I found that they all >> > report this metric to rx_missed_errors: >> > >> > - i40e >> > The corresponding ethtool metric is port.rx_discards, which was >> > mapped to rx_missed_errors in commit 5337d2949733 ("i40e: Add >> > rx_missed_errors for buffer exhaustion"). >> > >> > - broadcom >> > The equivalent metric is rx_total_discard_pkts, reported as >> > rx_missed_errors in commit c0c050c58d84 ("bnxt_en: New Broadcom >> > ethernet driver") >> > >> > Given this, it seems we should align with the standard practice and >> > report this metric to rx_missed_errors. >> > >> > Tariq, what are your thoughts? >> >> mlx5 already reports rx_missed_errors and AFAIU rx_discards_phy are very >> different kind of drops than the drops reported as 'missed'. >> The distinction is useful in production in my experience working with >> mlx5 devices. Yes rx_missed_errors is lack of software descriptors, please don't mix it with hardware pipeline FIFO discards. FYI: mlx5 reports more discards related to pipeline see below, especially for per PF/VF buffers. When these are advancing, usually they indicate congestion control issues, for example pause frames is off. rx_prio[p]_buf_discard The number of packets discarded by device due to lack of per host receive buffers. rx_prio[p]_cong_discard The number of packets discarded by device due to per host congestion. rx_prio[p]_discard (rx_discard_phy is the sum of all rx_prio[p]_discard) The number of packets discarded by device due to lack of receive buffers. That being said, these are not errors, so reporting them via rx_xyz_error is very misleading, rx_missed_errors is a special case though, and let's keep it that way. > >From the manual [0], it says : > >The number of received packets dropped due to lack of buffers on a >physical port. If this counter is increasing, it implies that the >adapter is congested and cannot absorb the traffic coming from the >network. > >Would it be possible to add this description to if_link.h? > >Frankly, it doesn’t make much difference to end users like me whether >this is reported to rx_missed_errors or rx_fifo_errors; the main goal >is simply to monitor this metric to flag any issues... > not rx_missed_errors please, it is exclusive for software lack of buffers. Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to extend, this is the place. RFC2863[1] defines this type of discards as ifInDiscards. So let's add it to ehttool std stats. mlx5 reports most of them already to driver custom ethtool -S [1] https://datatracker.ietf.org/doc/html/rfc2863 - Saeed >[0]. https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters > > >-- >Regards >Yafang
On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote: > not rx_missed_errors please, it is exclusive for software lack of buffers. > > Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to > extend, this is the place. > > RFC2863[1] defines this type of discards as ifInDiscards. So let's add > it to ehttool std stats. mlx5 reports most of them already to driver custom > ethtool -S We can, but honestly I'd just make sure they are counted in rx_dropped and leave the detailed breakdown in ethtool -S. The value of the common stats kicks in when we have multiple NICs with reasonably similar interpretations. Hopefully for missed we do have that interpretation. Anything further down in the pipeline will be device specific. Or at least I haven't figured out sufficient commonalities among the devices I deal with in production..
On 15 Nov 11:24, Jakub Kicinski wrote: >On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote: >> not rx_missed_errors please, it is exclusive for software lack of buffers. >> >> Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to >> extend, this is the place. >> >> RFC2863[1] defines this type of discards as ifInDiscards. So let's add >> it to ehttool std stats. mlx5 reports most of them already to driver custom >> ethtool -S > >We can, but honestly I'd just make sure they are counted in rx_dropped rx_dropped: Number of packets received but not processed, * e.g. due to lack of resources or unsupported protocol. * For hardware interfaces this counter may include packets discarded * due to L2 address filtering but should not include packets dropped ^^^^^^^^^^^^^^ * by the device due to buffer exhaustion which are counted separately in ^^^^^^^^^^^^^^^^^ * @rx_missed_errors (since procfs folds those two counters together). ^^^^^^^^^^^^^^^^^ I think we should use rx_fifo_errors for this and update documentation: rx_missed_errors --> host buffers rx_fifo_errors --> device buffers rx_dropped --> unsupported portocols, filter drops, link down, etc.. rx_dropped doesn't reflect a performance issue, but a configuration mishap "lack of resources" should be removed from the doc or improved since I believe it meant "allocation failure of resources" such as skbs, which is the common use case. >and leave the detailed breakdown in ethtool -S. The value of the common >stats kicks in when we have multiple NICs with reasonably similar >interpretations. Hopefully for missed we do have that interpretation. >Anything further down in the pipeline will be device specific. >Or at least I haven't figured out sufficient commonalities among >the devices I deal with in production.. >
On Fri, 15 Nov 2024 11:54:38 -0800 Saeed Mahameed wrote: > >We can, but honestly I'd just make sure they are counted in rx_dropped > > rx_dropped: Number of packets received but not processed, > * e.g. due to lack of resources or unsupported protocol. > * For hardware interfaces this counter may include packets discarded > * due to L2 address filtering but should not include packets dropped > ^^^^^^^^^^^^^^ > * by the device due to buffer exhaustion which are counted separately in > ^^^^^^^^^^^^^^^^^ > * @rx_missed_errors (since procfs folds those two counters together). > ^^^^^^^^^^^^^^^^^ I presume you quote this comment to indicate the rx_dropped should count packets dropped due to buffer exhaustion? If yes then you don't understand the comment. If no then I don't understand why you're quoting it. > I think we should use rx_fifo_errors for this and update documentation: > > rx_missed_errors --> host buffers > rx_fifo_errors --> device buffers In theory I'd love to use fifo errors to mean device buffer drops. In practice devices can backpressure due to host slowness, so the device drops are hard to categorize. The vendors themselves have limited understanding of how their devices will behave under real workloads. And once devices are deployed it may be too late to change definitions. > rx_dropped --> unsupported portocols, filter drops, link down, etc..
On 15 Nov 13:25, Jakub Kicinski wrote: >On Fri, 15 Nov 2024 11:54:38 -0800 Saeed Mahameed wrote: >> >We can, but honestly I'd just make sure they are counted in rx_dropped >> >> rx_dropped: Number of packets received but not processed, >> * e.g. due to lack of resources or unsupported protocol. >> * For hardware interfaces this counter may include packets discarded >> * due to L2 address filtering but should not include packets dropped >> ^^^^^^^^^^^^^^ >> * by the device due to buffer exhaustion which are counted separately in >> ^^^^^^^^^^^^^^^^^ >> * @rx_missed_errors (since procfs folds those two counters together). >> ^^^^^^^^^^^^^^^^^ > >I presume you quote this comment to indicate the rx_dropped should >count packets dropped due to buffer exhaustion? If yes then you don't >understand the comment. If no then I don't understand why you're >quoting it. > I quoted this because you suggested to use rx_dropped. It's not a good fit. In your previous reply you said: "but honestly I'd just make sure they are counted in rx_dropped" >> I think we should use rx_fifo_errors for this and update documentation: >> >> rx_missed_errors --> host buffers >> rx_fifo_errors --> device buffers > >In theory I'd love to use fifo errors to mean device buffer drops. >In practice devices can backpressure due to host slowness, so the So what? host slowness will always be counted in rx_missed_errors. if you see both rx_missed_erros and fifo_errors progressing, you can make the connection.. With CX devices out_of_buffer "missed_errors" can never cause fifo drops "fifo_errors". >device drops are hard to categorize. The vendors themselves have >limited understanding of how their devices will behave under real >workloads. And once devices are deployed it may be too late to change >definitions. > Forget about vendors, here is my simplified categorization that we could align with easily among all vendors and users // delivered to SW but dropped in SW 1) seen by sw dropped by sw. (rx_dropped?) // couldn't deliver to SW (back-pressure) 2) slow SW: passed pipeline/fifo: but SW lack of descriptors (rx_missed_errors) 3) overflow: pipeline/fifo overflow at any point before SW queue (rx_fifo_errors) 4) expected drops: steering/flow filters, configuration, carrier down, etc ).. no counter in rtnl_stats exists for this // couldn't deliver to SW, errors 5) errors, dropped due to packet related errors or HW errors If all vendors agree, with some repurposing and renaming of the counters maybe we can achieve the above with minimal backward compatibility breakage. To solve this once and for all, we need the documentation to reflect strong and clear definitions even if it renders existing implementation/interpretation to be wrong. Otherwise this will never be solved. >> rx_dropped --> unsupported portocols, filter drops, link down, etc..
On Fri, 15 Nov 2024 14:09:02 -0800 Saeed Mahameed wrote: > >> rx_dropped: Number of packets received but not processed, > >> * e.g. due to lack of resources or unsupported protocol. > >> * For hardware interfaces this counter may include packets discarded > >> * due to L2 address filtering but should not include packets dropped > >> ^^^^^^^^^^^^^^ > >> * by the device due to buffer exhaustion which are counted separately in > >> ^^^^^^^^^^^^^^^^^ > >> * @rx_missed_errors (since procfs folds those two counters together). > >> ^^^^^^^^^^^^^^^^^ > > > >I presume you quote this comment to indicate the rx_dropped should > >count packets dropped due to buffer exhaustion? If yes then you don't > >understand the comment. If no then I don't understand why you're > >quoting it. > > I quoted this because you suggested to use rx_dropped. It's not a good fit. > In your previous reply you said: > "but honestly I'd just make sure they are counted in rx_dropped" The comment just says not to add what's already counted in missed, because profcs adds the two and we'd end up double counting.
On Sat, Nov 16, 2024 at 3:54 AM Saeed Mahameed <saeed@kernel.org> wrote: > > On 15 Nov 11:24, Jakub Kicinski wrote: > >On Fri, 15 Nov 2024 00:01:50 -0800 Saeed Mahameed wrote: > >> not rx_missed_errors please, it is exclusive for software lack of buffers. > >> > >> Please have a look at thtool_eth_XXX_stats IEEE ethnl_stats, if you need to > >> extend, this is the place. > >> > >> RFC2863[1] defines this type of discards as ifInDiscards. So let's add > >> it to ehttool std stats. mlx5 reports most of them already to driver custom > >> ethtool -S > > > >We can, but honestly I'd just make sure they are counted in rx_dropped > > rx_dropped: Number of packets received but not processed, > * e.g. due to lack of resources or unsupported protocol. > * For hardware interfaces this counter may include packets discarded > * due to L2 address filtering but should not include packets dropped > ^^^^^^^^^^^^^^ > * by the device due to buffer exhaustion which are counted separately in > ^^^^^^^^^^^^^^^^^ > * @rx_missed_errors (since procfs folds those two counters together). > ^^^^^^^^^^^^^^^^^ > > I think we should use rx_fifo_errors for this and update documentation: Hello Saeed, Could we apply the change to `drivers/net/ethernet/mellanox/mlx5/core/en_main.c` first and update the documentation in a future patch? Updating the documentation perfectly seems like a challenging task, and I’m not the best person for that job ;) -- Regards Yafang
On 16/11/2024 0:42, Jakub Kicinski wrote: > On Fri, 15 Nov 2024 14:09:02 -0800 Saeed Mahameed wrote: >>>> rx_dropped: Number of packets received but not processed, >>>> * e.g. due to lack of resources or unsupported protocol. >>>> * For hardware interfaces this counter may include packets discarded >>>> * due to L2 address filtering but should not include packets dropped >>>> ^^^^^^^^^^^^^^ >>>> * by the device due to buffer exhaustion which are counted separately in >>>> ^^^^^^^^^^^^^^^^^ >>>> * @rx_missed_errors (since procfs folds those two counters together). >>>> ^^^^^^^^^^^^^^^^^ >>> >>> I presume you quote this comment to indicate the rx_dropped should >>> count packets dropped due to buffer exhaustion? If yes then you don't >>> understand the comment. If no then I don't understand why you're >>> quoting it. >> >> I quoted this because you suggested to use rx_dropped. It's not a good fit. >> In your previous reply you said: >> "but honestly I'd just make sure they are counted in rx_dropped" > > The comment just says not to add what's already counted in missed, > because profcs adds the two and we'd end up double counting. So this is a procfs thing only? Does that mean that netlink's rx_dropped might be different than procfs' rx_dropped?
On Wed, 20 Nov 2024 08:04:35 +0200 Gal Pressman wrote: > > The comment just says not to add what's already counted in missed, > > because profcs adds the two and we'd end up double counting. > > So this is a procfs thing only? > Does that mean that netlink's rx_dropped might be different than procfs' > rx_dropped? Yes, procfs and rtnl show "different" stats. For more context on why I put the comment there -- some stats the drivers are supposed to fold (error stats from memory). Legacy stats are tricky, it'd be a major review time investment to try to improve them..
diff --git a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c index e601324a690a..15b1a3e6e641 100644 --- a/drivers/net/ethernet/mellanox/mlx5/core/en_main.c +++ b/drivers/net/ethernet/mellanox/mlx5/core/en_main.c @@ -3916,6 +3916,7 @@ mlx5e_get_stats(struct net_device *dev, struct rtnl_link_stats64 *stats) } stats->rx_missed_errors = priv->stats.qcnt.rx_out_of_buffer; + stats->rx_fifo_errors = PPORT_2863_GET(pstats, if_in_discards); stats->rx_length_errors = PPORT_802_3_GET(pstats, a_in_range_length_errors) + diff --git a/include/uapi/linux/if_link.h b/include/uapi/linux/if_link.h index 6dc258993b17..16dfaf5f47ca 100644 --- a/include/uapi/linux/if_link.h +++ b/include/uapi/linux/if_link.h @@ -144,10 +144,6 @@ struct rtnl_link_stats { * not correspond one-to-one with dropped packets. * * This statistics was used interchangeably with @rx_over_errors. - * Not recommended for use in drivers for high speed interfaces. - * - * This statistic is used on software devices, e.g. to count software - * packet queue overflow (can) or sequencing errors (GRE). * * @rx_missed_errors: Count of packets missed by the host. * Folded into the "drop" counter in `/proc/net/dev`.
We observed a high number of rx_discards_phy events on some servers when running `ethtool -S`. However, this important counter is not currently reflected in the /proc/net/dev statistics file, making it challenging to monitor effectively. Since rx_fifo_errors represents receive FIFO errors on this network deivice, it makes sense to include rx_discards_phy in this counter to enhance monitoring visibility. This change will help administrators track these events more effectively through standard interfaces. I’ve also reviewed the manual for ethtool counters on mlx5 [0], and it appears that rx_discards_phy and rx_fifo_errors have the same meaning. rx_discards_phy: The number of received packets dropped due to lack of buffers on a physical port. If this counter is increasing, it implies that the adapter is congested and cannot absorb the traffic coming from the network. ConnectX-3 naming : rx_fifo_errors The documentation in if_link.h has been updated accordingly. Link: https://enterprise-support.nvidia.com/s/article/understanding-mlx5-ethtool-counters [0] Suggested-by: Tariq Toukan <ttoukan.linux@gmail.com> Signed-off-by: Yafang Shao <laoar.shao@gmail.com> Cc: Tariq Toukan <ttoukan.linux@gmail.com> Cc: Saeed Mahameed <saeedm@nvidia.com> Cc: Leon Romanovsky <leon@kernel.org> Cc: Gal Pressman <gal@nvidia.com> Cc: Jakub Kicinski <kuba@kernel.org> --- drivers/net/ethernet/mellanox/mlx5/core/en_main.c | 1 + include/uapi/linux/if_link.h | 4 ---- 2 files changed, 1 insertion(+), 4 deletions(-) Changes: v1->v2: - Use rx_fifo_errors instead (Tariq) - Update the if_link.h accordingly v1: https://lore.kernel.org/netdev/20241106064015.4118-1-laoar.shao@gmail.com/