diff mbox series

[net-next,v3,1/3] phy: open_alliance_helpers: Add defines for link quality metrics

Message ID 20240822115939.1387015-2-o.rempel@pengutronix.de (mailing list archive)
State Changes Requested
Delegated to: Netdev Maintainers
Headers show
Series Add Statistics Support for DP83TG720 PHY | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net-next
netdev/ynl success Generated files up to date; no warnings/errors; no diff in generated;
netdev/fixes_present success Fixes tag not required for -next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 16 this patch: 16
netdev/build_tools success No tools touched, skip
netdev/cc_maintainers success CCed 7 of 7 maintainers
netdev/build_clang success Errors and warnings before: 16 this patch: 16
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 16 this patch: 16
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 20 lines checked
netdev/build_clang_rust success No Rust files in patch. Skipping build
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0
netdev/contest success net-next-2024-08-25--21-00 (tests: 714)

Commit Message

Oleksij Rempel Aug. 22, 2024, 11:59 a.m. UTC
Introduce a set of defines for link quality (LQ) related metrics in the
Open Alliance helpers. These metrics include:

- `oa_lq_lfl_esd_event_count`: Number of ESD events detected by the Link
  Failures and Losses (LFL).
- `oa_lq_link_training_time`: Time required to establish a link.
- `oa_lq_remote_receiver_time`: Time required until the remote receiver
  signals that it is locked.
- `oa_lq_local_receiver_time`: Time required until the local receiver is
  locked.
- `oa_lq_lfl_link_loss_count`: Number of link losses.
- `oa_lq_lfl_link_failure_count`: Number of link failures that do not
  cause a link loss.

These standardized defines will be used by PHY drivers to report these
statistics.

Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>
---
 drivers/net/phy/open_alliance_helpers.h | 14 ++++++++++++++
 1 file changed, 14 insertions(+)

Comments

Andrew Lunn Aug. 22, 2024, 3:48 p.m. UTC | #1
On Thu, Aug 22, 2024 at 01:59:37PM +0200, Oleksij Rempel wrote:
> Introduce a set of defines for link quality (LQ) related metrics in the
> Open Alliance helpers. These metrics include:
> 
> - `oa_lq_lfl_esd_event_count`: Number of ESD events detected by the Link
>   Failures and Losses (LFL).
> - `oa_lq_link_training_time`: Time required to establish a link.
> - `oa_lq_remote_receiver_time`: Time required until the remote receiver
>   signals that it is locked.
> - `oa_lq_local_receiver_time`: Time required until the local receiver is
>   locked.
> - `oa_lq_lfl_link_loss_count`: Number of link losses.
> - `oa_lq_lfl_link_failure_count`: Number of link failures that do not
>   cause a link loss.
> 
> These standardized defines will be used by PHY drivers to report these
> statistics.
> 
> Signed-off-by: Oleksij Rempel <o.rempel@pengutronix.de>

Reviewed-by: Andrew Lunn <andrew@lunn.ch>

    Andrew
Jakub Kicinski Aug. 26, 2024, 4:32 p.m. UTC | #2
On Thu, 22 Aug 2024 13:59:37 +0200 Oleksij Rempel wrote:
> Introduce a set of defines for link quality (LQ) related metrics in the
> Open Alliance helpers. These metrics include:
> 
> - `oa_lq_lfl_esd_event_count`: Number of ESD events detected by the Link
>   Failures and Losses (LFL).
> - `oa_lq_link_training_time`: Time required to establish a link.
> - `oa_lq_remote_receiver_time`: Time required until the remote receiver
>   signals that it is locked.
> - `oa_lq_local_receiver_time`: Time required until the local receiver is
>   locked.
> - `oa_lq_lfl_link_loss_count`: Number of link losses.
> - `oa_lq_lfl_link_failure_count`: Number of link failures that do not
>   cause a link loss.
> 
> These standardized defines will be used by PHY drivers to report these
> statistics.

If these are defined by a standard why not report them as structured
data? Like we report ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
ethtool_rmon_stats etc.?
Andrew Lunn Aug. 26, 2024, 5:12 p.m. UTC | #3
On Mon, Aug 26, 2024 at 09:32:17AM -0700, Jakub Kicinski wrote:
> On Thu, 22 Aug 2024 13:59:37 +0200 Oleksij Rempel wrote:
> > Introduce a set of defines for link quality (LQ) related metrics in the
> > Open Alliance helpers. These metrics include:
> > 
> > - `oa_lq_lfl_esd_event_count`: Number of ESD events detected by the Link
> >   Failures and Losses (LFL).
> > - `oa_lq_link_training_time`: Time required to establish a link.
> > - `oa_lq_remote_receiver_time`: Time required until the remote receiver
> >   signals that it is locked.
> > - `oa_lq_local_receiver_time`: Time required until the local receiver is
> >   locked.
> > - `oa_lq_lfl_link_loss_count`: Number of link losses.
> > - `oa_lq_lfl_link_failure_count`: Number of link failures that do not
> >   cause a link loss.
> > 
> > These standardized defines will be used by PHY drivers to report these
> > statistics.
> 
> If these are defined by a standard why not report them as structured
> data? Like we report ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
> ethtool_rmon_stats etc.?

We could do, but we have no infrastructure for this at the
moment. These are PHY statistics, not MAC statistics. We don't have
all the ethool_op infrastructure, etc. We also need to think about
which PHY do we want the statics from, the bootlin code for multiple
PHYs etc.

I will leave it up to Oleksij, but it would neatly avoid different
vendors returning the same stats with different names.

	Andrew
Jakub Kicinski Aug. 26, 2024, 7:57 p.m. UTC | #4
On Mon, 26 Aug 2024 19:12:52 +0200 Andrew Lunn wrote:
> > If these are defined by a standard why not report them as structured
> > data? Like we report ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
> > ethtool_rmon_stats etc.?  
> 
> We could do, but we have no infrastructure for this at the
> moment. These are PHY statistics, not MAC statistics.
> We don't have all the ethool_op infrastructure, etc.

This appears to not be a concern when calling phy_ops->get_sset_count()
You know this code better than me, but I can't think of any big 'infra'
that we'd need. ethtool code can just call phy_ops, the rest is likely
a repeat of the "MAC"/ethtool_ops stats.

> We also need to think about which PHY do we want the statics from,
> the bootlin code for multiple PHYs etc.

True, that said I'd rather we added a new group for the well-defined
PHY stats without supporting multi-PHY, than let the additional
considerations prevent us from making progress. ioctl stats are
strictly worse.

I'm sorry to pick on this particular series, but the structured ethtool
stats have been around for 3 years. Feels like it's time to fill the
gaps on the PHY side.
Oleksij Rempel Aug. 27, 2024, 4:51 a.m. UTC | #5
Hi Jakub,

On Mon, Aug 26, 2024 at 12:57:19PM -0700, Jakub Kicinski wrote:
> On Mon, 26 Aug 2024 19:12:52 +0200 Andrew Lunn wrote:
> > > If these are defined by a standard why not report them as structured
> > > data? Like we report ethtool_eth_mac_stats, ethtool_eth_ctrl_stats,
> > > ethtool_rmon_stats etc.?  
> > 
> > We could do, but we have no infrastructure for this at the
> > moment. These are PHY statistics, not MAC statistics.
> > We don't have all the ethool_op infrastructure, etc.
> 
> This appears to not be a concern when calling phy_ops->get_sset_count()
> You know this code better than me, but I can't think of any big 'infra'
> that we'd need. ethtool code can just call phy_ops, the rest is likely
> a repeat of the "MAC"/ethtool_ops stats.
> 
> > We also need to think about which PHY do we want the statics from,
> > the bootlin code for multiple PHYs etc.
> 
> True, that said I'd rather we added a new group for the well-defined
> PHY stats without supporting multi-PHY, than let the additional
> considerations prevent us from making progress. ioctl stats are
> strictly worse.
> 
> I'm sorry to pick on this particular series, but the structured ethtool
> stats have been around for 3 years. Feels like it's time to fill the
> gaps on the PHY side.

I completely agree with you, but I currently don't have additional
budget for this project.

What might help is a diagnostic concept that I can present to my
customers to seek sponsorship for implementing various interfaces based
on their relevance and priority for different projects.

Since I haven't seen an existing concept from the end product or
component vendors, I suggest starting this within the Linux Kernel
NetDev community.

I'll send my current thoughts in a separate email

Regards,
Oleksij
Jakub Kicinski Aug. 27, 2024, 6:33 p.m. UTC | #6
On Tue, 27 Aug 2024 06:51:27 +0200 Oleksij Rempel wrote:
> I completely agree with you, but I currently don't have additional
> budget for this project.

Is this a legit reason not to do something relatively simple?
Especially that we're talking about uAPI, once we go down
the string path I presume they will stick around forever.

IDK. Additional opinions welcome...
Oleksij Rempel Aug. 28, 2024, 4:50 a.m. UTC | #7
On Tue, Aug 27, 2024 at 11:33:00AM -0700, Jakub Kicinski wrote:
> On Tue, 27 Aug 2024 06:51:27 +0200 Oleksij Rempel wrote:
> > I completely agree with you, but I currently don't have additional
> > budget for this project.
> 
> Is this a legit reason not to do something relatively simple?

Due to the nature of my work in a consulting company, my time is scheduled
across multiple customers. For the 10BaseT1 PHY, I had 2 days budgeted left,
which allowed me to implement some extra diagnostics. This was simple,
predictable, and within the scope of the original task.

However, now that the budget for this task and customer has been used up, any
additional work would require a full process:
- I would need to sell the idea to the customer.
- The new task would need to be prioritized.
- It would then be scheduled, which could happen this year, next year, or
  possibly never.

A similar situation occurred with the EEE implementation. I started with a
simple fix for Atheros PHY's SmartEEE, but it led to reworking the entire EEE
infrastructure in the kernel. Once the budget was exhausted, I couldn’t
continue with SmartEEE for Atheros PHYs. These are the risks inherent to
consulting work. I still see it as not wasted time, because we have a better
EEE infrastructure now.

Considering that you've requested a change to the uAPI, the work has now become
more predictable. I can plan for it within the task and update the required
time budget accordingly. However, it's worth noting that while this work is
manageable, the time spent on this particular task could be seen as somewhat
wasted from a budget perspective, as it wasn't part of the original scope.

> Especially that we're talking about uAPI, once we go down
> the string path I presume they will stick around forever.

Yes, I agree with it. I just needed this feedback as early as possible.
Jakub Kicinski Aug. 28, 2024, 8:34 p.m. UTC | #8
On Wed, 28 Aug 2024 06:50:46 +0200 Oleksij Rempel wrote:
> Considering that you've requested a change to the uAPI, the work has now become
> more predictable. I can plan for it within the task and update the required
> time budget accordingly. However, it's worth noting that while this work is
> manageable, the time spent on this particular task could be seen as somewhat
> wasted from a budget perspective, as it wasn't part of the original scope.

I can probably take a stab at the kernel side, since I know the code
already shouldn't take me more more than an hour. Would that help?
You'd still need to retest, fix bugs. And go thru review.. so all
the not-so-fun parts

> > Especially that we're talking about uAPI, once we go down
> > the string path I presume they will stick around forever.  
> 
> Yes, I agree with it. I just needed this feedback as early as possible.

Andrew? Do you want to decide? :)
Andrew Lunn Aug. 28, 2024, 8:45 p.m. UTC | #9
On Wed, Aug 28, 2024 at 01:34:28PM -0700, Jakub Kicinski wrote:
> On Wed, 28 Aug 2024 06:50:46 +0200 Oleksij Rempel wrote:
> > Considering that you've requested a change to the uAPI, the work has now become
> > more predictable. I can plan for it within the task and update the required
> > time budget accordingly. However, it's worth noting that while this work is
> > manageable, the time spent on this particular task could be seen as somewhat
> > wasted from a budget perspective, as it wasn't part of the original scope.
> 
> I can probably take a stab at the kernel side, since I know the code
> already shouldn't take me more more than an hour. Would that help?
> You'd still need to retest, fix bugs. And go thru review.. so all
> the not-so-fun parts
> 
> > > Especially that we're talking about uAPI, once we go down
> > > the string path I presume they will stick around forever.  
> > 
> > Yes, I agree with it. I just needed this feedback as early as possible.
> 
> Andrew? Do you want to decide? :)

I agree about avoiding free test strings. Something more structures
would be good.

I can definitely help out with review, but i don't have any time at
the moment for writing code.

	Andrew
Oleksij Rempel Aug. 29, 2024, 4:41 a.m. UTC | #10
On Wed, Aug 28, 2024 at 01:34:28PM -0700, Jakub Kicinski wrote:
> On Wed, 28 Aug 2024 06:50:46 +0200 Oleksij Rempel wrote:
> > Considering that you've requested a change to the uAPI, the work has now become
> > more predictable. I can plan for it within the task and update the required
> > time budget accordingly. However, it's worth noting that while this work is
> > manageable, the time spent on this particular task could be seen as somewhat
> > wasted from a budget perspective, as it wasn't part of the original scope.
> 
> I can probably take a stab at the kernel side, since I know the code
> already shouldn't take me more more than an hour. Would that help?

Ack. This will be a great help.

> You'd still need to retest, fix bugs. And go thru review.. so all
> the not-so-fun parts

Sure :)
diff mbox series

Patch

diff --git a/drivers/net/phy/open_alliance_helpers.h b/drivers/net/phy/open_alliance_helpers.h
index 8b7d97bc6f186..f8b392671e20d 100644
--- a/drivers/net/phy/open_alliance_helpers.h
+++ b/drivers/net/phy/open_alliance_helpers.h
@@ -3,6 +3,20 @@ 
 #ifndef OPEN_ALLIANCE_HELPERS_H
 #define OPEN_ALLIANCE_HELPERS_H
 
+/* Link quality (LQ) related metrics */
+/* The number of ESD events detected by the Link Failures and Losses (LFL) */
+#define OA_LQ_LFL_ESD_EVENT_COUNT		"oa_lq_lfl_esd_event_count"
+/* Time required to establish a link */
+#define OA_LQ_LINK_TRAINING_TIME		"oa_lq_link_training_time"
+/* Time required until the remote receiver is signaling that it is locked */
+#define OA_LQ_REMOTE_RECEIVER_TIME		"oa_lq_remote_receiver_time"
+/* Time required until the local receiver is locked */
+#define OA_LQ_LOCAL_RECEIVER_TIME		"oa_lq_local_receiver_time"
+/* Number of link losses */
+#define OA_LQ_LFL_LINK_LOSS_COUNT		"oa_lq_lfl_link_loss_count"
+/* Number of link failures causing NOT a link loss */
+#define OA_LQ_LFL_LINK_FAILURE_COUNT		"oa_lq_lfl_link_failure_count"
+
 /*
  * These defines reflect the TDR (Time Delay Reflection) diagnostic feature
  * for 1000BASE-T1 automotive Ethernet PHYs as specified by the OPEN Alliance.