diff mbox series

[net,5/6] net: hns3: fix wrong print link down up

Message ID 20230728075840.4022760-6-shaojijie@huawei.com (mailing list archive)
State Superseded
Delegated to: Netdev Maintainers
Headers show
Series There are some bugfix for the HNS3 ethernet driver | expand

Checks

Context Check Description
netdev/series_format success Posting correctly formatted
netdev/tree_selection success Clearly marked for net
netdev/fixes_present success Fixes tag present in non-next series
netdev/header_inline success No static functions without inline keyword in header files
netdev/build_32bit success Errors and warnings before: 1328 this patch: 1328
netdev/cc_maintainers warning 2 maintainers not CCed: lanhao@huawei.com huangguangbin2@huawei.com
netdev/build_clang success Errors and warnings before: 1351 this patch: 1351
netdev/verify_signedoff success Signed-off-by tag matches author and committer
netdev/deprecated_api success None detected
netdev/check_selftest success No net selftest shell script
netdev/verify_fixes success No Fixes tag
netdev/build_allmodconfig_warn success Errors and warnings before: 1351 this patch: 1351
netdev/checkpatch success total: 0 errors, 0 warnings, 0 checks, 9 lines checked
netdev/kdoc success Errors and warnings before: 0 this patch: 0
netdev/source_inline success Was 0 now: 0

Commit Message

Jijie Shao July 28, 2023, 7:58 a.m. UTC
From: Peiyang Wang <wangpeiyang1@huawei.com>

This patch will fix a wrong print "device link down/up". Consider a case
that set autoneg to off with same speed and duplex configuration. The link
is always up while the phy state is set to PHY_UP and set back to
PHY_RUNNING later. It will print link down when the phy state is not
PHY_RUNNING. To avoid that, the condition should include PHY_UP.

Signed-off-by: Peiyang Wang <wangpeiyang1@huawei.com>
Signed-off-by: Jijie Shao <shaojijie@huawei.com>
---
 drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c | 3 ++-
 1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Andrew Lunn July 28, 2023, 8:57 a.m. UTC | #1
On Fri, Jul 28, 2023 at 03:58:39PM +0800, Jijie Shao wrote:
> From: Peiyang Wang <wangpeiyang1@huawei.com>
> 
> This patch will fix a wrong print "device link down/up". Consider a case
> that set autoneg to off with same speed and duplex configuration. The link
> is always up while the phy state is set to PHY_UP and set back to
> PHY_RUNNING later. It will print link down when the phy state is not
> PHY_RUNNING. To avoid that, the condition should include PHY_UP.

Does this really happen? If autoneg is on, and there is link, it means
the link peer is auto using auto-neg. If you turn auto-neg off, the
link peer is not going to know what speed to use, and so the link will
go down. The link will only come up again when you reconfigure the
link peer to also not use auto-neg.

I don't see how you can turn auto-neg off and not loose the link.

  Andrew
Jijie Shao July 29, 2023, 3:11 a.m. UTC | #2
Hi Andrew,
I understand what you mean, and sorry for my wrong description. The link
is not always up. If I turn auto-neg off, the link will go down finally.
However, there is an intervel between my operation and the link down. In
my experiment, it may be 1 min or evn 10 mins. The phy state is set to
PHY_UP immediately when I set auto-neg off. And the phy machine check the
state during a very small intervals. Thus, during my experiment, the phy
state has a followed varietion:
PHY_RUNNING -> PHY_UP -> PHY_RUNNING -> PHY_NOLINK.

We print link up/down based on phy state and link state. In aboved case,
It print looks like:
eth0 link down -- because phy state is set to PHY_UP
eth0 link up -- because phy state is set to PHY_RUNNING
eth0 link down -- because link down

This patch wants to fix the first two wrong print.
We will modify this patch description

Thanks!

   Jijie Shao


on 2023/7/28 16:57, Andrew Lunn wrote:
> On Fri, Jul 28, 2023 at 03:58:39PM +0800, Jijie Shao wrote:
>> From: Peiyang Wang <wangpeiyang1@huawei.com>
>>
>> This patch will fix a wrong print "device link down/up". Consider a case
>> that set autoneg to off with same speed and duplex configuration. The link
>> is always up while the phy state is set to PHY_UP and set back to
>> PHY_RUNNING later. It will print link down when the phy state is not
>> PHY_RUNNING. To avoid that, the condition should include PHY_UP.
> Does this really happen? If autoneg is on, and there is link, it means
> the link peer is auto using auto-neg. If you turn auto-neg off, the
> link peer is not going to know what speed to use, and so the link will
> go down. The link will only come up again when you reconfigure the
> link peer to also not use auto-neg.
>
> I don't see how you can turn auto-neg off and not loose the link.
>
>    Andrew
Andrew Lunn July 29, 2023, 7:57 a.m. UTC | #3
On Sat, Jul 29, 2023 at 11:11:48AM +0800, Jijie Shao wrote:
> Hi Andrew,
> I understand what you mean, and sorry for my wrong description. The link
> is not always up. If I turn auto-neg off, the link will go down finally.
> However, there is an intervel between my operation and the link down. In
> my experiment, it may be 1 min or evn 10 mins. The phy state is set to
> PHY_UP immediately when I set auto-neg off. And the phy machine check the
> state during a very small intervals. Thus, during my experiment, the phy
> state has a followed varietion:
> PHY_RUNNING -> PHY_UP -> PHY_RUNNING -> PHY_NOLINK.
> 
> We print link up/down based on phy state and link state. In aboved case,
> It print looks like:
> eth0 link down -- because phy state is set to PHY_UP
> eth0 link up -- because phy state is set to PHY_RUNNING
> eth0 link down -- because link down
> 
> This patch wants to fix the first two wrong print.
> We will modify this patch description

Now i wounder if you are fixing the wrong thing. Maybe you should be
fixing the PHY so it does not report up and then down? You say 'very
snall intervals', which should in fact be 1 second. So is the PHY
reporting link for a number of poll intervals? 1min to 10 minutes?

	  Andrew
Andrew Lunn July 29, 2023, 6:23 p.m. UTC | #4
>     Now i wounder if you are fixing the wrong thing. Maybe you should be
>     fixing the PHY so it does not report up and then down? You say 'very
>     snall intervals', which should in fact be 1 second. So is the PHY
>     reporting link for a number of poll intervals? 1min to 10 minutes?
> 
>               Andrew
> 
> Yes, according to the log records, the phy polls every second,
> but the link status changes take time.
> Generally, it takes 10 seconds for the phy to detect link down,
> but occasionally it takes several minutes to detect link down,

What PHY driver is this?

It is not so clear what should actually happen with auto-neg turned
off. With it on, and the link going down, the PHY should react after
about 1 second. It is not supposed to react faster than that, although
some PHYs allow fast link down notification to be configured.

Have you checked 802.3 to see what it says about auto-neg off and link
down detection?

I personally would not suppress this behaviour in the MAC
driver. Otherwise you are going to have funny combinations of special
cases of a feature which very few people actually use, making your
maintenance costs higher.

	    Andrew
Jijie Shao July 31, 2023, 9:10 a.m. UTC | #5
on 2023/7/30 2:23, Andrew Lunn wrote:
>>      Now i wounder if you are fixing the wrong thing. Maybe you should be
>>      fixing the PHY so it does not report up and then down? You say 'very
>>      snall intervals', which should in fact be 1 second. So is the PHY
>>      reporting link for a number of poll intervals? 1min to 10 minutes?
>>
>>                Andrew
>>
>> Yes, according to the log records, the phy polls every second,
>> but the link status changes take time.
>> Generally, it takes 10 seconds for the phy to detect link down,
>> but occasionally it takes several minutes to detect link down,
> What PHY driver is this?
>
> It is not so clear what should actually happen with auto-neg turned
> off. With it on, and the link going down, the PHY should react after
> about 1 second. It is not supposed to react faster than that, although
> some PHYs allow fast link down notification to be configured.
>
> Have you checked 802.3 to see what it says about auto-neg off and link
> down detection?
>
> I personally would not suppress this behaviour in the MAC
> driver. Otherwise you are going to have funny combinations of special
> cases of a feature which very few people actually use, making your
> maintenance costs higher.
>
> 	    Andrew
Thanks for your suggestion, We are analyzing this issue in depth.
Jijie Shao Aug. 10, 2023, 8:06 a.m. UTC | #6
on 2023/7/31 17:10, Jijie Shao wrote:
> What PHY driver is this?
>>
>> It is not so clear what should actually happen with auto-neg turned
>> off. With it on, and the link going down, the PHY should react after
>> about 1 second. It is not supposed to react faster than that, although
>> some PHYs allow fast link down notification to be configured.
>>
>> Have you checked 802.3 to see what it says about auto-neg off and link
>> down detection?
>>
>> I personally would not suppress this behaviour in the MAC
>> driver. Otherwise you are going to have funny combinations of special
>> cases of a feature which very few people actually use, making your
>> maintenance costs higher.
>>
>>         Andrew

Hi Andrew,
We trace how the PHY state machine changed and show as followed:

[ 1974.220847][ T362] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0, 
speed=100, duplex=1
[ 1974.233694][ T362] hns3 0000:35:00.0 eth1: link down
[ 1974.267444][ T32] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY 
state change UP -> RUNNING
[ 1974.892830][ T7] hns3 0000:35:00.0 eth1: link up
[ 2004.277425][ T32] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY 
state change RUNNING -> NOLINK
[ 2004.797731][ T7] hns3 0000:35:00.0 eth1: link down

Meanwhile, we also open tracing event about mdio and here are some 
useful logs:

kworker/1:0-19 [001] .... 1973.329775: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x00 val:0x1040
kworker/1:0-19 [001] .... 1973.331964: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x01 val:0x79ad
kworker/2:1-32 [002] .... 1974.247627: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x00 val:0x1040
kworker/2:1-32 [002] .... 1974.249870: mdio_access: mii-0000:35:00.0 
write phy:0x02 reg:0x00 val:0x2100
kworker/2:1-32 [002] .... 1974.252069: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x00 val:0x2100
kworker/2:1-32 [002] .... 1974.254143: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x01 val:0x798d
....
kworker/2:1-32 [002] .... 2003.240015: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x01 val:0x798d
....
kworker/2:1-32 [002] .... 2004.269525: mdio_access: mii-0000:35:00.0 
read phy:0x02 reg:0x01 val:0x7989

As you can see, the link state changed after 30 seconds when only 
setting autoneg off. When the BMSR changed, the PHY driver change state 
immediately. This patch wants to fixed the first link down up showed on 
logs cause the link do not changed.


Regards
Jijie Shao
Jijie Shao Oct. 17, 2023, 1:03 p.m. UTC | #7
on 2023/7/31 17:10, Jijie Shao wrote:
>
> on 2023/7/30 2:23, Andrew Lunn wrote:
>>>      Now i wounder if you are fixing the wrong thing. Maybe you 
>>> should be
>>>      fixing the PHY so it does not report up and then down? You say 
>>> 'very
>>>      snall intervals', which should in fact be 1 second. So is the PHY
>>>      reporting link for a number of poll intervals? 1min to 10 minutes?
>>>
>>>                Andrew
>>>
>>> Yes, according to the log records, the phy polls every second,
>>> but the link status changes take time.
>>> Generally, it takes 10 seconds for the phy to detect link down,
>>> but occasionally it takes several minutes to detect link down,
>> What PHY driver is this?
>>
>> It is not so clear what should actually happen with auto-neg turned
>> off. With it on, and the link going down, the PHY should react after
>> about 1 second. It is not supposed to react faster than that, although
>> some PHYs allow fast link down notification to be configured.
>>
>> Have you checked 802.3 to see what it says about auto-neg off and link
>> down detection?
>>
>> I personally would not suppress this behaviour in the MAC
>> driver. Otherwise you are going to have funny combinations of special
>> cases of a feature which very few people actually use, making your
>> maintenance costs higher.
>>
>>         Andrew

Hi Andrew,
We've rewritten the commit log to explain this problem,
Would you please take some time to review that?

The following is the new commit log:
This patch is to correct a wrong log info "link down/up" in hns3 driver.
When setting autoneg off without changing speed and duplex, the link
should be not changed. However in hns3 driver, it print link down/up once
incorrectly. We trace the phy machine state and find the phy change form
PHY_UP to PHY_RUNNING. No other state of PHY occurs during this process.
MDIO trace also indicate the link is on. The wrong log info and mdio
trace are showed as followed:

[  843.720783][  T367] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0,
speed=10, duplex=1
[  843.736087][  T367] hns3 0000:35:00.0 eth1: link down
[  843.773506][   T17] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY
state change UP -> RUNNING
[  844.674668][   T31] hns3 0000:35:00.0 eth1: link up

      kworker/1:1-32      [001] ....   841.457231: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x79ad
      kworker/1:1-32      [001] ....   842.486496: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x79ad
      kworker/1:1-32      [001] ....   843.520565: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x79ad
      kworker/0:1-17      [000] ....   843.757147: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x798d
      kworker/0:1-17      [000] ....   844.799141: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x798d
      kworker/0:1-17      [000] ....   845.831513: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x798d
      kworker/0:1-17      [000] ....   846.863053: mdio_access: mii-0000:
35:00.0 read  phy:0x02 reg:0x01 val:0x798d

Regards
Jijie
Andrew Lunn Oct. 17, 2023, 1:59 p.m. UTC | #8
On Tue, Oct 17, 2023 at 09:03:01PM +0800, Jijie Shao wrote:
> 
> on 2023/7/31 17:10, Jijie Shao wrote:
> > 
> > on 2023/7/30 2:23, Andrew Lunn wrote:
> > > >      Now i wounder if you are fixing the wrong thing. Maybe you
> > > > should be
> > > >      fixing the PHY so it does not report up and then down? You
> > > > say 'very
> > > >      snall intervals', which should in fact be 1 second. So is the PHY
> > > >      reporting link for a number of poll intervals? 1min to 10 minutes?
> > > > 
> > > >                Andrew
> > > > 
> > > > Yes, according to the log records, the phy polls every second,
> > > > but the link status changes take time.
> > > > Generally, it takes 10 seconds for the phy to detect link down,
> > > > but occasionally it takes several minutes to detect link down,
> > > What PHY driver is this?
> > > 
> > > It is not so clear what should actually happen with auto-neg turned
> > > off. With it on, and the link going down, the PHY should react after
> > > about 1 second. It is not supposed to react faster than that, although
> > > some PHYs allow fast link down notification to be configured.
> > > 
> > > Have you checked 802.3 to see what it says about auto-neg off and link
> > > down detection?
> > > 
> > > I personally would not suppress this behaviour in the MAC
> > > driver. Otherwise you are going to have funny combinations of special
> > > cases of a feature which very few people actually use, making your
> > > maintenance costs higher.
> > > 
> > >         Andrew
> 
> Hi Andrew,
> We've rewritten the commit log to explain this problem,
> Would you please take some time to review that?
> 
> The following is the new commit log:
> This patch is to correct a wrong log info "link down/up" in hns3 driver.
> When setting autoneg off without changing speed and duplex, the link
> should be not changed. However in hns3 driver, it print link down/up once
> incorrectly. We trace the phy machine state and find the phy change form
> PHY_UP to PHY_RUNNING. No other state of PHY occurs during this process.
> MDIO trace also indicate the link is on. The wrong log info and mdio
> trace are showed as followed:
> 
> [  843.720783][  T367] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0,
> speed=10, duplex=1
> [  843.736087][  T367] hns3 0000:35:00.0 eth1: link down
> [  843.773506][   T17] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY
> state change UP -> RUNNING
> [  844.674668][   T31] hns3 0000:35:00.0 eth1: link up

I still think this is totally valid and correct.

When you turn auto-neg off the link partner is going to react to that,
it might drop the link. After a while, the link partner will give up
trying to perform auto-neg and might fall back to 10/Half. At which
point, the link might allow traffic flow. However, in this example,
you have a duplex mis-match, so it might not work correctly.

Turning off auto-neg is something you need to do at both ends, and you
need to then force both ends to the same settings. Link down is
expected. I would actually be suppressed if no link down events were
reported.

	Andrew
Jijie Shao Oct. 18, 2023, 12:25 p.m. UTC | #9
on 2023/10/17 21:59, Andrew Lunn wrote
> I still think this is totally valid and correct.
>
> When you turn auto-neg off the link partner is going to react to that,
> it might drop the link. After a while, the link partner will give up
> trying to perform auto-neg and might fall back to 10/Half. At which
> point, the link might allow traffic flow. However, in this example,
> you have a duplex mis-match, so it might not work correctly.
>
> Turning off auto-neg is something you need to do at both ends, and you
> need to then force both ends to the same settings. Link down is
> expected. I would actually be suppressed if no link down events were
> reported.
>
> 	Andrew

Hi Andrew,
Thank you for your comments, we are re-evaluating this issue and may drop this patch.

Regards
Jijie
diff mbox series

Patch

diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
index a940e35aef29..1723d0fa49ee 100644
--- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
+++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c
@@ -3089,7 +3089,8 @@  static int hclge_get_mac_phy_link(struct hclge_dev *hdev, int *link_status)
 	if (test_bit(HCLGE_STATE_DOWN, &hdev->state))
 		return 0;
 
-	if (phydev && (phydev->state != PHY_RUNNING || !phydev->link))
+	if (phydev && ((phydev->state != PHY_UP &&
+			phydev->state != PHY_RUNNING) || !phydev->link))
 		return 0;
 
 	return hclge_get_mac_link_status(hdev, link_status);