Message ID | 20230728075840.4022760-6-shaojijie@huawei.com (mailing list archive) |
---|---|
State | Superseded |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | There are some bugfix for the HNS3 ethernet driver | expand |
On Fri, Jul 28, 2023 at 03:58:39PM +0800, Jijie Shao wrote: > From: Peiyang Wang <wangpeiyang1@huawei.com> > > This patch will fix a wrong print "device link down/up". Consider a case > that set autoneg to off with same speed and duplex configuration. The link > is always up while the phy state is set to PHY_UP and set back to > PHY_RUNNING later. It will print link down when the phy state is not > PHY_RUNNING. To avoid that, the condition should include PHY_UP. Does this really happen? If autoneg is on, and there is link, it means the link peer is auto using auto-neg. If you turn auto-neg off, the link peer is not going to know what speed to use, and so the link will go down. The link will only come up again when you reconfigure the link peer to also not use auto-neg. I don't see how you can turn auto-neg off and not loose the link. Andrew
Hi Andrew, I understand what you mean, and sorry for my wrong description. The link is not always up. If I turn auto-neg off, the link will go down finally. However, there is an intervel between my operation and the link down. In my experiment, it may be 1 min or evn 10 mins. The phy state is set to PHY_UP immediately when I set auto-neg off. And the phy machine check the state during a very small intervals. Thus, during my experiment, the phy state has a followed varietion: PHY_RUNNING -> PHY_UP -> PHY_RUNNING -> PHY_NOLINK. We print link up/down based on phy state and link state. In aboved case, It print looks like: eth0 link down -- because phy state is set to PHY_UP eth0 link up -- because phy state is set to PHY_RUNNING eth0 link down -- because link down This patch wants to fix the first two wrong print. We will modify this patch description Thanks! Jijie Shao on 2023/7/28 16:57, Andrew Lunn wrote: > On Fri, Jul 28, 2023 at 03:58:39PM +0800, Jijie Shao wrote: >> From: Peiyang Wang <wangpeiyang1@huawei.com> >> >> This patch will fix a wrong print "device link down/up". Consider a case >> that set autoneg to off with same speed and duplex configuration. The link >> is always up while the phy state is set to PHY_UP and set back to >> PHY_RUNNING later. It will print link down when the phy state is not >> PHY_RUNNING. To avoid that, the condition should include PHY_UP. > Does this really happen? If autoneg is on, and there is link, it means > the link peer is auto using auto-neg. If you turn auto-neg off, the > link peer is not going to know what speed to use, and so the link will > go down. The link will only come up again when you reconfigure the > link peer to also not use auto-neg. > > I don't see how you can turn auto-neg off and not loose the link. > > Andrew
On Sat, Jul 29, 2023 at 11:11:48AM +0800, Jijie Shao wrote: > Hi Andrew, > I understand what you mean, and sorry for my wrong description. The link > is not always up. If I turn auto-neg off, the link will go down finally. > However, there is an intervel between my operation and the link down. In > my experiment, it may be 1 min or evn 10 mins. The phy state is set to > PHY_UP immediately when I set auto-neg off. And the phy machine check the > state during a very small intervals. Thus, during my experiment, the phy > state has a followed varietion: > PHY_RUNNING -> PHY_UP -> PHY_RUNNING -> PHY_NOLINK. > > We print link up/down based on phy state and link state. In aboved case, > It print looks like: > eth0 link down -- because phy state is set to PHY_UP > eth0 link up -- because phy state is set to PHY_RUNNING > eth0 link down -- because link down > > This patch wants to fix the first two wrong print. > We will modify this patch description Now i wounder if you are fixing the wrong thing. Maybe you should be fixing the PHY so it does not report up and then down? You say 'very snall intervals', which should in fact be 1 second. So is the PHY reporting link for a number of poll intervals? 1min to 10 minutes? Andrew
> Now i wounder if you are fixing the wrong thing. Maybe you should be > fixing the PHY so it does not report up and then down? You say 'very > snall intervals', which should in fact be 1 second. So is the PHY > reporting link for a number of poll intervals? 1min to 10 minutes? > > Andrew > > Yes, according to the log records, the phy polls every second, > but the link status changes take time. > Generally, it takes 10 seconds for the phy to detect link down, > but occasionally it takes several minutes to detect link down, What PHY driver is this? It is not so clear what should actually happen with auto-neg turned off. With it on, and the link going down, the PHY should react after about 1 second. It is not supposed to react faster than that, although some PHYs allow fast link down notification to be configured. Have you checked 802.3 to see what it says about auto-neg off and link down detection? I personally would not suppress this behaviour in the MAC driver. Otherwise you are going to have funny combinations of special cases of a feature which very few people actually use, making your maintenance costs higher. Andrew
on 2023/7/30 2:23, Andrew Lunn wrote: >> Now i wounder if you are fixing the wrong thing. Maybe you should be >> fixing the PHY so it does not report up and then down? You say 'very >> snall intervals', which should in fact be 1 second. So is the PHY >> reporting link for a number of poll intervals? 1min to 10 minutes? >> >> Andrew >> >> Yes, according to the log records, the phy polls every second, >> but the link status changes take time. >> Generally, it takes 10 seconds for the phy to detect link down, >> but occasionally it takes several minutes to detect link down, > What PHY driver is this? > > It is not so clear what should actually happen with auto-neg turned > off. With it on, and the link going down, the PHY should react after > about 1 second. It is not supposed to react faster than that, although > some PHYs allow fast link down notification to be configured. > > Have you checked 802.3 to see what it says about auto-neg off and link > down detection? > > I personally would not suppress this behaviour in the MAC > driver. Otherwise you are going to have funny combinations of special > cases of a feature which very few people actually use, making your > maintenance costs higher. > > Andrew Thanks for your suggestion, We are analyzing this issue in depth.
on 2023/7/31 17:10, Jijie Shao wrote: > What PHY driver is this? >> >> It is not so clear what should actually happen with auto-neg turned >> off. With it on, and the link going down, the PHY should react after >> about 1 second. It is not supposed to react faster than that, although >> some PHYs allow fast link down notification to be configured. >> >> Have you checked 802.3 to see what it says about auto-neg off and link >> down detection? >> >> I personally would not suppress this behaviour in the MAC >> driver. Otherwise you are going to have funny combinations of special >> cases of a feature which very few people actually use, making your >> maintenance costs higher. >> >> Andrew Hi Andrew, We trace how the PHY state machine changed and show as followed: [ 1974.220847][ T362] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0, speed=100, duplex=1 [ 1974.233694][ T362] hns3 0000:35:00.0 eth1: link down [ 1974.267444][ T32] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY state change UP -> RUNNING [ 1974.892830][ T7] hns3 0000:35:00.0 eth1: link up [ 2004.277425][ T32] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY state change RUNNING -> NOLINK [ 2004.797731][ T7] hns3 0000:35:00.0 eth1: link down Meanwhile, we also open tracing event about mdio and here are some useful logs: kworker/1:0-19 [001] .... 1973.329775: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x00 val:0x1040 kworker/1:0-19 [001] .... 1973.331964: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x01 val:0x79ad kworker/2:1-32 [002] .... 1974.247627: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x00 val:0x1040 kworker/2:1-32 [002] .... 1974.249870: mdio_access: mii-0000:35:00.0 write phy:0x02 reg:0x00 val:0x2100 kworker/2:1-32 [002] .... 1974.252069: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x00 val:0x2100 kworker/2:1-32 [002] .... 1974.254143: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x01 val:0x798d .... kworker/2:1-32 [002] .... 2003.240015: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x01 val:0x798d .... kworker/2:1-32 [002] .... 2004.269525: mdio_access: mii-0000:35:00.0 read phy:0x02 reg:0x01 val:0x7989 As you can see, the link state changed after 30 seconds when only setting autoneg off. When the BMSR changed, the PHY driver change state immediately. This patch wants to fixed the first link down up showed on logs cause the link do not changed. Regards Jijie Shao
on 2023/7/31 17:10, Jijie Shao wrote: > > on 2023/7/30 2:23, Andrew Lunn wrote: >>> Now i wounder if you are fixing the wrong thing. Maybe you >>> should be >>> fixing the PHY so it does not report up and then down? You say >>> 'very >>> snall intervals', which should in fact be 1 second. So is the PHY >>> reporting link for a number of poll intervals? 1min to 10 minutes? >>> >>> Andrew >>> >>> Yes, according to the log records, the phy polls every second, >>> but the link status changes take time. >>> Generally, it takes 10 seconds for the phy to detect link down, >>> but occasionally it takes several minutes to detect link down, >> What PHY driver is this? >> >> It is not so clear what should actually happen with auto-neg turned >> off. With it on, and the link going down, the PHY should react after >> about 1 second. It is not supposed to react faster than that, although >> some PHYs allow fast link down notification to be configured. >> >> Have you checked 802.3 to see what it says about auto-neg off and link >> down detection? >> >> I personally would not suppress this behaviour in the MAC >> driver. Otherwise you are going to have funny combinations of special >> cases of a feature which very few people actually use, making your >> maintenance costs higher. >> >> Andrew Hi Andrew, We've rewritten the commit log to explain this problem, Would you please take some time to review that? The following is the new commit log: This patch is to correct a wrong log info "link down/up" in hns3 driver. When setting autoneg off without changing speed and duplex, the link should be not changed. However in hns3 driver, it print link down/up once incorrectly. We trace the phy machine state and find the phy change form PHY_UP to PHY_RUNNING. No other state of PHY occurs during this process. MDIO trace also indicate the link is on. The wrong log info and mdio trace are showed as followed: [ 843.720783][ T367] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0, speed=10, duplex=1 [ 843.736087][ T367] hns3 0000:35:00.0 eth1: link down [ 843.773506][ T17] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY state change UP -> RUNNING [ 844.674668][ T31] hns3 0000:35:00.0 eth1: link up kworker/1:1-32 [001] .... 841.457231: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x79ad kworker/1:1-32 [001] .... 842.486496: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x79ad kworker/1:1-32 [001] .... 843.520565: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x79ad kworker/0:1-17 [000] .... 843.757147: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x798d kworker/0:1-17 [000] .... 844.799141: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x798d kworker/0:1-17 [000] .... 845.831513: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x798d kworker/0:1-17 [000] .... 846.863053: mdio_access: mii-0000: 35:00.0 read phy:0x02 reg:0x01 val:0x798d Regards Jijie
On Tue, Oct 17, 2023 at 09:03:01PM +0800, Jijie Shao wrote: > > on 2023/7/31 17:10, Jijie Shao wrote: > > > > on 2023/7/30 2:23, Andrew Lunn wrote: > > > > Now i wounder if you are fixing the wrong thing. Maybe you > > > > should be > > > > fixing the PHY so it does not report up and then down? You > > > > say 'very > > > > snall intervals', which should in fact be 1 second. So is the PHY > > > > reporting link for a number of poll intervals? 1min to 10 minutes? > > > > > > > > Andrew > > > > > > > > Yes, according to the log records, the phy polls every second, > > > > but the link status changes take time. > > > > Generally, it takes 10 seconds for the phy to detect link down, > > > > but occasionally it takes several minutes to detect link down, > > > What PHY driver is this? > > > > > > It is not so clear what should actually happen with auto-neg turned > > > off. With it on, and the link going down, the PHY should react after > > > about 1 second. It is not supposed to react faster than that, although > > > some PHYs allow fast link down notification to be configured. > > > > > > Have you checked 802.3 to see what it says about auto-neg off and link > > > down detection? > > > > > > I personally would not suppress this behaviour in the MAC > > > driver. Otherwise you are going to have funny combinations of special > > > cases of a feature which very few people actually use, making your > > > maintenance costs higher. > > > > > > Andrew > > Hi Andrew, > We've rewritten the commit log to explain this problem, > Would you please take some time to review that? > > The following is the new commit log: > This patch is to correct a wrong log info "link down/up" in hns3 driver. > When setting autoneg off without changing speed and duplex, the link > should be not changed. However in hns3 driver, it print link down/up once > incorrectly. We trace the phy machine state and find the phy change form > PHY_UP to PHY_RUNNING. No other state of PHY occurs during this process. > MDIO trace also indicate the link is on. The wrong log info and mdio > trace are showed as followed: > > [ 843.720783][ T367] hns3 0000:35:00.0 eth1: set link(phy): autoneg=0, > speed=10, duplex=1 > [ 843.736087][ T367] hns3 0000:35:00.0 eth1: link down > [ 843.773506][ T17] RTL8211F Gigabit Ethernet mii-0000:35:00.0:02: PHY > state change UP -> RUNNING > [ 844.674668][ T31] hns3 0000:35:00.0 eth1: link up I still think this is totally valid and correct. When you turn auto-neg off the link partner is going to react to that, it might drop the link. After a while, the link partner will give up trying to perform auto-neg and might fall back to 10/Half. At which point, the link might allow traffic flow. However, in this example, you have a duplex mis-match, so it might not work correctly. Turning off auto-neg is something you need to do at both ends, and you need to then force both ends to the same settings. Link down is expected. I would actually be suppressed if no link down events were reported. Andrew
on 2023/10/17 21:59, Andrew Lunn wrote > I still think this is totally valid and correct. > > When you turn auto-neg off the link partner is going to react to that, > it might drop the link. After a while, the link partner will give up > trying to perform auto-neg and might fall back to 10/Half. At which > point, the link might allow traffic flow. However, in this example, > you have a duplex mis-match, so it might not work correctly. > > Turning off auto-neg is something you need to do at both ends, and you > need to then force both ends to the same settings. Link down is > expected. I would actually be suppressed if no link down events were > reported. > > Andrew Hi Andrew, Thank you for your comments, we are re-evaluating this issue and may drop this patch. Regards Jijie
diff --git a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c index a940e35aef29..1723d0fa49ee 100644 --- a/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c +++ b/drivers/net/ethernet/hisilicon/hns3/hns3pf/hclge_main.c @@ -3089,7 +3089,8 @@ static int hclge_get_mac_phy_link(struct hclge_dev *hdev, int *link_status) if (test_bit(HCLGE_STATE_DOWN, &hdev->state)) return 0; - if (phydev && (phydev->state != PHY_RUNNING || !phydev->link)) + if (phydev && ((phydev->state != PHY_UP && + phydev->state != PHY_RUNNING) || !phydev->link)) return 0; return hclge_get_mac_link_status(hdev, link_status);