Message ID | 20220123141245.1060-1-jszhang@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | net: stmmac: don't stop RXC during LPI | expand |
On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote: > I met can't receive rx pkt issue with below steps: > 0.plug in ethernet cable then boot normal and get ip from dhcp server > 1.quickly hotplug out then hotplug in the ethernet cable > 2.trigger the dhcp client to renew lease > > tcpdump shows that the request tx pkt is sent out successfully, > but the mac can't receive the rx pkt. > > The issue can easily be reproduced on platforms with PHY_POLL external > phy. If we don't allow the phy to stop the RXC during LPI, the issue > is gone. I think it's unsafe to stop the RXC during LPI because the mac > needs RXC clock to support RX logic. > > And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use > false instead of 0. > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > --- > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- > 1 file changed, 1 insertion(+), 1 deletion(-) > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > index 6708ca2aa4f7..92a9b0b226b1 100644 > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > stmmac_mac_set(priv, priv->ioaddr, true); > if (phy && priv->dma_cap.eee) { > - priv->eee_active = phy_init_eee(phy, 1) >= 0; > + priv->eee_active = phy_init_eee(phy, false) >= 0; This has not caused issues in the past. So i'm wondering if this is somehow specific to your system? Does everybody else use a PHY which does not implement this bit? Does your synthesis of the stmmac have a different clock tree? By changing this value for every instance of the stmmac, you are potentially causing a power regression for stmmac implementations which don't need the clock. So we need a clear understanding, stopping the clock is wrong in general and so the change is correct in general. Or this is specific to your system, and you probably need to add priv->dma_cap.keep_rx_clock_ticking, which you set in your glue driver,and use here to decide what to pass to phy_init_eee(). Andrew
On Sun, Jan 23, 2022 at 04:52:29PM +0100, Andrew Lunn wrote: > On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote: > > I met can't receive rx pkt issue with below steps: > > 0.plug in ethernet cable then boot normal and get ip from dhcp server > > 1.quickly hotplug out then hotplug in the ethernet cable > > 2.trigger the dhcp client to renew lease > > > > tcpdump shows that the request tx pkt is sent out successfully, > > but the mac can't receive the rx pkt. > > > > The issue can easily be reproduced on platforms with PHY_POLL external > > phy. If we don't allow the phy to stop the RXC during LPI, the issue > > is gone. I think it's unsafe to stop the RXC during LPI because the mac > > needs RXC clock to support RX logic. > > > > And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use > > false instead of 0. > > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > > --- > > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > index 6708ca2aa4f7..92a9b0b226b1 100644 > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > > > stmmac_mac_set(priv, priv->ioaddr, true); > > if (phy && priv->dma_cap.eee) { > > - priv->eee_active = phy_init_eee(phy, 1) >= 0; > > + priv->eee_active = phy_init_eee(phy, false) >= 0; > > This has not caused issues in the past. So i'm wondering if this is > somehow specific to your system? Does everybody else use a PHY which > does not implement this bit? Does your synthesis of the stmmac have a > different clock tree? > > By changing this value for every instance of the stmmac, you are > potentially causing a power regression for stmmac implementations > which don't need the clock. So we need a clear understanding, stopping > the clock is wrong in general and so the change is correct in I think this is a common issue because the MAC needs phy's RXC for RX logic. But it's better to let other stmmac users verify. The issue can easily be reproduced on platforms with PHY_POLL external phy. Or other platforms use a dedicated clock rather than clock from phy for MAC's RX logic? If the issue turns out specific to my system, then I will send out a new patch to adopt your suggestion. Hi Joakim, IIRC, you have stmmac + external RTL8211F phy platform, but I'm not sure whether your platform have an irq for the phy. could you help me to check whether you can reproduce the issue on your platform? > general. Or this is specific to your system, and you probably need to > add priv->dma_cap.keep_rx_clock_ticking, which you set in your glue > driver,and use here to decide what to pass to phy_init_eee(). Thanks a lot for the suggestion.
On Mon, Jan 24, 2022 at 12:08:22AM +0800, Jisheng Zhang wrote: > On Sun, Jan 23, 2022 at 04:52:29PM +0100, Andrew Lunn wrote: > > On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote: > > > I met can't receive rx pkt issue with below steps: > > > 0.plug in ethernet cable then boot normal and get ip from dhcp server > > > 1.quickly hotplug out then hotplug in the ethernet cable > > > 2.trigger the dhcp client to renew lease > > > > > > tcpdump shows that the request tx pkt is sent out successfully, > > > but the mac can't receive the rx pkt. > > > > > > The issue can easily be reproduced on platforms with PHY_POLL external > > > phy. If we don't allow the phy to stop the RXC during LPI, the issue > > > is gone. I think it's unsafe to stop the RXC during LPI because the mac > > > needs RXC clock to support RX logic. > > > > > > And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use > > > false instead of 0. > > > > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > > > --- > > > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > index 6708ca2aa4f7..92a9b0b226b1 100644 > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, > > > > > > stmmac_mac_set(priv, priv->ioaddr, true); > > > if (phy && priv->dma_cap.eee) { > > > - priv->eee_active = phy_init_eee(phy, 1) >= 0; > > > + priv->eee_active = phy_init_eee(phy, false) >= 0; > > > > This has not caused issues in the past. So i'm wondering if this is > > somehow specific to your system? Does everybody else use a PHY which > > does not implement this bit? Does your synthesis of the stmmac have a > > different clock tree? > > > > By changing this value for every instance of the stmmac, you are > > potentially causing a power regression for stmmac implementations > > which don't need the clock. So we need a clear understanding, stopping > > the clock is wrong in general and so the change is correct in > > I think this is a common issue because the MAC needs phy's RXC for RX > logic. But it's better to let other stmmac users verify. The issue > can easily be reproduced on platforms with PHY_POLL external phy. > Or other platforms use a dedicated clock rather than clock from phy > for MAC's RX logic? > > If the issue turns out specific to my system, then I will send out > a new patch to adopt your suggestion. > + Joakim > Hi Joakim, IIRC, you have stmmac + external RTL8211F phy platform, but > I'm not sure whether your platform have an irq for the phy. could you > help me to check whether you can reproduce the issue on your platform? > > > general. Or this is specific to your system, and you probably need to > > add priv->dma_cap.keep_rx_clock_ticking, which you set in your glue > > driver,and use here to decide what to pass to phy_init_eee(). > > Thanks a lot for the suggestion.
> I think this is a common issue because the MAC needs phy's RXC for RX > logic. But it's better to let other stmmac users verify. The issue > can easily be reproduced on platforms with PHY_POLL external phy. What is the relevance of PHY polling here? Are you saying if the PHY is using interrupts you do not see this issue? Andrew
On 1/23/2022 8:09 AM, Jisheng Zhang wrote: > On Mon, Jan 24, 2022 at 12:08:22AM +0800, Jisheng Zhang wrote: >> On Sun, Jan 23, 2022 at 04:52:29PM +0100, Andrew Lunn wrote: >>> On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote: >>>> I met can't receive rx pkt issue with below steps: >>>> 0.plug in ethernet cable then boot normal and get ip from dhcp server >>>> 1.quickly hotplug out then hotplug in the ethernet cable >>>> 2.trigger the dhcp client to renew lease >>>> >>>> tcpdump shows that the request tx pkt is sent out successfully, >>>> but the mac can't receive the rx pkt. >>>> >>>> The issue can easily be reproduced on platforms with PHY_POLL external >>>> phy. If we don't allow the phy to stop the RXC during LPI, the issue >>>> is gone. I think it's unsafe to stop the RXC during LPI because the mac >>>> needs RXC clock to support RX logic. >>>> >>>> And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use >>>> false instead of 0. >>>> >>>> Signed-off-by: Jisheng Zhang <jszhang@kernel.org> >>>> --- >>>> drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- >>>> 1 file changed, 1 insertion(+), 1 deletion(-) >>>> >>>> diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>> index 6708ca2aa4f7..92a9b0b226b1 100644 >>>> --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>> +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c >>>> @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, >>>> >>>> stmmac_mac_set(priv, priv->ioaddr, true); >>>> if (phy && priv->dma_cap.eee) { >>>> - priv->eee_active = phy_init_eee(phy, 1) >= 0; >>>> + priv->eee_active = phy_init_eee(phy, false) >= 0; >>> >>> This has not caused issues in the past. So i'm wondering if this is >>> somehow specific to your system? Does everybody else use a PHY which >>> does not implement this bit? Does your synthesis of the stmmac have a >>> different clock tree? >>> >>> By changing this value for every instance of the stmmac, you are >>> potentially causing a power regression for stmmac implementations >>> which don't need the clock. So we need a clear understanding, stopping >>> the clock is wrong in general and so the change is correct in >> >> I think this is a common issue because the MAC needs phy's RXC for RX >> logic. But it's better to let other stmmac users verify. The issue >> can easily be reproduced on platforms with PHY_POLL external phy. >> Or other platforms use a dedicated clock rather than clock from phy >> for MAC's RX logic? >> >> If the issue turns out specific to my system, then I will send out >> a new patch to adopt your suggestion. >> > > + Joakim > >> Hi Joakim, IIRC, you have stmmac + external RTL8211F phy platform, but >> I'm not sure whether your platform have an irq for the phy. could you >> help me to check whether you can reproduce the issue on your platform? >> >>> general. Or this is specific to your system, and you probably need to >>> add priv->dma_cap.keep_rx_clock_ticking, which you set in your glue >>> driver,and use here to decide what to pass to phy_init_eee(). I suspect the problem is only or largely relevant in a RGMII configuration whereby the TXC of the MAC is an input to the PHY which then re-generates the RXC and feeds it back to the MAC as RXC (with the configured delay). If the PHY stops its clock, then MAC no longer gets a RXC and all sorts of problems would arise if the MAC logic on the RX side is dependent upon getting the PHY's RXC to be re-sampled internally within the MAC. Now, this would be symptomatic of a fairly naive design on the MAC side to support EEE, also usually to really save power while in LPI you would want to switch your MAC from its main or fast clock (which is presumably at least 250MHz to support Gigabit rates and generate a 125MHz TXC) to a slow clock (say 25 or 27MHz) in order to actually save power on the MAC side (even if the bulk of the power is on the PHY's analog logic). When the PHY signals that we are out of LPI the MAC switches back to its main clock. This may occur with the help of the MAC driver, or this can be done autonomously sometimes. So with all that theory and how should things be designed and so on, I think you need to investigate this problem a bit more thoroughly. FWIW phy_init_eee()'s second argument is improperly designed. Before deciding to stop the PHY's RX clock, you should first know whether the PHY supports it to begin with, otherwise you are requesting something the is not able to do, and there is no feedback mechanism. A while back I had started this patch series which may still be relevant: https://github.com/ffainelli/linux/commits/phy-eee-tx-clk
Hi Jisheng, > -----Original Message----- > From: Jisheng Zhang <jszhang@kernel.org> > Sent: 2022年1月24日 0:10 > To: Andrew Lunn <andrew@lunn.ch>; Joakim Zhang > <qiangqing.zhang@nxp.com> > Cc: Giuseppe Cavallaro <peppe.cavallaro@st.com>; Alexandre Torgue > <alexandre.torgue@foss.st.com>; Jose Abreu <joabreu@synopsys.com>; > David S . Miller <davem@davemloft.net>; Jakub Kicinski <kuba@kernel.org>; > Maxime Coquelin <mcoquelin.stm32@gmail.com>; netdev@vger.kernel.org; > linux-stm32@st-md-mailman.stormreply.com; > linux-arm-kernel@lists.infradead.org; linux-kernel@vger.kernel.org > Subject: Re: [PATCH] net: stmmac: don't stop RXC during LPI > > On Mon, Jan 24, 2022 at 12:08:22AM +0800, Jisheng Zhang wrote: > > On Sun, Jan 23, 2022 at 04:52:29PM +0100, Andrew Lunn wrote: > > > On Sun, Jan 23, 2022 at 10:12:45PM +0800, Jisheng Zhang wrote: > > > > I met can't receive rx pkt issue with below steps: > > > > 0.plug in ethernet cable then boot normal and get ip from dhcp > > > > server 1.quickly hotplug out then hotplug in the ethernet cable > > > > 2.trigger the dhcp client to renew lease > > > > > > > > tcpdump shows that the request tx pkt is sent out successfully, > > > > but the mac can't receive the rx pkt. > > > > > > > > The issue can easily be reproduced on platforms with PHY_POLL > > > > external phy. If we don't allow the phy to stop the RXC during > > > > LPI, the issue is gone. I think it's unsafe to stop the RXC during > > > > LPI because the mac needs RXC clock to support RX logic. > > > > > > > > And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so > > > > use false instead of 0. > > > > > > > > Signed-off-by: Jisheng Zhang <jszhang@kernel.org> > > > > --- > > > > drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- > > > > 1 file changed, 1 insertion(+), 1 deletion(-) > > > > > > > > diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > index 6708ca2aa4f7..92a9b0b226b1 100644 > > > > --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c > > > > @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct > > > > phylink_config *config, > > > > > > > > stmmac_mac_set(priv, priv->ioaddr, true); > > > > if (phy && priv->dma_cap.eee) { > > > > - priv->eee_active = phy_init_eee(phy, 1) >= 0; > > > > + priv->eee_active = phy_init_eee(phy, false) >= 0; > > > > > > This has not caused issues in the past. So i'm wondering if this is > > > somehow specific to your system? Does everybody else use a PHY which > > > does not implement this bit? Does your synthesis of the stmmac have > > > a different clock tree? > > > > > > By changing this value for every instance of the stmmac, you are > > > potentially causing a power regression for stmmac implementations > > > which don't need the clock. So we need a clear understanding, > > > stopping the clock is wrong in general and so the change is correct > > > in > > > > I think this is a common issue because the MAC needs phy's RXC for RX > > logic. But it's better to let other stmmac users verify. The issue can > > easily be reproduced on platforms with PHY_POLL external phy. > > Or other platforms use a dedicated clock rather than clock from phy > > for MAC's RX logic? > > > > If the issue turns out specific to my system, then I will send out a > > new patch to adopt your suggestion. > > > > + Joakim > > > Hi Joakim, IIRC, you have stmmac + external RTL8211F phy platform, but > > I'm not sure whether your platform have an irq for the phy. could you > > help me to check whether you can reproduce the issue on your platform? Yes, i.MX8MP uses the stmmac + external RTL8211F which works on PHY_POLL mode. I tried the reproduce steps you provided, but the Ethernet can work properly. I don't know what abnormal behavior should appear obviously? Regarding to your reported issue, I guess this is a real general issue for SNPS stmmac working on RGMII mode, not sure if other mii modes also suffer similar issue. Actually we have a same patch for it at local since 5.10 to fix a suspend/resume issue. https://source.codeaurora.org/external/imx/linux-imx/commit/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c?h=lf-5.10.y&id=a7864e9fbc8f8f99e785ce7419a682651c89b8f7 The root cause is that stmmac needs RXC clock generating from PHY for some receive logic, if RXC clock is not feeding in time, stmmac would be broken. And the feedback from SNPS guys, they confirm that stmmac needs this RXC clock and there is no other clocks can be routed if RXC is not present. Best Regards, Joakim Zhang
On Sun, 23 Jan 2022 22:12:45 +0800 Jisheng Zhang wrote: > I met can't receive rx pkt issue with below steps: > 0.plug in ethernet cable then boot normal and get ip from dhcp server > 1.quickly hotplug out then hotplug in the ethernet cable > 2.trigger the dhcp client to renew lease > > tcpdump shows that the request tx pkt is sent out successfully, > but the mac can't receive the rx pkt. > > The issue can easily be reproduced on platforms with PHY_POLL external > phy. If we don't allow the phy to stop the RXC during LPI, the issue > is gone. I think it's unsafe to stop the RXC during LPI because the mac > needs RXC clock to support RX logic. > > And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use > false instead of 0. FWIW this is marked Changes Requested in pw, TBH I'm not sure what the conclusion is but if the patch is good please try to fold the information requested in the discussion into the commit msg and repost.
On Sun, Jan 23, 2022 at 06:39:26PM +0100, Andrew Lunn wrote: > > I think this is a common issue because the MAC needs phy's RXC for RX > > logic. But it's better to let other stmmac users verify. The issue > > can easily be reproduced on platforms with PHY_POLL external phy. > > What is the relevance of PHY polling here? Are you saying if the PHY > is using interrupts you do not see this issue? I tried these two days, if the PHY is using interrupts, I can't reproduce the issue. It looks a bit more complex. Any suggestions? Thanks in advance
On Wed, Jan 26, 2022 at 08:55:22PM +0800, Jisheng Zhang wrote: > On Sun, Jan 23, 2022 at 06:39:26PM +0100, Andrew Lunn wrote: > > > I think this is a common issue because the MAC needs phy's RXC for RX > > > logic. But it's better to let other stmmac users verify. The issue > > > can easily be reproduced on platforms with PHY_POLL external phy. > > > > What is the relevance of PHY polling here? Are you saying if the PHY > > is using interrupts you do not see this issue? > > I tried these two days, if the PHY is using interrupts, I can't > reproduce the issue. It looks a bit more complex. Any suggestions? I suppose it could be that there is a delay between the PHY reporting the link loss, raising an interrupt, which is then processed to disable the receive side, and the PHY going into LPI. The problem with polling is, well, it's polling, and at a one second rate - which probably is too long between the PHY noticing the loss of link and going into LPI. What this also probably means is that if interrupt latency is high enough, the same problem will occur. So maybe the EEE support to be a little more clever - so we only enable clock stop after the MAC driver has disabled the receive side.
diff --git a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c index 6708ca2aa4f7..92a9b0b226b1 100644 --- a/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c +++ b/drivers/net/ethernet/stmicro/stmmac/stmmac_main.c @@ -1162,7 +1162,7 @@ static void stmmac_mac_link_up(struct phylink_config *config, stmmac_mac_set(priv, priv->ioaddr, true); if (phy && priv->dma_cap.eee) { - priv->eee_active = phy_init_eee(phy, 1) >= 0; + priv->eee_active = phy_init_eee(phy, false) >= 0; priv->eee_enabled = stmmac_eee_init(priv); priv->tx_lpi_enabled = priv->eee_enabled; stmmac_set_eee_pls(priv, priv->hw, true);
I met can't receive rx pkt issue with below steps: 0.plug in ethernet cable then boot normal and get ip from dhcp server 1.quickly hotplug out then hotplug in the ethernet cable 2.trigger the dhcp client to renew lease tcpdump shows that the request tx pkt is sent out successfully, but the mac can't receive the rx pkt. The issue can easily be reproduced on platforms with PHY_POLL external phy. If we don't allow the phy to stop the RXC during LPI, the issue is gone. I think it's unsafe to stop the RXC during LPI because the mac needs RXC clock to support RX logic. And the 2nd param clk_stop_enable of phy_init_eee() is a bool, so use false instead of 0. Signed-off-by: Jisheng Zhang <jszhang@kernel.org> --- drivers/net/ethernet/stmicro/stmmac/stmmac_main.c | 2 +- 1 file changed, 1 insertion(+), 1 deletion(-)