Message ID | 20241018222407.1139697-1-quic_abchauha@quicinc.com (mailing list archive) |
---|---|
State | Changes Requested |
Delegated to: | Netdev Maintainers |
Headers | show |
Series | [net,v1] net: stmmac: Disable PCS Link and AN interrupt when PCS AN is disabled | expand |
On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote: > Currently we disable PCS ANE when the link speed is 2.5Gbps. > mac_link_up callback internally calls the fix_mac_speed which internally > calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. > > We observed that the CPU utilization is pretty high. That is because > we saw that the PCS interrupt status line for Link and AN always remain > asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense > to also disable the PCS link status and AN complete in the interrupt > enable register. > > Interrupt storm Issue:- > [ 25.465754][ C2] stmmac_pcs: Link Down > [ 25.469888][ C2] stmmac_pcs: Link Down > [ 25.474030][ C2] stmmac_pcs: Link Down > [ 25.478164][ C2] stmmac_pcs: Link Down > [ 25.482305][ C2] stmmac_pcs: Link Down I don't know this code, so i cannot really comment if not enabling the interrupt is the correct fix or not. But generally an interrupt storm like this is cause because you are not acknowledging the interrupt correctly to clear its status. So rather than not enabling it, maybe you should check what is the correct way to clear the interrupt once it happens? Andrew
On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote: > On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote: > > Currently we disable PCS ANE when the link speed is 2.5Gbps. > > mac_link_up callback internally calls the fix_mac_speed which internally > > calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. > > > > We observed that the CPU utilization is pretty high. That is because > > we saw that the PCS interrupt status line for Link and AN always remain > > asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense > > to also disable the PCS link status and AN complete in the interrupt > > enable register. > > > > Interrupt storm Issue:- > > [ 25.465754][ C2] stmmac_pcs: Link Down > > [ 25.469888][ C2] stmmac_pcs: Link Down > > [ 25.474030][ C2] stmmac_pcs: Link Down > > [ 25.478164][ C2] stmmac_pcs: Link Down > > [ 25.482305][ C2] stmmac_pcs: Link Down > > I don't know this code, so i cannot really comment if not enabling the > interrupt is the correct fix or not. But generally an interrupt storm > like this is cause because you are not acknowledging the interrupt > correctly to clear its status. So rather than not enabling it, maybe > you should check what is the correct way to clear the interrupt once > it happens? stmmac PCS support is total crap and shouldn't be used, or stmmac should not be using phylink. It's one or the other. Blame Serge for this mess.
On Mon, Oct 21, 2024 at 10:20:24AM +0100, Russell King (Oracle) wrote: > On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote: > > On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote: > > > Currently we disable PCS ANE when the link speed is 2.5Gbps. > > > mac_link_up callback internally calls the fix_mac_speed which internally > > > calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. > > > > > > We observed that the CPU utilization is pretty high. That is because > > > we saw that the PCS interrupt status line for Link and AN always remain > > > asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense > > > to also disable the PCS link status and AN complete in the interrupt > > > enable register. > > > > > > Interrupt storm Issue:- > > > [ 25.465754][ C2] stmmac_pcs: Link Down > > > [ 25.469888][ C2] stmmac_pcs: Link Down > > > [ 25.474030][ C2] stmmac_pcs: Link Down > > > [ 25.478164][ C2] stmmac_pcs: Link Down > > > [ 25.482305][ C2] stmmac_pcs: Link Down > > > > I don't know this code, so i cannot really comment if not enabling the > > interrupt is the correct fix or not. But generally an interrupt storm > > like this is cause because you are not acknowledging the interrupt > > correctly to clear its status. So rather than not enabling it, maybe > > you should check what is the correct way to clear the interrupt once > > it happens? > > stmmac PCS support is total crap and shouldn't be used, or stmmac > should not be using phylink. It's one or the other. Blame Serge for > this mess. Seriously, we could've had this fixed had the patch set I was working on that fixed stmmac's _bad_ _conversion_ to phylink progressed to the point of being merged. The whole stmmac PCS support is broken, bypassing phylink. This series also contained bug fixes for stuff like this interrupt storm after Serge tested it. However, Serge wanted to turn my series into his maze of indirect function pointers approach that I disagreed with, and he wouldn't change his mind on that, so I deleted the series. As I keep saying - either stmmac uses phylink *properly* and gets its PCS hacks sorted out, or it does not use phylink *at* *all*. It's one or the other. I am not going to patch stmmac for any future phylink changes, and if it breaks, then I'll just say "oh that's a shame, not my problem." Blame Serge for that. I've had it with the pile of crap that is stmmac.
On 10/21/2024 2:32 AM, Russell King (Oracle) wrote: > On Mon, Oct 21, 2024 at 10:20:24AM +0100, Russell King (Oracle) wrote: >> On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote: >>> On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote: >>>> Currently we disable PCS ANE when the link speed is 2.5Gbps. >>>> mac_link_up callback internally calls the fix_mac_speed which internally >>>> calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. >>>> >>>> We observed that the CPU utilization is pretty high. That is because >>>> we saw that the PCS interrupt status line for Link and AN always remain >>>> asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense >>>> to also disable the PCS link status and AN complete in the interrupt >>>> enable register. >>>> >>>> Interrupt storm Issue:- >>>> [ 25.465754][ C2] stmmac_pcs: Link Down >>>> [ 25.469888][ C2] stmmac_pcs: Link Down >>>> [ 25.474030][ C2] stmmac_pcs: Link Down >>>> [ 25.478164][ C2] stmmac_pcs: Link Down >>>> [ 25.482305][ C2] stmmac_pcs: Link Down >>> >>> I don't know this code, so i cannot really comment if not enabling the >>> interrupt is the correct fix or not. But generally an interrupt storm >>> like this is cause because you are not acknowledging the interrupt >>> correctly to clear its status. So rather than not enabling it, maybe >>> you should check what is the correct way to clear the interrupt once >>> it happens? >> >> stmmac PCS support is total crap and shouldn't be used, or stmmac >> should not be using phylink. It's one or the other. Blame Serge for >> this mess. > > Seriously, we could've had this fixed had the patch set I was working > on that fixed stmmac's _bad_ _conversion_ to phylink progressed to the > point of being merged. > > The whole stmmac PCS support is broken, bypassing phylink. > > This series also contained bug fixes for stuff like this interrupt > storm after Serge tested it. However, Serge wanted to turn my series > into his maze of indirect function pointers approach that I disagreed > with, and he wouldn't change his mind on that, so I deleted the series. > > As I keep saying - either stmmac uses phylink *properly* and gets its > PCS hacks sorted out, or it does not use phylink *at* *all*. It's one > or the other. > > I am not going to patch stmmac for any future phylink changes, and if > it breaks, then I'll just say "oh that's a shame, not my problem." > Blame Serge for that. I've had it with the pile of crap that is > stmmac. > Thanks Andrew and Russell for you review comments. Adding Serge here. Lets take a step back and see how i can help here to make sure we can get things merged and the discussion proceeds. Serge please help if can here. Thanks!
On 10/21/2024 8:09 AM, Abhishek Chauhan (ABC) wrote: > > > On 10/21/2024 2:32 AM, Russell King (Oracle) wrote: >> On Mon, Oct 21, 2024 at 10:20:24AM +0100, Russell King (Oracle) wrote: >>> On Sat, Oct 19, 2024 at 04:45:16AM +0200, Andrew Lunn wrote: >>>> On Fri, Oct 18, 2024 at 03:24:07PM -0700, Abhishek Chauhan wrote: >>>>> Currently we disable PCS ANE when the link speed is 2.5Gbps. >>>>> mac_link_up callback internally calls the fix_mac_speed which internally >>>>> calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. >>>>> >>>>> We observed that the CPU utilization is pretty high. That is because >>>>> we saw that the PCS interrupt status line for Link and AN always remain >>>>> asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense >>>>> to also disable the PCS link status and AN complete in the interrupt >>>>> enable register. >>>>> >>>>> Interrupt storm Issue:- >>>>> [ 25.465754][ C2] stmmac_pcs: Link Down >>>>> [ 25.469888][ C2] stmmac_pcs: Link Down >>>>> [ 25.474030][ C2] stmmac_pcs: Link Down >>>>> [ 25.478164][ C2] stmmac_pcs: Link Down >>>>> [ 25.482305][ C2] stmmac_pcs: Link Down >>>> >>>> I don't know this code, so i cannot really comment if not enabling the >>>> interrupt is the correct fix or not. But generally an interrupt storm >>>> like this is cause because you are not acknowledging the interrupt >>>> correctly to clear its status. So rather than not enabling it, maybe >>>> you should check what is the correct way to clear the interrupt once >>>> it happens? >>> >>> stmmac PCS support is total crap and shouldn't be used, or stmmac >>> should not be using phylink. It's one or the other. Blame Serge for >>> this mess. >> >> Seriously, we could've had this fixed had the patch set I was working >> on that fixed stmmac's _bad_ _conversion_ to phylink progressed to the >> point of being merged. >> >> The whole stmmac PCS support is broken, bypassing phylink. >> >> This series also contained bug fixes for stuff like this interrupt >> storm after Serge tested it. However, Serge wanted to turn my series >> into his maze of indirect function pointers approach that I disagreed >> with, and he wouldn't change his mind on that, so I deleted the series. >> >> As I keep saying - either stmmac uses phylink *properly* and gets its >> PCS hacks sorted out, or it does not use phylink *at* *all*. It's one >> or the other. >> >> I am not going to patch stmmac for any future phylink changes, and if >> it breaks, then I'll just say "oh that's a shame, not my problem." >> Blame Serge for that. I've had it with the pile of crap that is >> stmmac. >> > Thanks Andrew and Russell for you review comments. > > Adding Serge here. > > Lets take a step back and see how i can help here to make sure > we can get things merged and the discussion proceeds. > > Serge please help if can here. Thanks! > Andrew, I had a detailed discussion with hardware team internally. Section 1:- ---------------------------------------------------------------------- Here are the updates from my side on the same. 1. ANE feature is disabled for 2.5 Gbps integrated PCS in the stmmac for the PCS link to be up. Experiment was done to turn on ANE bit in the MAC register and i clearly saw pcs link went down when 2.5Gbps link speed was selected 2. if ANE feature is not supported the corresponding PCS interrupts such as ANE and Link status has to be disabled in the MAC block according to hardware team. Note:- today stmmac driver is reading the PCS interrupt status to clear the interrupt. so interrupt handling is done correctly. Section 2:- --------------------------------------------------------------------- Serge can you please respond on the PCS support in stmmac ? We can work together with Russell and see how can we join hands and take things forward. I feel PCS has to communicate to Phylink and phylink has to post the final notification to MAC stating the link is up or not. > > >
> Serge can you please respond on the PCS support in stmmac ?
Unfortunately, Serge has been removed as Maintainer of stmmac as part
of the Russian sanctions.
stmmac currently has no active Maintainer.
Andrew
On 10/25/2024 5:44 AM, Andrew Lunn wrote: >> Serge can you please respond on the PCS support in stmmac ? > > Unfortunately, Serge has been removed as Maintainer of stmmac as part > of the Russian sanctions. > > stmmac currently has no active Maintainer. > > Andrew That's very unfortunate that Serge is removed as Maintainers. Andrew, I am not sure if you are the right person to ask this question How can i proceed with this problem ? I am kind of stuck now. :) Please guide me through this.
diff --git a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c index e65a65666cc1..db77d07af9fe 100644 --- a/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c +++ b/drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c @@ -751,7 +751,16 @@ static void dwmac4_flow_ctrl(struct mac_device_info *hw, unsigned int duplex, static void dwmac4_ctrl_ane(void __iomem *ioaddr, bool ane, bool srgmi_ral, bool loopback) { + u32 intr_mask = readl(ioaddr + GMAC_INT_EN); + dwmac_ctrl_ane(ioaddr, GMAC_PCS_BASE, ane, srgmi_ral, loopback); + + if (!ane) + intr_mask &= ~(GMAC_INT_PCS_LINK | GMAC_INT_PCS_ANE); + else + intr_mask |= (GMAC_INT_PCS_LINK | GMAC_INT_PCS_ANE); + + writel(intr_mask, ioaddr + GMAC_INT_EN); } static void dwmac4_get_adv_lp(void __iomem *ioaddr, struct rgmii_adv *adv)
Currently we disable PCS ANE when the link speed is 2.5Gbps. mac_link_up callback internally calls the fix_mac_speed which internally calls stmmac_pcs_ctrl_ane to disable the ANE for 2.5Gbps. We observed that the CPU utilization is pretty high. That is because we saw that the PCS interrupt status line for Link and AN always remain asserted. Since we are disabling the PCS ANE for 2.5Gbps it makes sense to also disable the PCS link status and AN complete in the interrupt enable register. Interrupt storm Issue:- [ 25.465754][ C2] stmmac_pcs: Link Down [ 25.469888][ C2] stmmac_pcs: Link Down [ 25.474030][ C2] stmmac_pcs: Link Down [ 25.478164][ C2] stmmac_pcs: Link Down [ 25.482305][ C2] stmmac_pcs: Link Down [ 25.486441][ C2] stmmac_pcs: Link Down [ 25.486635][ C4] watchdog0: pretimeout event [ 25.490585][ C2] stmmac_pcs: Link Down [ 25.499341][ C2] stmmac_pcs: Link Down [ 25.503484][ C2] stmmac_pcs: Link Down [ 25.507619][ C2] stmmac_pcs: Link Down [ 25.511760][ C2] stmmac_pcs: Link Down [ 25.515897][ C2] stmmac_pcs: Link Down [ 25.520038][ C2] stmmac_pcs: Link Down [ 25.524174][ C2] stmmac_pcs: Link Down [ 25.528316][ C2] stmmac_pcs: Link Down [ 25.532451][ C2] stmmac_pcs: Link Down [ 25.536591][ C2] stmmac_pcs: Link Down [ 25.540724][ C2] stmmac_pcs: Link Down [ 25.544866][ C2] stmmac_pcs: Link Down Once we disabled PCS ANE and Link Status interrupt issue disappears. Fixes: a818bd12538c ("net: stmmac: dwmac-qcom-ethqos: Add support for 2.5G SGMII") Signed-off-by: Abhishek Chauhan <quic_abchauha@quicinc.com> --- drivers/net/ethernet/stmicro/stmmac/dwmac4_core.c | 9 +++++++++ 1 file changed, 9 insertions(+)