Message ID | 20241025105620.1891596-1-andre.przywara@arm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] clk: sunxi-ng: h616: Reparent CPU clock during frequency changes | expand |
On Fri, Oct 25, 2024 at 6:56 PM Andre Przywara <andre.przywara@arm.com> wrote: > > The H616 user manual recommends to re-parent the CPU clock during > frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs > at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit, > after the factor changes have been applied. > > Add clock notifiers for the PLL and the CPU mux clock, using the existing > notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source), > and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the > correct algorithms. > > Signed-off-by: Andre Przywara <andre.przywara@arm.com> > --- > Hi, > > the manual states that those changes would be needed to safely change > the CPU_PLL frequency during DVFS operation. On my H618 boards it works > fine without them, but Philippe reported problems on his H700 board. > Posting this for reference at this point, to see if it helps people. > I am not sure we should change this without it fixing any real issues. IIRC we do this for all the other SoCs. But if you want to be cautious, we can wait for Philippe to give a Tested-by? ChenYu > The same algorithm would apply to the A100/A133 (and the upcoming A523) > as well. > > Cheers, > Andre > > drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > index 84e406ddf9d12..85eea196f25e3 100644 > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = { > SUN50I_H616_USB3_CLK_REG, > }; > > +static struct ccu_mux_nb sun50i_h616_cpu_nb = { > + .common = &cpux_clk.common, > + .cm = &cpux_clk.mux, > + .delay_us = 1, /* manual doesn't really say */ > + .bypass_index = 4, /* PLL_PERI0@600MHz, as recommended by manual */ > +}; > + > +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = { > + .common = &pll_cpux_clk.common, > + .enable = BIT(29), /* LOCK_ENABLE */ > + .lock = BIT(28), > +}; > + > static int sun50i_h616_ccu_probe(struct platform_device *pdev) > { > void __iomem *reg; > u32 val; > - int i; > + int ret, i; > > reg = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(reg)) > @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev) > val |= BIT(24); > writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG); > > - return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > + ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > + if (ret) > + return ret; > + > + /* Reparent CPU during CPU PLL rate changes */ > + ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk, > + &sun50i_h616_cpu_nb); > + > + /* Re-lock the CPU PLL after any rate changes */ > + ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb); > + > + return 0; > } > > static const struct of_device_id sun50i_h616_ccu_ids[] = { > -- > 2.25.1 >
On Fri, 25 Oct 2024 22:49:27 +0800 Chen-Yu Tsai <wens@csie.org> wrote: Hi, > On Fri, Oct 25, 2024 at 6:56 PM Andre Przywara <andre.przywara@arm.com> wrote: > > > > The H616 user manual recommends to re-parent the CPU clock during > > frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs > > at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit, > > after the factor changes have been applied. > > > > Add clock notifiers for the PLL and the CPU mux clock, using the existing > > notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source), > > and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the > > correct algorithms. > > > > Signed-off-by: Andre Przywara <andre.przywara@arm.com> > > --- > > Hi, > > > > the manual states that those changes would be needed to safely change > > the CPU_PLL frequency during DVFS operation. On my H618 boards it works > > fine without them, but Philippe reported problems on his H700 board. > > Posting this for reference at this point, to see if it helps people. > > I am not sure we should change this without it fixing any real issues. > > IIRC we do this for all the other SoCs. But if you want to be cautious, > we can wait for Philippe to give a Tested-by? Yes, I copied this code from the A64 CCU, but IIRC this was desperately needed there. But so far I didn't hear many complaints on the H616, and I ran through like 100,000 transistions in a matter on minutes without any issues yesterday. And apparently this patch doesn't fix Philippe's immediate problem, so I would like to hold it back for now, until we have either more testing, with or without this patch. Thanks, Andre > ChenYu > > > The same algorithm would apply to the A100/A133 (and the upcoming A523) > > as well. > > > > Cheers, > > Andre > > > > drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++-- > > 1 file changed, 26 insertions(+), 2 deletions(-) > > > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > > index 84e406ddf9d12..85eea196f25e3 100644 > > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > > @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = { > > SUN50I_H616_USB3_CLK_REG, > > }; > > > > +static struct ccu_mux_nb sun50i_h616_cpu_nb = { > > + .common = &cpux_clk.common, > > + .cm = &cpux_clk.mux, > > + .delay_us = 1, /* manual doesn't really say */ > > + .bypass_index = 4, /* PLL_PERI0@600MHz, as recommended by manual */ > > +}; > > + > > +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = { > > + .common = &pll_cpux_clk.common, > > + .enable = BIT(29), /* LOCK_ENABLE */ > > + .lock = BIT(28), > > +}; > > + > > static int sun50i_h616_ccu_probe(struct platform_device *pdev) > > { > > void __iomem *reg; > > u32 val; > > - int i; > > + int ret, i; > > > > reg = devm_platform_ioremap_resource(pdev, 0); > > if (IS_ERR(reg)) > > @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev) > > val |= BIT(24); > > writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG); > > > > - return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > > + ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > > + if (ret) > > + return ret; > > + > > + /* Reparent CPU during CPU PLL rate changes */ > > + ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk, > > + &sun50i_h616_cpu_nb); > > + > > + /* Re-lock the CPU PLL after any rate changes */ > > + ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb); > > + > > + return 0; > > } > > > > static const struct of_device_id sun50i_h616_ccu_ids[] = { > > -- > > 2.25.1 > >
Hi, I made various tests with this patch, and it doesn't resolve the issue on H700. It doesn't hurt either, so it's up to you to keep it or not. For the H700, I'm wondering if DVFS is the issue. I mean that's how I trigger the crash, but the crash maybe just a side effect, I've ran tests across the transition matrix, and they all seems to works... until it crash. Crashes are random in their occurrence and their manifestation... I'm clueless at what could it be. I've stressed the DRAM at various CPU speeds, and this seems to be stable, but again we have detection issues with u-boot... so... Philippe On 25/10/2024 12:56, Andre Przywara wrote: > The H616 user manual recommends to re-parent the CPU clock during > frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs > at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit, > after the factor changes have been applied. > > Add clock notifiers for the PLL and the CPU mux clock, using the existing > notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source), > and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the > correct algorithms. > > Signed-off-by: Andre Przywara <andre.przywara@arm.com> > --- > Hi, > > the manual states that those changes would be needed to safely change > the CPU_PLL frequency during DVFS operation. On my H618 boards it works > fine without them, but Philippe reported problems on his H700 board. > Posting this for reference at this point, to see if it helps people. > I am not sure we should change this without it fixing any real issues. > > The same algorithm would apply to the A100/A133 (and the upcoming A523) > as well. > > Cheers, > Andre > > drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++-- > 1 file changed, 26 insertions(+), 2 deletions(-) > > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > index 84e406ddf9d12..85eea196f25e3 100644 > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c > @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = { > SUN50I_H616_USB3_CLK_REG, > }; > > +static struct ccu_mux_nb sun50i_h616_cpu_nb = { > + .common = &cpux_clk.common, > + .cm = &cpux_clk.mux, > + .delay_us = 1, /* manual doesn't really say */ > + .bypass_index = 4, /* PLL_PERI0@600MHz, as recommended by manual */ > +}; > + > +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = { > + .common = &pll_cpux_clk.common, > + .enable = BIT(29), /* LOCK_ENABLE */ > + .lock = BIT(28), > +}; > + > static int sun50i_h616_ccu_probe(struct platform_device *pdev) > { > void __iomem *reg; > u32 val; > - int i; > + int ret, i; > > reg = devm_platform_ioremap_resource(pdev, 0); > if (IS_ERR(reg)) > @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev) > val |= BIT(24); > writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG); > > - return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > + ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); > + if (ret) > + return ret; > + > + /* Reparent CPU during CPU PLL rate changes */ > + ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk, > + &sun50i_h616_cpu_nb); > + > + /* Re-lock the CPU PLL after any rate changes */ > + ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb); > + > + return 0; > } > > static const struct of_device_id sun50i_h616_ccu_ids[] = {
Tested-by: Evgeny Boger <boger@wirenboard.com>
We had stability issues with some of our T507-based boards. T507 is the
same die as H616, to my knowledge.
They were fixed by essentially the same patch, which we unfortunately
didn't submitted to mainline:
https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af
It's worth noticing that not only the reparenting is mandated by T5 User
Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the
same way.
We tested the patch extensively on dozens of custom T507 boards (Wiren
Board 8 PLC). In our test it significantly improved the stability,
especially at low core voltages.
From my understanding, all Allwinner SoCs need to follow this kind of
procedure, however it's only implemented in mainline for a handful of chips.
On Fri, 8 Nov 2024 23:14:51 +0300 Evgeny Boger <boger@wirenboard.com> wrote: Hi Evgeny, > Tested-by: Evgeny Boger <boger@wirenboard.com> > > We had stability issues with some of our T507-based boards. T507 is the > same die as H616, to my knowledge. > They were fixed by essentially the same patch, which we unfortunately > didn't submitted to mainline: > https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af > > It's worth noticing that not only the reparenting is mandated by T5 User > Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the > same way. > > We tested the patch extensively on dozens of custom T507 boards (Wiren > Board 8 PLC). In our test it significantly improved the stability, > especially at low core voltages. many thanks for this reply, I was hoping for such a kind of report! I typically don't test those things in anger, and only have a few boards, so having those reports from the real world is very helpful! Can you maybe give some hint on how you tested this? Does "at low core voltages" mean you forced transitions between the lower OPPs only, or were the chips undervolted? > From my understanding, all Allwinner SoCs need to follow this kind of > procedure, however it's only implemented in mainline for a handful of chips. Yes, I saw, I have added this to my A523 code already, and prepared a patch for the H6. Do you have boards with any other Allwinner SoCs you could test on, or even already have experience with? Cheers, Andre
On 11/9/24 01:34, Andre Przywara wrote: > On Fri, 8 Nov 2024 23:14:51 +0300 > Evgeny Boger <boger@wirenboard.com> wrote: > > Hi Evgeny, > >> Tested-by: Evgeny Boger <boger@wirenboard.com> >> >> We had stability issues with some of our T507-based boards. T507 is the >> same die as H616, to my knowledge. >> They were fixed by essentially the same patch, which we unfortunately >> didn't submitted to mainline: >> https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af >> >> It's worth noticing that not only the reparenting is mandated by T5 User >> Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the >> same way. >> >> We tested the patch extensively on dozens of custom T507 boards (Wiren >> Board 8 PLC). In our test it significantly improved the stability, >> especially at low core voltages. > many thanks for this reply, I was hoping for such a kind of report! > I typically don't test those things in anger, and only have a few > boards, so having those reports from the real world is very helpful! > > Can you maybe give some hint on how you tested this? Does "at low core > voltages" mean you forced transitions between the lower OPPs only, or > were the chips undervolted? Both, in a way. Some boards (about 1 in 20 or so) would hang after a few days of operation. During our investigation, we found they would never hang under stress testing, so we started examining cpufreq-related factors. Disabling lower OPPs also prevented hanging. If we artificially lowered the OPP voltages (undervolting the chip), the boards would hang much faster without the patch, and even the previously stable ones would start to hang. > >> From my understanding, all Allwinner SoCs need to follow this kind of >> procedure, however it's only implemented in mainline for a handful of chips. > Yes, I saw, I have added this to my A523 code already, and prepared a > patch for the H6. > Do you have boards with any other Allwinner SoCs you could test on, or > even already have experience with? Unfortunately, no, not really. We only use the T507 and A40i at the moment. We’ve never had these kinds of issues with the A40i, though. By the way, the A40i is among the few Allwinner chips with reparenting implemented in the mainline. The A523/T527 is really nice; it's a pity it's limited to 4GB RAM. > > Cheers, > Andre
On Sat, Nov 9, 2024 at 7:15 AM Evgeny Boger <boger@wirenboard.com> wrote: > > On 11/9/24 01:34, Andre Przywara wrote: > > On Fri, 8 Nov 2024 23:14:51 +0300 > > Evgeny Boger <boger@wirenboard.com> wrote: > > > > Hi Evgeny, > > > >> Tested-by: Evgeny Boger <boger@wirenboard.com> > >> > >> We had stability issues with some of our T507-based boards. T507 is the > >> same die as H616, to my knowledge. > >> They were fixed by essentially the same patch, which we unfortunately > >> didn't submitted to mainline: > >> https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af > >> > >> It's worth noticing that not only the reparenting is mandated by T5 User > >> Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the > >> same way. > >> > >> We tested the patch extensively on dozens of custom T507 boards (Wiren > >> Board 8 PLC). In our test it significantly improved the stability, > >> especially at low core voltages. > > many thanks for this reply, I was hoping for such a kind of report! > > I typically don't test those things in anger, and only have a few > > boards, so having those reports from the real world is very helpful! > > > > Can you maybe give some hint on how you tested this? Does "at low core > > voltages" mean you forced transitions between the lower OPPs only, or > > were the chips undervolted? > Both, in a way. Some boards (about 1 in 20 or so) would hang after a few > days of operation. > > During our investigation, we found they would never hang under stress > testing, so we started examining cpufreq-related factors. > > Disabling lower OPPs also prevented hanging. If we artificially lowered > the OPP voltages (undervolting the chip), the boards would hang much > faster without the patch, and even the previously stable ones would > start to hang. I guess we can merge this one then? ChenYu > >> From my understanding, all Allwinner SoCs need to follow this kind of > >> procedure, however it's only implemented in mainline for a handful of chips. > > Yes, I saw, I have added this to my A523 code already, and prepared a > > patch for the H6. > > Do you have boards with any other Allwinner SoCs you could test on, or > > even already have experience with? > Unfortunately, no, not really. We only use the T507 and A40i at the moment. > We’ve never had these kinds of issues with the A40i, though. By the way, > the A40i is among the few Allwinner chips with reparenting implemented > in the mainline. > > The A523/T527 is really nice; it's a pity it's limited to 4GB RAM. > > > > > Cheers, > > Andre > > -- > Kind regards, > Evgeny Boger >
diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c index 84e406ddf9d12..85eea196f25e3 100644 --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = { SUN50I_H616_USB3_CLK_REG, }; +static struct ccu_mux_nb sun50i_h616_cpu_nb = { + .common = &cpux_clk.common, + .cm = &cpux_clk.mux, + .delay_us = 1, /* manual doesn't really say */ + .bypass_index = 4, /* PLL_PERI0@600MHz, as recommended by manual */ +}; + +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = { + .common = &pll_cpux_clk.common, + .enable = BIT(29), /* LOCK_ENABLE */ + .lock = BIT(28), +}; + static int sun50i_h616_ccu_probe(struct platform_device *pdev) { void __iomem *reg; u32 val; - int i; + int ret, i; reg = devm_platform_ioremap_resource(pdev, 0); if (IS_ERR(reg)) @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev) val |= BIT(24); writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG); - return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); + ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc); + if (ret) + return ret; + + /* Reparent CPU during CPU PLL rate changes */ + ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk, + &sun50i_h616_cpu_nb); + + /* Re-lock the CPU PLL after any rate changes */ + ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb); + + return 0; } static const struct of_device_id sun50i_h616_ccu_ids[] = {
The H616 user manual recommends to re-parent the CPU clock during frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit, after the factor changes have been applied. Add clock notifiers for the PLL and the CPU mux clock, using the existing notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source), and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the correct algorithms. Signed-off-by: Andre Przywara <andre.przywara@arm.com> --- Hi, the manual states that those changes would be needed to safely change the CPU_PLL frequency during DVFS operation. On my H618 boards it works fine without them, but Philippe reported problems on his H700 board. Posting this for reference at this point, to see if it helps people. I am not sure we should change this without it fixing any real issues. The same algorithm would apply to the A100/A133 (and the upcoming A523) as well. Cheers, Andre drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++-- 1 file changed, 26 insertions(+), 2 deletions(-)