diff mbox series

[RFC] clk: sunxi-ng: h616: Reparent CPU clock during frequency changes

Message ID 20241025105620.1891596-1-andre.przywara@arm.com (mailing list archive)
State New
Headers show
Series [RFC] clk: sunxi-ng: h616: Reparent CPU clock during frequency changes | expand

Commit Message

Andre Przywara Oct. 25, 2024, 10:56 a.m. UTC
The H616 user manual recommends to re-parent the CPU clock during
frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs
at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit,
after the factor changes have been applied.

Add clock notifiers for the PLL and the CPU mux clock, using the existing
notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source),
and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the
correct algorithms.

Signed-off-by: Andre Przywara <andre.przywara@arm.com>
---
Hi,

the manual states that those changes would be needed to safely change
the CPU_PLL frequency during DVFS operation. On my H618 boards it works
fine without them, but Philippe reported problems on his H700 board.
Posting this for reference at this point, to see if it helps people.
I am not sure we should change this without it fixing any real issues.

The same algorithm would apply to the A100/A133 (and the upcoming A523)
as well.

Cheers,
Andre

 drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++--
 1 file changed, 26 insertions(+), 2 deletions(-)

Comments

Chen-Yu Tsai Oct. 25, 2024, 2:49 p.m. UTC | #1
On Fri, Oct 25, 2024 at 6:56 PM Andre Przywara <andre.przywara@arm.com> wrote:
>
> The H616 user manual recommends to re-parent the CPU clock during
> frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs
> at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit,
> after the factor changes have been applied.
>
> Add clock notifiers for the PLL and the CPU mux clock, using the existing
> notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source),
> and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the
> correct algorithms.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Hi,
>
> the manual states that those changes would be needed to safely change
> the CPU_PLL frequency during DVFS operation. On my H618 boards it works
> fine without them, but Philippe reported problems on his H700 board.
> Posting this for reference at this point, to see if it helps people.
> I am not sure we should change this without it fixing any real issues.

IIRC we do this for all the other SoCs. But if you want to be cautious,
we can wait for Philippe to give a Tested-by?

ChenYu

> The same algorithm would apply to the A100/A133 (and the upcoming A523)
> as well.
>
> Cheers,
> Andre
>
>  drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++--
>  1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> index 84e406ddf9d12..85eea196f25e3 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = {
>         SUN50I_H616_USB3_CLK_REG,
>  };
>
> +static struct ccu_mux_nb sun50i_h616_cpu_nb = {
> +       .common         = &cpux_clk.common,
> +       .cm             = &cpux_clk.mux,
> +       .delay_us       = 1, /* manual doesn't really say */
> +       .bypass_index   = 4, /* PLL_PERI0@600MHz, as recommended by manual */
> +};
> +
> +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = {
> +       .common         = &pll_cpux_clk.common,
> +       .enable         = BIT(29),      /* LOCK_ENABLE */
> +       .lock           = BIT(28),
> +};
> +
>  static int sun50i_h616_ccu_probe(struct platform_device *pdev)
>  {
>         void __iomem *reg;
>         u32 val;
> -       int i;
> +       int ret, i;
>
>         reg = devm_platform_ioremap_resource(pdev, 0);
>         if (IS_ERR(reg))
> @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev)
>         val |= BIT(24);
>         writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG);
>
> -       return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> +       ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> +       if (ret)
> +               return ret;
> +
> +       /* Reparent CPU during CPU PLL rate changes */
> +       ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk,
> +                                 &sun50i_h616_cpu_nb);
> +
> +       /* Re-lock the CPU PLL after any rate changes */
> +       ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb);
> +
> +       return 0;
>  }
>
>  static const struct of_device_id sun50i_h616_ccu_ids[] = {
> --
> 2.25.1
>
Andre Przywara Oct. 25, 2024, 3:05 p.m. UTC | #2
On Fri, 25 Oct 2024 22:49:27 +0800
Chen-Yu Tsai <wens@csie.org> wrote:

Hi,

> On Fri, Oct 25, 2024 at 6:56 PM Andre Przywara <andre.przywara@arm.com> wrote:
> >
> > The H616 user manual recommends to re-parent the CPU clock during
> > frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs
> > at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit,
> > after the factor changes have been applied.
> >
> > Add clock notifiers for the PLL and the CPU mux clock, using the existing
> > notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source),
> > and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the
> > correct algorithms.
> >
> > Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> > ---
> > Hi,
> >
> > the manual states that those changes would be needed to safely change
> > the CPU_PLL frequency during DVFS operation. On my H618 boards it works
> > fine without them, but Philippe reported problems on his H700 board.
> > Posting this for reference at this point, to see if it helps people.
> > I am not sure we should change this without it fixing any real issues.  
> 
> IIRC we do this for all the other SoCs. But if you want to be cautious,
> we can wait for Philippe to give a Tested-by?

Yes, I copied this code from the A64 CCU, but IIRC this was desperately
needed there. But so far I didn't hear many complaints on the H616, and I
ran through like 100,000 transistions in a matter on minutes without any
issues yesterday.
And apparently this patch doesn't fix Philippe's immediate problem, so I
would like to hold it back for now, until we have either more testing,
with or without this patch.

Thanks,
Andre

> ChenYu
> 
> > The same algorithm would apply to the A100/A133 (and the upcoming A523)
> > as well.
> >
> > Cheers,
> > Andre
> >
> >  drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++--
> >  1 file changed, 26 insertions(+), 2 deletions(-)
> >
> > diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> > index 84e406ddf9d12..85eea196f25e3 100644
> > --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> > +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> > @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = {
> >         SUN50I_H616_USB3_CLK_REG,
> >  };
> >
> > +static struct ccu_mux_nb sun50i_h616_cpu_nb = {
> > +       .common         = &cpux_clk.common,
> > +       .cm             = &cpux_clk.mux,
> > +       .delay_us       = 1, /* manual doesn't really say */
> > +       .bypass_index   = 4, /* PLL_PERI0@600MHz, as recommended by manual */
> > +};
> > +
> > +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = {
> > +       .common         = &pll_cpux_clk.common,
> > +       .enable         = BIT(29),      /* LOCK_ENABLE */
> > +       .lock           = BIT(28),
> > +};
> > +
> >  static int sun50i_h616_ccu_probe(struct platform_device *pdev)
> >  {
> >         void __iomem *reg;
> >         u32 val;
> > -       int i;
> > +       int ret, i;
> >
> >         reg = devm_platform_ioremap_resource(pdev, 0);
> >         if (IS_ERR(reg))
> > @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev)
> >         val |= BIT(24);
> >         writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG);
> >
> > -       return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> > +       ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> > +       if (ret)
> > +               return ret;
> > +
> > +       /* Reparent CPU during CPU PLL rate changes */
> > +       ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk,
> > +                                 &sun50i_h616_cpu_nb);
> > +
> > +       /* Re-lock the CPU PLL after any rate changes */
> > +       ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb);
> > +
> > +       return 0;
> >  }
> >
> >  static const struct of_device_id sun50i_h616_ccu_ids[] = {
> > --
> > 2.25.1
> >
Philippe Simons Oct. 28, 2024, 8:10 p.m. UTC | #3
Hi,

I made various tests with this patch, and it doesn't resolve the issue 
on H700.

It doesn't hurt either, so it's up to you to keep it or not.


For the H700, I'm wondering if DVFS is the issue. I mean that's how I 
trigger the crash,

but the crash maybe just a side effect, I've ran tests across the 
transition matrix, and they all seems to works... until it crash.


Crashes are random in their occurrence and their manifestation...


I'm clueless at what could it be.


I've stressed the DRAM at various CPU speeds, and this seems to be 
stable, but again we have detection issues with u-boot... so...

Philippe

On 25/10/2024 12:56, Andre Przywara wrote:
> The H616 user manual recommends to re-parent the CPU clock during
> frequency changes of the PLL, and recommends PLL_PERI0(1X), which runs
> at 600 MHz. Also it asks to disable and then re-enable the PLL lock bit,
> after the factor changes have been applied.
>
> Add clock notifiers for the PLL and the CPU mux clock, using the existing
> notifier callbacks, and tell them to use mux 4 (the PLL_PERI0(1X) source),
> and bit 29 (the LOCK_ENABLE) bit. The existing code already follows the
> correct algorithms.
>
> Signed-off-by: Andre Przywara <andre.przywara@arm.com>
> ---
> Hi,
>
> the manual states that those changes would be needed to safely change
> the CPU_PLL frequency during DVFS operation. On my H618 boards it works
> fine without them, but Philippe reported problems on his H700 board.
> Posting this for reference at this point, to see if it helps people.
> I am not sure we should change this without it fixing any real issues.
>
> The same algorithm would apply to the A100/A133 (and the upcoming A523)
> as well.
>
> Cheers,
> Andre
>
>   drivers/clk/sunxi-ng/ccu-sun50i-h616.c | 28 ++++++++++++++++++++++++--
>   1 file changed, 26 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> index 84e406ddf9d12..85eea196f25e3 100644
> --- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> +++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
> @@ -1095,11 +1095,24 @@ static const u32 usb2_clk_regs[] = {
>   	SUN50I_H616_USB3_CLK_REG,
>   };
>   
> +static struct ccu_mux_nb sun50i_h616_cpu_nb = {
> +	.common		= &cpux_clk.common,
> +	.cm		= &cpux_clk.mux,
> +	.delay_us	= 1, /* manual doesn't really say */
> +	.bypass_index	= 4, /* PLL_PERI0@600MHz, as recommended by manual */
> +};
> +
> +static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = {
> +	.common		= &pll_cpux_clk.common,
> +	.enable		= BIT(29),	/* LOCK_ENABLE */
> +	.lock		= BIT(28),
> +};
> +
>   static int sun50i_h616_ccu_probe(struct platform_device *pdev)
>   {
>   	void __iomem *reg;
>   	u32 val;
> -	int i;
> +	int ret, i;
>   
>   	reg = devm_platform_ioremap_resource(pdev, 0);
>   	if (IS_ERR(reg))
> @@ -1152,7 +1165,18 @@ static int sun50i_h616_ccu_probe(struct platform_device *pdev)
>   	val |= BIT(24);
>   	writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG);
>   
> -	return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> +	ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
> +	if (ret)
> +		return ret;
> +
> +	/* Reparent CPU during CPU PLL rate changes */
> +	ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk,
> +				  &sun50i_h616_cpu_nb);
> +
> +	/* Re-lock the CPU PLL after any rate changes */
> +	ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb);
> +
> +	return 0;
>   }
>   
>   static const struct of_device_id sun50i_h616_ccu_ids[] = {
Evgeny Boger Nov. 8, 2024, 8:14 p.m. UTC | #4
Tested-by: Evgeny Boger <boger@wirenboard.com>

We had stability issues with some of our T507-based boards. T507 is the 
same die as H616, to my knowledge.
They were fixed by essentially the same patch, which we unfortunately 
didn't submitted to mainline: 
https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af

It's worth noticing that not only the reparenting is mandated by T5 User 
Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the 
same way.

We tested the patch extensively on dozens of custom T507 boards (Wiren 
Board 8 PLC). In our test it significantly improved the stability, 
especially at low core voltages.

 From my understanding, all Allwinner SoCs need to follow this kind of 
procedure, however it's only implemented in mainline for a handful of chips.
Andre Przywara Nov. 8, 2024, 10:34 p.m. UTC | #5
On Fri, 8 Nov 2024 23:14:51 +0300
Evgeny Boger <boger@wirenboard.com> wrote:

Hi Evgeny,

> Tested-by: Evgeny Boger <boger@wirenboard.com>
> 
> We had stability issues with some of our T507-based boards. T507 is the 
> same die as H616, to my knowledge.
> They were fixed by essentially the same patch, which we unfortunately 
> didn't submitted to mainline: 
> https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af
> 
> It's worth noticing that not only the reparenting is mandated by T5 User 
> Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the 
> same way.
> 
> We tested the patch extensively on dozens of custom T507 boards (Wiren 
> Board 8 PLC). In our test it significantly improved the stability, 
> especially at low core voltages.

many thanks for this reply, I was hoping for such a kind of report!
I typically don't test those things in anger, and only have a few
boards, so having those reports from the real world is very helpful!

Can you maybe give some hint on how you tested this? Does "at low core
voltages" mean you forced transitions between the lower OPPs only, or
were the chips undervolted?

>  From my understanding, all Allwinner SoCs need to follow this kind of 
> procedure, however it's only implemented in mainline for a handful of chips.

Yes, I saw, I have added this to my A523 code already, and prepared a
patch for the H6.
Do you have boards with any other Allwinner SoCs you could test on, or
even already have experience with?

Cheers,
Andre
Evgeny Boger Nov. 8, 2024, 11:14 p.m. UTC | #6
On 11/9/24 01:34, Andre Przywara wrote:
> On Fri, 8 Nov 2024 23:14:51 +0300
> Evgeny Boger <boger@wirenboard.com> wrote:
>
> Hi Evgeny,
>
>> Tested-by: Evgeny Boger <boger@wirenboard.com>
>>
>> We had stability issues with some of our T507-based boards. T507 is the
>> same die as H616, to my knowledge.
>> They were fixed by essentially the same patch, which we unfortunately
>> didn't submitted to mainline:
>> https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af
>>
>> It's worth noticing that not only the reparenting is mandated by T5 User
>> Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the
>> same way.
>>
>> We tested the patch extensively on dozens of custom T507 boards (Wiren
>> Board 8 PLC). In our test it significantly improved the stability,
>> especially at low core voltages.
> many thanks for this reply, I was hoping for such a kind of report!
> I typically don't test those things in anger, and only have a few
> boards, so having those reports from the real world is very helpful!
>
> Can you maybe give some hint on how you tested this? Does "at low core
> voltages" mean you forced transitions between the lower OPPs only, or
> were the chips undervolted?
Both, in a way. Some boards (about 1 in 20 or so) would hang after a few 
days of operation.

During our investigation, we found they would never hang under stress 
testing, so we started examining cpufreq-related factors.

Disabling lower OPPs also prevented hanging. If we artificially lowered 
the OPP voltages (undervolting the chip), the boards would hang much 
faster without the patch, and even the previously stable ones would 
start to hang.

>
>>   From my understanding, all Allwinner SoCs need to follow this kind of
>> procedure, however it's only implemented in mainline for a handful of chips.
> Yes, I saw, I have added this to my A523 code already, and prepared a
> patch for the H6.
> Do you have boards with any other Allwinner SoCs you could test on, or
> even already have experience with?
Unfortunately, no, not really. We only use the T507 and A40i at the moment.
We’ve never had these kinds of issues with the A40i, though. By the way, 
the A40i is among the few Allwinner chips with reparenting implemented 
in the mainline.

The A523/T527 is really nice; it's a pity it's limited to 4GB RAM.

>
> Cheers,
> Andre
Chen-Yu Tsai Nov. 10, 2024, 12:33 p.m. UTC | #7
On Sat, Nov 9, 2024 at 7:15 AM Evgeny Boger <boger@wirenboard.com> wrote:
>
> On 11/9/24 01:34, Andre Przywara wrote:
> > On Fri, 8 Nov 2024 23:14:51 +0300
> > Evgeny Boger <boger@wirenboard.com> wrote:
> >
> > Hi Evgeny,
> >
> >> Tested-by: Evgeny Boger <boger@wirenboard.com>
> >>
> >> We had stability issues with some of our T507-based boards. T507 is the
> >> same die as H616, to my knowledge.
> >> They were fixed by essentially the same patch, which we unfortunately
> >> didn't submitted to mainline:
> >> https://github.com/wirenboard/linux/commit/dc06e377108c935b2d1f5ce3d54ca1a1756458af
> >>
> >> It's worth noticing that not only the reparenting is mandated by T5 User
> >> Manual (section 3.3.3.1), it's also is implemented in vendor BSP in the
> >> same way.
> >>
> >> We tested the patch extensively on dozens of custom T507 boards (Wiren
> >> Board 8 PLC). In our test it significantly improved the stability,
> >> especially at low core voltages.
> > many thanks for this reply, I was hoping for such a kind of report!
> > I typically don't test those things in anger, and only have a few
> > boards, so having those reports from the real world is very helpful!
> >
> > Can you maybe give some hint on how you tested this? Does "at low core
> > voltages" mean you forced transitions between the lower OPPs only, or
> > were the chips undervolted?
> Both, in a way. Some boards (about 1 in 20 or so) would hang after a few
> days of operation.
>
> During our investigation, we found they would never hang under stress
> testing, so we started examining cpufreq-related factors.
>
> Disabling lower OPPs also prevented hanging. If we artificially lowered
> the OPP voltages (undervolting the chip), the boards would hang much
> faster without the patch, and even the previously stable ones would
> start to hang.

I guess we can merge this one then?


ChenYu

> >>   From my understanding, all Allwinner SoCs need to follow this kind of
> >> procedure, however it's only implemented in mainline for a handful of chips.
> > Yes, I saw, I have added this to my A523 code already, and prepared a
> > patch for the H6.
> > Do you have boards with any other Allwinner SoCs you could test on, or
> > even already have experience with?
> Unfortunately, no, not really. We only use the T507 and A40i at the moment.
> We’ve never had these kinds of issues with the A40i, though. By the way,
> the A40i is among the few Allwinner chips with reparenting implemented
> in the mainline.
>
> The A523/T527 is really nice; it's a pity it's limited to 4GB RAM.
>
> >
> > Cheers,
> > Andre
>
> --
> Kind regards,
> Evgeny Boger
>
diff mbox series

Patch

diff --git a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
index 84e406ddf9d12..85eea196f25e3 100644
--- a/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
+++ b/drivers/clk/sunxi-ng/ccu-sun50i-h616.c
@@ -1095,11 +1095,24 @@  static const u32 usb2_clk_regs[] = {
 	SUN50I_H616_USB3_CLK_REG,
 };
 
+static struct ccu_mux_nb sun50i_h616_cpu_nb = {
+	.common		= &cpux_clk.common,
+	.cm		= &cpux_clk.mux,
+	.delay_us	= 1, /* manual doesn't really say */
+	.bypass_index	= 4, /* PLL_PERI0@600MHz, as recommended by manual */
+};
+
+static struct ccu_pll_nb sun50i_h616_pll_cpu_nb = {
+	.common		= &pll_cpux_clk.common,
+	.enable		= BIT(29),	/* LOCK_ENABLE */
+	.lock		= BIT(28),
+};
+
 static int sun50i_h616_ccu_probe(struct platform_device *pdev)
 {
 	void __iomem *reg;
 	u32 val;
-	int i;
+	int ret, i;
 
 	reg = devm_platform_ioremap_resource(pdev, 0);
 	if (IS_ERR(reg))
@@ -1152,7 +1165,18 @@  static int sun50i_h616_ccu_probe(struct platform_device *pdev)
 	val |= BIT(24);
 	writel(val, reg + SUN50I_H616_HDMI_CEC_CLK_REG);
 
-	return devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
+	ret = devm_sunxi_ccu_probe(&pdev->dev, reg, &sun50i_h616_ccu_desc);
+	if (ret)
+		return ret;
+
+	/* Reparent CPU during CPU PLL rate changes */
+	ccu_mux_notifier_register(pll_cpux_clk.common.hw.clk,
+				  &sun50i_h616_cpu_nb);
+
+	/* Re-lock the CPU PLL after any rate changes */
+	ccu_pll_notifier_register(&sun50i_h616_pll_cpu_nb);
+
+	return 0;
 }
 
 static const struct of_device_id sun50i_h616_ccu_ids[] = {