diff mbox

[1/3] clk: sunxi-ng: Add clk notifier to gate then ungate PLL clocks

Message ID 20170413021354.3258-2-wens@csie.org (mailing list archive)
State Accepted
Headers show

Commit Message

Chen-Yu Tsai April 13, 2017, 2:13 a.m. UTC
In common PLL designs, changes to the dividers take effect almost
immediately, while changes to the multipliers (implemented as
dividers in the feedback loop) take a few cycles to work into
the feedback loop for the PLL to stablize.

Sometimes when the PLL clock rate is changed, the decrease in the
divider is too much for the decrease in the multiplier to catch up.
The PLL clock rate will spike, and in some cases, might lock up
completely. This is especially the case if the divider changed is
the pre-divider, which affects the reference frequency.

This patch introduces a clk notifier callback that will gate and
then ungate a clk after a rate change, effectively resetting it,
so it continues to work, despite any possible lockups. Care must
be taken to reparent any consumers to other temporary clocks during
the rate change, and that this notifier callback must be the first
to be registered.

This is intended to fix occasional lockups with cpufreq on newer
Allwinner SoCs, such as the A33 and the H3. Previously it was
thought that reparenting the cpu clock away from the PLL while
it stabilized was enough, as this worked quite well on the A31.

On the A33, hangs have been observed after cpufreq was recently
introduced. With the H3, a more thorough test [1] showed that
reparenting alone isn't enough. The system still locks up unless
the dividers are limited to 1.

A hunch was if the PLL was stuck in some unknown state, perhaps
gating then ungating it would bring it back to normal. Tests
done by Icenowy Zheng using Ondrej's test firmware shows this
to be a valid solution.

[1] http://www.spinics.net/lists/arm-kernel/msg552501.html

Reported-by: Ondrej Jirman <megous@megous.com>
Signed-off-by: Chen-Yu Tsai <wens@csie.org>
Tested-by: Icenowy Zheng <icenowy@aosc.io>
Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com>
---
 drivers/clk/sunxi-ng/ccu_common.c | 49 +++++++++++++++++++++++++++++++++++++++
 drivers/clk/sunxi-ng/ccu_common.h | 12 ++++++++++
 2 files changed, 61 insertions(+)

Comments

Maxime Ripard April 13, 2017, 7:02 a.m. UTC | #1
Hi Chen-Yu,

On Thu, Apr 13, 2017 at 10:13:52AM +0800, Chen-Yu Tsai wrote:
> In common PLL designs, changes to the dividers take effect almost
> immediately, while changes to the multipliers (implemented as
> dividers in the feedback loop) take a few cycles to work into
> the feedback loop for the PLL to stablize.
> 
> Sometimes when the PLL clock rate is changed, the decrease in the
> divider is too much for the decrease in the multiplier to catch up.
> The PLL clock rate will spike, and in some cases, might lock up
> completely. This is especially the case if the divider changed is
> the pre-divider, which affects the reference frequency.
> 
> This patch introduces a clk notifier callback that will gate and
> then ungate a clk after a rate change, effectively resetting it,
> so it continues to work, despite any possible lockups. Care must
> be taken to reparent any consumers to other temporary clocks during
> the rate change, and that this notifier callback must be the first
> to be registered.
> 
> This is intended to fix occasional lockups with cpufreq on newer
> Allwinner SoCs, such as the A33 and the H3. Previously it was
> thought that reparenting the cpu clock away from the PLL while
> it stabilized was enough, as this worked quite well on the A31.
> 
> On the A33, hangs have been observed after cpufreq was recently
> introduced. With the H3, a more thorough test [1] showed that
> reparenting alone isn't enough. The system still locks up unless
> the dividers are limited to 1.
> 
> A hunch was if the PLL was stuck in some unknown state, perhaps
> gating then ungating it would bring it back to normal. Tests
> done by Icenowy Zheng using Ondrej's test firmware shows this
> to be a valid solution.
> 
> [1] http://www.spinics.net/lists/arm-kernel/msg552501.html
> 
> Reported-by: Ondrej Jirman <megous@megous.com>
> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
> Tested-by: Icenowy Zheng <icenowy@aosc.io>
> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com>

Thanks for looking into this, and coming up with a clean solution, and
a great commit log.

However, I wondering, isn't that notifier just a re-implementation of
CLK_SET_RATE_GATE?

Maxime
Chen-Yu Tsai April 13, 2017, 7:35 a.m. UTC | #2
On Thu, Apr 13, 2017 at 3:02 PM, Maxime Ripard
<maxime.ripard@free-electrons.com> wrote:
> Hi Chen-Yu,
>
> On Thu, Apr 13, 2017 at 10:13:52AM +0800, Chen-Yu Tsai wrote:
>> In common PLL designs, changes to the dividers take effect almost
>> immediately, while changes to the multipliers (implemented as
>> dividers in the feedback loop) take a few cycles to work into
>> the feedback loop for the PLL to stablize.
>>
>> Sometimes when the PLL clock rate is changed, the decrease in the
>> divider is too much for the decrease in the multiplier to catch up.
>> The PLL clock rate will spike, and in some cases, might lock up
>> completely. This is especially the case if the divider changed is
>> the pre-divider, which affects the reference frequency.
>>
>> This patch introduces a clk notifier callback that will gate and
>> then ungate a clk after a rate change, effectively resetting it,
>> so it continues to work, despite any possible lockups. Care must
>> be taken to reparent any consumers to other temporary clocks during
>> the rate change, and that this notifier callback must be the first
>> to be registered.
>>
>> This is intended to fix occasional lockups with cpufreq on newer
>> Allwinner SoCs, such as the A33 and the H3. Previously it was
>> thought that reparenting the cpu clock away from the PLL while
>> it stabilized was enough, as this worked quite well on the A31.
>>
>> On the A33, hangs have been observed after cpufreq was recently
>> introduced. With the H3, a more thorough test [1] showed that
>> reparenting alone isn't enough. The system still locks up unless
>> the dividers are limited to 1.
>>
>> A hunch was if the PLL was stuck in some unknown state, perhaps
>> gating then ungating it would bring it back to normal. Tests
>> done by Icenowy Zheng using Ondrej's test firmware shows this
>> to be a valid solution.
>>
>> [1] http://www.spinics.net/lists/arm-kernel/msg552501.html
>>
>> Reported-by: Ondrej Jirman <megous@megous.com>
>> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
>> Tested-by: Icenowy Zheng <icenowy@aosc.io>
>> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com>
>
> Thanks for looking into this, and coming up with a clean solution, and
> a great commit log.
>
> However, I wondering, isn't that notifier just a re-implementation of
> CLK_SET_RATE_GATE?

They are not the same. AFAIK, CLK_SET_RATE_GATE tells the clk framework
that this clk's rate cannot be changed if it is enabled (which means
some one is using it). However the clk framework does nothing to
actually handle it. It just returns an error. Any consumers are
responsible for gating the clock before making changes. This is a nice
thing to have, as it can prevent unintended changes to dot clocks or
audio clocks used with active output streams. We could consider setting
this for the audio and video PLLs.

Here we are dealing with the CPU PLL, which, for practical reasons,
is always enabled as far as the clk framework is concerned. The
reason being the OPPs are never low enough for the CPU clock to
use any other parent. To have it disabled, we would have to kick
consumers (the CPU clock in this case) to use other clocks, so it's
safe, remember which ones we kicked, and then bring them back once
everything is done.

AFAIK, we, samsung, rockchip, meson, do the temporary reparenting
using clk_notifiers to access the mux registers directly. As far
as the clk framework is concerned, nothing has changed.

I'm not saying it's not possible to support this in the core, but
the core already has to do a lot of bookkeeping and recalculation
when anything changes. Adding something transient into the process
isn't helping. And the reparenting might temporarily violate any
downstream requirements.

For now, I think clk notifiers is the easier solution for these
one off requirements that are pretty much contained in a small
part of the system.

Regards
ChenYu
--
To unsubscribe from this list: send the line "unsubscribe linux-clk" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Maxime Ripard April 13, 2017, 9:27 a.m. UTC | #3
On Thu, Apr 13, 2017 at 03:35:30PM +0800, Chen-Yu Tsai wrote:
> On Thu, Apr 13, 2017 at 3:02 PM, Maxime Ripard
> <maxime.ripard@free-electrons.com> wrote:
> > Hi Chen-Yu,
> >
> > On Thu, Apr 13, 2017 at 10:13:52AM +0800, Chen-Yu Tsai wrote:
> >> In common PLL designs, changes to the dividers take effect almost
> >> immediately, while changes to the multipliers (implemented as
> >> dividers in the feedback loop) take a few cycles to work into
> >> the feedback loop for the PLL to stablize.
> >>
> >> Sometimes when the PLL clock rate is changed, the decrease in the
> >> divider is too much for the decrease in the multiplier to catch up.
> >> The PLL clock rate will spike, and in some cases, might lock up
> >> completely. This is especially the case if the divider changed is
> >> the pre-divider, which affects the reference frequency.
> >>
> >> This patch introduces a clk notifier callback that will gate and
> >> then ungate a clk after a rate change, effectively resetting it,
> >> so it continues to work, despite any possible lockups. Care must
> >> be taken to reparent any consumers to other temporary clocks during
> >> the rate change, and that this notifier callback must be the first
> >> to be registered.
> >>
> >> This is intended to fix occasional lockups with cpufreq on newer
> >> Allwinner SoCs, such as the A33 and the H3. Previously it was
> >> thought that reparenting the cpu clock away from the PLL while
> >> it stabilized was enough, as this worked quite well on the A31.
> >>
> >> On the A33, hangs have been observed after cpufreq was recently
> >> introduced. With the H3, a more thorough test [1] showed that
> >> reparenting alone isn't enough. The system still locks up unless
> >> the dividers are limited to 1.
> >>
> >> A hunch was if the PLL was stuck in some unknown state, perhaps
> >> gating then ungating it would bring it back to normal. Tests
> >> done by Icenowy Zheng using Ondrej's test firmware shows this
> >> to be a valid solution.
> >>
> >> [1] http://www.spinics.net/lists/arm-kernel/msg552501.html
> >>
> >> Reported-by: Ondrej Jirman <megous@megous.com>
> >> Signed-off-by: Chen-Yu Tsai <wens@csie.org>
> >> Tested-by: Icenowy Zheng <icenowy@aosc.io>
> >> Tested-by: Quentin Schulz <quentin.schulz@free-electrons.com>
> >
> > Thanks for looking into this, and coming up with a clean solution, and
> > a great commit log.
> >
> > However, I wondering, isn't that notifier just a re-implementation of
> > CLK_SET_RATE_GATE?
> 
> They are not the same. AFAIK, CLK_SET_RATE_GATE tells the clk framework
> that this clk's rate cannot be changed if it is enabled (which means
> some one is using it). However the clk framework does nothing to
> actually handle it. It just returns an error. Any consumers are
> responsible for gating the clock before making changes. This is a nice
> thing to have, as it can prevent unintended changes to dot clocks or
> audio clocks used with active output streams. We could consider setting
> this for the audio and video PLLs.

Ah, you're right. I merged the two first patches and will send them
for 4.11.

> Here we are dealing with the CPU PLL, which, for practical reasons,
> is always enabled as far as the clk framework is concerned. The
> reason being the OPPs are never low enough for the CPU clock to
> use any other parent. To have it disabled, we would have to kick
> consumers (the CPU clock in this case) to use other clocks, so it's
> safe, remember which ones we kicked, and then bring them back once
> everything is done.
> 
> AFAIK, we, samsung, rockchip, meson, do the temporary reparenting
> using clk_notifiers to access the mux registers directly. As far
> as the clk framework is concerned, nothing has changed.
> 
> I'm not saying it's not possible to support this in the core, but
> the core already has to do a lot of bookkeeping and recalculation
> when anything changes. Adding something transient into the process
> isn't helping. And the reparenting might temporarily violate any
> downstream requirements.
> 
> For now, I think clk notifiers is the easier solution for these
> one off requirements that are pretty much contained in a small
> part of the system.

However, the third one is less urgent, since we don't have H3 cpufreq
support yet, so we won't hit that case, and I'd like to have first a
common function that register the notifiers since the order really
matters, we don't want to have someone getting it wrong.

Since this is 4.13 material, there's no rush on that one though.

Thanks again!
Maxime
diff mbox

Patch

diff --git a/drivers/clk/sunxi-ng/ccu_common.c b/drivers/clk/sunxi-ng/ccu_common.c
index 188fa50d0380..40aac316128f 100644
--- a/drivers/clk/sunxi-ng/ccu_common.c
+++ b/drivers/clk/sunxi-ng/ccu_common.c
@@ -14,11 +14,13 @@ 
  * GNU General Public License for more details.
  */
 
+#include <linux/clk.h>
 #include <linux/clk-provider.h>
 #include <linux/iopoll.h>
 #include <linux/slab.h>
 
 #include "ccu_common.h"
+#include "ccu_gate.h"
 #include "ccu_reset.h"
 
 static DEFINE_SPINLOCK(ccu_lock);
@@ -39,6 +41,53 @@  void ccu_helper_wait_for_lock(struct ccu_common *common, u32 lock)
 	WARN_ON(readl_relaxed_poll_timeout(addr, reg, reg & lock, 100, 70000));
 }
 
+/*
+ * This clock notifier is called when the frequency of a PLL clock is
+ * changed. In common PLL designs, changes to the dividers take effect
+ * almost immediately, while changes to the multipliers (implemented
+ * as dividers in the feedback loop) take a few cycles to work into
+ * the feedback loop for the PLL to stablize.
+ *
+ * Sometimes when the PLL clock rate is changed, the decrease in the
+ * divider is too much for the decrease in the multiplier to catch up.
+ * The PLL clock rate will spike, and in some cases, might lock up
+ * completely.
+ *
+ * This notifier callback will gate and then ungate the clock,
+ * effectively resetting it, so it proceeds to work. Care must be
+ * taken to reparent consumers to other temporary clocks during the
+ * rate change, and that this notifier callback must be the first
+ * to be registered.
+ */
+static int ccu_pll_notifier_cb(struct notifier_block *nb,
+			       unsigned long event, void *data)
+{
+	struct ccu_pll_nb *pll = to_ccu_pll_nb(nb);
+	int ret = 0;
+
+	if (event != POST_RATE_CHANGE)
+		goto out;
+
+	ccu_gate_helper_disable(pll->common, pll->enable);
+
+	ret = ccu_gate_helper_enable(pll->common, pll->enable);
+	if (ret)
+		goto out;
+
+	ccu_helper_wait_for_lock(pll->common, pll->lock);
+
+out:
+	return notifier_from_errno(ret);
+}
+
+int ccu_pll_notifier_register(struct ccu_pll_nb *pll_nb)
+{
+	pll_nb->clk_nb.notifier_call = ccu_pll_notifier_cb;
+
+	return clk_notifier_register(pll_nb->common->hw.clk,
+				     &pll_nb->clk_nb);
+}
+
 int sunxi_ccu_probe(struct device_node *node, void __iomem *reg,
 		    const struct sunxi_ccu_desc *desc)
 {
diff --git a/drivers/clk/sunxi-ng/ccu_common.h b/drivers/clk/sunxi-ng/ccu_common.h
index 73d81dc58fc5..d6fdd7a789aa 100644
--- a/drivers/clk/sunxi-ng/ccu_common.h
+++ b/drivers/clk/sunxi-ng/ccu_common.h
@@ -83,6 +83,18 @@  struct sunxi_ccu_desc {
 
 void ccu_helper_wait_for_lock(struct ccu_common *common, u32 lock);
 
+struct ccu_pll_nb {
+	struct notifier_block	clk_nb;
+	struct ccu_common	*common;
+
+	u32	enable;
+	u32	lock;
+};
+
+#define to_ccu_pll_nb(_nb) container_of(_nb, struct ccu_pll_nb, clk_nb)
+
+int ccu_pll_notifier_register(struct ccu_pll_nb *pll_nb);
+
 int sunxi_ccu_probe(struct device_node *node, void __iomem *reg,
 		    const struct sunxi_ccu_desc *desc);