mbox series

[v3,0/5] RK3588 and Rock 5B dts additions: thermal, OPP and fan

Message ID 20240229-rk-dts-additions-v3-0-6afe8473a631@gmail.com (mailing list archive)
Headers show
Series RK3588 and Rock 5B dts additions: thermal, OPP and fan | expand

Message

Alexey Charkov Feb. 29, 2024, 7:26 p.m. UTC
This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
active cooling on Radxa Rock 5B via the provided PWM fan.

Some RK3588 boards use separate regulators to supply CPUs and their
respective memory interfaces, so this is handled by coupling those
regulators in affected boards' device trees to ensure that their
voltage is adjusted in step.

In this revision of the series I chose to enable TSADC for all boards
at .dtsi level, because:
 - The defaults already in .dtsi should work for all users, given that
   the CRU based resets don't need any out-of-chip components, and
   the CRU vs. PMIC reset is pretty much the only thing a board might
   have to configure / override there
 - The boards that have TSADC_SHUT signal wired to the PMIC reset line
   can still choose to override the reset logic in their .dts. Or stay
   with CRU based resets, as downstream kernels do anyway
 - The on-by-default approach helps ensure thermal protections are in
   place (emergency reset and throttling) for any board even with a
   rudimentary .dts, and thus lets us introduce CPU DVFS with better
   peace of mind

Fan control on Rock 5B has been split into two intervals: let it spin
at the minimum cooling state between 55C and 65C, and then accelerate
if the system crosses the 65C mark - thanks to Dragan for suggesting.
This lets some cooling setups with beefier heatsinks and/or larger
fan fins to stay in the quietest non-zero fan state while still
gaining potential benefits from the airflow it generates, and
possibly avoiding noisy speeds altogether for some workloads.

OPPs help actually scale CPU frequencies up and down for both cooling
and performance - tested on Rock 5B under varied loads. I've split
the patch into two parts: the first containing those OPPs that seem
to be no-regret with general consensus during v1 review [2], while
the second contains OPPs that cause frequency reductions without
accompanying decrease in CPU voltage. There seems to be a slight
performance gain in some workload scenarios when using these, but
previous discussion was inconclusive as to whether they should be
included or not. Having them as separate patches enables easier
comparison and partial reversion if people want to test it under
their workloads, and also enables the first 'no-regret' part to be
merged to -next while the jury is still out on the second one.

[1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
[2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c

Signed-off-by: Alexey Charkov <alchark@gmail.com>
---
Changes in v3:
- Added regulator coupling for EVB1 and QuartzPro64
- Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
- Added comments regarding two passive cooling trips in each zone (thanks Dragan)
- Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
- Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
  churn there since the version he acknowledged
- Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com

Changes in v2:
- Dropped the rfkill patch which Heiko has already applied
- Set higher 'polling-delay-passive' (100 instead of 20)
- Name all cooling maps starting from map0 in each respective zone
- Drop 'contribution' properties from passive cooling maps
- Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com

---
Alexey Charkov (5):
      arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
      arm64: dts: rockchip: enable automatic active cooling on Rock 5B
      arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
      arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
      arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs

 arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
 .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
 arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
 arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
 4 files changed, 437 insertions(+), 2 deletions(-)
---
base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
change-id: 20240124-rk-dts-additions-a6d7b52787b9

Best regards,

Comments

Sebastian Reichel March 4, 2024, 5:50 p.m. UTC | #1
Hi,

On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> active cooling on Radxa Rock 5B via the provided PWM fan.
> 
> Some RK3588 boards use separate regulators to supply CPUs and their
> respective memory interfaces, so this is handled by coupling those
> regulators in affected boards' device trees to ensure that their
> voltage is adjusted in step.
> 
> In this revision of the series I chose to enable TSADC for all boards
> at .dtsi level, because:
>  - The defaults already in .dtsi should work for all users, given that
>    the CRU based resets don't need any out-of-chip components, and
>    the CRU vs. PMIC reset is pretty much the only thing a board might
>    have to configure / override there
>  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
>    can still choose to override the reset logic in their .dts. Or stay
>    with CRU based resets, as downstream kernels do anyway
>  - The on-by-default approach helps ensure thermal protections are in
>    place (emergency reset and throttling) for any board even with a
>    rudimentary .dts, and thus lets us introduce CPU DVFS with better
>    peace of mind
> 
> Fan control on Rock 5B has been split into two intervals: let it spin
> at the minimum cooling state between 55C and 65C, and then accelerate
> if the system crosses the 65C mark - thanks to Dragan for suggesting.
> This lets some cooling setups with beefier heatsinks and/or larger
> fan fins to stay in the quietest non-zero fan state while still
> gaining potential benefits from the airflow it generates, and
> possibly avoiding noisy speeds altogether for some workloads.
> 
> OPPs help actually scale CPU frequencies up and down for both cooling
> and performance - tested on Rock 5B under varied loads. I've split
> the patch into two parts: the first containing those OPPs that seem
> to be no-regret with general consensus during v1 review [2], while
> the second contains OPPs that cause frequency reductions without
> accompanying decrease in CPU voltage. There seems to be a slight
> performance gain in some workload scenarios when using these, but
> previous discussion was inconclusive as to whether they should be
> included or not. Having them as separate patches enables easier
> comparison and partial reversion if people want to test it under
> their workloads, and also enables the first 'no-regret' part to be
> merged to -next while the jury is still out on the second one.
> 
> [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> 
> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> ---
> Changes in v3:
> - Added regulator coupling for EVB1 and QuartzPro64
> - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
>   churn there since the version he acknowledged
> - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
> 
> Changes in v2:
> - Dropped the rfkill patch which Heiko has already applied
> - Set higher 'polling-delay-passive' (100 instead of 20)
> - Name all cooling maps starting from map0 in each respective zone
> - Drop 'contribution' properties from passive cooling maps
> - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
> 
> ---
> Alexey Charkov (5):
>       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> 
>  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
>  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
>  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
>  4 files changed, 437 insertions(+), 2 deletions(-)

I'm too busy to have a detailed review of this series right now, but
I pushed it to our CI and it results in a board reset at boot time:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950

I also pushed just the first three patches (i.e. without OPP /
cpufreq) and that boots fine:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953

Note, that OPP / cpufreq works on the same boards in the CI when
using the ugly-and-not-for-upstream cpufreq driver:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd

My best guess right now is, that this is related to the generic
driver obviously not updating the GRF read margin registers.

Greetings,

-- Sebastian
Alexey Charkov March 5, 2024, 8:06 a.m. UTC | #2
Hi Sebastian!

On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
<sebastian.reichel@collabora.com> wrote:
>
> Hi,
>
> On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > active cooling on Radxa Rock 5B via the provided PWM fan.
> >
> > Some RK3588 boards use separate regulators to supply CPUs and their
> > respective memory interfaces, so this is handled by coupling those
> > regulators in affected boards' device trees to ensure that their
> > voltage is adjusted in step.
> >
> > In this revision of the series I chose to enable TSADC for all boards
> > at .dtsi level, because:
> >  - The defaults already in .dtsi should work for all users, given that
> >    the CRU based resets don't need any out-of-chip components, and
> >    the CRU vs. PMIC reset is pretty much the only thing a board might
> >    have to configure / override there
> >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> >    can still choose to override the reset logic in their .dts. Or stay
> >    with CRU based resets, as downstream kernels do anyway
> >  - The on-by-default approach helps ensure thermal protections are in
> >    place (emergency reset and throttling) for any board even with a
> >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
> >    peace of mind
> >
> > Fan control on Rock 5B has been split into two intervals: let it spin
> > at the minimum cooling state between 55C and 65C, and then accelerate
> > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > This lets some cooling setups with beefier heatsinks and/or larger
> > fan fins to stay in the quietest non-zero fan state while still
> > gaining potential benefits from the airflow it generates, and
> > possibly avoiding noisy speeds altogether for some workloads.
> >
> > OPPs help actually scale CPU frequencies up and down for both cooling
> > and performance - tested on Rock 5B under varied loads. I've split
> > the patch into two parts: the first containing those OPPs that seem
> > to be no-regret with general consensus during v1 review [2], while
> > the second contains OPPs that cause frequency reductions without
> > accompanying decrease in CPU voltage. There seems to be a slight
> > performance gain in some workload scenarios when using these, but
> > previous discussion was inconclusive as to whether they should be
> > included or not. Having them as separate patches enables easier
> > comparison and partial reversion if people want to test it under
> > their workloads, and also enables the first 'no-regret' part to be
> > merged to -next while the jury is still out on the second one.
> >
> > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> >
> > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > ---
> > Changes in v3:
> > - Added regulator coupling for EVB1 and QuartzPro64
> > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> >   churn there since the version he acknowledged
> > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
> >
> > Changes in v2:
> > - Dropped the rfkill patch which Heiko has already applied
> > - Set higher 'polling-delay-passive' (100 instead of 20)
> > - Name all cooling maps starting from map0 in each respective zone
> > - Drop 'contribution' properties from passive cooling maps
> > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
> >
> > ---
> > Alexey Charkov (5):
> >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> >
> >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
> >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
> >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
> >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
> >  4 files changed, 437 insertions(+), 2 deletions(-)
>
> I'm too busy to have a detailed review of this series right now, but
> I pushed it to our CI and it results in a board reset at boot time:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>
> I also pushed just the first three patches (i.e. without OPP /
> cpufreq) and that boots fine:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953

Thank you for testing these! I've noticed in the boot log that the CI
machine uses some u-boot 2023.07 - is that a downstream one? Any
chance to compare it to 2023.11 or 2024.01 from your (Collabora)
integration tree?

I use 2023.11 from your integration tree, with a binary bl31, and I'm
not getting those resets even under prolonged heavy load (I rebuild
Chromium with 8 concurrent compilation jobs as the stress test -
that's 14 hours of heavy CPU, memory and IO use). Would be interesting
to understand if it's just a 'lucky' SoC specimen on my side, or if
there is some dark magic happening differently on my machine vs. your
CI machine.

Thinking that maybe if your CI machine uses a downstream u-boot it
might be leaving some extra hardware running (PVTM?) which might do
weird stuff when TSADC/clocks/voltages get readjusted by the generic
cpufreq driver?..

> Note, that OPP / cpufreq works on the same boards in the CI when
> using the ugly-and-not-for-upstream cpufreq driver:
>
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>
> My best guess right now is, that this is related to the generic
> driver obviously not updating the GRF read margin registers.

If it was about memory read margins I believe I would have been
unlikely to get my machine to work reliably under heavy load with the
default ones, but who knows...

Best regards,
Alexey
Alexey Charkov March 7, 2024, 12:38 p.m. UTC | #3
On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> wrote:
>
> Hi Sebastian!
>
> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> <sebastian.reichel@collabora.com> wrote:
> >
> > Hi,
> >
> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > >
> > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > respective memory interfaces, so this is handled by coupling those
> > > regulators in affected boards' device trees to ensure that their
> > > voltage is adjusted in step.
> > >
> > > In this revision of the series I chose to enable TSADC for all boards
> > > at .dtsi level, because:
> > >  - The defaults already in .dtsi should work for all users, given that
> > >    the CRU based resets don't need any out-of-chip components, and
> > >    the CRU vs. PMIC reset is pretty much the only thing a board might
> > >    have to configure / override there
> > >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > >    can still choose to override the reset logic in their .dts. Or stay
> > >    with CRU based resets, as downstream kernels do anyway
> > >  - The on-by-default approach helps ensure thermal protections are in
> > >    place (emergency reset and throttling) for any board even with a
> > >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > >    peace of mind
> > >
> > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > This lets some cooling setups with beefier heatsinks and/or larger
> > > fan fins to stay in the quietest non-zero fan state while still
> > > gaining potential benefits from the airflow it generates, and
> > > possibly avoiding noisy speeds altogether for some workloads.
> > >
> > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > and performance - tested on Rock 5B under varied loads. I've split
> > > the patch into two parts: the first containing those OPPs that seem
> > > to be no-regret with general consensus during v1 review [2], while
> > > the second contains OPPs that cause frequency reductions without
> > > accompanying decrease in CPU voltage. There seems to be a slight
> > > performance gain in some workload scenarios when using these, but
> > > previous discussion was inconclusive as to whether they should be
> > > included or not. Having them as separate patches enables easier
> > > comparison and partial reversion if people want to test it under
> > > their workloads, and also enables the first 'no-regret' part to be
> > > merged to -next while the jury is still out on the second one.
> > >
> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > >
> > > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > > ---
> > > Changes in v3:
> > > - Added regulator coupling for EVB1 and QuartzPro64
> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > >   churn there since the version he acknowledged
> > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
> > >
> > > Changes in v2:
> > > - Dropped the rfkill patch which Heiko has already applied
> > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > - Name all cooling maps starting from map0 in each respective zone
> > > - Drop 'contribution' properties from passive cooling maps
> > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
> > >
> > > ---
> > > Alexey Charkov (5):
> > >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > >
> > >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
> > >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
> > >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
> > >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
> > >  4 files changed, 437 insertions(+), 2 deletions(-)
> >
> > I'm too busy to have a detailed review of this series right now, but
> > I pushed it to our CI and it results in a board reset at boot time:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> >
> > I also pushed just the first three patches (i.e. without OPP /
> > cpufreq) and that boots fine:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>
> Thank you for testing these! I've noticed in the boot log that the CI
> machine uses some u-boot 2023.07 - is that a downstream one? Any
> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> integration tree?
>
> I use 2023.11 from your integration tree, with a binary bl31, and I'm
> not getting those resets even under prolonged heavy load (I rebuild
> Chromium with 8 concurrent compilation jobs as the stress test -
> that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> to understand if it's just a 'lucky' SoC specimen on my side, or if
> there is some dark magic happening differently on my machine vs. your
> CI machine.
>
> Thinking that maybe if your CI machine uses a downstream u-boot it
> might be leaving some extra hardware running (PVTM?) which might do
> weird stuff when TSADC/clocks/voltages get readjusted by the generic
> cpufreq driver?..
>
> > Note, that OPP / cpufreq works on the same boards in the CI when
> > using the ugly-and-not-for-upstream cpufreq driver:
> >
> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> >
> > My best guess right now is, that this is related to the generic
> > driver obviously not updating the GRF read margin registers.
>
> If it was about memory read margins I believe I would have been
> unlikely to get my machine to work reliably under heavy load with the
> default ones, but who knows...

Sebastian's report led me to investigate further how all those things
are organized in the downstream code and in hardware, and what could
be a pragmatic way forward with upstream enablement. It turned out to
be quite a rabbit hole frankly, with multiple layers of abstraction
and intertwined code in different places.

Here's a quick summary for future reference:
 - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
which provides an SCMI service to expose them to the kernel
 - ATF itself doesn't directly set any clock frequencies. Instead, it
accepts a target frequency via SCMI and converts it into an oscillator
ring length setting for the PVPLL hardware block (via a fixed table
lookup). At least that's how it's done in the recently released TF-A
bl31 code [1] - perhaps the binary bl31 does something similar
 - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
 - PVPLL produces a reference clock to feed to the CPUs, which depends
on the configured oscillator ring length but also on the supply
voltage, silicon quality and perhaps temperature too. ATF doesn't know
anything about voltages or temperatures, so it doesn't guarantee that
the requested frequency is matched by the hardware
 - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
which case the target frequency is directly fed by the ATF to the CRU.
This happens for both big-core and little-core frequencies below 816
MHz
 - Given that requesting a particular frequency via SCMI doesn't
guarantee that it will be what the CPUs end up running at, the vendor
kernel also does a runtime voltage calibration for the supply
regulators, by adjusting the supply voltage in minimum regulator steps
until the frequency reported by PVPLL gets close to the requested one
[2]. It then overwrites OPP provided voltage values with the
calibrated ones
 - There's also some trickery with preselecting OPP voltage sets using
the "-Lx" suffix based on silicon quality, as measured by a "leakage"
value stored in an NVMEM cell and/or the PVTM frequency generated at a
reference "midpoint" OPP [3]. Better performing silicon gets to run at
lower default supply voltages, thus saving power
 - Once the OPPs are selected and calibrated, the only remaining
trickery is the two supply regulators per each CPU cluster (one for
the CPUs and the other for the memory interface)
 - Another catch, as Sebastian points out, is that memory read margins
must be adjusted whenever the memory interface supply voltage crosses
certain thresholds [4]. This has little to do with CPUs or
frequencies, and is only tangentially related to them due to the
dependency chain between the target CPU frequency -> required CPU
supply voltage -> matching memory interface supply voltage -> required
read margins
 - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
setting it to anything in kernel code (as the downstream driver does)
seems redundant

All in all, it does indeed sound like Collabora's CI machine boot-time
resets are most likely caused by the missing memory read margin
settings in my patch series. Voltage values in the OPPs I used are the
most conservative defaults of what the downstream DT has, and PVPLL
should be able to generate reasonable clock speeds with those (albeit
likely suboptimal, due to them not being tuned to the particular
silicon specimen). And there is little else to differ frankly.

As for the way forward, it would be great to know the opinions from
the list. My thinking is as follows:
 - I can introduce memory read margin updates as the first priority,
leaving voltage calibration and/or OPP preselection for later (as
those should not affect system stability at current default values,
perhaps only power efficiency to a certain extent)
 - CPUfreq doesn't sound like the right place for those, given that
they have little to do with either CPU or freq :)
 - I suggest a custom regulator config helper to plug into the OPP
layer, as is done for TI OMAP5 [6]. At first, it might be only used
for looking up and setting the correct memory read margin value
whenever the cluster supply voltage changes, and later the same code
can be extended to do voltage calibration. In fact, OMAP code is there
for a very similar purpose, but in their case optimized voltages are
pre-programmed in efuses and don't require runtime recalibration
 - Given that all OPPs in the downstream kernel list identical
voltages for the memory supply as for the CPU supply, I don't think it
makes much sense to customize the cpufreq driver per se.
Single-regulator approach with the generic cpufreq-dt and regulator
coupling sounds much less invasive and thus lower-maintenance

Best regards,
Alexey

[1] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
[2] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
[3] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
[4] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
[5] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
[6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275
Dragan Simic March 7, 2024, 2:21 p.m. UTC | #4
Hello Alexey,

On 2024-03-07 13:38, Alexey Charkov wrote:
> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> 
> wrote:
>> 
>> Hi Sebastian!
>> 
>> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>> <sebastian.reichel@collabora.com> wrote:
>> >
>> > Hi,
>> >
>> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
>> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>> > > active cooling on Radxa Rock 5B via the provided PWM fan.
>> > >
>> > > Some RK3588 boards use separate regulators to supply CPUs and their
>> > > respective memory interfaces, so this is handled by coupling those
>> > > regulators in affected boards' device trees to ensure that their
>> > > voltage is adjusted in step.
>> > >
>> > > In this revision of the series I chose to enable TSADC for all boards
>> > > at .dtsi level, because:
>> > >  - The defaults already in .dtsi should work for all users, given that
>> > >    the CRU based resets don't need any out-of-chip components, and
>> > >    the CRU vs. PMIC reset is pretty much the only thing a board might
>> > >    have to configure / override there
>> > >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
>> > >    can still choose to override the reset logic in their .dts. Or stay
>> > >    with CRU based resets, as downstream kernels do anyway
>> > >  - The on-by-default approach helps ensure thermal protections are in
>> > >    place (emergency reset and throttling) for any board even with a
>> > >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
>> > >    peace of mind
>> > >
>> > > Fan control on Rock 5B has been split into two intervals: let it spin
>> > > at the minimum cooling state between 55C and 65C, and then accelerate
>> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
>> > > This lets some cooling setups with beefier heatsinks and/or larger
>> > > fan fins to stay in the quietest non-zero fan state while still
>> > > gaining potential benefits from the airflow it generates, and
>> > > possibly avoiding noisy speeds altogether for some workloads.
>> > >
>> > > OPPs help actually scale CPU frequencies up and down for both cooling
>> > > and performance - tested on Rock 5B under varied loads. I've split
>> > > the patch into two parts: the first containing those OPPs that seem
>> > > to be no-regret with general consensus during v1 review [2], while
>> > > the second contains OPPs that cause frequency reductions without
>> > > accompanying decrease in CPU voltage. There seems to be a slight
>> > > performance gain in some workload scenarios when using these, but
>> > > previous discussion was inconclusive as to whether they should be
>> > > included or not. Having them as separate patches enables easier
>> > > comparison and partial reversion if people want to test it under
>> > > their workloads, and also enables the first 'no-regret' part to be
>> > > merged to -next while the jury is still out on the second one.
>> > >
>> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
>> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
>> > >
>> > > Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> > > ---
>> > > Changes in v3:
>> > > - Added regulator coupling for EVB1 and QuartzPro64
>> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
>> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
>> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
>> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
>> > >   churn there since the version he acknowledged
>> > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
>> > >
>> > > Changes in v2:
>> > > - Dropped the rfkill patch which Heiko has already applied
>> > > - Set higher 'polling-delay-passive' (100 instead of 20)
>> > > - Name all cooling maps starting from map0 in each respective zone
>> > > - Drop 'contribution' properties from passive cooling maps
>> > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
>> > >
>> > > ---
>> > > Alexey Charkov (5):
>> > >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>> > >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>> > >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>> > >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>> > >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>> > >
>> > >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
>> > >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
>> > >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
>> > >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
>> > >  4 files changed, 437 insertions(+), 2 deletions(-)
>> >
>> > I'm too busy to have a detailed review of this series right now, but
>> > I pushed it to our CI and it results in a board reset at boot time:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>> >
>> > I also pushed just the first three patches (i.e. without OPP /
>> > cpufreq) and that boots fine:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>> 
>> Thank you for testing these! I've noticed in the boot log that the CI
>> machine uses some u-boot 2023.07 - is that a downstream one? Any
>> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>> integration tree?
>> 
>> I use 2023.11 from your integration tree, with a binary bl31, and I'm
>> not getting those resets even under prolonged heavy load (I rebuild
>> Chromium with 8 concurrent compilation jobs as the stress test -
>> that's 14 hours of heavy CPU, memory and IO use). Would be interesting
>> to understand if it's just a 'lucky' SoC specimen on my side, or if
>> there is some dark magic happening differently on my machine vs. your
>> CI machine.
>> 
>> Thinking that maybe if your CI machine uses a downstream u-boot it
>> might be leaving some extra hardware running (PVTM?) which might do
>> weird stuff when TSADC/clocks/voltages get readjusted by the generic
>> cpufreq driver?..
>> 
>> > Note, that OPP / cpufreq works on the same boards in the CI when
>> > using the ugly-and-not-for-upstream cpufreq driver:
>> >
>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>> >
>> > My best guess right now is, that this is related to the generic
>> > driver obviously not updating the GRF read margin registers.
>> 
>> If it was about memory read margins I believe I would have been
>> unlikely to get my machine to work reliably under heavy load with the
>> default ones, but who knows...
> 
> Sebastian's report led me to investigate further how all those things
> are organized in the downstream code and in hardware, and what could
> be a pragmatic way forward with upstream enablement. It turned out to
> be quite a rabbit hole frankly, with multiple layers of abstraction
> and intertwined code in different places.
> 
> Here's a quick summary for future reference:
>  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> which provides an SCMI service to expose them to the kernel
>  - ATF itself doesn't directly set any clock frequencies. Instead, it
> accepts a target frequency via SCMI and converts it into an oscillator
> ring length setting for the PVPLL hardware block (via a fixed table
> lookup). At least that's how it's done in the recently released TF-A
> bl31 code [1] - perhaps the binary bl31 does something similar
>  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>  - PVPLL produces a reference clock to feed to the CPUs, which depends
> on the configured oscillator ring length but also on the supply
> voltage, silicon quality and perhaps temperature too. ATF doesn't know
> anything about voltages or temperatures, so it doesn't guarantee that
> the requested frequency is matched by the hardware
>  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> which case the target frequency is directly fed by the ATF to the CRU.
> This happens for both big-core and little-core frequencies below 816
> MHz
>  - Given that requesting a particular frequency via SCMI doesn't
> guarantee that it will be what the CPUs end up running at, the vendor
> kernel also does a runtime voltage calibration for the supply
> regulators, by adjusting the supply voltage in minimum regulator steps
> until the frequency reported by PVPLL gets close to the requested one
> [2]. It then overwrites OPP provided voltage values with the
> calibrated ones
>  - There's also some trickery with preselecting OPP voltage sets using
> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> value stored in an NVMEM cell and/or the PVTM frequency generated at a
> reference "midpoint" OPP [3]. Better performing silicon gets to run at
> lower default supply voltages, thus saving power
>  - Once the OPPs are selected and calibrated, the only remaining
> trickery is the two supply regulators per each CPU cluster (one for
> the CPUs and the other for the memory interface)
>  - Another catch, as Sebastian points out, is that memory read margins
> must be adjusted whenever the memory interface supply voltage crosses
> certain thresholds [4]. This has little to do with CPUs or
> frequencies, and is only tangentially related to them due to the
> dependency chain between the target CPU frequency -> required CPU
> supply voltage -> matching memory interface supply voltage -> required
> read margins
>  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> setting it to anything in kernel code (as the downstream driver does)
> seems redundant
> 
> All in all, it does indeed sound like Collabora's CI machine boot-time
> resets are most likely caused by the missing memory read margin
> settings in my patch series. Voltage values in the OPPs I used are the
> most conservative defaults of what the downstream DT has, and PVPLL
> should be able to generate reasonable clock speeds with those (albeit
> likely suboptimal, due to them not being tuned to the particular
> silicon specimen). And there is little else to differ frankly.
> 
> As for the way forward, it would be great to know the opinions from
> the list. My thinking is as follows:
>  - I can introduce memory read margin updates as the first priority,
> leaving voltage calibration and/or OPP preselection for later (as
> those should not affect system stability at current default values,
> perhaps only power efficiency to a certain extent)
>  - CPUfreq doesn't sound like the right place for those, given that
> they have little to do with either CPU or freq :)
>  - I suggest a custom regulator config helper to plug into the OPP
> layer, as is done for TI OMAP5 [6]. At first, it might be only used
> for looking up and setting the correct memory read margin value
> whenever the cluster supply voltage changes, and later the same code
> can be extended to do voltage calibration. In fact, OMAP code is there
> for a very similar purpose, but in their case optimized voltages are
> pre-programmed in efuses and don't require runtime recalibration
>  - Given that all OPPs in the downstream kernel list identical
> voltages for the memory supply as for the CPU supply, I don't think it
> makes much sense to customize the cpufreq driver per se.
> Single-regulator approach with the generic cpufreq-dt and regulator
> coupling sounds much less invasive and thus lower-maintenance

Thank you very much for a detailed and highly useful summary!

I'll retrace your steps into and, hopefully, out of the rabbit hole. :)
After that, I'll come back with an update.

> [1] 
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
> [2] 
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
> [3] 
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
> [4] 
> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
> [5] 
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
> [6] 
> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275
Sebastian Reichel March 7, 2024, 10:16 p.m. UTC | #5
Hi,

On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> wrote:
> > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> > <sebastian.reichel@collabora.com> wrote:
> > > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > > >
> > > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > > respective memory interfaces, so this is handled by coupling those
> > > > regulators in affected boards' device trees to ensure that their
> > > > voltage is adjusted in step.
> > > >
> > > > In this revision of the series I chose to enable TSADC for all boards
> > > > at .dtsi level, because:
> > > >  - The defaults already in .dtsi should work for all users, given that
> > > >    the CRU based resets don't need any out-of-chip components, and
> > > >    the CRU vs. PMIC reset is pretty much the only thing a board might
> > > >    have to configure / override there
> > > >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > > >    can still choose to override the reset logic in their .dts. Or stay
> > > >    with CRU based resets, as downstream kernels do anyway
> > > >  - The on-by-default approach helps ensure thermal protections are in
> > > >    place (emergency reset and throttling) for any board even with a
> > > >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > > >    peace of mind
> > > >
> > > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > > This lets some cooling setups with beefier heatsinks and/or larger
> > > > fan fins to stay in the quietest non-zero fan state while still
> > > > gaining potential benefits from the airflow it generates, and
> > > > possibly avoiding noisy speeds altogether for some workloads.
> > > >
> > > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > > and performance - tested on Rock 5B under varied loads. I've split
> > > > the patch into two parts: the first containing those OPPs that seem
> > > > to be no-regret with general consensus during v1 review [2], while
> > > > the second contains OPPs that cause frequency reductions without
> > > > accompanying decrease in CPU voltage. There seems to be a slight
> > > > performance gain in some workload scenarios when using these, but
> > > > previous discussion was inconclusive as to whether they should be
> > > > included or not. Having them as separate patches enables easier
> > > > comparison and partial reversion if people want to test it under
> > > > their workloads, and also enables the first 'no-regret' part to be
> > > > merged to -next while the jury is still out on the second one.
> > > >
> > > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > > >
> > > > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > > > ---
> > > > Changes in v3:
> > > > - Added regulator coupling for EVB1 and QuartzPro64
> > > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > > >   churn there since the version he acknowledged
> > > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
> > > >
> > > > Changes in v2:
> > > > - Dropped the rfkill patch which Heiko has already applied
> > > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > > - Name all cooling maps starting from map0 in each respective zone
> > > > - Drop 'contribution' properties from passive cooling maps
> > > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
> > > >
> > > > ---
> > > > Alexey Charkov (5):
> > > >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > > >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > > >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > > >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > > >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > > >
> > > >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
> > > >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
> > > >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
> > > >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
> > > >  4 files changed, 437 insertions(+), 2 deletions(-)
> > >
> > > I'm too busy to have a detailed review of this series right now, but
> > > I pushed it to our CI and it results in a board reset at boot time:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> > >
> > > I also pushed just the first three patches (i.e. without OPP /
> > > cpufreq) and that boots fine:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
> >
> > Thank you for testing these! I've noticed in the boot log that the CI
> > machine uses some u-boot 2023.07 - is that a downstream one? Any
> > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> > integration tree?
> >
> > I use 2023.11 from your integration tree, with a binary bl31, and I'm
> > not getting those resets even under prolonged heavy load (I rebuild
> > Chromium with 8 concurrent compilation jobs as the stress test -
> > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> > to understand if it's just a 'lucky' SoC specimen on my side, or if
> > there is some dark magic happening differently on my machine vs. your
> > CI machine.
> >
> > Thinking that maybe if your CI machine uses a downstream u-boot it
> > might be leaving some extra hardware running (PVTM?) which might do
> > weird stuff when TSADC/clocks/voltages get readjusted by the generic
> > cpufreq driver?..
> >
> > > Note, that OPP / cpufreq works on the same boards in the CI when
> > > using the ugly-and-not-for-upstream cpufreq driver:
> > >
> > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> > >
> > > My best guess right now is, that this is related to the generic
> > > driver obviously not updating the GRF read margin registers.
> >
> > If it was about memory read margins I believe I would have been
> > unlikely to get my machine to work reliably under heavy load with the
> > default ones, but who knows...
> 
> Sebastian's report led me to investigate further how all those things
> are organized in the downstream code and in hardware, and what could
> be a pragmatic way forward with upstream enablement. It turned out to
> be quite a rabbit hole frankly, with multiple layers of abstraction
> and intertwined code in different places.
> 
> Here's a quick summary for future reference:
>  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> which provides an SCMI service to expose them to the kernel
>  - ATF itself doesn't directly set any clock frequencies. Instead, it
> accepts a target frequency via SCMI and converts it into an oscillator
> ring length setting for the PVPLL hardware block (via a fixed table
> lookup). At least that's how it's done in the recently released TF-A
> bl31 code [1] - perhaps the binary bl31 does something similar
>  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>  - PVPLL produces a reference clock to feed to the CPUs, which depends
> on the configured oscillator ring length but also on the supply
> voltage, silicon quality and perhaps temperature too. ATF doesn't know
> anything about voltages or temperatures, so it doesn't guarantee that
> the requested frequency is matched by the hardware
>  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> which case the target frequency is directly fed by the ATF to the CRU.
> This happens for both big-core and little-core frequencies below 816
> MHz
>  - Given that requesting a particular frequency via SCMI doesn't
> guarantee that it will be what the CPUs end up running at, the vendor
> kernel also does a runtime voltage calibration for the supply
> regulators, by adjusting the supply voltage in minimum regulator steps
> until the frequency reported by PVPLL gets close to the requested one
> [2]. It then overwrites OPP provided voltage values with the
> calibrated ones
>  - There's also some trickery with preselecting OPP voltage sets using
> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> value stored in an NVMEM cell and/or the PVTM frequency generated at a
> reference "midpoint" OPP [3]. Better performing silicon gets to run at
> lower default supply voltages, thus saving power
>  - Once the OPPs are selected and calibrated, the only remaining
> trickery is the two supply regulators per each CPU cluster (one for
> the CPUs and the other for the memory interface)
>  - Another catch, as Sebastian points out, is that memory read margins
> must be adjusted whenever the memory interface supply voltage crosses
> certain thresholds [4]. This has little to do with CPUs or
> frequencies, and is only tangentially related to them due to the
> dependency chain between the target CPU frequency -> required CPU
> supply voltage -> matching memory interface supply voltage -> required
> read margins
>  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> setting it to anything in kernel code (as the downstream driver does)
> seems redundant
> 
> All in all, it does indeed sound like Collabora's CI machine boot-time
> resets are most likely caused by the missing memory read margin
> settings in my patch series. Voltage values in the OPPs I used are the
> most conservative defaults of what the downstream DT has, and PVPLL
> should be able to generate reasonable clock speeds with those (albeit
> likely suboptimal, due to them not being tuned to the particular
> silicon specimen). And there is little else to differ frankly.
> 
> As for the way forward, it would be great to know the opinions from
> the list. My thinking is as follows:
>  - I can introduce memory read margin updates as the first priority,
> leaving voltage calibration and/or OPP preselection for later (as
> those should not affect system stability at current default values,
> perhaps only power efficiency to a certain extent)
>  - CPUfreq doesn't sound like the right place for those, given that
> they have little to do with either CPU or freq :)
>  - I suggest a custom regulator config helper to plug into the OPP
> layer, as is done for TI OMAP5 [6]. At first, it might be only used
> for looking up and setting the correct memory read margin value
> whenever the cluster supply voltage changes, and later the same code
> can be extended to do voltage calibration. In fact, OMAP code is there
> for a very similar purpose, but in their case optimized voltages are
> pre-programmed in efuses and don't require runtime recalibration
>  - Given that all OPPs in the downstream kernel list identical
> voltages for the memory supply as for the CPU supply, I don't think it
> makes much sense to customize the cpufreq driver per se.
> Single-regulator approach with the generic cpufreq-dt and regulator
> coupling sounds much less invasive and thus lower-maintenance

Sorry for my late response.

When doing some more tests I noticed, that the CI never build the
custom driver and thus never did any CPU frequency scaling at all.
I only used it for my own tests (on RK3588 EVB1). When enabling the
custom driver, the CI has the same issues as your series. So my
message was completely wrong, sorry about that.

Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
the last part is the git hash. This is the exact U-Boot source tree
being used:

https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/

This was one of the first U-Boot trees with Rock 5B Ethernet support
and is currently flashed to the SPI flash memory of the CI boards.
The vendor U-Boot tree is a lot older. Also it is still using the
Rockchip binary BL31. We have plans to also CI boot test U-Boot,
but currently nobody has time to work on this. I don't think there should
be any relevant changes between upstream 2023.07 and 2023.11 that could
explain this. But it's the best lead now, so I will try to find some time
for doing further tests related to this in the next days.

Regarding the voltage calibration - One option would be to do this
calibration at boot time (i.e. in U-Boot) and update the voltages
in DT accordingly.

Greetings,

-- Sebastian

> Best regards,
> Alexey
> 
> [1] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
> [2] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
> [3] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
> [4] https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
> [5] https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
> [6] https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275
Dragan Simic March 11, 2024, 7:08 a.m. UTC | #6
Hello Alexey,

On 2024-03-07 15:21, Dragan Simic wrote:
> On 2024-03-07 13:38, Alexey Charkov wrote:
>> On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> 
>> wrote:
>>> On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>>> <sebastian.reichel@collabora.com> wrote:
>>> > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
>>> > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>>> > > active cooling on Radxa Rock 5B via the provided PWM fan.
>>> > >
>>> > > Some RK3588 boards use separate regulators to supply CPUs and their
>>> > > respective memory interfaces, so this is handled by coupling those
>>> > > regulators in affected boards' device trees to ensure that their
>>> > > voltage is adjusted in step.
>>> > >
>>> > > In this revision of the series I chose to enable TSADC for all boards
>>> > > at .dtsi level, because:
>>> > >  - The defaults already in .dtsi should work for all users, given that
>>> > >    the CRU based resets don't need any out-of-chip components, and
>>> > >    the CRU vs. PMIC reset is pretty much the only thing a board might
>>> > >    have to configure / override there
>>> > >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
>>> > >    can still choose to override the reset logic in their .dts. Or stay
>>> > >    with CRU based resets, as downstream kernels do anyway
>>> > >  - The on-by-default approach helps ensure thermal protections are in
>>> > >    place (emergency reset and throttling) for any board even with a
>>> > >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
>>> > >    peace of mind
>>> > >
>>> > > Fan control on Rock 5B has been split into two intervals: let it spin
>>> > > at the minimum cooling state between 55C and 65C, and then accelerate
>>> > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
>>> > > This lets some cooling setups with beefier heatsinks and/or larger
>>> > > fan fins to stay in the quietest non-zero fan state while still
>>> > > gaining potential benefits from the airflow it generates, and
>>> > > possibly avoiding noisy speeds altogether for some workloads.
>>> > >
>>> > > OPPs help actually scale CPU frequencies up and down for both cooling
>>> > > and performance - tested on Rock 5B under varied loads. I've split
>>> > > the patch into two parts: the first containing those OPPs that seem
>>> > > to be no-regret with general consensus during v1 review [2], while
>>> > > the second contains OPPs that cause frequency reductions without
>>> > > accompanying decrease in CPU voltage. There seems to be a slight
>>> > > performance gain in some workload scenarios when using these, but
>>> > > previous discussion was inconclusive as to whether they should be
>>> > > included or not. Having them as separate patches enables easier
>>> > > comparison and partial reversion if people want to test it under
>>> > > their workloads, and also enables the first 'no-regret' part to be
>>> > > merged to -next while the jury is still out on the second one.
>>> > >
>>> > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
>>> > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
>>> > >
>>> > > Signed-off-by: Alexey Charkov <alchark@gmail.com>
>>> > > ---
>>> > > Changes in v3:
>>> > > - Added regulator coupling for EVB1 and QuartzPro64
>>> > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
>>> > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
>>> > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
>>> > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
>>> > >   churn there since the version he acknowledged
>>> > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
>>> > >
>>> > > Changes in v2:
>>> > > - Dropped the rfkill patch which Heiko has already applied
>>> > > - Set higher 'polling-delay-passive' (100 instead of 20)
>>> > > - Name all cooling maps starting from map0 in each respective zone
>>> > > - Drop 'contribution' properties from passive cooling maps
>>> > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
>>> > >
>>> > > ---
>>> > > Alexey Charkov (5):
>>> > >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>>> > >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>>> > >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>>> > >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>>> > >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>>> > >
>>> > >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
>>> > >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
>>> > >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
>>> > >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
>>> > >  4 files changed, 437 insertions(+), 2 deletions(-)
>>> >
>>> > I'm too busy to have a detailed review of this series right now, but
>>> > I pushed it to our CI and it results in a board reset at boot time:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>>> >
>>> > I also pushed just the first three patches (i.e. without OPP /
>>> > cpufreq) and that boots fine:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>>> 
>>> Thank you for testing these! I've noticed in the boot log that the CI
>>> machine uses some u-boot 2023.07 - is that a downstream one? Any
>>> chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>>> integration tree?
>>> 
>>> I use 2023.11 from your integration tree, with a binary bl31, and I'm
>>> not getting those resets even under prolonged heavy load (I rebuild
>>> Chromium with 8 concurrent compilation jobs as the stress test -
>>> that's 14 hours of heavy CPU, memory and IO use). Would be 
>>> interesting
>>> to understand if it's just a 'lucky' SoC specimen on my side, or if
>>> there is some dark magic happening differently on my machine vs. your
>>> CI machine.
>>> 
>>> Thinking that maybe if your CI machine uses a downstream u-boot it
>>> might be leaving some extra hardware running (PVTM?) which might do
>>> weird stuff when TSADC/clocks/voltages get readjusted by the generic
>>> cpufreq driver?..
>>> 
>>> > Note, that OPP / cpufreq works on the same boards in the CI when
>>> > using the ugly-and-not-for-upstream cpufreq driver:
>>> >
>>> > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>>> >
>>> > My best guess right now is, that this is related to the generic
>>> > driver obviously not updating the GRF read margin registers.
>>> 
>>> If it was about memory read margins I believe I would have been
>>> unlikely to get my machine to work reliably under heavy load with the
>>> default ones, but who knows...
>> 
>> Sebastian's report led me to investigate further how all those things
>> are organized in the downstream code and in hardware, and what could
>> be a pragmatic way forward with upstream enablement. It turned out to
>> be quite a rabbit hole frankly, with multiple layers of abstraction
>> and intertwined code in different places.
>> 
>> Here's a quick summary for future reference:
>>  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
>> which provides an SCMI service to expose them to the kernel
>>  - ATF itself doesn't directly set any clock frequencies. Instead, it
>> accepts a target frequency via SCMI and converts it into an oscillator
>> ring length setting for the PVPLL hardware block (via a fixed table
>> lookup). At least that's how it's done in the recently released TF-A
>> bl31 code [1] - perhaps the binary bl31 does something similar
>>  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>>  - PVPLL produces a reference clock to feed to the CPUs, which depends
>> on the configured oscillator ring length but also on the supply
>> voltage, silicon quality and perhaps temperature too. ATF doesn't know
>> anything about voltages or temperatures, so it doesn't guarantee that
>> the requested frequency is matched by the hardware
>>  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
>> which case the target frequency is directly fed by the ATF to the CRU.
>> This happens for both big-core and little-core frequencies below 816
>> MHz
>>  - Given that requesting a particular frequency via SCMI doesn't
>> guarantee that it will be what the CPUs end up running at, the vendor
>> kernel also does a runtime voltage calibration for the supply
>> regulators, by adjusting the supply voltage in minimum regulator steps
>> until the frequency reported by PVPLL gets close to the requested one
>> [2]. It then overwrites OPP provided voltage values with the
>> calibrated ones
>>  - There's also some trickery with preselecting OPP voltage sets using
>> the "-Lx" suffix based on silicon quality, as measured by a "leakage"
>> value stored in an NVMEM cell and/or the PVTM frequency generated at a
>> reference "midpoint" OPP [3]. Better performing silicon gets to run at
>> lower default supply voltages, thus saving power
>>  - Once the OPPs are selected and calibrated, the only remaining
>> trickery is the two supply regulators per each CPU cluster (one for
>> the CPUs and the other for the memory interface)
>>  - Another catch, as Sebastian points out, is that memory read margins
>> must be adjusted whenever the memory interface supply voltage crosses
>> certain thresholds [4]. This has little to do with CPUs or
>> frequencies, and is only tangentially related to them due to the
>> dependency chain between the target CPU frequency -> required CPU
>> supply voltage -> matching memory interface supply voltage -> required
>> read margins
>>  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
>> setting it to anything in kernel code (as the downstream driver does)
>> seems redundant
>> 
>> All in all, it does indeed sound like Collabora's CI machine boot-time
>> resets are most likely caused by the missing memory read margin
>> settings in my patch series. Voltage values in the OPPs I used are the
>> most conservative defaults of what the downstream DT has, and PVPLL
>> should be able to generate reasonable clock speeds with those (albeit
>> likely suboptimal, due to them not being tuned to the particular
>> silicon specimen). And there is little else to differ frankly.
>> 
>> As for the way forward, it would be great to know the opinions from
>> the list. My thinking is as follows:
>>  - I can introduce memory read margin updates as the first priority,
>> leaving voltage calibration and/or OPP preselection for later (as
>> those should not affect system stability at current default values,
>> perhaps only power efficiency to a certain extent)
>>  - CPUfreq doesn't sound like the right place for those, given that
>> they have little to do with either CPU or freq :)
>>  - I suggest a custom regulator config helper to plug into the OPP
>> layer, as is done for TI OMAP5 [6]. At first, it might be only used
>> for looking up and setting the correct memory read margin value
>> whenever the cluster supply voltage changes, and later the same code
>> can be extended to do voltage calibration. In fact, OMAP code is there
>> for a very similar purpose, but in their case optimized voltages are
>> pre-programmed in efuses and don't require runtime recalibration
>>  - Given that all OPPs in the downstream kernel list identical
>> voltages for the memory supply as for the CPU supply, I don't think it
>> makes much sense to customize the cpufreq driver per se.
>> Single-regulator approach with the generic cpufreq-dt and regulator
>> coupling sounds much less invasive and thus lower-maintenance
> 
> Thank you very much for a detailed and highly useful summary!
> 
> I'll retrace your steps into and, hopefully, out of the rabbit hole. :)
> After that, I'll come back with an update.

Just a brief update...  I went even a bit deeper into multiple rabbit
holes, :) and I'll come back with a detailed update a bit later, 
together
with a proposal for the plan to move forward.  The final outcome should
be awesome. :)

>> [1] 
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L303
>> [2] 
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L804
>> [3] 
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/soc/rockchip/rockchip_opp_select.c#L1575
>> [4] 
>> https://github.com/radxa/kernel/blob/c428536281d69aeb2b3480f65b2b227210b61535/drivers/cpufreq/rockchip-cpufreq.c#L405
>> [5] 
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/trusted-firmware-a/-/blob/rk3588/plat/rockchip/rk3588/drivers/scmi/rk3588_clk.c?ref_type=heads#L2419
>> [6] 
>> https://git.kernel.org/pub/scm/linux/kernel/git/torvalds/linux.git/tree/drivers/opp/ti-opp-supply.c#n275
> 
> _______________________________________________
> Linux-rockchip mailing list
> Linux-rockchip@lists.infradead.org
> http://lists.infradead.org/mailman/listinfo/linux-rockchip
Sebastian Reichel March 13, 2024, 4:39 p.m. UTC | #7
Hi,

On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
> On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
> > On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> wrote:
> > > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
> > > <sebastian.reichel@collabora.com> wrote:
> > > > On Thu, Feb 29, 2024 at 11:26:31PM +0400, Alexey Charkov wrote:
> > > > > This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> > > > > active cooling on Radxa Rock 5B via the provided PWM fan.
> > > > >
> > > > > Some RK3588 boards use separate regulators to supply CPUs and their
> > > > > respective memory interfaces, so this is handled by coupling those
> > > > > regulators in affected boards' device trees to ensure that their
> > > > > voltage is adjusted in step.
> > > > >
> > > > > In this revision of the series I chose to enable TSADC for all boards
> > > > > at .dtsi level, because:
> > > > >  - The defaults already in .dtsi should work for all users, given that
> > > > >    the CRU based resets don't need any out-of-chip components, and
> > > > >    the CRU vs. PMIC reset is pretty much the only thing a board might
> > > > >    have to configure / override there
> > > > >  - The boards that have TSADC_SHUT signal wired to the PMIC reset line
> > > > >    can still choose to override the reset logic in their .dts. Or stay
> > > > >    with CRU based resets, as downstream kernels do anyway
> > > > >  - The on-by-default approach helps ensure thermal protections are in
> > > > >    place (emergency reset and throttling) for any board even with a
> > > > >    rudimentary .dts, and thus lets us introduce CPU DVFS with better
> > > > >    peace of mind
> > > > >
> > > > > Fan control on Rock 5B has been split into two intervals: let it spin
> > > > > at the minimum cooling state between 55C and 65C, and then accelerate
> > > > > if the system crosses the 65C mark - thanks to Dragan for suggesting.
> > > > > This lets some cooling setups with beefier heatsinks and/or larger
> > > > > fan fins to stay in the quietest non-zero fan state while still
> > > > > gaining potential benefits from the airflow it generates, and
> > > > > possibly avoiding noisy speeds altogether for some workloads.
> > > > >
> > > > > OPPs help actually scale CPU frequencies up and down for both cooling
> > > > > and performance - tested on Rock 5B under varied loads. I've split
> > > > > the patch into two parts: the first containing those OPPs that seem
> > > > > to be no-regret with general consensus during v1 review [2], while
> > > > > the second contains OPPs that cause frequency reductions without
> > > > > accompanying decrease in CPU voltage. There seems to be a slight
> > > > > performance gain in some workload scenarios when using these, but
> > > > > previous discussion was inconclusive as to whether they should be
> > > > > included or not. Having them as separate patches enables easier
> > > > > comparison and partial reversion if people want to test it under
> > > > > their workloads, and also enables the first 'no-regret' part to be
> > > > > merged to -next while the jury is still out on the second one.
> > > > >
> > > > > [1] https://lore.kernel.org/linux-rockchip/1824717.EqSB1tO5pr@bagend/T/#ma2ab949da2235a8e759eab22155fb2bc397d8aea
> > > > > [2] https://lore.kernel.org/linux-rockchip/CABjd4YxqarUCbZ-a2XLe3TWJ-qjphGkyq=wDnctnEhdoSdPPpw@mail.gmail.com/T/#m49d2b94e773f5b532a0bb5d3d7664799ff28cc2c
> > > > >
> > > > > Signed-off-by: Alexey Charkov <alchark@gmail.com>
> > > > > ---
> > > > > Changes in v3:
> > > > > - Added regulator coupling for EVB1 and QuartzPro64
> > > > > - Enabled the TSADC for all boards in .dtsi, not just Rock 5B (thanks ChenYu)
> > > > > - Added comments regarding two passive cooling trips in each zone (thanks Dragan)
> > > > > - Fixed active cooling map numbering for Radxa Rock 5B (thanks Dragan)
> > > > > - Dropped Daniel's Acked-by tag from the Rock 5B fan patch, as there's been quite some
> > > > >   churn there since the version he acknowledged
> > > > > - Link to v2: https://lore.kernel.org/r/20240130-rk-dts-additions-v2-0-c6222c4c78df@gmail.com
> > > > >
> > > > > Changes in v2:
> > > > > - Dropped the rfkill patch which Heiko has already applied
> > > > > - Set higher 'polling-delay-passive' (100 instead of 20)
> > > > > - Name all cooling maps starting from map0 in each respective zone
> > > > > - Drop 'contribution' properties from passive cooling maps
> > > > > - Link to v1: https://lore.kernel.org/r/20240125-rk-dts-additions-v1-0-5879275db36f@gmail.com
> > > > >
> > > > > ---
> > > > > Alexey Charkov (5):
> > > > >       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
> > > > >       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
> > > > >       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
> > > > >       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
> > > > >       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> > > > >
> > > > >  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
> > > > >  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
> > > > >  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
> > > > >  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385 ++++++++++++++++++++-
> > > > >  4 files changed, 437 insertions(+), 2 deletions(-)
> > > >
> > > > I'm too busy to have a detailed review of this series right now, but
> > > > I pushed it to our CI and it results in a board reset at boot time:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
> > > >
> > > > I also pushed just the first three patches (i.e. without OPP /
> > > > cpufreq) and that boots fine:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
> > >
> > > Thank you for testing these! I've noticed in the boot log that the CI
> > > machine uses some u-boot 2023.07 - is that a downstream one? Any
> > > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
> > > integration tree?
> > >
> > > I use 2023.11 from your integration tree, with a binary bl31, and I'm
> > > not getting those resets even under prolonged heavy load (I rebuild
> > > Chromium with 8 concurrent compilation jobs as the stress test -
> > > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
> > > to understand if it's just a 'lucky' SoC specimen on my side, or if
> > > there is some dark magic happening differently on my machine vs. your
> > > CI machine.
> > >
> > > Thinking that maybe if your CI machine uses a downstream u-boot it
> > > might be leaving some extra hardware running (PVTM?) which might do
> > > weird stuff when TSADC/clocks/voltages get readjusted by the generic
> > > cpufreq driver?..
> > >
> > > > Note, that OPP / cpufreq works on the same boards in the CI when
> > > > using the ugly-and-not-for-upstream cpufreq driver:
> > > >
> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
> > > >
> > > > My best guess right now is, that this is related to the generic
> > > > driver obviously not updating the GRF read margin registers.
> > >
> > > If it was about memory read margins I believe I would have been
> > > unlikely to get my machine to work reliably under heavy load with the
> > > default ones, but who knows...
> > 
> > Sebastian's report led me to investigate further how all those things
> > are organized in the downstream code and in hardware, and what could
> > be a pragmatic way forward with upstream enablement. It turned out to
> > be quite a rabbit hole frankly, with multiple layers of abstraction
> > and intertwined code in different places.
> > 
> > Here's a quick summary for future reference:
> >  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
> > which provides an SCMI service to expose them to the kernel
> >  - ATF itself doesn't directly set any clock frequencies. Instead, it
> > accepts a target frequency via SCMI and converts it into an oscillator
> > ring length setting for the PVPLL hardware block (via a fixed table
> > lookup). At least that's how it's done in the recently released TF-A
> > bl31 code [1] - perhaps the binary bl31 does something similar
> >  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
> >  - PVPLL produces a reference clock to feed to the CPUs, which depends
> > on the configured oscillator ring length but also on the supply
> > voltage, silicon quality and perhaps temperature too. ATF doesn't know
> > anything about voltages or temperatures, so it doesn't guarantee that
> > the requested frequency is matched by the hardware
> >  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
> > which case the target frequency is directly fed by the ATF to the CRU.
> > This happens for both big-core and little-core frequencies below 816
> > MHz
> >  - Given that requesting a particular frequency via SCMI doesn't
> > guarantee that it will be what the CPUs end up running at, the vendor
> > kernel also does a runtime voltage calibration for the supply
> > regulators, by adjusting the supply voltage in minimum regulator steps
> > until the frequency reported by PVPLL gets close to the requested one
> > [2]. It then overwrites OPP provided voltage values with the
> > calibrated ones
> >  - There's also some trickery with preselecting OPP voltage sets using
> > the "-Lx" suffix based on silicon quality, as measured by a "leakage"
> > value stored in an NVMEM cell and/or the PVTM frequency generated at a
> > reference "midpoint" OPP [3]. Better performing silicon gets to run at
> > lower default supply voltages, thus saving power
> >  - Once the OPPs are selected and calibrated, the only remaining
> > trickery is the two supply regulators per each CPU cluster (one for
> > the CPUs and the other for the memory interface)
> >  - Another catch, as Sebastian points out, is that memory read margins
> > must be adjusted whenever the memory interface supply voltage crosses
> > certain thresholds [4]. This has little to do with CPUs or
> > frequencies, and is only tangentially related to them due to the
> > dependency chain between the target CPU frequency -> required CPU
> > supply voltage -> matching memory interface supply voltage -> required
> > read margins
> >  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
> > setting it to anything in kernel code (as the downstream driver does)
> > seems redundant
> > 
> > All in all, it does indeed sound like Collabora's CI machine boot-time
> > resets are most likely caused by the missing memory read margin
> > settings in my patch series. Voltage values in the OPPs I used are the
> > most conservative defaults of what the downstream DT has, and PVPLL
> > should be able to generate reasonable clock speeds with those (albeit
> > likely suboptimal, due to them not being tuned to the particular
> > silicon specimen). And there is little else to differ frankly.
> > 
> > As for the way forward, it would be great to know the opinions from
> > the list. My thinking is as follows:
> >  - I can introduce memory read margin updates as the first priority,
> > leaving voltage calibration and/or OPP preselection for later (as
> > those should not affect system stability at current default values,
> > perhaps only power efficiency to a certain extent)
> >  - CPUfreq doesn't sound like the right place for those, given that
> > they have little to do with either CPU or freq :)
> >  - I suggest a custom regulator config helper to plug into the OPP
> > layer, as is done for TI OMAP5 [6]. At first, it might be only used
> > for looking up and setting the correct memory read margin value
> > whenever the cluster supply voltage changes, and later the same code
> > can be extended to do voltage calibration. In fact, OMAP code is there
> > for a very similar purpose, but in their case optimized voltages are
> > pre-programmed in efuses and don't require runtime recalibration
> >  - Given that all OPPs in the downstream kernel list identical
> > voltages for the memory supply as for the CPU supply, I don't think it
> > makes much sense to customize the cpufreq driver per se.
> > Single-regulator approach with the generic cpufreq-dt and regulator
> > coupling sounds much less invasive and thus lower-maintenance
> 
> Sorry for my late response.
> 
> When doing some more tests I noticed, that the CI never build the
> custom driver and thus never did any CPU frequency scaling at all.
> I only used it for my own tests (on RK3588 EVB1). When enabling the
> custom driver, the CI has the same issues as your series. So my
> message was completely wrong, sorry about that.
> 
> Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
> the last part is the git hash. This is the exact U-Boot source tree
> being used:
> 
> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/
> 
> This was one of the first U-Boot trees with Rock 5B Ethernet support
> and is currently flashed to the SPI flash memory of the CI boards.
> The vendor U-Boot tree is a lot older. Also it is still using the
> Rockchip binary BL31. We have plans to also CI boot test U-Boot,
> but currently nobody has time to work on this. I don't think there should
> be any relevant changes between upstream 2023.07 and 2023.11 that could
> explain this. But it's the best lead now, so I will try to find some time
> for doing further tests related to this in the next days.
> 
> Regarding the voltage calibration - One option would be to do this
> calibration at boot time (i.e. in U-Boot) and update the voltages
> in DT accordingly.

After some more debugging I finally found the root cause. The CI
boards were powered from a USB hub using a USB-A to USB-C cable, so
that the team could access maskrom mode. Since I was not involved in
setting them up, I was not aware of that. It effectively limits the
power draw to 500 or 900 mA (depending on USB port implementation),
which is not enough to power the board with the higher frequencies.
The KernelCI Rock 5B boards are now switched to proper power
supplies and the issues are gone.

Sorry for the false alarm,

-- Sebstian
Dragan Simic March 13, 2024, 4:44 p.m. UTC | #8
Hello Sebastian,

On 2024-03-13 17:39, Sebastian Reichel wrote:
> On Thu, Mar 07, 2024 at 11:16:20PM +0100, Sebastian Reichel wrote:
>> On Thu, Mar 07, 2024 at 04:38:24PM +0400, Alexey Charkov wrote:
>> > On Tue, Mar 5, 2024 at 12:06 PM Alexey Charkov <alchark@gmail.com> wrote:
>> > > On Mon, Mar 4, 2024 at 9:51 PM Sebastian Reichel
>> > > <sebastian.reichel@collabora.com> wrote:
>> > > > I'm too busy to have a detailed review of this series right now, but
>> > > > I pushed it to our CI and it results in a board reset at boot time:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300950
>> > > >
>> > > > I also pushed just the first three patches (i.e. without OPP /
>> > > > cpufreq) and that boots fine:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/jobs/300953
>> > >
>> > > Thank you for testing these! I've noticed in the boot log that the CI
>> > > machine uses some u-boot 2023.07 - is that a downstream one? Any
>> > > chance to compare it to 2023.11 or 2024.01 from your (Collabora)
>> > > integration tree?
>> > >
>> > > I use 2023.11 from your integration tree, with a binary bl31, and I'm
>> > > not getting those resets even under prolonged heavy load (I rebuild
>> > > Chromium with 8 concurrent compilation jobs as the stress test -
>> > > that's 14 hours of heavy CPU, memory and IO use). Would be interesting
>> > > to understand if it's just a 'lucky' SoC specimen on my side, or if
>> > > there is some dark magic happening differently on my machine vs. your
>> > > CI machine.
>> > >
>> > > Thinking that maybe if your CI machine uses a downstream u-boot it
>> > > might be leaving some extra hardware running (PVTM?) which might do
>> > > weird stuff when TSADC/clocks/voltages get readjusted by the generic
>> > > cpufreq driver?..
>> > >
>> > > > Note, that OPP / cpufreq works on the same boards in the CI when
>> > > > using the ugly-and-not-for-upstream cpufreq driver:
>> > > >
>> > > > https://gitlab.collabora.com/hardware-enablement/rockchip-3588/linux/-/commit/9c90c5032743a0419bf3fd2f914a24fd53101acd
>> > > >
>> > > > My best guess right now is, that this is related to the generic
>> > > > driver obviously not updating the GRF read margin registers.
>> > >
>> > > If it was about memory read margins I believe I would have been
>> > > unlikely to get my machine to work reliably under heavy load with the
>> > > default ones, but who knows...
>> >
>> > Sebastian's report led me to investigate further how all those things
>> > are organized in the downstream code and in hardware, and what could
>> > be a pragmatic way forward with upstream enablement. It turned out to
>> > be quite a rabbit hole frankly, with multiple layers of abstraction
>> > and intertwined code in different places.
>> >
>> > Here's a quick summary for future reference:
>> >  - CPU clocks on RK3588 are ultimately managed by the ATF firmware,
>> > which provides an SCMI service to expose them to the kernel
>> >  - ATF itself doesn't directly set any clock frequencies. Instead, it
>> > accepts a target frequency via SCMI and converts it into an oscillator
>> > ring length setting for the PVPLL hardware block (via a fixed table
>> > lookup). At least that's how it's done in the recently released TF-A
>> > bl31 code [1] - perhaps the binary bl31 does something similar
>> >  - U-boot doesn't seem to mess with CPU clocks, PVTM or PVPLL
>> >  - PVPLL produces a reference clock to feed to the CPUs, which depends
>> > on the configured oscillator ring length but also on the supply
>> > voltage, silicon quality and perhaps temperature too. ATF doesn't know
>> > anything about voltages or temperatures, so it doesn't guarantee that
>> > the requested frequency is matched by the hardware
>> >  - PVPLL frequency generation is bypassed for lower-frequency OPPs, in
>> > which case the target frequency is directly fed by the ATF to the CRU.
>> > This happens for both big-core and little-core frequencies below 816
>> > MHz
>> >  - Given that requesting a particular frequency via SCMI doesn't
>> > guarantee that it will be what the CPUs end up running at, the vendor
>> > kernel also does a runtime voltage calibration for the supply
>> > regulators, by adjusting the supply voltage in minimum regulator steps
>> > until the frequency reported by PVPLL gets close to the requested one
>> > [2]. It then overwrites OPP provided voltage values with the
>> > calibrated ones
>> >  - There's also some trickery with preselecting OPP voltage sets using
>> > the "-Lx" suffix based on silicon quality, as measured by a "leakage"
>> > value stored in an NVMEM cell and/or the PVTM frequency generated at a
>> > reference "midpoint" OPP [3]. Better performing silicon gets to run at
>> > lower default supply voltages, thus saving power
>> >  - Once the OPPs are selected and calibrated, the only remaining
>> > trickery is the two supply regulators per each CPU cluster (one for
>> > the CPUs and the other for the memory interface)
>> >  - Another catch, as Sebastian points out, is that memory read margins
>> > must be adjusted whenever the memory interface supply voltage crosses
>> > certain thresholds [4]. This has little to do with CPUs or
>> > frequencies, and is only tangentially related to them due to the
>> > dependency chain between the target CPU frequency -> required CPU
>> > supply voltage -> matching memory interface supply voltage -> required
>> > read margins
>> >  - At reset the ATF switches all clocks to the lowest 408 MHz [6], so
>> > setting it to anything in kernel code (as the downstream driver does)
>> > seems redundant
>> >
>> > All in all, it does indeed sound like Collabora's CI machine boot-time
>> > resets are most likely caused by the missing memory read margin
>> > settings in my patch series. Voltage values in the OPPs I used are the
>> > most conservative defaults of what the downstream DT has, and PVPLL
>> > should be able to generate reasonable clock speeds with those (albeit
>> > likely suboptimal, due to them not being tuned to the particular
>> > silicon specimen). And there is little else to differ frankly.
>> >
>> > As for the way forward, it would be great to know the opinions from
>> > the list. My thinking is as follows:
>> >  - I can introduce memory read margin updates as the first priority,
>> > leaving voltage calibration and/or OPP preselection for later (as
>> > those should not affect system stability at current default values,
>> > perhaps only power efficiency to a certain extent)
>> >  - CPUfreq doesn't sound like the right place for those, given that
>> > they have little to do with either CPU or freq :)
>> >  - I suggest a custom regulator config helper to plug into the OPP
>> > layer, as is done for TI OMAP5 [6]. At first, it might be only used
>> > for looking up and setting the correct memory read margin value
>> > whenever the cluster supply voltage changes, and later the same code
>> > can be extended to do voltage calibration. In fact, OMAP code is there
>> > for a very similar purpose, but in their case optimized voltages are
>> > pre-programmed in efuses and don't require runtime recalibration
>> >  - Given that all OPPs in the downstream kernel list identical
>> > voltages for the memory supply as for the CPU supply, I don't think it
>> > makes much sense to customize the cpufreq driver per se.
>> > Single-regulator approach with the generic cpufreq-dt and regulator
>> > coupling sounds much less invasive and thus lower-maintenance
>> 
>> Sorry for my late response.
>> 
>> When doing some more tests I noticed, that the CI never build the
>> custom driver and thus never did any CPU frequency scaling at all.
>> I only used it for my own tests (on RK3588 EVB1). When enabling the
>> custom driver, the CI has the same issues as your series. So my
>> message was completely wrong, sorry about that.
>> 
>> Regarding U-Boot: The CI uses "U-Boot SPL 2023.07-rc4-g46349e27";
>> the last part is the git hash. This is the exact U-Boot source tree
>> being used:
>> 
>> https://gitlab.collabora.com/hardware-enablement/rockchip-3588/u-boot/-/commits/46349e27/
>> 
>> This was one of the first U-Boot trees with Rock 5B Ethernet support
>> and is currently flashed to the SPI flash memory of the CI boards.
>> The vendor U-Boot tree is a lot older. Also it is still using the
>> Rockchip binary BL31. We have plans to also CI boot test U-Boot,
>> but currently nobody has time to work on this. I don't think there 
>> should
>> be any relevant changes between upstream 2023.07 and 2023.11 that 
>> could
>> explain this. But it's the best lead now, so I will try to find some 
>> time
>> for doing further tests related to this in the next days.
>> 
>> Regarding the voltage calibration - One option would be to do this
>> calibration at boot time (i.e. in U-Boot) and update the voltages
>> in DT accordingly.
> 
> After some more debugging I finally found the root cause. The CI
> boards were powered from a USB hub using a USB-A to USB-C cable, so
> that the team could access maskrom mode. Since I was not involved in
> setting them up, I was not aware of that. It effectively limits the
> power draw to 500 or 900 mA (depending on USB port implementation),
> which is not enough to power the board with the higher frequencies.
> The KernelCI Rock 5B boards are now switched to proper power
> supplies and the issues are gone.
> 
> Sorry for the false alarm,

Great to know, thanks for the clarification.
Diederik de Haas April 10, 2024, 9:19 a.m. UTC | #9
On Thursday, 29 February 2024 20:26:31 CEST Alexey Charkov wrote:
> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
> active cooling on Radxa Rock 5B via the provided PWM fan.
> 
> Some RK3588 boards use separate regulators to supply CPUs and their
> respective memory interfaces, so this is handled by coupling those
> regulators in affected boards' device trees to ensure that their
> voltage is adjusted in step.
> 
> 
> Signed-off-by: Alexey Charkov <alchark@gmail.com>
> ---
> Alexey Charkov (5):
>       arm64: dts: rockchip: enable built-in thermal monitoring on RK3588
>       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>       arm64: dts: rockchip: Add CPU/memory regulator coupling for RK3588
>       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
> 
>  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
>  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
>  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385
> ++++++++++++++++++++- 4 files changed, 437 insertions(+), 2 deletions(-)
> ---
> base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
> change-id: 20240124-rk-dts-additions-a6d7b52787b9

Can you rebase this patch set on Heiko's for-next branch [1]?
And then also fix the ordering of the nodes and the elements within
those nodes so that they match the current conventions?

Cheers,
  Diederik

[1] https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next
Dragan Simic April 10, 2024, 9:28 a.m. UTC | #10
Hello Diederik,

On 2024-04-10 11:19, Diederik de Haas wrote:
> On Thursday, 29 February 2024 20:26:31 CEST Alexey Charkov wrote:
>> This enables thermal monitoring and CPU DVFS on RK3588(s), as well as
>> active cooling on Radxa Rock 5B via the provided PWM fan.
>> 
>> Some RK3588 boards use separate regulators to supply CPUs and their
>> respective memory interfaces, so this is handled by coupling those
>> regulators in affected boards' device trees to ensure that their
>> voltage is adjusted in step.
>> 
>> 
>> Signed-off-by: Alexey Charkov <alchark@gmail.com>
>> ---
>> Alexey Charkov (5):
>>       arm64: dts: rockchip: enable built-in thermal monitoring on 
>> RK3588
>>       arm64: dts: rockchip: enable automatic active cooling on Rock 5B
>>       arm64: dts: rockchip: Add CPU/memory regulator coupling for 
>> RK3588
>>       arm64: dts: rockchip: Add OPP data for CPU cores on RK3588
>>       arm64: dts: rockchip: Add further granularity in RK3588 CPU OPPs
>> 
>>  arch/arm64/boot/dts/rockchip/rk3588-evb1-v10.dts   |  12 +
>>  .../arm64/boot/dts/rockchip/rk3588-quartzpro64.dts |  12 +
>>  arch/arm64/boot/dts/rockchip/rk3588-rock-5b.dts    |  30 +-
>>  arch/arm64/boot/dts/rockchip/rk3588s.dtsi          | 385
>> ++++++++++++++++++++- 4 files changed, 437 insertions(+), 2 
>> deletions(-)
>> ---
>> base-commit: cf1182944c7cc9f1c21a8a44e0d29abe12527412
>> change-id: 20240124-rk-dts-additions-a6d7b52787b9
> 
> Can you rebase this patch set on Heiko's for-next branch [1]?
> And then also fix the ordering of the nodes and the elements within
> those nodes so that they match the current conventions?

Ah, thanks, this is a good reminder about the proposal for the plan
for moving forward, which I promised to send a while ago. :)

> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next
Diederik de Haas April 20, 2024, 5:53 p.m. UTC | #11
Hi Dragan and Alexey,

On Wednesday, 10 April 2024 11:28:09 CEST Dragan Simic wrote:
> > Can you rebase this patch set on Heiko's for-next branch [1]?
> > And then also fix the ordering of the nodes and the elements within
> > those nodes so that they match the current conventions?
> 
> Ah, thanks, this is a good reminder about the proposal for the plan
> for moving forward, which I promised to send a while ago. :)
> 
> > [1]
> > https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/l
> > og/?h=for-next

I build a (Debian) kernel based off 6.9-rc4 + a whole bunch of patches, 
including this patch series. I got someone on #debian-arm to try it out on a 
Rock 5B and the dmesg output showed a number of items wrt thermal and OPP.

Some items that I filtered out from that dmesg output:

[    3.211716] hwmon hwmon0: temp1_input not attached to any thermal zone
[    3.908339] panthor fb000000.gpu: EM: OPP:900000 is inefficient
[   10.473061] cpu cpu0: EM: OPP:600000 is inefficient
[   10.473233] energy_model: Accessing cpu4 policy failed
[   10.585236] rockchip-thermal fec00000.tsadc: Missing rockchip,grf property

Attached is the full list of items I collected from that dmesg output which 
seem worth investigating. 

Maybe useful to investigate when moving forward?

Cheers,
  Diederik
Radxa 5B issues (dmesg on https://termbin.com/e170):

[    2.863508] rockchip-spi feb20000.spi: Runtime PM usage count underflow!

[    2.915650] rtc-hym8563 6-0051: no valid clock/calendar values available
[    2.915949] rtc-hym8563 6-0051: registered as rtc0
[    2.916945] rtc-hym8563 6-0051: no valid clock/calendar values available
[    2.916950] rtc-hym8563 6-0051: hctosys: unable to read the hardware clock

[    2.994580] rockchip-drm display-subsystem: [drm:rockchip_drm_platform_probe [rockchipdrm]] *ERROR* No available vop found for display-subsystem.
[    3.101020] rockchip-dw-pcie a40000000.pcie: PCIe Gen.3 x4 link up
[    3.101308] rockchip-dw-pcie a40000000.pcie: PCI host bridge to bus 0000:00
[    3.101316] pci_bus 0000:00: root bus resource [bus 00-0f]
[    3.101324] pci_bus 0000:00: root bus resource [io  0x0000-0xfffff] (bus address [0xf0100000-0xf01fffff])
[    3.101329] pci_bus 0000:00: root bus resource [mem 0xf0200000-0xf0ffffff]
[    3.101335] pci_bus 0000:00: root bus resource [mem 0x900000000-0x93fffffff] (bus address [0x40000000-0x7fffffff])
[    3.101357] pci 0000:00:00.0: [1d87:3588] type 01 class 0x060400 PCIe Root Port
[    3.101369] pci 0000:00:00.0: BAR 0 [mem 0x00000000-0x3fffffff]
[    3.101377] pci 0000:00:00.0: BAR 1 [mem 0x00000000-0x3fffffff]
[    3.101384] pci 0000:00:00.0: ROM [mem 0x00000000-0x0000ffff pref]
[    3.101389] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[    3.101396] pci 0000:00:00.0:   bridge window [io  0x0000-0x0fff]
[    3.101402] pci 0000:00:00.0:   bridge window [mem 0x00000000-0x000fffff]
[    3.101410] pci 0000:00:00.0:   bridge window [mem 0x00000000-0x000fffff 64bit pref]
[    3.101452] pci 0000:00:00.0: supports D1 D2
[    3.101457] pci 0000:00:00.0: PME# supported from D0 D1 D3hot
[    3.103848] pci_bus 0000:01: busn_res: can not insert [bus 01-ff] under [bus 00-0f] (conflicts with (null) [bus 00-0f])
[    3.103921] pci 0000:01:00.0: [144d:a809] type 00 class 0x010802 PCIe Endpoint
[    3.103987] pci 0000:01:00.0: BAR 0 [mem 0x00000000-0x00003fff 64bit]
[    3.116969] pci 0000:00:00.0: BAR 0 [mem 0x900000000-0x93fffffff]: assigned
[    3.116983] pci 0000:00:00.0: BAR 1 [mem size 0x40000000]: can't assign; no space
[    3.116990] pci 0000:00:00.0: BAR 1 [mem size 0x40000000]: failed to assign
[    3.116995] pci 0000:00:00.0: bridge window [mem 0xf0200000-0xf02fffff]: assigned
[    3.117001] pci 0000:00:00.0: ROM [mem 0xf0300000-0xf030ffff pref]: assigned
[    3.117009] pci 0000:01:00.0: BAR 0 [mem 0xf0200000-0xf0203fff 64bit]: assigned
[    3.117042] pci 0000:00:00.0: PCI bridge to [bus 01-ff]
[    3.117048] pci 0000:00:00.0:   bridge window [mem 0xf0200000-0xf02fffff]

[    3.211716] hwmon hwmon0: temp1_input not attached to any thermal zone
[    3.211775] hwmon hwmon0: temp2_input not attached to any thermal zone
[    3.211823] hwmon hwmon0: temp3_input not attached to any thermal zone

[    3.327532] pci 0002:20:00.0: bridge configuration invalid ([bus 01-ff]), reconfiguring

[    3.908339] panthor fb000000.gpu: EM: OPP:900000 is inefficient
[    3.908349] panthor fb000000.gpu: EM: OPP:800000 is inefficient
[    3.908354] panthor fb000000.gpu: EM: OPP:700000 is inefficient
[    3.908358] panthor fb000000.gpu: EM: OPP:600000 is inefficient
[    3.908361] panthor fb000000.gpu: EM: OPP:500000 is inefficient
[    3.908365] panthor fb000000.gpu: EM: OPP:400000 is inefficient
[    3.908369] panthor fb000000.gpu: EM: OPP:300000 is inefficient

[    3.920149] [drm] Initialized panthor 1.0.0 20230801 for fb000000.gpu on minor 0

[    4.483910] dwmmc_rockchip fe2d0000.mmc: Busy; trying anyway
[    5.213383] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[    5.228057] mmc_host mmc2: Bus speed (slot 0) = 300000Hz (slot req 300000Hz, actual 300000HZ div = 0)

[    7.072323] dwmmc_rockchip fe2d0000.mmc: Busy; trying anyway
[    7.801690] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[    7.816350] mmc_host mmc2: Bus speed (slot 0) = 187500Hz (slot req 187500Hz, actual 187500HZ div = 0)
[    8.369904] dwmmc_rockchip fe2d0000.mmc: Busy; trying anyway
[    9.099376] mmc_host mmc2: Timeout sending command (cmd 0x202000 arg 0x0 status 0x80202000)
[    9.101395] mmc2: Failed to initialize a non-removable card

[   10.473061] cpu cpu0: EM: OPP:600000 is inefficient
[   10.473075] cpu cpu0: EM: OPP:408000 is inefficient
[   10.473227] cpu cpu0: EM: created perf domain
[   10.473233] energy_model: Accessing cpu4 policy failed
[   10.483901] cpu cpu4: EM: OPP:2352000 is inefficient
[   10.483914] cpu cpu4: EM: OPP:2304000 is inefficient
[   10.483918] cpu cpu4: EM: OPP:2256000 is inefficient
[   10.483923] cpu cpu4: EM: OPP:1008000 is inefficient
[   10.483926] cpu cpu4: EM: OPP:816000 is inefficient
[   10.483930] cpu cpu4: EM: OPP:600000 is inefficient
[   10.483934] cpu cpu4: EM: OPP:408000 is inefficient
[   10.484193] cpu cpu4: EM: created perf domain
[   10.484199] energy_model: Accessing cpu6 policy failed
[   10.488907] cpu cpu6: EM: OPP:2352000 is inefficient
[   10.488917] cpu cpu6: EM: OPP:2304000 is inefficient
[   10.488921] cpu cpu6: EM: OPP:2256000 is inefficient
[   10.488925] cpu cpu6: EM: OPP:1008000 is inefficient
[   10.488928] cpu cpu6: EM: OPP:816000 is inefficient
[   10.488932] cpu cpu6: EM: OPP:600000 is inefficient
[   10.488936] cpu cpu6: EM: OPP:408000 is inefficient
[   10.489175] cpu cpu6: EM: created perf domain
[   10.489179] energy_model: updating cpu0 cpu_cap=397 old capacity=530
[   10.489186] cpu cpu0: EM: OPP:816000 is inefficient
[   10.489190] cpu cpu0: EM: OPP:600000 is inefficient
[   10.489194] cpu cpu0: EM: OPP:408000 is inefficient
[   10.585236] rockchip-thermal fec00000.tsadc: Missing rockchip,grf property
Dragan Simic April 21, 2024, 4:07 p.m. UTC | #12
Hello Diederik,

On 2024-04-20 19:53, Diederik de Haas wrote:
> On Wednesday, 10 April 2024 11:28:09 CEST Dragan Simic wrote:
>> > Can you rebase this patch set on Heiko's for-next branch [1]?
>> > And then also fix the ordering of the nodes and the elements within
>> > those nodes so that they match the current conventions?
>> 
>> Ah, thanks, this is a good reminder about the proposal for the plan
>> for moving forward, which I promised to send a while ago. :)
>> 
>>> [1] 
>>> https://git.kernel.org/pub/scm/linux/kernel/git/mmind/linux-rockchip.git/log/?h=for-next
> 
> I build a (Debian) kernel based off 6.9-rc4 + a whole bunch of patches,
> including this patch series. I got someone on #debian-arm to try it out 
> on a
> Rock 5B and the dmesg output showed a number of items wrt thermal and 
> OPP.
> 
> Some items that I filtered out from that dmesg output:
> 
> [    3.211716] hwmon hwmon0: temp1_input not attached to any thermal 
> zone
> [    3.908339] panthor fb000000.gpu: EM: OPP:900000 is inefficient
> [   10.473061] cpu cpu0: EM: OPP:600000 is inefficient
> [   10.473233] energy_model: Accessing cpu4 policy failed
> [   10.585236] rockchip-thermal fec00000.tsadc: Missing rockchip,grf 
> property
> 
> Attached is the full list of items I collected from that dmesg output 
> which
> seem worth investigating.
> 
> Maybe useful to investigate when moving forward?

This is a nice report, thanks!

I'm not sure what's going on with the mmc2 issues.  Regarding the 
hym8563,
hwmon, energy_model and rockchip-spi issues, I'll have a look into them
and come back with an update.

Regarding the multiple "OPP:<frequency> is inefficient" warnings, it's
already on my TO-DO list to perform a detailed (and repeatable) testing.
My suspicion is that declaring the OPPs as inefficient actually isn't
warranted, but we'll see what will be the test results.