diff mbox series

arm64: dts: mediatek: mt8195: Set DSU PMU status to fail

Message ID 20230720200753.322133-1-nfraprado@collabora.com (mailing list archive)
State New, archived
Headers show
Series arm64: dts: mediatek: mt8195: Set DSU PMU status to fail | expand

Commit Message

Nícolas F. R. A. Prado July 20, 2023, 8:07 p.m. UTC
The DSU PMU allows monitoring performance events in the DSU cluster,
which is done by configuring and reading back values from the DSU PMU
system registers. However, for write-access to be allowed by ELs lower
than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
register, as it is disallowed by default.

That configuration is not done on the firmware used by the MT8195 SoC,
as a consequence, booting a MT8195-based machine like
mt8195-cherry-tomato-r2 with CONFIG_ARM_DSU_PMU enabled hangs the kernel
just as it writes to the CLUSTERPMOVSCLR_EL1 register, since the
instruction faults to EL3, and BL31 apparently just re-runs the
instruction over and over.

Mark the DSU PMU node in the Devicetree with status "fail", as the
machine doesn't have a suitable firmware to make use of it from the
kernel, and allowing its driver to probe would hang the kernel.

Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board")
Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>

---

 arch/arm64/boot/dts/mediatek/mt8195.dtsi | 1 +
 1 file changed, 1 insertion(+)

Comments

AngeloGioacchino Del Regno July 21, 2023, 8:16 a.m. UTC | #1
Il 20/07/23 22:07, Nícolas F. R. A. Prado ha scritto:
> The DSU PMU allows monitoring performance events in the DSU cluster,
> which is done by configuring and reading back values from the DSU PMU
> system registers. However, for write-access to be allowed by ELs lower
> than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
> register, as it is disallowed by default.

Typo: ACTLR_EL2, ACTLR_EL3 bit 12 must be set if SCR.NS is 1;
ACTLR_EL3 bit 12 must be set if SCR.NS is 0.

On MT8195 Chromebooks, SCR.NS is 1 - hence ACTLR_EL2/EL3 must have BIT(12) set,
but at least ACTLR_EL2 doesn't have it set.

I haven't verified EL3, but that doesn't matter, since both need to be set.

> 
> That configuration is not done on the firmware used by the MT8195 SoC,
> as a consequence, booting a MT8195-based machine like
> mt8195-cherry-tomato-r2 with CONFIG_ARM_DSU_PMU enabled hangs the kernel
> just as it writes to the CLUSTERPMOVSCLR_EL1 register, since the
> instruction faults to EL3, and BL31 apparently just re-runs the
> instruction over and over.

...at least for this SoC, TF-A's BL31 fault handler loops over the same
instruction forever, hanging the AP...

Regards,
Angelo

> 
> Mark the DSU PMU node in the Devicetree with status "fail", as the
> machine doesn't have a suitable firmware to make use of it from the
> kernel, and allowing its driver to probe would hang the kernel.
> 
> Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board")
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> 
> ---
> 
>   arch/arm64/boot/dts/mediatek/mt8195.dtsi | 1 +
>   1 file changed, 1 insertion(+)
> 
> diff --git a/arch/arm64/boot/dts/mediatek/mt8195.dtsi b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> index 5c670fce1e47..0705d9c3a6a7 100644
> --- a/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> +++ b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
> @@ -313,6 +313,7 @@ dsu-pmu {
>   		interrupts = <GIC_SPI 18 IRQ_TYPE_LEVEL_HIGH 0>;
>   		cpus = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>,
>   		       <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
> +		status = "fail";
>   	};
>   
>   	dmic_codec: dmic-codec {
Nícolas F. R. A. Prado July 21, 2023, 3:54 p.m. UTC | #2
On Fri, Jul 21, 2023 at 10:16:44AM +0200, AngeloGioacchino Del Regno wrote:
> Il 20/07/23 22:07, Nícolas F. R. A. Prado ha scritto:
> > The DSU PMU allows monitoring performance events in the DSU cluster,
> > which is done by configuring and reading back values from the DSU PMU
> > system registers. However, for write-access to be allowed by ELs lower
> > than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
> > register, as it is disallowed by default.
> 
> Typo: ACTLR_EL2, ACTLR_EL3 bit 12 must be set if SCR.NS is 1;
> ACTLR_EL3 bit 12 must be set if SCR.NS is 0.
> 
> On MT8195 Chromebooks, SCR.NS is 1 - hence ACTLR_EL2/EL3 must have BIT(12) set,
> but at least ACTLR_EL2 doesn't have it set.
> 
> I haven't verified EL3, but that doesn't matter, since both need to be set.

The kernel is running at EL2 (as I verified from CurrentEL), so only ACTLR_EL3
needs to be set. ACTLR_EL2 controls whether EL1 can write to the register (in
non-secure mode) [1], which doesn't matter in this case.

[1] https://developer.arm.com/documentation/101430/r1p2/Register-descriptions/AArch64-system-registers/ACTLR-EL2--Auxiliary-Control-Register--EL2

Thanks,
Nícolas
Nícolas F. R. A. Prado Aug. 10, 2023, 10:12 p.m. UTC | #3
On Thu, Jul 20, 2023 at 04:07:51PM -0400, Nícolas F. R. A. Prado wrote:
> The DSU PMU allows monitoring performance events in the DSU cluster,
> which is done by configuring and reading back values from the DSU PMU
> system registers. However, for write-access to be allowed by ELs lower
> than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
> register, as it is disallowed by default.
> 
> That configuration is not done on the firmware used by the MT8195 SoC,
> as a consequence, booting a MT8195-based machine like
> mt8195-cherry-tomato-r2 with CONFIG_ARM_DSU_PMU enabled hangs the kernel
> just as it writes to the CLUSTERPMOVSCLR_EL1 register, since the
> instruction faults to EL3, and BL31 apparently just re-runs the
> instruction over and over.
> 
> Mark the DSU PMU node in the Devicetree with status "fail", as the
> machine doesn't have a suitable firmware to make use of it from the
> kernel, and allowing its driver to probe would hang the kernel.
> 
> Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board")
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>

Hi Matthias,

gentle ping on this patch, as it's not possible to boot MT8195 Chromebooks with
the mainline defconfig without this fix.

Thanks,
Nícolas
Macpaul Lin Aug. 24, 2023, 11:58 a.m. UTC | #4
On 8/11/23 06:12, Nícolas F. R. A. Prado wrote:
> On Thu, Jul 20, 2023 at 04:07:51PM -0400, Nícolas F. R. A. Prado wrote:
>> The DSU PMU allows monitoring performance events in the DSU cluster,
>> which is done by configuring and reading back values from the DSU PMU
>> system registers. However, for write-access to be allowed by ELs lower
>> than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
>> register, as it is disallowed by default.
>>
>> That configuration is not done on the firmware used by the MT8195 SoC,
>> as a consequence, booting a MT8195-based machine like
>> mt8195-cherry-tomato-r2 with CONFIG_ARM_DSU_PMU enabled hangs the kernel
>> just as it writes to the CLUSTERPMOVSCLR_EL1 register, since the
>> instruction faults to EL3, and BL31 apparently just re-runs the
>> instruction over and over.
>>
>> Mark the DSU PMU node in the Devicetree with status "fail", as the
>> machine doesn't have a suitable firmware to make use of it from the
>> kernel, and allowing its driver to probe would hang the kernel.
>>
>> Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board")
>> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> 
> Hi Matthias,
> 
> gentle ping on this patch, as it's not possible to boot MT8195 Chromebooks with
> the mainline defconfig without this fix.

I've encountered this issue for a long time since CONFIG_ARM_DSU_PMU has 
been enabled by this patch.
075ed7b9e408 arm64: configs: Enable all PMUs provided by Arm

I'm working on mt8195-demo board, I guess this board has the same issue 
as mt8195-cherry-tomato-r2. Here's the log.

[   22.996825] rcu: INFO: rcu_preempt detected stalls on CPUs/tasks:
[   22.997609] rcu:     (detected by 1, t=5254 jiffies, g=-603, q=3023 
ncpus=8)
[   22.998468] rcu: All QSes seen, last rcu_preempt kthread activity 
5252 (4294898045-4294892793), jiffies_till_next_fqs=1, root ->qsmask 0x0
[   23.000036] rcu: rcu_preempt kthread timer wakeup didn't happen for 
5251 jiffies! g-603 f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200
[   23.001462] rcu:     Possible timer handling issue on cpu=2 
timer-softirq=47
[   23.002319] rcu: rcu_preempt kthread starved for 5252 jiffies! g-603 
f0x2 RCU_GP_WAIT_FQS(5) ->state=0x200 ->cpu=2
[   23.003625] rcu:     Unless rcu_preempt kthread gets sufficient CPU 
time, OOM is now expected behavior.
[   23.004776] rcu: RCU grace-period kthread stack dump:
[   23.005414] task:rcu_preempt     state:R stack:0     pid:15    ppid:2 
      flags:0x00000008
[   23.006474] Call trace:
[   23.006788]  __switch_to+0xe4/0x15c
[   23.007240]  __schedule+0x2bc/0xaa0
[   23.007685]  schedule+0x5c/0xc4
[   23.008087]  schedule_timeout+0x80/0xf4
[   23.008578]  rcu_gp_fqs_loop+0x124/0x3d4
[   23.009081]  rcu_gp_kthread+0x124/0x160
[   23.009571]  kthread+0x118/0x11c
[   23.009985]  ret_from_fork+0x10/0x20

I have a work around to enable DSU PMU in firmware (trusted-firmware-a) 
to solve this hang problem.
However, I think this is not the correct place to put these codes to 
enable DSU PMU in trusted-firmware-a.

--- a/include/arch/aarch64/el3_common_macros.S
+++ b/include/arch/aarch64/el3_common_macros.S
@@ -39,6 +39,19 @@
         msr     sctlr_el3, x0
         isb

+       /* enable DSU PMU */
+       mov     x1, #(1 << 12)
+       mrs     x0, actlr_el3
+       orr     x0, x0, x1
+       msr     actlr_el3, x0
+       isb
+
+       mov     x1, #(1 << 12)
+       mrs     x0, actlr_el2
+       orr     x0, x0, x1
+       msr     actlr_el2, x0
+       isb
+
  #ifdef IMAGE_BL31
         /* 
---------------------------------------------------------------------
          * Initialise the per-cpu cache pointer to the CPU.

If I put these codes in other platform dependent files to enable DSU PMU 
instead of the common code beginning of the EL3, it just not work.

It should be able to fixed in firmware in platform dependent files, but 
I'm not familiar with how actlr_el3 and actlr_el2 should be accessed. 
Otherwise, the DSU PMU node in dts should be disabled. Any idea is welcome.

Thanks
Macpaul Lin
AngeloGioacchino Del Regno Oct. 3, 2023, 9:24 a.m. UTC | #5
Il 20/07/23 22:07, Nícolas F. R. A. Prado ha scritto:
> The DSU PMU allows monitoring performance events in the DSU cluster,
> which is done by configuring and reading back values from the DSU PMU
> system registers. However, for write-access to be allowed by ELs lower
> than EL3, the EL3 firmware needs to update the setting on the ACTLR3_EL3
> register, as it is disallowed by default.
> 
> That configuration is not done on the firmware used by the MT8195 SoC,
> as a consequence, booting a MT8195-based machine like
> mt8195-cherry-tomato-r2 with CONFIG_ARM_DSU_PMU enabled hangs the kernel
> just as it writes to the CLUSTERPMOVSCLR_EL1 register, since the
> instruction faults to EL3, and BL31 apparently just re-runs the
> instruction over and over.
> 
> Mark the DSU PMU node in the Devicetree with status "fail", as the
> machine doesn't have a suitable firmware to make use of it from the
> kernel, and allowing its driver to probe would hang the kernel.
> 
> Fixes: 37f2582883be ("arm64: dts: Add mediatek SoC mt8195 and evaluation board")
> Signed-off-by: Nícolas F. R. A. Prado <nfraprado@collabora.com>
> 

Reviewed-by: AngeloGioacchino Del Regno <angelogioacchino.delregno@collabora.com>
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/mediatek/mt8195.dtsi b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
index 5c670fce1e47..0705d9c3a6a7 100644
--- a/arch/arm64/boot/dts/mediatek/mt8195.dtsi
+++ b/arch/arm64/boot/dts/mediatek/mt8195.dtsi
@@ -313,6 +313,7 @@  dsu-pmu {
 		interrupts = <GIC_SPI 18 IRQ_TYPE_LEVEL_HIGH 0>;
 		cpus = <&cpu0>, <&cpu1>, <&cpu2>, <&cpu3>,
 		       <&cpu4>, <&cpu5>, <&cpu6>, <&cpu7>;
+		status = "fail";
 	};
 
 	dmic_codec: dmic-codec {