Message ID | 20241025031257.6284-2-c@jia.je (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | arm64: dts: qcom: x1e80100: Add performance hint for boost clock | expand |
On Fri, 25 Oct 2024 04:12:58 +0100, Jiajie Chen <c@jia.je> wrote: > > The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one > core in the second cluster (cores 4-7) and the other in the third > cluster (cores 8-11). However, the scheduler is currently unaware of > this, leading to scenarios where a single core benchmark might run at > 3.4 GHz when scheduled to the first cluster. > > This patch introduces capacity-dmips-mhz nodes to each CPU node in the > DTS. For cores numbered 4 and 8, the capacities are set to 1200, while > others are set to 1024. This ensures that the two cores can be > prioritized for scheduling. The value 1200 is derived from approximately > `1024/3.4*4.0`. > > Note that capacity-dmips-mhz is not ideally suited for this purpose, as > it was designed to differentiate between performance and efficient > cores, not for core boosting. According to its definition, DMIPS/MHz > actually decreases with higher frequencies. However, since the CPU does > not support AMU, and no elegant solution was found, this approach is > used as a workaround. Are you sure? [ 0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11 So activity monitors are available. Not that what you have here is not useful, but this comment seems a bit... surprising. Thanks, M.
On 2024/10/25 15:58, Marc Zyngier wrote: > On Fri, 25 Oct 2024 04:12:58 +0100, > Jiajie Chen <c@jia.je> wrote: >> The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one >> core in the second cluster (cores 4-7) and the other in the third >> cluster (cores 8-11). However, the scheduler is currently unaware of >> this, leading to scenarios where a single core benchmark might run at >> 3.4 GHz when scheduled to the first cluster. >> >> This patch introduces capacity-dmips-mhz nodes to each CPU node in the >> DTS. For cores numbered 4 and 8, the capacities are set to 1200, while >> others are set to 1024. This ensures that the two cores can be >> prioritized for scheduling. The value 1200 is derived from approximately >> `1024/3.4*4.0`. >> >> Note that capacity-dmips-mhz is not ideally suited for this purpose, as >> it was designed to differentiate between performance and efficient >> cores, not for core boosting. According to its definition, DMIPS/MHz >> actually decreases with higher frequencies. However, since the CPU does >> not support AMU, and no elegant solution was found, this approach is >> used as a workaround. > Are you sure? > > [ 0.570323] CPU features: detected: Activity Monitors Unit (AMU) on CPU0-11 > > So activity monitors are available. Not that what you have here is not > useful, but this comment seems a bit... surprising. Sorry for the false claim, I was looking for AMU at /proc/cpuinfo, which is not there. But it did not help the scheduling somehow. Let me have a look at it. Best regards, Jiajie Chen > > Thanks, > > M. >
On Fri, Oct 25, 2024 at 11:12:58AM +0800, Jiajie Chen wrote: > The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one > core in the second cluster (cores 4-7) and the other in the third > cluster (cores 8-11). However, the scheduler is currently unaware of > this, leading to scenarios where a single core benchmark might run at > 3.4 GHz when scheduled to the first cluster. > > This patch introduces capacity-dmips-mhz nodes to each CPU node in the > DTS. For cores numbered 4 and 8, the capacities are set to 1200, while > others are set to 1024. This ensures that the two cores can be > prioritized for scheduling. The value 1200 is derived from approximately > `1024/3.4*4.0`. > > Note that capacity-dmips-mhz is not ideally suited for this purpose, as > it was designed to differentiate between performance and efficient > cores, not for core boosting. According to its definition, DMIPS/MHz > actually decreases with higher frequencies. However, since the CPU does > not support AMU, and no elegant solution was found, this approach is > used as a workaround. > > With this patch, we observe two cores running at full 4.0 GHz without > core binding. The single core score of Geekbench 6 increases from 2452 > to 2892, both without core binding. Tested on Surface Laptop 7. I think this is a nice hack, but I'd prefer to see scheduler being improved instead. From my (ignorant) point of view this should be close to SMT-based scheduling. We should split the jobs between the clusters, if that provides better power utilisation. > > Signed-off-by: Jiajie Chen <c@jia.je> > --- > arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++ > 1 file changed, 12 insertions(+) >
diff --git a/arch/arm64/boot/dts/qcom/x1e80100.dtsi b/arch/arm64/boot/dts/qcom/x1e80100.dtsi index cd732ef88cd8..c9c559d956c2 100644 --- a/arch/arm64/boot/dts/qcom/x1e80100.dtsi +++ b/arch/arm64/boot/dts/qcom/x1e80100.dtsi @@ -69,6 +69,7 @@ CPU0: cpu@0 { compatible = "qcom,oryon"; reg = <0x0 0x0>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_0>; power-domains = <&CPU_PD0>; power-domain-names = "psci"; @@ -86,6 +87,7 @@ CPU1: cpu@100 { compatible = "qcom,oryon"; reg = <0x0 0x100>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_0>; power-domains = <&CPU_PD1>; power-domain-names = "psci"; @@ -97,6 +99,7 @@ CPU2: cpu@200 { compatible = "qcom,oryon"; reg = <0x0 0x200>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_0>; power-domains = <&CPU_PD2>; power-domain-names = "psci"; @@ -108,6 +111,7 @@ CPU3: cpu@300 { compatible = "qcom,oryon"; reg = <0x0 0x300>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_0>; power-domains = <&CPU_PD3>; power-domain-names = "psci"; @@ -119,6 +123,7 @@ CPU4: cpu@10000 { compatible = "qcom,oryon"; reg = <0x0 0x10000>; enable-method = "psci"; + capacity-dmips-mhz = <1200>; next-level-cache = <&L2_1>; power-domains = <&CPU_PD4>; power-domain-names = "psci"; @@ -136,6 +141,7 @@ CPU5: cpu@10100 { compatible = "qcom,oryon"; reg = <0x0 0x10100>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_1>; power-domains = <&CPU_PD5>; power-domain-names = "psci"; @@ -147,6 +153,7 @@ CPU6: cpu@10200 { compatible = "qcom,oryon"; reg = <0x0 0x10200>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_1>; power-domains = <&CPU_PD6>; power-domain-names = "psci"; @@ -158,6 +165,7 @@ CPU7: cpu@10300 { compatible = "qcom,oryon"; reg = <0x0 0x10300>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_1>; power-domains = <&CPU_PD7>; power-domain-names = "psci"; @@ -169,6 +177,7 @@ CPU8: cpu@20000 { compatible = "qcom,oryon"; reg = <0x0 0x20000>; enable-method = "psci"; + capacity-dmips-mhz = <1200>; next-level-cache = <&L2_2>; power-domains = <&CPU_PD8>; power-domain-names = "psci"; @@ -186,6 +195,7 @@ CPU9: cpu@20100 { compatible = "qcom,oryon"; reg = <0x0 0x20100>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_2>; power-domains = <&CPU_PD9>; power-domain-names = "psci"; @@ -197,6 +207,7 @@ CPU10: cpu@20200 { compatible = "qcom,oryon"; reg = <0x0 0x20200>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_2>; power-domains = <&CPU_PD10>; power-domain-names = "psci"; @@ -208,6 +219,7 @@ CPU11: cpu@20300 { compatible = "qcom,oryon"; reg = <0x0 0x20300>; enable-method = "psci"; + capacity-dmips-mhz = <1024>; next-level-cache = <&L2_2>; power-domains = <&CPU_PD11>; power-domain-names = "psci";
The x1e80100 CPU can have up to two cores running at 4.0 GHz, with one core in the second cluster (cores 4-7) and the other in the third cluster (cores 8-11). However, the scheduler is currently unaware of this, leading to scenarios where a single core benchmark might run at 3.4 GHz when scheduled to the first cluster. This patch introduces capacity-dmips-mhz nodes to each CPU node in the DTS. For cores numbered 4 and 8, the capacities are set to 1200, while others are set to 1024. This ensures that the two cores can be prioritized for scheduling. The value 1200 is derived from approximately `1024/3.4*4.0`. Note that capacity-dmips-mhz is not ideally suited for this purpose, as it was designed to differentiate between performance and efficient cores, not for core boosting. According to its definition, DMIPS/MHz actually decreases with higher frequencies. However, since the CPU does not support AMU, and no elegant solution was found, this approach is used as a workaround. With this patch, we observe two cores running at full 4.0 GHz without core binding. The single core score of Geekbench 6 increases from 2452 to 2892, both without core binding. Tested on Surface Laptop 7. Signed-off-by: Jiajie Chen <c@jia.je> --- arch/arm64/boot/dts/qcom/x1e80100.dtsi | 12 ++++++++++++ 1 file changed, 12 insertions(+)