Message ID | 20240205171930.968-5-linux.amoon@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [PATCHv1,1/5] arm64: dts: amlogic: Add cache information to the Amlogic GXBB and GXL SoC | expand |
On 05/02/2024 18:19, Anand Moon wrote: > As per S922X datasheet add missing cache information to the Amlogic > S922X SoC. > > - Each Cortex-A53 core has 32 KB of instruction cache and > 32 KB of L1 data cache available. > - Each Cortex-A73 core has 64 KB of L1 instruction cache and > 64 KB of L1 data cache available. > - The little (A53) cluster has 512 KB of unified L2 cache available. > - The big (A73) cluster has 1 MB of unified L2 cache available. Datasheet says: The quad core Cortex™-A73 processor is paired with A53 processor in a big.Little configuration, with each core has L1 instruction and data chaches, together with a single shared L2 unified cache with A53 And there's no indication of the L1 or L2 cache sizes. Neil > > To improve system performance. > > Signed-off-by: Anand Moon <linux.amoon@gmail.com> > --- > [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf > [1] https://en.wikipedia.org/wiki/ARM_Cortex-A73 > [2] https://en.wikipedia.org/wiki/ARM_Cortex-A53 > --- > arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 62 ++++++++++++++++++--- > 1 file changed, 55 insertions(+), 7 deletions(-) > > diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi > index 86e6ceb31d5e..624c6fd763ac 100644 > --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi > +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi > @@ -49,7 +49,13 @@ cpu0: cpu@0 { > reg = <0x0 0x0>; > enable-method = "psci"; > capacity-dmips-mhz = <592>; > - next-level-cache = <&l2>; > + d-cache-line-size = <32>; > + d-cache-size = <0x8000>; > + d-cache-sets = <32>; > + i-cache-line-size = <32>; > + i-cache-size = <0x8000>; > + i-cache-sets = <32>; > + next-level-cache = <&l2_cache_l>; > #cooling-cells = <2>; > }; > > @@ -59,7 +65,13 @@ cpu1: cpu@1 { > reg = <0x0 0x1>; > enable-method = "psci"; > capacity-dmips-mhz = <592>; > - next-level-cache = <&l2>; > + d-cache-line-size = <32>; > + d-cache-size = <0x8000>; > + d-cache-sets = <32>; > + i-cache-line-size = <32>; > + i-cache-size = <0x8000>; > + i-cache-sets = <32>; > + next-level-cache = <&l2_cache_l>; > #cooling-cells = <2>; > }; > > @@ -69,7 +81,13 @@ cpu100: cpu@100 { > reg = <0x0 0x100>; > enable-method = "psci"; > capacity-dmips-mhz = <1024>; > - next-level-cache = <&l2>; > + d-cache-line-size = <64>; > + d-cache-size = <0x10000>; > + d-cache-sets = <64>; > + i-cache-line-size = <64>; > + i-cache-size = <0x10000>; > + i-cache-sets = <64>; > + next-level-cache = <&l2_cache_b>; > #cooling-cells = <2>; > }; > > @@ -79,7 +97,13 @@ cpu101: cpu@101 { > reg = <0x0 0x101>; > enable-method = "psci"; > capacity-dmips-mhz = <1024>; > - next-level-cache = <&l2>; > + d-cache-line-size = <64>; > + d-cache-size = <0x10000>; > + d-cache-sets = <64>; > + i-cache-line-size = <64>; > + i-cache-size = <0x10000>; > + i-cache-sets = <64>; > + next-level-cache = <&l2_cache_b>; > #cooling-cells = <2>; > }; > > @@ -89,7 +113,13 @@ cpu102: cpu@102 { > reg = <0x0 0x102>; > enable-method = "psci"; > capacity-dmips-mhz = <1024>; > - next-level-cache = <&l2>; > + d-cache-line-size = <64>; > + d-cache-size = <0x10000>; > + d-cache-sets = <64>; > + i-cache-line-size = <64>; > + i-cache-size = <0x10000>; > + i-cache-sets = <64>; > + next-level-cache = <&l2_cache_b>; > #cooling-cells = <2>; > }; > > @@ -99,14 +129,32 @@ cpu103: cpu@103 { > reg = <0x0 0x103>; > enable-method = "psci"; > capacity-dmips-mhz = <1024>; > - next-level-cache = <&l2>; > + d-cache-line-size = <64>; > + d-cache-size = <0x10000>; > + d-cache-sets = <64>; > + i-cache-line-size = <64>; > + i-cache-size = <0x10000>; > + i-cache-sets = <64>; > + next-level-cache = <&l2_cache_b>; > #cooling-cells = <2>; > }; > > - l2: l2-cache0 { > + l2_cache_l: l2-cache-cluster0 { > compatible = "cache"; > cache-level = <2>; > cache-unified; > + cache-size = <0x80000>; > + cache-line-size = <64>; > + cache-sets = <512>; > + }; > + > + l2_cache_b: l2-cache-cluster1 { > + compatible = "cache"; > + cache-level = <2>; > + cache-unified; > + cache-size = <0x100000>; > + cache-line-size = <64>; > + cache-sets = <512>; > }; > }; > };
Hi Neil, On Tue, 6 Feb 2024 at 14:30, Neil Armstrong <neil.armstrong@linaro.org> wrote: > > On 05/02/2024 18:19, Anand Moon wrote: > > As per S922X datasheet add missing cache information to the Amlogic > > S922X SoC. > > > > - Each Cortex-A53 core has 32 KB of instruction cache and > > 32 KB of L1 data cache available. > > - Each Cortex-A73 core has 64 KB of L1 instruction cache and > > 64 KB of L1 data cache available. > > - The little (A53) cluster has 512 KB of unified L2 cache available. > > - The big (A73) cluster has 1 MB of unified L2 cache available. > > Datasheet says: > The quad core Cortex™-A73 processor is paired with A53 processor in a big.Little configuration, with each > core has L1 instruction and data chaches, together with a single shared L2 unified cache with A53 > Ok, Since all the Cortex™-A73 and Cortex™-A53 share some improvements in the architecture with some improvements in cache features hence I update the changes accordingly. Also, I checked this in the ARM documentation earlier on this. > And there's no indication of the L1 or L2 cache sizes. What I feel is in general all the Cortex™-A73 and Cortex™-A53 supports L1 and L2 cache size since it is part of the core features. but I opted for these size values from a Wikipedia article. On my Odroid N2+, I observe the following. I have also done some testing on the stress-ng to verify this. alarm@archl-on2:~$ lscpu Architecture: aarch64 CPU op-mode(s): 32-bit, 64-bit Byte Order: Little Endian CPU(s): 6 On-line CPU(s) list: 0-5 Vendor ID: ARM Model name: Cortex-A53 Model: 4 Thread(s) per core: 1 Core(s) per socket: 2 Socket(s): 1 Stepping: r0p4 CPU(s) scaling MHz: 100% CPU max MHz: 1800.0000 CPU min MHz: 1000.0000 BogoMIPS: 48.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid Model name: Cortex-A73 Model: 2 Thread(s) per core: 1 Core(s) per socket: 4 Socket(s): 1 Stepping: r0p2 CPU(s) scaling MHz: 63% CPU max MHz: 2208.0000 CPU min MHz: 1000.0000 BogoMIPS: 48.00 Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid Caches (sum of all): L1d: 320 KiB (6 instances) L1i: 320 KiB (6 instances) L2: 1.5 MiB (2 instances) NUMA: NUMA node(s): 1 NUMA node0 CPU(s): 0-5 Vulnerabilities: Gather data sampling: Not affected Itlb multihit: Not affected L1tf: Not affected Mds: Not affected Meltdown: Not affected Mmio stale data: Not affected Retbleed: Not affected Spec rstack overflow: Not affected Spec store bypass: Vulnerable Spectre v1: Mitigation; __user pointer sanitization Spectre v2: Vulnerable Srbds: Not affected Tsx async abort: Not affected alarm@archl-on2:~$ alarm@archl-on2:~$ lstopo-no-graphics Machine (3659MB total) Package L#0 NUMANode L#0 (P#0 3659MB) L2 L#0 (512KB) L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) L2 L#1 (1024KB) L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2) L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3) L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4) L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5) Block "mmcblk1boot0" Block "mmcblk1boot1" Block "mmcblk1" Net "eth0" > > Neil > Thanks -Anand
On 06/02/2024 11:15, Anand Moon wrote: > Hi Neil, > > On Tue, 6 Feb 2024 at 14:30, Neil Armstrong <neil.armstrong@linaro.org> wrote: >> >> On 05/02/2024 18:19, Anand Moon wrote: >>> As per S922X datasheet add missing cache information to the Amlogic >>> S922X SoC. >>> >>> - Each Cortex-A53 core has 32 KB of instruction cache and >>> 32 KB of L1 data cache available. >>> - Each Cortex-A73 core has 64 KB of L1 instruction cache and >>> 64 KB of L1 data cache available. >>> - The little (A53) cluster has 512 KB of unified L2 cache available. >>> - The big (A73) cluster has 1 MB of unified L2 cache available. >> >> Datasheet says: >> The quad core Cortex™-A73 processor is paired with A53 processor in a big.Little configuration, with each >> core has L1 instruction and data chaches, together with a single shared L2 unified cache with A53 >> > Ok, > > Since all the Cortex™-A73 and Cortex™-A53 share some improvements in > the architecture with some improvements in cache features > hence I update the changes accordingly. > Also, I checked this in the ARM documentation earlier on this. I don't understand, Amlogic states it's a shared L2 cache, but you trust the ARM documentation instead ??? > >> And there's no indication of the L1 or L2 cache sizes. > > What I feel is in general all the Cortex™-A73 and Cortex™-A53 supports > L1 and L2 cache size since it is part of the core features. > but I opted for these size values from a Wikipedia article. > > On my Odroid N2+, I observe the following. > > I have also done some testing on the stress-ng to verify this. Ok I don't feel confident adding numbers that comes out of thin air, and even more since they are only shared to userspace. I think we should only add the numbers which are 100% sure > > alarm@archl-on2:~$ lscpu > Architecture: aarch64 > CPU op-mode(s): 32-bit, 64-bit > Byte Order: Little Endian > CPU(s): 6 > On-line CPU(s) list: 0-5 > Vendor ID: ARM > Model name: Cortex-A53 > Model: 4 > Thread(s) per core: 1 > Core(s) per socket: 2 > Socket(s): 1 > Stepping: r0p4 > CPU(s) scaling MHz: 100% > CPU max MHz: 1800.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 48.00 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid > Model name: Cortex-A73 > Model: 2 > Thread(s) per core: 1 > Core(s) per socket: 4 > Socket(s): 1 > Stepping: r0p2 > CPU(s) scaling MHz: 63% > CPU max MHz: 2208.0000 > CPU min MHz: 1000.0000 > BogoMIPS: 48.00 > Flags: fp asimd evtstrm aes pmull sha1 sha2 crc32 cpuid > Caches (sum of all): > L1d: 320 KiB (6 instances) > L1i: 320 KiB (6 instances) > L2: 1.5 MiB (2 instances) > NUMA: > NUMA node(s): 1 > NUMA node0 CPU(s): 0-5 > Vulnerabilities: > Gather data sampling: Not affected > Itlb multihit: Not affected > L1tf: Not affected > Mds: Not affected > Meltdown: Not affected > Mmio stale data: Not affected > Retbleed: Not affected > Spec rstack overflow: Not affected > Spec store bypass: Vulnerable > Spectre v1: Mitigation; __user pointer sanitization > Spectre v2: Vulnerable > Srbds: Not affected > Tsx async abort: Not affected > alarm@archl-on2:~$ > > alarm@archl-on2:~$ lstopo-no-graphics > Machine (3659MB total) > Package L#0 > NUMANode L#0 (P#0 3659MB) > L2 L#0 (512KB) > L1d L#0 (32KB) + L1i L#0 (32KB) + Core L#0 + PU L#0 (P#0) > L1d L#1 (32KB) + L1i L#1 (32KB) + Core L#1 + PU L#1 (P#1) > L2 L#1 (1024KB) > L1d L#2 (64KB) + L1i L#2 (64KB) + Core L#2 + PU L#2 (P#2) > L1d L#3 (64KB) + L1i L#3 (64KB) + Core L#3 + PU L#3 (P#3) > L1d L#4 (64KB) + L1i L#4 (64KB) + Core L#4 + PU L#4 (P#4) > L1d L#5 (64KB) + L1i L#5 (64KB) + Core L#5 + PU L#5 (P#5) > Block "mmcblk1boot0" > Block "mmcblk1boot1" > Block "mmcblk1" > Net "eth0" This looks pretty, but let's keep exporting verified data. > > > >> Neil >> > > Thanks > -Anand
Hi Niel, On Tue, 6 Feb 2024 at 20:31, <neil.armstrong@linaro.org> wrote: > > On 06/02/2024 11:15, Anand Moon wrote: > > Hi Neil, > > > > On Tue, 6 Feb 2024 at 14:30, Neil Armstrong <neil.armstrong@linaro.org> wrote: > >> > >> On 05/02/2024 18:19, Anand Moon wrote: > >>> As per S922X datasheet add missing cache information to the Amlogic > >>> S922X SoC. > >>> > >>> - Each Cortex-A53 core has 32 KB of instruction cache and > >>> 32 KB of L1 data cache available. > >>> - Each Cortex-A73 core has 64 KB of L1 instruction cache and > >>> 64 KB of L1 data cache available. > >>> - The little (A53) cluster has 512 KB of unified L2 cache available. > >>> - The big (A73) cluster has 1 MB of unified L2 cache available. > >> > >> Datasheet says: > >> The quad core Cortex™-A73 processor is paired with A53 processor in a big.Little configuration, with each > >> core has L1 instruction and data chaches, together with a single shared L2 unified cache with A53 > >> > > Ok, > > > > Since all the Cortex™-A73 and Cortex™-A53 share some improvements in > > the architecture with some improvements in cache features > > hence I update the changes accordingly. > > Also, I checked this in the ARM documentation earlier on this. > > I don't understand, Amlogic states it's a shared L2 cache, but you trust > the ARM documentation instead ??? Yes please find the Cortex™-A73 TRM L1 Cache https://developer.arm.com/documentation/100048/0002/level-1-memory-system/about-the-l1-memory-system?lang=en L2 Cache https://developer.arm.com/documentation/100048/0002/level-2-memory-system/about-the-l2-memory-system?lang=en > > > > >> And there's no indication of the L1 or L2 cache sizes. > > > > What I feel is in general all the Cortex™-A73 and Cortex™-A53 supports > > L1 and L2 cache size since it is part of the core features. > > but I opted for these size values from a Wikipedia article. > > > > On my Odroid N2+, I observe the following. > > > > I have also done some testing on the stress-ng to verify this. > > > Ok I don't feel confident adding numbers that comes out of thin air, > and even more since they are only shared to userspace. > > I think we should only add the numbers which are 100% sure Best way to let the Amlogic SoC members comment on the CPU L1/ / L2 cache size. But with the lack of pref PMU events we cannot test this feature. > > > This looks pretty, but let's keep exporting verified data. > This CPU hardware supports cache this feature, but with missing PMU for this cpu so its not getting listed hardware events like cache-misses cache-references alarm@archl-on2:~$ sudo perf list [sudo] password for alarm: List of pre-defined events (to be used in -e or -M): alignment-faults [Software event] bpf-output [Software event] cgroup-switches [Software event] context-switches OR cs [Software event] cpu-clock [Software event] cpu-migrations OR migrations [Software event] dummy [Software event] emulation-faults [Software event] major-faults [Software event] minor-faults [Software event] page-faults OR faults [Software event] task-clock [Software event] duration_time [Tool event] user_time [Tool event] system_time [Tool event] meson_ddr_bw/chan_1_rw_bytes/ [Kernel PMU event] meson_ddr_bw/chan_2_rw_bytes/ [Kernel PMU event] meson_ddr_bw/chan_3_rw_bytes/ [Kernel PMU event] meson_ddr_bw/chan_4_rw_bytes/ [Kernel PMU event] meson_ddr_bw/total_rw_bytes/ [Kernel PMU event] rNNN [Raw hardware event descriptor] cpu/t1=v1[,t2=v2,t3 ...]/modifier [Raw hardware event descriptor] [(see 'man perf-list' on how to encode it)] mem:<addr>[/len][:access] [Hardware breakpoint] alarmtimer:alarmtimer_cancel [Tracepoint event] alarmtimer:alarmtimer_fired [Tracepoint event] alarmtimer:alarmtimer_start [Tracepoint event] alarmtimer:alarmtimer_suspend [Tracepoint event] asoc:snd_soc_bias_level_done [Tracepoint event] asoc:snd_soc_bias_level_start [Tracepoint event] asoc:snd_soc_dapm_connected [Tracepoint event] asoc:snd_soc_dapm_done [Tracepoint event] asoc:snd_soc_dapm_path [Tracepoint event] asoc:snd_soc_dapm_start [Tracepoint event] asoc:snd_soc_dapm_walk_done [Tracepoint event] asoc:snd_soc_dapm_widget_event_done [Tracepoint event] asoc:snd_soc_dapm_widget_event_start [Tracepoint event] asoc:snd_soc_dapm_widget_power [Tracepoint event] asoc:snd_soc_jack_irq [Tracepoint event] asoc:snd_soc_jack_notify [Tracepoint event] asoc:snd_soc_jack_report [Tracepoint event] binder:binder_alloc_lru_end [Tracepoint event] binder:binder_alloc_lru_start [Tracepoint event] binder:binder_alloc_page_end [Tracepoint event] binder:binder_alloc_page_start [Tracepoint event] binder:binder_command [Tracepoint event] binder:binder_free_lru_end [Tracepoint event] binder:binder_free_lru_start [Tracepoint event] binder:binder_ioctl [Tracepoint event] binder:binder_ioctl_done [Tracepoint event] binder:binder_lock [Tracepoint event] binder:binder_locked [Tracepoint event] binder:binder_read_done [Tracepoint event] binder:binder_return [Tracepoint event] binder:binder_transaction [Tracepoint event] binder:binder_transaction_alloc_buf [Tracepoint event] binder:binder_transaction_buffer_release [Tracepoint event] binder:binder_transaction_failed_buffer_release [Tracepoint event] binder:binder_transaction_fd_recv [Tracepoint event] [root@archl-on2 alarm]# perf stat -B -e cache-references,cache-misses,cycles,instructions,branches,faults,migrations sleep 5 Performance counter stats for 'sleep 5': <not supported> cache-references <not supported> cache-misses <not supported> cycles <not supported> instructions <not supported> branches 56 faults 0 migrations 5.003404106 seconds time elapsed 0.003396000 seconds user 0.000000000 seconds sys Thanks -Anand
diff --git a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi index 86e6ceb31d5e..624c6fd763ac 100644 --- a/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi +++ b/arch/arm64/boot/dts/amlogic/meson-g12b.dtsi @@ -49,7 +49,13 @@ cpu0: cpu@0 { reg = <0x0 0x0>; enable-method = "psci"; capacity-dmips-mhz = <592>; - next-level-cache = <&l2>; + d-cache-line-size = <32>; + d-cache-size = <0x8000>; + d-cache-sets = <32>; + i-cache-line-size = <32>; + i-cache-size = <0x8000>; + i-cache-sets = <32>; + next-level-cache = <&l2_cache_l>; #cooling-cells = <2>; }; @@ -59,7 +65,13 @@ cpu1: cpu@1 { reg = <0x0 0x1>; enable-method = "psci"; capacity-dmips-mhz = <592>; - next-level-cache = <&l2>; + d-cache-line-size = <32>; + d-cache-size = <0x8000>; + d-cache-sets = <32>; + i-cache-line-size = <32>; + i-cache-size = <0x8000>; + i-cache-sets = <32>; + next-level-cache = <&l2_cache_l>; #cooling-cells = <2>; }; @@ -69,7 +81,13 @@ cpu100: cpu@100 { reg = <0x0 0x100>; enable-method = "psci"; capacity-dmips-mhz = <1024>; - next-level-cache = <&l2>; + d-cache-line-size = <64>; + d-cache-size = <0x10000>; + d-cache-sets = <64>; + i-cache-line-size = <64>; + i-cache-size = <0x10000>; + i-cache-sets = <64>; + next-level-cache = <&l2_cache_b>; #cooling-cells = <2>; }; @@ -79,7 +97,13 @@ cpu101: cpu@101 { reg = <0x0 0x101>; enable-method = "psci"; capacity-dmips-mhz = <1024>; - next-level-cache = <&l2>; + d-cache-line-size = <64>; + d-cache-size = <0x10000>; + d-cache-sets = <64>; + i-cache-line-size = <64>; + i-cache-size = <0x10000>; + i-cache-sets = <64>; + next-level-cache = <&l2_cache_b>; #cooling-cells = <2>; }; @@ -89,7 +113,13 @@ cpu102: cpu@102 { reg = <0x0 0x102>; enable-method = "psci"; capacity-dmips-mhz = <1024>; - next-level-cache = <&l2>; + d-cache-line-size = <64>; + d-cache-size = <0x10000>; + d-cache-sets = <64>; + i-cache-line-size = <64>; + i-cache-size = <0x10000>; + i-cache-sets = <64>; + next-level-cache = <&l2_cache_b>; #cooling-cells = <2>; }; @@ -99,14 +129,32 @@ cpu103: cpu@103 { reg = <0x0 0x103>; enable-method = "psci"; capacity-dmips-mhz = <1024>; - next-level-cache = <&l2>; + d-cache-line-size = <64>; + d-cache-size = <0x10000>; + d-cache-sets = <64>; + i-cache-line-size = <64>; + i-cache-size = <0x10000>; + i-cache-sets = <64>; + next-level-cache = <&l2_cache_b>; #cooling-cells = <2>; }; - l2: l2-cache0 { + l2_cache_l: l2-cache-cluster0 { compatible = "cache"; cache-level = <2>; cache-unified; + cache-size = <0x80000>; + cache-line-size = <64>; + cache-sets = <512>; + }; + + l2_cache_b: l2-cache-cluster1 { + compatible = "cache"; + cache-level = <2>; + cache-unified; + cache-size = <0x100000>; + cache-line-size = <64>; + cache-sets = <512>; }; }; };
As per S922X datasheet add missing cache information to the Amlogic S922X SoC. - Each Cortex-A53 core has 32 KB of instruction cache and 32 KB of L1 data cache available. - Each Cortex-A73 core has 64 KB of L1 instruction cache and 64 KB of L1 data cache available. - The little (A53) cluster has 512 KB of unified L2 cache available. - The big (A73) cluster has 1 MB of unified L2 cache available. To improve system performance. Signed-off-by: Anand Moon <linux.amoon@gmail.com> --- [0] https://dn.odroid.com/S922X/ODROID-N2/Datasheet/S922X_Public_Datasheet_V0.2.pdf [1] https://en.wikipedia.org/wiki/ARM_Cortex-A73 [2] https://en.wikipedia.org/wiki/ARM_Cortex-A53 --- arch/arm64/boot/dts/amlogic/meson-g12b.dtsi | 62 ++++++++++++++++++--- 1 file changed, 55 insertions(+), 7 deletions(-)