Message ID | 49abb93000078c692c48c0a65ff677893909361a.1714304071.git.dsimic@manjaro.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 | expand |
Dne nedelja, 28. april 2024 ob 13:40:36 GMT +2 je Dragan Simic napisal(a): > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > the userspace, which includes lscpu(1) that uses the virtual files provided > by the kernel under the /sys/devices/system/cpu directory, to display the > proper H6 cache information. > > Adding the cache information to the H6 SoC dtsi also makes the following > warning message in the kernel log go away: > > cacheinfo: Unable to detect cache hierarchy for CPU 0 > > The cache parameters for the H6 dtsi were obtained and partially derived > by hand from the cache size and layout specifications found in the following > datasheets and technical reference manuals: > > - Allwinner H6 V200 datasheet, version 1.1 > - ARM Cortex-A53 revision r0p3 TRM, version E > > For future reference, here's a brief summary of the documentation: > > - All caches employ the 64-byte cache line length > - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction > cache and 32 KB of L1 4-way, set-associative data cache > - The entire SoC has 512 KB of unified L2 16-way, set-associative cache > > Signed-off-by: Dragan Simic <dsimic@manjaro.org> Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com> Best regards, Jernej
On Sun, 28 Apr 2024 13:40:36 +0200 Dragan Simic <dsimic@manjaro.org> wrote: > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > the userspace, which includes lscpu(1) that uses the virtual files provided > by the kernel under the /sys/devices/system/cpu directory, to display the > proper H6 cache information. > > Adding the cache information to the H6 SoC dtsi also makes the following > warning message in the kernel log go away: > > cacheinfo: Unable to detect cache hierarchy for CPU 0 > > The cache parameters for the H6 dtsi were obtained and partially derived > by hand from the cache size and layout specifications found in the following > datasheets and technical reference manuals: > > - Allwinner H6 V200 datasheet, version 1.1 > - ARM Cortex-A53 revision r0p3 TRM, version E > > For future reference, here's a brief summary of the documentation: > > - All caches employ the 64-byte cache line length > - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction > cache and 32 KB of L1 4-way, set-associative data cache > - The entire SoC has 512 KB of unified L2 16-way, set-associative cache > > Signed-off-by: Dragan Simic <dsimic@manjaro.org> I can confirm that the data below matches the manuals, but also the decoding of the architectural cache type registers (CCSIDR_EL1): L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line tinymembench results for the H6 are available here: https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt and confirm the theory. Also ran it locally with similar results. Reviewed-by: Andre Przywara <andre.przywara@arm.com> Thanks, Andre > --- > arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++ > 1 file changed, 37 insertions(+) > > diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi > index d11e5041bae9..1a63066396e8 100644 > --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi > +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi > @@ -29,36 +29,73 @@ cpu0: cpu@0 { > clocks = <&ccu CLK_CPUX>; > clock-latency-ns = <244144>; /* 8 32k periods */ > #cooling-cells = <2>; > + i-cache-size = <0x8000>; > + i-cache-line-size = <64>; > + i-cache-sets = <256>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + next-level-cache = <&l2_cache>; > }; > > cpu1: cpu@1 { > compatible = "arm,cortex-a53"; > device_type = "cpu"; > reg = <1>; > enable-method = "psci"; > clocks = <&ccu CLK_CPUX>; > clock-latency-ns = <244144>; /* 8 32k periods */ > #cooling-cells = <2>; > + i-cache-size = <0x8000>; > + i-cache-line-size = <64>; > + i-cache-sets = <256>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + next-level-cache = <&l2_cache>; > }; > > cpu2: cpu@2 { > compatible = "arm,cortex-a53"; > device_type = "cpu"; > reg = <2>; > enable-method = "psci"; > clocks = <&ccu CLK_CPUX>; > clock-latency-ns = <244144>; /* 8 32k periods */ > #cooling-cells = <2>; > + i-cache-size = <0x8000>; > + i-cache-line-size = <64>; > + i-cache-sets = <256>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + next-level-cache = <&l2_cache>; > }; > > cpu3: cpu@3 { > compatible = "arm,cortex-a53"; > device_type = "cpu"; > reg = <3>; > enable-method = "psci"; > clocks = <&ccu CLK_CPUX>; > clock-latency-ns = <244144>; /* 8 32k periods */ > #cooling-cells = <2>; > + i-cache-size = <0x8000>; > + i-cache-line-size = <64>; > + i-cache-sets = <256>; > + d-cache-size = <0x8000>; > + d-cache-line-size = <64>; > + d-cache-sets = <128>; > + next-level-cache = <&l2_cache>; > + }; > + > + l2_cache: l2-cache { > + compatible = "cache"; > + cache-level = <2>; > + cache-unified; > + cache-size = <0x80000>; > + cache-line-size = <64>; > + cache-sets = <512>; > }; > }; > >
Hello Andre, On 2024-04-30 01:10, Andre Przywara wrote: > On Sun, 28 Apr 2024 13:40:36 +0200 > Dragan Simic <dsimic@manjaro.org> wrote: > >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow >> the userspace, which includes lscpu(1) that uses the virtual files >> provided >> by the kernel under the /sys/devices/system/cpu directory, to display >> the >> proper H6 cache information. >> >> Adding the cache information to the H6 SoC dtsi also makes the >> following >> warning message in the kernel log go away: >> >> cacheinfo: Unable to detect cache hierarchy for CPU 0 >> >> The cache parameters for the H6 dtsi were obtained and partially >> derived >> by hand from the cache size and layout specifications found in the >> following >> datasheets and technical reference manuals: >> >> - Allwinner H6 V200 datasheet, version 1.1 >> - ARM Cortex-A53 revision r0p3 TRM, version E >> >> For future reference, here's a brief summary of the documentation: >> >> - All caches employ the 64-byte cache line length >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative >> instruction >> cache and 32 KB of L1 4-way, set-associative data cache >> - The entire SoC has 512 KB of unified L2 16-way, set-associative >> cache >> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> > > I can confirm that the data below matches the manuals, but also the > decoding of the architectural cache type registers (CCSIDR_EL1): > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line Thank you very much for reviewing my patch in such a detailed way! It's good to know that the values in the Allwinner datasheets match with the observed reality, so to speak. :) > tinymembench results for the H6 are available here: > https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt > and confirm the theory. Also ran it locally with similar results. Here's a quick copy & paste of the most important benchmark results from the link above, as a quick reference for anyone reading this thread in the future, or as a data source in case the link above becomes inaccessible at some point in the future: ========================================================================== == Memory latency test == == == == Average time is measured for random memory accesses in the buffers == == of different sizes. The larger is the buffer, the more significant == == are relative contributions of TLB, L1/L2 cache misses and SDRAM == == accesses. For extremely large buffer sizes we are expecting to see == == page table walk with several requests to SDRAM for almost every == == memory access (though 64MiB is not nearly large enough to experience == == this effect to its fullest). == == == == Note 1: All the numbers are representing extra time, which needs to == == be added to L1 cache latency. The cycle timings for L1 cache == == latency can be usually found in the processor documentation. == == Note 2: Dual random read means that we are simultaneously performing == == two independent memory accesses at a time. In the case if == == the memory subsystem can't handle multiple outstanding == == requests, dual random read has the same timings as two == == single reads performed one after another. == ========================================================================== block size : single random read / dual random read, [MADV_NOHUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 3.8 ns / 6.5 ns 131072 : 5.8 ns / 9.1 ns 262144 : 6.9 ns / 10.2 ns 524288 : 7.8 ns / 11.2 ns 1048576 : 74.3 ns / 114.5 ns 2097152 : 110.5 ns / 148.1 ns 4194304 : 132.6 ns / 164.5 ns 8388608 : 144.0 ns / 172.3 ns 16777216 : 151.5 ns / 177.3 ns 33554432 : 156.3 ns / 180.7 ns 67108864 : 158.7 ns / 182.9 ns block size : single random read / dual random read, [MADV_HUGEPAGE] 1024 : 0.0 ns / 0.0 ns 2048 : 0.0 ns / 0.0 ns 4096 : 0.0 ns / 0.0 ns 8192 : 0.0 ns / 0.0 ns 16384 : 0.0 ns / 0.0 ns 32768 : 0.0 ns / 0.0 ns 65536 : 3.8 ns / 6.5 ns 131072 : 5.8 ns / 9.1 ns 262144 : 6.9 ns / 10.2 ns 524288 : 7.8 ns / 11.2 ns 1048576 : 74.3 ns / 114.5 ns 2097152 : 110.0 ns / 147.5 ns 4194304 : 127.6 ns / 158.3 ns 8388608 : 136.4 ns / 162.2 ns 16777216 : 141.2 ns / 165.6 ns 33554432 : 143.7 ns / 168.4 ns 67108864 : 144.9 ns / 168.9 ns > Reviewed-by: Andre Przywara <andre.przywara@arm.com> Thanks! >> --- >> arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 >> ++++++++++++++++++++ >> 1 file changed, 37 insertions(+) >> >> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi >> b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi >> index d11e5041bae9..1a63066396e8 100644 >> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi >> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi >> @@ -29,36 +29,73 @@ cpu0: cpu@0 { >> clocks = <&ccu CLK_CPUX>; >> clock-latency-ns = <244144>; /* 8 32k periods */ >> #cooling-cells = <2>; >> + i-cache-size = <0x8000>; >> + i-cache-line-size = <64>; >> + i-cache-sets = <256>; >> + d-cache-size = <0x8000>; >> + d-cache-line-size = <64>; >> + d-cache-sets = <128>; >> + next-level-cache = <&l2_cache>; >> }; >> >> cpu1: cpu@1 { >> compatible = "arm,cortex-a53"; >> device_type = "cpu"; >> reg = <1>; >> enable-method = "psci"; >> clocks = <&ccu CLK_CPUX>; >> clock-latency-ns = <244144>; /* 8 32k periods */ >> #cooling-cells = <2>; >> + i-cache-size = <0x8000>; >> + i-cache-line-size = <64>; >> + i-cache-sets = <256>; >> + d-cache-size = <0x8000>; >> + d-cache-line-size = <64>; >> + d-cache-sets = <128>; >> + next-level-cache = <&l2_cache>; >> }; >> >> cpu2: cpu@2 { >> compatible = "arm,cortex-a53"; >> device_type = "cpu"; >> reg = <2>; >> enable-method = "psci"; >> clocks = <&ccu CLK_CPUX>; >> clock-latency-ns = <244144>; /* 8 32k periods */ >> #cooling-cells = <2>; >> + i-cache-size = <0x8000>; >> + i-cache-line-size = <64>; >> + i-cache-sets = <256>; >> + d-cache-size = <0x8000>; >> + d-cache-line-size = <64>; >> + d-cache-sets = <128>; >> + next-level-cache = <&l2_cache>; >> }; >> >> cpu3: cpu@3 { >> compatible = "arm,cortex-a53"; >> device_type = "cpu"; >> reg = <3>; >> enable-method = "psci"; >> clocks = <&ccu CLK_CPUX>; >> clock-latency-ns = <244144>; /* 8 32k periods */ >> #cooling-cells = <2>; >> + i-cache-size = <0x8000>; >> + i-cache-line-size = <64>; >> + i-cache-sets = <256>; >> + d-cache-size = <0x8000>; >> + d-cache-line-size = <64>; >> + d-cache-sets = <128>; >> + next-level-cache = <&l2_cache>; >> + }; >> + >> + l2_cache: l2-cache { >> + compatible = "cache"; >> + cache-level = <2>; >> + cache-unified; >> + cache-size = <0x80000>; >> + cache-line-size = <64>; >> + cache-sets = <512>; >> }; >> };
On Tue, 30 Apr 2024 02:01:42 +0200 Dragan Simic <dsimic@manjaro.org> wrote: Hi Dragan, > Hello Andre, > > On 2024-04-30 01:10, Andre Przywara wrote: > > On Sun, 28 Apr 2024 13:40:36 +0200 > > Dragan Simic <dsimic@manjaro.org> wrote: > > > >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow > >> the userspace, which includes lscpu(1) that uses the virtual files > >> provided > >> by the kernel under the /sys/devices/system/cpu directory, to display > >> the > >> proper H6 cache information. > >> > >> Adding the cache information to the H6 SoC dtsi also makes the > >> following > >> warning message in the kernel log go away: > >> > >> cacheinfo: Unable to detect cache hierarchy for CPU 0 > >> > >> The cache parameters for the H6 dtsi were obtained and partially > >> derived > >> by hand from the cache size and layout specifications found in the > >> following > >> datasheets and technical reference manuals: > >> > >> - Allwinner H6 V200 datasheet, version 1.1 > >> - ARM Cortex-A53 revision r0p3 TRM, version E > >> > >> For future reference, here's a brief summary of the documentation: > >> > >> - All caches employ the 64-byte cache line length > >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative > >> instruction > >> cache and 32 KB of L1 4-way, set-associative data cache > >> - The entire SoC has 512 KB of unified L2 16-way, set-associative > >> cache > >> > >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> > > > > I can confirm that the data below matches the manuals, but also the > > decoding of the architectural cache type registers (CCSIDR_EL1): > > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line > > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line > > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line > > Thank you very much for reviewing my patch in such a detailed way! > It's good to know that the values in the Allwinner datasheets match > with the observed reality, so to speak. :) YW, and yes, I like to double check things when it comes to Allwinner documentation ;-) And it was comparably easy for this problem. Out of curiosity: what triggered that patch? Trying to get rid of false warning/error messages? And do you plan to address the H616 as well? It's a bit more tricky there, since there are two die revisions out: one with 256(?)KB of L2, one with 1MB(!). We know how to tell them apart, so I could provide some TF-A code to patch that up in the DT. The kernel DT copy could go with 256KB then. Cheers, Andre.
Hello Andre, On 2024-04-30 12:46, Andre Przywara wrote: > On Tue, 30 Apr 2024 02:01:42 +0200 > Dragan Simic <dsimic@manjaro.org> wrote: >> On 2024-04-30 01:10, Andre Przywara wrote: >> > On Sun, 28 Apr 2024 13:40:36 +0200 >> > Dragan Simic <dsimic@manjaro.org> wrote: >> > >> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow >> >> the userspace, which includes lscpu(1) that uses the virtual files >> >> provided >> >> by the kernel under the /sys/devices/system/cpu directory, to display >> >> the >> >> proper H6 cache information. >> >> >> >> Adding the cache information to the H6 SoC dtsi also makes the >> >> following >> >> warning message in the kernel log go away: >> >> >> >> cacheinfo: Unable to detect cache hierarchy for CPU 0 >> >> >> >> The cache parameters for the H6 dtsi were obtained and partially >> >> derived >> >> by hand from the cache size and layout specifications found in the >> >> following >> >> datasheets and technical reference manuals: >> >> >> >> - Allwinner H6 V200 datasheet, version 1.1 >> >> - ARM Cortex-A53 revision r0p3 TRM, version E >> >> >> >> For future reference, here's a brief summary of the documentation: >> >> >> >> - All caches employ the 64-byte cache line length >> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative >> >> instruction >> >> cache and 32 KB of L1 4-way, set-associative data cache >> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative >> >> cache >> >> >> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> >> > >> > I can confirm that the data below matches the manuals, but also the >> > decoding of the architectural cache type registers (CCSIDR_EL1): >> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line >> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line >> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line >> >> Thank you very much for reviewing my patch in such a detailed way! >> It's good to know that the values in the Allwinner datasheets match >> with the observed reality, so to speak. :) > > YW, and yes, I like to double check things when it comes to Allwinner > documentation ;-) And it was comparably easy for this problem. Double checking is always good, IMHO. :) > Out of curiosity: what triggered that patch? Trying to get rid of false > warning/error messages? Yes, one of the motivators was to get rid of the false kernel warning, and the other was to have the cache information nicely available through lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] so a couple of Allwinner SoCs were the next on my mental TODO list. :) > And do you plan to address the H616 as well? It's a bit more tricky > there, > since there are two die revisions out: one with 256(?)KB of L2, one > with > 1MB(!). We know how to tell them apart, so I could provide some TF-A > code > to patch that up in the DT. The kernel DT copy could go with 256KB > then. I have no boards based on the Allwinner H616, so it wasn't on my radar. Though, I'd be happy to prepare and submit a similar kernel patch for the H616, if you'd then take it further and submit a TF-A patch that fixes the DT according to the detected die revision? Did I understand the plan right? [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7 [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
On Tue, 30 Apr 2024 13:10:41 +0200 Dragan Simic <dsimic@manjaro.org> wrote: > Hello Andre, > > On 2024-04-30 12:46, Andre Przywara wrote: > > On Tue, 30 Apr 2024 02:01:42 +0200 > > Dragan Simic <dsimic@manjaro.org> wrote: > >> On 2024-04-30 01:10, Andre Przywara wrote: > >> > On Sun, 28 Apr 2024 13:40:36 +0200 > >> > Dragan Simic <dsimic@manjaro.org> wrote: > >> > > >> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow > >> >> the userspace, which includes lscpu(1) that uses the virtual files > >> >> provided > >> >> by the kernel under the /sys/devices/system/cpu directory, to display > >> >> the > >> >> proper H6 cache information. > >> >> > >> >> Adding the cache information to the H6 SoC dtsi also makes the > >> >> following > >> >> warning message in the kernel log go away: > >> >> > >> >> cacheinfo: Unable to detect cache hierarchy for CPU 0 > >> >> > >> >> The cache parameters for the H6 dtsi were obtained and partially > >> >> derived > >> >> by hand from the cache size and layout specifications found in the > >> >> following > >> >> datasheets and technical reference manuals: > >> >> > >> >> - Allwinner H6 V200 datasheet, version 1.1 > >> >> - ARM Cortex-A53 revision r0p3 TRM, version E > >> >> > >> >> For future reference, here's a brief summary of the documentation: > >> >> > >> >> - All caches employ the 64-byte cache line length > >> >> - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative > >> >> instruction > >> >> cache and 32 KB of L1 4-way, set-associative data cache > >> >> - The entire SoC has 512 KB of unified L2 16-way, set-associative > >> >> cache > >> >> > >> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org> > >> > > >> > I can confirm that the data below matches the manuals, but also the > >> > decoding of the architectural cache type registers (CCSIDR_EL1): > >> > L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line > >> > L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line > >> > L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line > >> > >> Thank you very much for reviewing my patch in such a detailed way! > >> It's good to know that the values in the Allwinner datasheets match > >> with the observed reality, so to speak. :) > > > > YW, and yes, I like to double check things when it comes to Allwinner > > documentation ;-) And it was comparably easy for this problem. > > Double checking is always good, IMHO. :) > > > Out of curiosity: what triggered that patch? Trying to get rid of false > > warning/error messages? > > Yes, one of the motivators was to get rid of the false kernel warning, > and the other was to have the cache information nicely available through > lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] so > a couple of Allwinner SoCs were the next on my mental TODO list. :) Thanks for doing this! > > And do you plan to address the H616 as well? It's a bit more tricky > > there, > > since there are two die revisions out: one with 256(?)KB of L2, one > > with > > 1MB(!). We know how to tell them apart, so I could provide some TF-A > > code > > to patch that up in the DT. The kernel DT copy could go with 256KB > > then. > > I have no boards based on the Allwinner H616, so it wasn't on my radar. > Though, I'd be happy to prepare and submit a similar kernel patch for > the H616, if you'd then take it further and submit a TF-A patch that > fixes the DT according to the detected die revision? Did I understand > the plan right? Yes, that was the idea. I have a working version of that TF-A patch now, just need to figure out some details about the best way to only build this for the H616 port. Neither the data sheet nor the user manual mention the cache sizes for the H616, but I checked the CSSIDR_EL1 register readouts on both an old H616 and a new H618, and they confirm that the former has 256 KB L2, and the latter 1MB. Also I ran tinymembench on two boards to confirm this, community benchmarks results are available here: https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively. Associativity and cache line size are dictated by the Arm Cortex cores, and the L1I & L1D sizes are the same as in the other SoCs. Cheers, Andre > [1] > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7 > [2] > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf > [3] > https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
Hello Andre, On 2024-05-01 11:30, Andre Przywara wrote: > On Tue, 30 Apr 2024 13:10:41 +0200 > Dragan Simic <dsimic@manjaro.org> wrote: >> On 2024-04-30 12:46, Andre Przywara wrote: >> > On Tue, 30 Apr 2024 02:01:42 +0200 >> > Dragan Simic <dsimic@manjaro.org> wrote: >> >> Thank you very much for reviewing my patch in such a detailed way! >> >> It's good to know that the values in the Allwinner datasheets match >> >> with the observed reality, so to speak. :) >> > >> > YW, and yes, I like to double check things when it comes to Allwinner >> > documentation ;-) And it was comparably easy for this problem. >> >> Double checking is always good, IMHO. :) >> >> > Out of curiosity: what triggered that patch? Trying to get rid of false >> > warning/error messages? >> >> Yes, one of the motivators was to get rid of the false kernel warning, >> and the other was to have the cache information nicely available >> through >> lscpu(1). I already did the same for a few Rockchip SoCs, [1][2][3] >> so >> a couple of Allwinner SoCs were the next on my mental TODO list. :) > > Thanks for doing this! I'm glad that you like all these patches. :) >>> And do you plan to address the H616 as well? It's a bit more tricky >>> there, >>> since there are two die revisions out: one with 256(?)KB of L2, one >>> with >>> 1MB(!). We know how to tell them apart, so I could provide some TF-A >>> code >>> to patch that up in the DT. The kernel DT copy could go with 256KB >>> then. >> >> I have no boards based on the Allwinner H616, so it wasn't on my >> radar. >> Though, I'd be happy to prepare and submit a similar kernel patch for >> the H616, if you'd then take it further and submit a TF-A patch that >> fixes the DT according to the detected die revision? Did I understand >> the plan right? > > Yes, that was the idea. I have a working version of that TF-A patch > now, > just need to figure out some details about the best way to only build > this > for the H616 port. Nice, the kernel patch for the H616 SoC dtsi is now on the list, [4] please have a look. Please let me know when your follow-up TF-A patch gets submitted upstream, so I can watch it. > Neither the data sheet nor the user manual mention the cache sizes for > the > H616, but I checked the CSSIDR_EL1 register readouts on both an old > H616 > and a new H618, and they confirm that the former has 256 KB L2, and the > latter 1MB. Oh wow, 1 MB of L2 cache is quite a lot for such an SoC, which is actually very nice to see. Thumbs up for Allwinner not skimping on the L2 cache in that H616 die revision. :) > Also I ran tinymembench on two boards to confirm this, > community benchmarks results are available here: > https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md > The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively. > Associativity and cache line size are dictated by the Arm Cortex cores, > and the L1I & L1D sizes are the same as in the other SoCs. I've included the most important benchmark results in the H616 SoC dtsi patch, [4] which actually now serves as an additional reference for the cache sizes. [1] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7 [2] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf [3] https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4 [4] https://lore.kernel.org/linux-sunxi/9d52e6d338a059618d894abb0764015043330c2b.1714727227.git.dsimic@manjaro.org/
On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > the userspace, which includes lscpu(1) that uses the virtual files provided > by the kernel under the /sys/devices/system/cpu directory, to display the > proper H6 cache information. > > Adding the cache information to the H6 SoC dtsi also makes the following > warning message in the kernel log go away: > > [...] Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 Best regards,
On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote: > > On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: > > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > > the userspace, which includes lscpu(1) that uses the virtual files provided > > by the kernel under the /sys/devices/system/cpu directory, to display the > > proper H6 cache information. > > > > Adding the cache information to the H6 SoC dtsi also makes the following > > warning message in the kernel log go away: > > > > [...] > > Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! > > [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 > https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch [1]. Looks like they are threaded together because this patch has "In-Reply-To". Please avoid it in the future. Thanks ChenYu [1] https://lore.kernel.org/linux-sunxi/6a772756c2c677dbdaaab4a2c71a358d8e4b27e9.1714304058.git.dsimic@manjaro.org/
Hello Chen-Yu, On 2024-05-28 17:56, Chen-Yu Tsai wrote: > On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote: >> >> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: >> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow >> > the userspace, which includes lscpu(1) that uses the virtual files provided >> > by the kernel under the /sys/devices/system/cpu directory, to display the >> > proper H6 cache information. >> > >> > Adding the cache information to the H6 SoC dtsi also makes the following >> > warning message in the kernel log go away: >> > >> > [...] >> >> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! >> >> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for >> H6 >> https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 > > OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch > [1]. > Looks like they are threaded together because this patch has > "In-Reply-To". > > Please avoid it in the future. I'm sorry for that. I noticed that back when I sent the patches to the mailing list, but didn't want to make some noise about that. The root cause was some missing configuration for "git send-email", which resulted in adding troublesome threading-related headers to the messages for the individual .patch files that in fact were correctly created by running "git format-patch". Do I need to resend the patches?
On Wed, May 29, 2024 at 12:02 AM Dragan Simic <dsimic@manjaro.org> wrote: > > Hello Chen-Yu, > > On 2024-05-28 17:56, Chen-Yu Tsai wrote: > > On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote: > >> > >> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: > >> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > >> > the userspace, which includes lscpu(1) that uses the virtual files provided > >> > by the kernel under the /sys/devices/system/cpu directory, to display the > >> > proper H6 cache information. > >> > > >> > Adding the cache information to the H6 SoC dtsi also makes the following > >> > warning message in the kernel log go away: > >> > > >> > [...] > >> > >> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! > >> > >> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for > >> H6 > >> https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 > > > > OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch > > [1]. > > Looks like they are threaded together because this patch has > > "In-Reply-To". > > > > Please avoid it in the future. > > I'm sorry for that. I noticed that back when I sent the patches to the > mailing list, but didn't want to make some noise about that. The root > cause was some missing configuration for "git send-email", which > resulted > in adding troublesome threading-related headers to the messages for the > individual .patch files that in fact were correctly created by running > "git format-patch". > > Do I need to resend the patches? No. I figured it out.
On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > the userspace, which includes lscpu(1) that uses the virtual files provided > by the kernel under the /sys/devices/system/cpu directory, to display the > proper H6 cache information. > > Adding the cache information to the H6 SoC dtsi also makes the following > warning message in the kernel log go away: > > [...] Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 Best regards,
On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote: > > On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote: > > Add missing cache information to the Allwinner H6 SoC dtsi, to allow > > the userspace, which includes lscpu(1) that uses the virtual files provided > > by the kernel under the /sys/devices/system/cpu directory, to display the > > proper H6 cache information. > > > > Adding the cache information to the H6 SoC dtsi also makes the following > > warning message in the kernel log go away: > > > > [...] > > Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks! > > [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 > https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2 I had to do a quick rebase as the branch start point was incorrect. The commit hashes will have changed. Rest assured that the patch is indeed merged. ChenYu
diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi index d11e5041bae9..1a63066396e8 100644 --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi @@ -29,36 +29,73 @@ cpu0: cpu@0 { clocks = <&ccu CLK_CPUX>; clock-latency-ns = <244144>; /* 8 32k periods */ #cooling-cells = <2>; + i-cache-size = <0x8000>; + i-cache-line-size = <64>; + i-cache-sets = <256>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + next-level-cache = <&l2_cache>; }; cpu1: cpu@1 { compatible = "arm,cortex-a53"; device_type = "cpu"; reg = <1>; enable-method = "psci"; clocks = <&ccu CLK_CPUX>; clock-latency-ns = <244144>; /* 8 32k periods */ #cooling-cells = <2>; + i-cache-size = <0x8000>; + i-cache-line-size = <64>; + i-cache-sets = <256>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + next-level-cache = <&l2_cache>; }; cpu2: cpu@2 { compatible = "arm,cortex-a53"; device_type = "cpu"; reg = <2>; enable-method = "psci"; clocks = <&ccu CLK_CPUX>; clock-latency-ns = <244144>; /* 8 32k periods */ #cooling-cells = <2>; + i-cache-size = <0x8000>; + i-cache-line-size = <64>; + i-cache-sets = <256>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + next-level-cache = <&l2_cache>; }; cpu3: cpu@3 { compatible = "arm,cortex-a53"; device_type = "cpu"; reg = <3>; enable-method = "psci"; clocks = <&ccu CLK_CPUX>; clock-latency-ns = <244144>; /* 8 32k periods */ #cooling-cells = <2>; + i-cache-size = <0x8000>; + i-cache-line-size = <64>; + i-cache-sets = <256>; + d-cache-size = <0x8000>; + d-cache-line-size = <64>; + d-cache-sets = <128>; + next-level-cache = <&l2_cache>; + }; + + l2_cache: l2-cache { + compatible = "cache"; + cache-level = <2>; + cache-unified; + cache-size = <0x80000>; + cache-line-size = <64>; + cache-sets = <512>; }; };
Add missing cache information to the Allwinner H6 SoC dtsi, to allow the userspace, which includes lscpu(1) that uses the virtual files provided by the kernel under the /sys/devices/system/cpu directory, to display the proper H6 cache information. Adding the cache information to the H6 SoC dtsi also makes the following warning message in the kernel log go away: cacheinfo: Unable to detect cache hierarchy for CPU 0 The cache parameters for the H6 dtsi were obtained and partially derived by hand from the cache size and layout specifications found in the following datasheets and technical reference manuals: - Allwinner H6 V200 datasheet, version 1.1 - ARM Cortex-A53 revision r0p3 TRM, version E For future reference, here's a brief summary of the documentation: - All caches employ the 64-byte cache line length - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction cache and 32 KB of L1 4-way, set-associative data cache - The entire SoC has 512 KB of unified L2 16-way, set-associative cache Signed-off-by: Dragan Simic <dsimic@manjaro.org> --- arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++ 1 file changed, 37 insertions(+)