diff mbox series

arm64: dts: allwinner: Add cache information to the SoC dtsi for H6

Message ID 49abb93000078c692c48c0a65ff677893909361a.1714304071.git.dsimic@manjaro.org (mailing list archive)
State New, archived
Headers show
Series arm64: dts: allwinner: Add cache information to the SoC dtsi for H6 | expand

Commit Message

Dragan Simic April 28, 2024, 11:40 a.m. UTC
Add missing cache information to the Allwinner H6 SoC dtsi, to allow
the userspace, which includes lscpu(1) that uses the virtual files provided
by the kernel under the /sys/devices/system/cpu directory, to display the
proper H6 cache information.

Adding the cache information to the H6 SoC dtsi also makes the following
warning message in the kernel log go away:

  cacheinfo: Unable to detect cache hierarchy for CPU 0

The cache parameters for the H6 dtsi were obtained and partially derived
by hand from the cache size and layout specifications found in the following
datasheets and technical reference manuals:

  - Allwinner H6 V200 datasheet, version 1.1
  - ARM Cortex-A53 revision r0p3 TRM, version E

For future reference, here's a brief summary of the documentation:

  - All caches employ the 64-byte cache line length
  - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
    cache and 32 KB of L1 4-way, set-associative data cache
  - The entire SoC has 512 KB of unified L2 16-way, set-associative cache

Signed-off-by: Dragan Simic <dsimic@manjaro.org>
---
 arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++
 1 file changed, 37 insertions(+)

Comments

Jernej Škrabec April 28, 2024, 4:21 p.m. UTC | #1
Dne nedelja, 28. april 2024 ob 13:40:36 GMT +2 je Dragan Simic napisal(a):
> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
> 
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
> 
>   cacheinfo: Unable to detect cache hierarchy for CPU 0
> 
> The cache parameters for the H6 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
> 
>   - Allwinner H6 V200 datasheet, version 1.1
>   - ARM Cortex-A53 revision r0p3 TRM, version E
> 
> For future reference, here's a brief summary of the documentation:
> 
>   - All caches employ the 64-byte cache line length
>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
>     cache and 32 KB of L1 4-way, set-associative data cache
>   - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
> 
> Signed-off-by: Dragan Simic <dsimic@manjaro.org>

Reviewed-by: Jernej Skrabec <jernej.skrabec@gmail.com>

Best regards,
Jernej
Andre Przywara April 29, 2024, 11:10 p.m. UTC | #2
On Sun, 28 Apr 2024 13:40:36 +0200
Dragan Simic <dsimic@manjaro.org> wrote:

> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
> 
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
> 
>   cacheinfo: Unable to detect cache hierarchy for CPU 0
> 
> The cache parameters for the H6 dtsi were obtained and partially derived
> by hand from the cache size and layout specifications found in the following
> datasheets and technical reference manuals:
> 
>   - Allwinner H6 V200 datasheet, version 1.1
>   - ARM Cortex-A53 revision r0p3 TRM, version E
> 
> For future reference, here's a brief summary of the documentation:
> 
>   - All caches employ the 64-byte cache line length
>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative instruction
>     cache and 32 KB of L1 4-way, set-associative data cache
>   - The entire SoC has 512 KB of unified L2 16-way, set-associative cache
> 
> Signed-off-by: Dragan Simic <dsimic@manjaro.org>

I can confirm that the data below matches the manuals, but also the
decoding of the architectural cache type registers (CCSIDR_EL1):
  L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
  L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
  L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line

tinymembench results for the H6 are available here:
https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt
and confirm the theory. Also ran it locally with similar results.

Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks,
Andre

> ---
>  arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 ++++++++++++++++++++
>  1 file changed, 37 insertions(+)
> 
> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> index d11e5041bae9..1a63066396e8 100644
> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
> @@ -29,36 +29,73 @@ cpu0: cpu@0 {
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-latency-ns = <244144>; /* 8 32k periods */
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu1: cpu@1 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <1>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-latency-ns = <244144>; /* 8 32k periods */
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu2: cpu@2 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <2>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-latency-ns = <244144>; /* 8 32k periods */
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
>  		};
>  
>  		cpu3: cpu@3 {
>  			compatible = "arm,cortex-a53";
>  			device_type = "cpu";
>  			reg = <3>;
>  			enable-method = "psci";
>  			clocks = <&ccu CLK_CPUX>;
>  			clock-latency-ns = <244144>; /* 8 32k periods */
>  			#cooling-cells = <2>;
> +			i-cache-size = <0x8000>;
> +			i-cache-line-size = <64>;
> +			i-cache-sets = <256>;
> +			d-cache-size = <0x8000>;
> +			d-cache-line-size = <64>;
> +			d-cache-sets = <128>;
> +			next-level-cache = <&l2_cache>;
> +		};
> +
> +		l2_cache: l2-cache {
> +			compatible = "cache";
> +			cache-level = <2>;
> +			cache-unified;
> +			cache-size = <0x80000>;
> +			cache-line-size = <64>;
> +			cache-sets = <512>;
>  		};
>  	};
>  
>
Dragan Simic April 30, 2024, 12:01 a.m. UTC | #3
Hello Andre,

On 2024-04-30 01:10, Andre Przywara wrote:
> On Sun, 28 Apr 2024 13:40:36 +0200
> Dragan Simic <dsimic@manjaro.org> wrote:
> 
>> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
>> the userspace, which includes lscpu(1) that uses the virtual files 
>> provided
>> by the kernel under the /sys/devices/system/cpu directory, to display 
>> the
>> proper H6 cache information.
>> 
>> Adding the cache information to the H6 SoC dtsi also makes the 
>> following
>> warning message in the kernel log go away:
>> 
>>   cacheinfo: Unable to detect cache hierarchy for CPU 0
>> 
>> The cache parameters for the H6 dtsi were obtained and partially 
>> derived
>> by hand from the cache size and layout specifications found in the 
>> following
>> datasheets and technical reference manuals:
>> 
>>   - Allwinner H6 V200 datasheet, version 1.1
>>   - ARM Cortex-A53 revision r0p3 TRM, version E
>> 
>> For future reference, here's a brief summary of the documentation:
>> 
>>   - All caches employ the 64-byte cache line length
>>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative 
>> instruction
>>     cache and 32 KB of L1 4-way, set-associative data cache
>>   - The entire SoC has 512 KB of unified L2 16-way, set-associative 
>> cache
>> 
>> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
> 
> I can confirm that the data below matches the manuals, but also the
> decoding of the architectural cache type registers (CCSIDR_EL1):
>   L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
>   L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
>   L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line

Thank you very much for reviewing my patch in such a detailed way!
It's good to know that the values in the Allwinner datasheets match
with the observed reality, so to speak. :)

> tinymembench results for the H6 are available here:
> https://github.com/ThomasKaiser/sbc-bench/blob/master/results/26Ph.txt
> and confirm the theory. Also ran it locally with similar results.

Here's a quick copy & paste of the most important benchmark results
from the link above, as a quick reference for anyone reading this
thread in the future, or as a data source in case the link above
becomes inaccessible at some point in the future:

==========================================================================
== Memory latency test                                                  
==
==                                                                      
==
== Average time is measured for random memory accesses in the buffers   
==
== of different sizes. The larger is the buffer, the more significant   
==
== are relative contributions of TLB, L1/L2 cache misses and SDRAM      
==
== accesses. For extremely large buffer sizes we are expecting to see   
==
== page table walk with several requests to SDRAM for almost every      
==
== memory access (though 64MiB is not nearly large enough to experience 
==
== this effect to its fullest).                                         
==
==                                                                      
==
== Note 1: All the numbers are representing extra time, which needs to  
==
==         be added to L1 cache latency. The cycle timings for L1 cache 
==
==         latency can be usually found in the processor documentation. 
==
== Note 2: Dual random read means that we are simultaneously performing 
==
==         two independent memory accesses at a time. In the case if    
==
==         the memory subsystem can't handle multiple outstanding       
==
==         requests, dual random read has the same timings as two       
==
==         single reads performed one after another.                    
==
==========================================================================

block size : single random read / dual random read, [MADV_NOHUGEPAGE]
       1024 :    0.0 ns          /     0.0 ns
       2048 :    0.0 ns          /     0.0 ns
       4096 :    0.0 ns          /     0.0 ns
       8192 :    0.0 ns          /     0.0 ns
      16384 :    0.0 ns          /     0.0 ns
      32768 :    0.0 ns          /     0.0 ns
      65536 :    3.8 ns          /     6.5 ns
     131072 :    5.8 ns          /     9.1 ns
     262144 :    6.9 ns          /    10.2 ns
     524288 :    7.8 ns          /    11.2 ns
    1048576 :   74.3 ns          /   114.5 ns
    2097152 :  110.5 ns          /   148.1 ns
    4194304 :  132.6 ns          /   164.5 ns
    8388608 :  144.0 ns          /   172.3 ns
   16777216 :  151.5 ns          /   177.3 ns
   33554432 :  156.3 ns          /   180.7 ns
   67108864 :  158.7 ns          /   182.9 ns

block size : single random read / dual random read, [MADV_HUGEPAGE]
       1024 :    0.0 ns          /     0.0 ns
       2048 :    0.0 ns          /     0.0 ns
       4096 :    0.0 ns          /     0.0 ns
       8192 :    0.0 ns          /     0.0 ns
      16384 :    0.0 ns          /     0.0 ns
      32768 :    0.0 ns          /     0.0 ns
      65536 :    3.8 ns          /     6.5 ns
     131072 :    5.8 ns          /     9.1 ns
     262144 :    6.9 ns          /    10.2 ns
     524288 :    7.8 ns          /    11.2 ns
    1048576 :   74.3 ns          /   114.5 ns
    2097152 :  110.0 ns          /   147.5 ns
    4194304 :  127.6 ns          /   158.3 ns
    8388608 :  136.4 ns          /   162.2 ns
   16777216 :  141.2 ns          /   165.6 ns
   33554432 :  143.7 ns          /   168.4 ns
   67108864 :  144.9 ns          /   168.9 ns

> Reviewed-by: Andre Przywara <andre.przywara@arm.com>

Thanks!

>> ---
>>  arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi | 37 
>> ++++++++++++++++++++
>>  1 file changed, 37 insertions(+)
>> 
>> diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi 
>> b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> index d11e5041bae9..1a63066396e8 100644
>> --- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> +++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
>> @@ -29,36 +29,73 @@ cpu0: cpu@0 {
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-latency-ns = <244144>; /* 8 32k periods */
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu1: cpu@1 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <1>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-latency-ns = <244144>; /* 8 32k periods */
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu2: cpu@2 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <2>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-latency-ns = <244144>; /* 8 32k periods */
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>>  		};
>> 
>>  		cpu3: cpu@3 {
>>  			compatible = "arm,cortex-a53";
>>  			device_type = "cpu";
>>  			reg = <3>;
>>  			enable-method = "psci";
>>  			clocks = <&ccu CLK_CPUX>;
>>  			clock-latency-ns = <244144>; /* 8 32k periods */
>>  			#cooling-cells = <2>;
>> +			i-cache-size = <0x8000>;
>> +			i-cache-line-size = <64>;
>> +			i-cache-sets = <256>;
>> +			d-cache-size = <0x8000>;
>> +			d-cache-line-size = <64>;
>> +			d-cache-sets = <128>;
>> +			next-level-cache = <&l2_cache>;
>> +		};
>> +
>> +		l2_cache: l2-cache {
>> +			compatible = "cache";
>> +			cache-level = <2>;
>> +			cache-unified;
>> +			cache-size = <0x80000>;
>> +			cache-line-size = <64>;
>> +			cache-sets = <512>;
>>  		};
>>  	};
Andre Przywara April 30, 2024, 10:46 a.m. UTC | #4
On Tue, 30 Apr 2024 02:01:42 +0200
Dragan Simic <dsimic@manjaro.org> wrote:

Hi Dragan,

> Hello Andre,
> 
> On 2024-04-30 01:10, Andre Przywara wrote:
> > On Sun, 28 Apr 2024 13:40:36 +0200
> > Dragan Simic <dsimic@manjaro.org> wrote:
> >   
> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> the userspace, which includes lscpu(1) that uses the virtual files 
> >> provided
> >> by the kernel under the /sys/devices/system/cpu directory, to display 
> >> the
> >> proper H6 cache information.
> >> 
> >> Adding the cache information to the H6 SoC dtsi also makes the 
> >> following
> >> warning message in the kernel log go away:
> >> 
> >>   cacheinfo: Unable to detect cache hierarchy for CPU 0
> >> 
> >> The cache parameters for the H6 dtsi were obtained and partially 
> >> derived
> >> by hand from the cache size and layout specifications found in the 
> >> following
> >> datasheets and technical reference manuals:
> >> 
> >>   - Allwinner H6 V200 datasheet, version 1.1
> >>   - ARM Cortex-A53 revision r0p3 TRM, version E
> >> 
> >> For future reference, here's a brief summary of the documentation:
> >> 
> >>   - All caches employ the 64-byte cache line length
> >>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative 
> >> instruction
> >>     cache and 32 KB of L1 4-way, set-associative data cache
> >>   - The entire SoC has 512 KB of unified L2 16-way, set-associative 
> >> cache
> >> 
> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org>  
> > 
> > I can confirm that the data below matches the manuals, but also the
> > decoding of the architectural cache type registers (CCSIDR_EL1):
> >   L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> >   L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> >   L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line  
> 
> Thank you very much for reviewing my patch in such a detailed way!
> It's good to know that the values in the Allwinner datasheets match
> with the observed reality, so to speak. :)

YW, and yes, I like to double check things when it comes to Allwinner
documentation ;-) And it was comparably easy for this problem.

Out of curiosity: what triggered that patch? Trying to get rid of false
warning/error messages?
And do you plan to address the H616 as well? It's a bit more tricky there,
since there are two die revisions out: one with 256(?)KB of L2, one with
1MB(!). We know how to tell them apart, so I could provide some TF-A code
to patch that up in the DT. The kernel DT copy could go with 256KB then.

Cheers,
Andre.
Dragan Simic April 30, 2024, 11:10 a.m. UTC | #5
Hello Andre,

On 2024-04-30 12:46, Andre Przywara wrote:
> On Tue, 30 Apr 2024 02:01:42 +0200
> Dragan Simic <dsimic@manjaro.org> wrote:
>> On 2024-04-30 01:10, Andre Przywara wrote:
>> > On Sun, 28 Apr 2024 13:40:36 +0200
>> > Dragan Simic <dsimic@manjaro.org> wrote:
>> >
>> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
>> >> the userspace, which includes lscpu(1) that uses the virtual files
>> >> provided
>> >> by the kernel under the /sys/devices/system/cpu directory, to display
>> >> the
>> >> proper H6 cache information.
>> >>
>> >> Adding the cache information to the H6 SoC dtsi also makes the
>> >> following
>> >> warning message in the kernel log go away:
>> >>
>> >>   cacheinfo: Unable to detect cache hierarchy for CPU 0
>> >>
>> >> The cache parameters for the H6 dtsi were obtained and partially
>> >> derived
>> >> by hand from the cache size and layout specifications found in the
>> >> following
>> >> datasheets and technical reference manuals:
>> >>
>> >>   - Allwinner H6 V200 datasheet, version 1.1
>> >>   - ARM Cortex-A53 revision r0p3 TRM, version E
>> >>
>> >> For future reference, here's a brief summary of the documentation:
>> >>
>> >>   - All caches employ the 64-byte cache line length
>> >>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
>> >> instruction
>> >>     cache and 32 KB of L1 4-way, set-associative data cache
>> >>   - The entire SoC has 512 KB of unified L2 16-way, set-associative
>> >> cache
>> >>
>> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org>
>> >
>> > I can confirm that the data below matches the manuals, but also the
>> > decoding of the architectural cache type registers (CCSIDR_EL1):
>> >   L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
>> >   L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
>> >   L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line
>> 
>> Thank you very much for reviewing my patch in such a detailed way!
>> It's good to know that the values in the Allwinner datasheets match
>> with the observed reality, so to speak. :)
> 
> YW, and yes, I like to double check things when it comes to Allwinner
> documentation ;-) And it was comparably easy for this problem.

Double checking is always good, IMHO. :)

> Out of curiosity: what triggered that patch? Trying to get rid of false
> warning/error messages?

Yes, one of the motivators was to get rid of the false kernel warning,
and the other was to have the cache information nicely available through
lscpu(1).  I already did the same for a few Rockchip SoCs, [1][2][3] so
a couple of Allwinner SoCs were the next on my mental TODO list. :)

> And do you plan to address the H616 as well? It's a bit more tricky 
> there,
> since there are two die revisions out: one with 256(?)KB of L2, one 
> with
> 1MB(!). We know how to tell them apart, so I could provide some TF-A 
> code
> to patch that up in the DT. The kernel DT copy could go with 256KB 
> then.

I have no boards based on the Allwinner H616, so it wasn't on my radar.
Though, I'd be happy to prepare and submit a similar kernel patch for
the H616, if you'd then take it further and submit a TF-A patch that
fixes the DT according to the detected die revision?  Did I understand
the plan right?

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
Andre Przywara May 1, 2024, 9:30 a.m. UTC | #6
On Tue, 30 Apr 2024 13:10:41 +0200
Dragan Simic <dsimic@manjaro.org> wrote:

> Hello Andre,
> 
> On 2024-04-30 12:46, Andre Przywara wrote:
> > On Tue, 30 Apr 2024 02:01:42 +0200
> > Dragan Simic <dsimic@manjaro.org> wrote:  
> >> On 2024-04-30 01:10, Andre Przywara wrote:  
> >> > On Sun, 28 Apr 2024 13:40:36 +0200
> >> > Dragan Simic <dsimic@manjaro.org> wrote:
> >> >  
> >> >> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> >> the userspace, which includes lscpu(1) that uses the virtual files
> >> >> provided
> >> >> by the kernel under the /sys/devices/system/cpu directory, to display
> >> >> the
> >> >> proper H6 cache information.
> >> >>
> >> >> Adding the cache information to the H6 SoC dtsi also makes the
> >> >> following
> >> >> warning message in the kernel log go away:
> >> >>
> >> >>   cacheinfo: Unable to detect cache hierarchy for CPU 0
> >> >>
> >> >> The cache parameters for the H6 dtsi were obtained and partially
> >> >> derived
> >> >> by hand from the cache size and layout specifications found in the
> >> >> following
> >> >> datasheets and technical reference manuals:
> >> >>
> >> >>   - Allwinner H6 V200 datasheet, version 1.1
> >> >>   - ARM Cortex-A53 revision r0p3 TRM, version E
> >> >>
> >> >> For future reference, here's a brief summary of the documentation:
> >> >>
> >> >>   - All caches employ the 64-byte cache line length
> >> >>   - Each Cortex-A53 core has 32 KB of L1 2-way, set-associative
> >> >> instruction
> >> >>     cache and 32 KB of L1 4-way, set-associative data cache
> >> >>   - The entire SoC has 512 KB of unified L2 16-way, set-associative
> >> >> cache
> >> >>
> >> >> Signed-off-by: Dragan Simic <dsimic@manjaro.org>  
> >> >
> >> > I can confirm that the data below matches the manuals, but also the
> >> > decoding of the architectural cache type registers (CCSIDR_EL1):
> >> >   L1D: 32 KB: 128 sets, 4 way associative, 64 bytes/line
> >> >   L1I: 32 KB: 256 sets, 2 way associative, 64 bytes/line
> >> >   L2: 512 KB: 512 sets, 16 way associative, 64 bytes/line  
> >> 
> >> Thank you very much for reviewing my patch in such a detailed way!
> >> It's good to know that the values in the Allwinner datasheets match
> >> with the observed reality, so to speak. :)  
> > 
> > YW, and yes, I like to double check things when it comes to Allwinner
> > documentation ;-) And it was comparably easy for this problem.  
> 
> Double checking is always good, IMHO. :)
> 
> > Out of curiosity: what triggered that patch? Trying to get rid of false
> > warning/error messages?  
> 
> Yes, one of the motivators was to get rid of the false kernel warning,
> and the other was to have the cache information nicely available through
> lscpu(1).  I already did the same for a few Rockchip SoCs, [1][2][3] so
> a couple of Allwinner SoCs were the next on my mental TODO list. :)

Thanks for doing this!

> > And do you plan to address the H616 as well? It's a bit more tricky 
> > there,
> > since there are two die revisions out: one with 256(?)KB of L2, one 
> > with
> > 1MB(!). We know how to tell them apart, so I could provide some TF-A 
> > code
> > to patch that up in the DT. The kernel DT copy could go with 256KB 
> > then.  
> 
> I have no boards based on the Allwinner H616, so it wasn't on my radar.
> Though, I'd be happy to prepare and submit a similar kernel patch for
> the H616, if you'd then take it further and submit a TF-A patch that
> fixes the DT according to the detected die revision?  Did I understand
> the plan right?

Yes, that was the idea. I have a working version of that TF-A patch now,
just need to figure out some details about the best way to only build this
for the H616 port.

Neither the data sheet nor the user manual mention the cache sizes for the
H616, but I checked the CSSIDR_EL1 register readouts on both an old H616
and a new H618, and they confirm that the former has 256 KB L2, and the
latter 1MB. Also I ran tinymembench on two boards to confirm this,
community benchmarks results are available here:
https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively.
Associativity and cache line size are dictated by the Arm Cortex cores,
and the L1I & L1D sizes are the same as in the other SoCs.

Cheers,
Andre

> [1] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
> [2] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
> [3] 
> https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
Dragan Simic May 3, 2024, 9:13 a.m. UTC | #7
Hello Andre,

On 2024-05-01 11:30, Andre Przywara wrote:
> On Tue, 30 Apr 2024 13:10:41 +0200
> Dragan Simic <dsimic@manjaro.org> wrote:
>> On 2024-04-30 12:46, Andre Przywara wrote:
>> > On Tue, 30 Apr 2024 02:01:42 +0200
>> > Dragan Simic <dsimic@manjaro.org> wrote:
>> >> Thank you very much for reviewing my patch in such a detailed way!
>> >> It's good to know that the values in the Allwinner datasheets match
>> >> with the observed reality, so to speak. :)
>> >
>> > YW, and yes, I like to double check things when it comes to Allwinner
>> > documentation ;-) And it was comparably easy for this problem.
>> 
>> Double checking is always good, IMHO. :)
>> 
>> > Out of curiosity: what triggered that patch? Trying to get rid of false
>> > warning/error messages?
>> 
>> Yes, one of the motivators was to get rid of the false kernel warning,
>> and the other was to have the cache information nicely available 
>> through
>> lscpu(1).  I already did the same for a few Rockchip SoCs, [1][2][3] 
>> so
>> a couple of Allwinner SoCs were the next on my mental TODO list. :)
> 
> Thanks for doing this!

I'm glad that you like all these patches. :)

>>> And do you plan to address the H616 as well? It's a bit more tricky 
>>> there,
>>> since there are two die revisions out: one with 256(?)KB of L2, one 
>>> with
>>> 1MB(!). We know how to tell them apart, so I could provide some TF-A 
>>> code
>>> to patch that up in the DT. The kernel DT copy could go with 256KB 
>>> then.
>> 
>> I have no boards based on the Allwinner H616, so it wasn't on my 
>> radar.
>> Though, I'd be happy to prepare and submit a similar kernel patch for
>> the H616, if you'd then take it further and submit a TF-A patch that
>> fixes the DT according to the detected die revision?  Did I understand
>> the plan right?
> 
> Yes, that was the idea. I have a working version of that TF-A patch 
> now,
> just need to figure out some details about the best way to only build 
> this
> for the H616 port.

Nice, the kernel patch for the H616 SoC dtsi is now on the list, [4]
please have a look.  Please let me know when your follow-up TF-A patch
gets submitted upstream, so I can watch it.

> Neither the data sheet nor the user manual mention the cache sizes for 
> the
> H616, but I checked the CSSIDR_EL1 register readouts on both an old 
> H616
> and a new H618, and they confirm that the former has 256 KB L2, and the
> latter 1MB.

Oh wow, 1 MB of L2 cache is quite a lot for such an SoC, which is
actually very nice to see.  Thumbs up for Allwinner not skimping on
the L2 cache in that H616 die revision. :)

> Also I ran tinymembench on two boards to confirm this,
> community benchmarks results are available here:
> https://github.com/ThomasKaiser/sbc-bench/blob/master/Results.md
> The OrangePi Zero2 and OrangePi Zero3 are good examples, respectively.
> Associativity and cache line size are dictated by the Arm Cortex cores,
> and the L1I & L1D sizes are the same as in the other SoCs.

I've included the most important benchmark results in the H616 SoC
dtsi patch, [4] which actually now serves as an additional reference
for the cache sizes.

[1] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=67a6a98575974416834c2294853b3814376a7ce7
[2] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=8612169a05c5e979af033868b7a9b177e0f9fcdf
[3] 
https://git.kernel.org/pub/scm/linux/kernel/git/next/linux-next.git/commit/?id=b72633ba5cfa932405832de25d0f0a11716903b4
[4] 
https://lore.kernel.org/linux-sunxi/9d52e6d338a059618d894abb0764015043330c2b.1714727227.git.dsimic@manjaro.org/
Chen-Yu Tsai May 28, 2024, 3:46 p.m. UTC | #8
On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
> 
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
> 
> [...]

Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!

[1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6
      https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2

Best regards,
Chen-Yu Tsai May 28, 2024, 3:56 p.m. UTC | #9
On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote:
>
> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> > the userspace, which includes lscpu(1) that uses the virtual files provided
> > by the kernel under the /sys/devices/system/cpu directory, to display the
> > proper H6 cache information.
> >
> > Adding the cache information to the H6 SoC dtsi also makes the following
> > warning message in the kernel log go away:
> >
> > [...]
>
> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!
>
> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6
>       https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2

OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch [1].
Looks like they are threaded together because this patch has "In-Reply-To".

Please avoid it in the future.


Thanks
ChenYu

[1] https://lore.kernel.org/linux-sunxi/6a772756c2c677dbdaaab4a2c71a358d8e4b27e9.1714304058.git.dsimic@manjaro.org/
Dragan Simic May 28, 2024, 4:02 p.m. UTC | #10
Hello Chen-Yu,

On 2024-05-28 17:56, Chen-Yu Tsai wrote:
> On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote:
>> 
>> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
>> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow
>> > the userspace, which includes lscpu(1) that uses the virtual files provided
>> > by the kernel under the /sys/devices/system/cpu directory, to display the
>> > proper H6 cache information.
>> >
>> > Adding the cache information to the H6 SoC dtsi also makes the following
>> > warning message in the kernel log go away:
>> >
>> > [...]
>> 
>> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!
>> 
>> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for 
>> H6
>>       https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2
> 
> OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch 
> [1].
> Looks like they are threaded together because this patch has 
> "In-Reply-To".
> 
> Please avoid it in the future.

I'm sorry for that.  I noticed that back when I sent the patches to the
mailing list, but didn't want to make some noise about that.  The root
cause was some missing configuration for "git send-email", which 
resulted
in adding troublesome threading-related headers to the messages for the
individual .patch files that in fact were correctly created by running
"git format-patch".

Do I need to resend the patches?
Chen-Yu Tsai May 28, 2024, 4:06 p.m. UTC | #11
On Wed, May 29, 2024 at 12:02 AM Dragan Simic <dsimic@manjaro.org> wrote:
>
> Hello Chen-Yu,
>
> On 2024-05-28 17:56, Chen-Yu Tsai wrote:
> > On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote:
> >>
> >> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
> >> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> >> > the userspace, which includes lscpu(1) that uses the virtual files provided
> >> > by the kernel under the /sys/devices/system/cpu directory, to display the
> >> > proper H6 cache information.
> >> >
> >> > Adding the cache information to the H6 SoC dtsi also makes the following
> >> > warning message in the kernel log go away:
> >> >
> >> > [...]
> >>
> >> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!
> >>
> >> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for
> >> H6
> >>       https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2
> >
> > OK, that's weird. Somehow b4 thought this patch was v2 of the A64 patch
> > [1].
> > Looks like they are threaded together because this patch has
> > "In-Reply-To".
> >
> > Please avoid it in the future.
>
> I'm sorry for that.  I noticed that back when I sent the patches to the
> mailing list, but didn't want to make some noise about that.  The root
> cause was some missing configuration for "git send-email", which
> resulted
> in adding troublesome threading-related headers to the messages for the
> individual .patch files that in fact were correctly created by running
> "git format-patch".
>
> Do I need to resend the patches?

No. I figured it out.
Chen-Yu Tsai May 28, 2024, 4:10 p.m. UTC | #12
On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
> Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> the userspace, which includes lscpu(1) that uses the virtual files provided
> by the kernel under the /sys/devices/system/cpu directory, to display the
> proper H6 cache information.
> 
> Adding the cache information to the H6 SoC dtsi also makes the following
> warning message in the kernel log go away:
> 
> [...]

Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!

[1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6
      https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2

Best regards,
Chen-Yu Tsai May 28, 2024, 4:17 p.m. UTC | #13
On Tue, May 28, 2024 at 11:46 PM Chen-Yu Tsai <wens@csie.org> wrote:
>
> On Sun, 28 Apr 2024 13:40:36 +0200, Dragan Simic wrote:
> > Add missing cache information to the Allwinner H6 SoC dtsi, to allow
> > the userspace, which includes lscpu(1) that uses the virtual files provided
> > by the kernel under the /sys/devices/system/cpu directory, to display the
> > proper H6 cache information.
> >
> > Adding the cache information to the H6 SoC dtsi also makes the following
> > warning message in the kernel log go away:
> >
> > [...]
>
> Applied to sunxi/dt-for-6.11 in sunxi/linux.git, thanks!
>
> [1/1] arm64: dts: allwinner: Add cache information to the SoC dtsi for H6
>       https://git.kernel.org/sunxi/linux/c/c8240e4b0fd2

I had to do a quick rebase as the branch start point was incorrect. The
commit hashes will have changed. Rest assured that the patch is indeed
merged.


ChenYu
diff mbox series

Patch

diff --git a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
index d11e5041bae9..1a63066396e8 100644
--- a/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
+++ b/arch/arm64/boot/dts/allwinner/sun50i-h6.dtsi
@@ -29,36 +29,73 @@  cpu0: cpu@0 {
 			clocks = <&ccu CLK_CPUX>;
 			clock-latency-ns = <244144>; /* 8 32k periods */
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu1: cpu@1 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <1>;
 			enable-method = "psci";
 			clocks = <&ccu CLK_CPUX>;
 			clock-latency-ns = <244144>; /* 8 32k periods */
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu2: cpu@2 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <2>;
 			enable-method = "psci";
 			clocks = <&ccu CLK_CPUX>;
 			clock-latency-ns = <244144>; /* 8 32k periods */
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
 		};
 
 		cpu3: cpu@3 {
 			compatible = "arm,cortex-a53";
 			device_type = "cpu";
 			reg = <3>;
 			enable-method = "psci";
 			clocks = <&ccu CLK_CPUX>;
 			clock-latency-ns = <244144>; /* 8 32k periods */
 			#cooling-cells = <2>;
+			i-cache-size = <0x8000>;
+			i-cache-line-size = <64>;
+			i-cache-sets = <256>;
+			d-cache-size = <0x8000>;
+			d-cache-line-size = <64>;
+			d-cache-sets = <128>;
+			next-level-cache = <&l2_cache>;
+		};
+
+		l2_cache: l2-cache {
+			compatible = "cache";
+			cache-level = <2>;
+			cache-unified;
+			cache-size = <0x80000>;
+			cache-line-size = <64>;
+			cache-sets = <512>;
 		};
 	};