diff mbox series

[2/2] arm64: cacheinfo: Update cache_line_size detected from PPTT

Message ID 1556242821-5080-2-git-send-email-zhangshaokun@hisilicon.com (mailing list archive)
State New, archived
Headers show
Series [1/2] ACPI/PPTT: Add variable to record max cache line size | expand

Commit Message

Shaokun Zhang April 26, 2019, 1:40 a.m. UTC
cache_line_size is derived from CTR_EL0.CWG field and is called mostly
for I/O device drivers. For HiSilicon certain plantform, like the
Kunpeng920 server SoC, cache line sizes are different between L1/2
cache and L3 cache while L1 cache line size is 64-byte and L3 is 128-byte,
but CTR_EL0.CWG is misreporting using L1 cache line size.

We shall correct the right value which is important for I/O performance.
Let's update the cache line size if it is detected from PPTT information
when it is larger than CTR_EL0.CWG reporting.

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Will Deacon <will.deacon@arm.com>
Reported-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
---
 arch/arm64/include/asm/cache.h |  6 +-----
 arch/arm64/kernel/cacheinfo.c  | 15 +++++++++++++++
 2 files changed, 16 insertions(+), 5 deletions(-)

Comments

Jeremy Linton April 26, 2019, 5:18 p.m. UTC | #1
Hi,

On 4/25/19 8:40 PM, Shaokun Zhang wrote:
> cache_line_size is derived from CTR_EL0.CWG field and is called mostly
> for I/O device drivers. For HiSilicon certain plantform, like the

But there are core users too? Thinkgs like blk-mq, the trace ring 
buffer, iommu/iova, slub/slab. And a quick look seems to indicate a 
number of those users are going to be checking the cache line size 
before the cachinfo is populated (it happens fairly late via 
device_initcall() and a hp notifier). Is it going to be a problem if the 
value changes?


> Kunpeng920 server SoC, cache line sizes are different between L1/2
> cache and L3 cache while L1 cache line size is 64-byte and L3 is 128-byte,
> but CTR_EL0.CWG is misreporting using L1 cache line size.
> 
> We shall correct the right value which is important for I/O performance.
> Let's update the cache line size if it is detected from PPTT information
> when it is larger than CTR_EL0.CWG reporting.
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Will Deacon <will.deacon@arm.com>
> Reported-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com>
> Suggested-by: Catalin Marinas <catalin.marinas@arm.com>
> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com>
> ---
>   arch/arm64/include/asm/cache.h |  6 +-----
>   arch/arm64/kernel/cacheinfo.c  | 15 +++++++++++++++
>   2 files changed, 16 insertions(+), 5 deletions(-)
> 
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 926434f413fa..f120d48b27ac 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void)
>   
>   #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>   
> -static inline int cache_line_size(void)
> -{
> -	u32 cwg = cache_type_cwg();
> -	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
> -}
> +extern int cache_line_size(void);
>   
>   /*
>    * Read the effective value of CTR_EL0.
> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> index 0bf0a835122f..0b26d53790a8 100644
> --- a/arch/arm64/kernel/cacheinfo.c
> +++ b/arch/arm64/kernel/cacheinfo.c
> @@ -28,6 +28,21 @@
>   #define CLIDR_CTYPE(clidr, level)	\
>   	(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
>   
> +int cache_line_size(void)
> +{
> +	u32 cwg = cache_type_cwg();
> +
> +	if (cwg == 0)
> +		return ARCH_DMA_MINALIGN;
> +#ifdef CONFIG_ACPI
> +	/* compare cache line size detected from PPTT with CWG reporting */
> +	if (coherency_max_size > (4 << cwg))
> +		return coherency_max_size;
> +#endif
> +
> +	return 4 << cwg;
> +}
> +
>   static inline enum cache_type get_cache_type(int level)
>   {
>   	u64 clidr;
>
Catalin Marinas April 27, 2019, 4:12 p.m. UTC | #2
On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote:
> On 4/25/19 8:40 PM, Shaokun Zhang wrote:
> > cache_line_size is derived from CTR_EL0.CWG field and is called mostly
> > for I/O device drivers. For HiSilicon certain plantform, like the
> 
> But there are core users too? Thinkgs like blk-mq, the trace ring buffer,
> iommu/iova, slub/slab.

cache_line_size() is indeed used in the core parts of the kernel, for
example when passing SLAB_HWCACHE_ALIGN on kmem_cache creation. Its
meaning is performance rather than coherency as we use ARCH_DMA_MINALIGN
for the latter.

> And a quick look seems to indicate a number of those
> users are going to be checking the cache line size before the cachinfo is
> populated (it happens fairly late via device_initcall() and a hp notifier).
> Is it going to be a problem if the value changes?

That's a good point. At a quick look I didn't see anything that would be
affected by a non-constant cache_line_size().
Catalin Marinas April 27, 2019, 4:16 p.m. UTC | #3
On Fri, Apr 26, 2019 at 09:40:21AM +0800, Shaokun Zhang wrote:
> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
> index 926434f413fa..f120d48b27ac 100644
> --- a/arch/arm64/include/asm/cache.h
> +++ b/arch/arm64/include/asm/cache.h
> @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void)
>  
>  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>  
> -static inline int cache_line_size(void)
> -{
> -	u32 cwg = cache_type_cwg();
> -	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
> -}
> +extern int cache_line_size(void);

Nitpick: no need for 'extern' on function prototypes.

>  /*
>   * Read the effective value of CTR_EL0.
> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
> index 0bf0a835122f..0b26d53790a8 100644
> --- a/arch/arm64/kernel/cacheinfo.c
> +++ b/arch/arm64/kernel/cacheinfo.c
> @@ -28,6 +28,21 @@
>  #define CLIDR_CTYPE(clidr, level)	\
>  	(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
>  
> +int cache_line_size(void)
> +{
> +	u32 cwg = cache_type_cwg();
> +
> +	if (cwg == 0)
> +		return ARCH_DMA_MINALIGN;
> +#ifdef CONFIG_ACPI
> +	/* compare cache line size detected from PPTT with CWG reporting */
> +	if (coherency_max_size > (4 << cwg))
> +		return coherency_max_size;
> +#endif
> +
> +	return 4 << cwg;
> +}

I'd rather have cache_line_size() report the PPTT information if
available, ignoring CWG with a fallback to the latter if not available.

We don't use cache_line_size() for DMA cache coherency, only
performance, so I think it's safe to return a value smaller than CWG in
cache_line_size().
Sudeep Holla April 29, 2019, 11:06 a.m. UTC | #4
On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote:
> Hi,
>
> On 4/25/19 8:40 PM, Shaokun Zhang wrote:
> > cache_line_size is derived from CTR_EL0.CWG field and is called mostly
> > for I/O device drivers. For HiSilicon certain plantform, like the
>
> But there are core users too? Thinkgs like blk-mq, the trace ring buffer,
> iommu/iova, slub/slab. And a quick look seems to indicate a number of those
> users are going to be checking the cache line size before the cachinfo is
> populated (it happens fairly late via device_initcall() and a hp notifier).
> Is it going to be a problem if the value changes?
>

Yes, I agree with that and share the same concern. If the users of these
can't get updated with the new value once cacheinfo is populated, then
we need to figure to solve this differently(I mean still from PPTT or
firmware info as we don't have anything more reliable).

--
Regards,
Sudeep
Sudeep Holla April 29, 2019, 11:12 a.m. UTC | #5
On Sat, Apr 27, 2019 at 05:12:44PM +0100, Catalin Marinas wrote:
> On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote:
> > On 4/25/19 8:40 PM, Shaokun Zhang wrote:
> > > cache_line_size is derived from CTR_EL0.CWG field and is called mostly
> > > for I/O device drivers. For HiSilicon certain plantform, like the
> >
> > But there are core users too? Thinkgs like blk-mq, the trace ring buffer,
> > iommu/iova, slub/slab.
>
> cache_line_size() is indeed used in the core parts of the kernel, for
> example when passing SLAB_HWCACHE_ALIGN on kmem_cache creation. Its
> meaning is performance rather than coherency as we use ARCH_DMA_MINALIGN
> for the latter.
>
> > And a quick look seems to indicate a number of those
> > users are going to be checking the cache line size before the cachinfo is
> > populated (it happens fairly late via device_initcall() and a hp notifier).
> > Is it going to be a problem if the value changes?
>
> That's a good point. At a quick look I didn't see anything that would be
> affected by a non-constant cache_line_size().
>
Ah, that's good. But won't it still affect early boot allocations(if the
smaller init time value is used before cacheinfo is populated) ? Sorry,
I haven't looked at all the uses of cache_line_size().

But if cache_line_size() takes care of reading updated value and impact on
boot time allocations are minimal, then this solution should be fine.

--
Regards,
Sudeep
Shaokun Zhang April 30, 2019, 1:32 a.m. UTC | #6
Hi Catalin,

On 2019/4/28 0:16, Catalin Marinas wrote:
> On Fri, Apr 26, 2019 at 09:40:21AM +0800, Shaokun Zhang wrote:
>> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
>> index 926434f413fa..f120d48b27ac 100644
>> --- a/arch/arm64/include/asm/cache.h
>> +++ b/arch/arm64/include/asm/cache.h
>> @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void)
>>  
>>  #define __read_mostly __attribute__((__section__(".data..read_mostly")))
>>  
>> -static inline int cache_line_size(void)
>> -{
>> -	u32 cwg = cache_type_cwg();
>> -	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
>> -}
>> +extern int cache_line_size(void);
> 
> Nitpick: no need for 'extern' on function prototypes.
> 

Oh, yes, my bad mistake.

>>  /*
>>   * Read the effective value of CTR_EL0.
>> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
>> index 0bf0a835122f..0b26d53790a8 100644
>> --- a/arch/arm64/kernel/cacheinfo.c
>> +++ b/arch/arm64/kernel/cacheinfo.c
>> @@ -28,6 +28,21 @@
>>  #define CLIDR_CTYPE(clidr, level)	\
>>  	(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
>>  
>> +int cache_line_size(void)
>> +{
>> +	u32 cwg = cache_type_cwg();
>> +
>> +	if (cwg == 0)
>> +		return ARCH_DMA_MINALIGN;
>> +#ifdef CONFIG_ACPI
>> +	/* compare cache line size detected from PPTT with CWG reporting */
>> +	if (coherency_max_size > (4 << cwg))
>> +		return coherency_max_size;
>> +#endif
>> +
>> +	return 4 << cwg;
>> +}
> 
> I'd rather have cache_line_size() report the PPTT information if
> available, ignoring CWG with a fallback to the latter if not available.
> 

Okay, got it, I will follow it in next version.

> We don't use cache_line_size() for DMA cache coherency, only
> performance, so I think it's safe to return a value smaller than CWG in
> cache_line_size().

Agree, a nice idea.

Thanks,
Shaokun

>
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h
index 926434f413fa..f120d48b27ac 100644
--- a/arch/arm64/include/asm/cache.h
+++ b/arch/arm64/include/asm/cache.h
@@ -91,11 +91,7 @@  static inline u32 cache_type_cwg(void)
 
 #define __read_mostly __attribute__((__section__(".data..read_mostly")))
 
-static inline int cache_line_size(void)
-{
-	u32 cwg = cache_type_cwg();
-	return cwg ? 4 << cwg : ARCH_DMA_MINALIGN;
-}
+extern int cache_line_size(void);
 
 /*
  * Read the effective value of CTR_EL0.
diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c
index 0bf0a835122f..0b26d53790a8 100644
--- a/arch/arm64/kernel/cacheinfo.c
+++ b/arch/arm64/kernel/cacheinfo.c
@@ -28,6 +28,21 @@ 
 #define CLIDR_CTYPE(clidr, level)	\
 	(((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level))
 
+int cache_line_size(void)
+{
+	u32 cwg = cache_type_cwg();
+
+	if (cwg == 0)
+		return ARCH_DMA_MINALIGN;
+#ifdef CONFIG_ACPI
+	/* compare cache line size detected from PPTT with CWG reporting */
+	if (coherency_max_size > (4 << cwg))
+		return coherency_max_size;
+#endif
+
+	return 4 << cwg;
+}
+
 static inline enum cache_type get_cache_type(int level)
 {
 	u64 clidr;