Message ID | 1556242821-5080-2-git-send-email-zhangshaokun@hisilicon.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [1/2] ACPI/PPTT: Add variable to record max cache line size | expand |
Hi, On 4/25/19 8:40 PM, Shaokun Zhang wrote: > cache_line_size is derived from CTR_EL0.CWG field and is called mostly > for I/O device drivers. For HiSilicon certain plantform, like the But there are core users too? Thinkgs like blk-mq, the trace ring buffer, iommu/iova, slub/slab. And a quick look seems to indicate a number of those users are going to be checking the cache line size before the cachinfo is populated (it happens fairly late via device_initcall() and a hp notifier). Is it going to be a problem if the value changes? > Kunpeng920 server SoC, cache line sizes are different between L1/2 > cache and L3 cache while L1 cache line size is 64-byte and L3 is 128-byte, > but CTR_EL0.CWG is misreporting using L1 cache line size. > > We shall correct the right value which is important for I/O performance. > Let's update the cache line size if it is detected from PPTT information > when it is larger than CTR_EL0.CWG reporting. > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Will Deacon <will.deacon@arm.com> > Reported-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com> > Suggested-by: Catalin Marinas <catalin.marinas@arm.com> > Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> > --- > arch/arm64/include/asm/cache.h | 6 +----- > arch/arm64/kernel/cacheinfo.c | 15 +++++++++++++++ > 2 files changed, 16 insertions(+), 5 deletions(-) > > diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h > index 926434f413fa..f120d48b27ac 100644 > --- a/arch/arm64/include/asm/cache.h > +++ b/arch/arm64/include/asm/cache.h > @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void) > > #define __read_mostly __attribute__((__section__(".data..read_mostly"))) > > -static inline int cache_line_size(void) > -{ > - u32 cwg = cache_type_cwg(); > - return cwg ? 4 << cwg : ARCH_DMA_MINALIGN; > -} > +extern int cache_line_size(void); > > /* > * Read the effective value of CTR_EL0. > diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c > index 0bf0a835122f..0b26d53790a8 100644 > --- a/arch/arm64/kernel/cacheinfo.c > +++ b/arch/arm64/kernel/cacheinfo.c > @@ -28,6 +28,21 @@ > #define CLIDR_CTYPE(clidr, level) \ > (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level)) > > +int cache_line_size(void) > +{ > + u32 cwg = cache_type_cwg(); > + > + if (cwg == 0) > + return ARCH_DMA_MINALIGN; > +#ifdef CONFIG_ACPI > + /* compare cache line size detected from PPTT with CWG reporting */ > + if (coherency_max_size > (4 << cwg)) > + return coherency_max_size; > +#endif > + > + return 4 << cwg; > +} > + > static inline enum cache_type get_cache_type(int level) > { > u64 clidr; >
On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote: > On 4/25/19 8:40 PM, Shaokun Zhang wrote: > > cache_line_size is derived from CTR_EL0.CWG field and is called mostly > > for I/O device drivers. For HiSilicon certain plantform, like the > > But there are core users too? Thinkgs like blk-mq, the trace ring buffer, > iommu/iova, slub/slab. cache_line_size() is indeed used in the core parts of the kernel, for example when passing SLAB_HWCACHE_ALIGN on kmem_cache creation. Its meaning is performance rather than coherency as we use ARCH_DMA_MINALIGN for the latter. > And a quick look seems to indicate a number of those > users are going to be checking the cache line size before the cachinfo is > populated (it happens fairly late via device_initcall() and a hp notifier). > Is it going to be a problem if the value changes? That's a good point. At a quick look I didn't see anything that would be affected by a non-constant cache_line_size().
On Fri, Apr 26, 2019 at 09:40:21AM +0800, Shaokun Zhang wrote: > diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h > index 926434f413fa..f120d48b27ac 100644 > --- a/arch/arm64/include/asm/cache.h > +++ b/arch/arm64/include/asm/cache.h > @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void) > > #define __read_mostly __attribute__((__section__(".data..read_mostly"))) > > -static inline int cache_line_size(void) > -{ > - u32 cwg = cache_type_cwg(); > - return cwg ? 4 << cwg : ARCH_DMA_MINALIGN; > -} > +extern int cache_line_size(void); Nitpick: no need for 'extern' on function prototypes. > /* > * Read the effective value of CTR_EL0. > diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c > index 0bf0a835122f..0b26d53790a8 100644 > --- a/arch/arm64/kernel/cacheinfo.c > +++ b/arch/arm64/kernel/cacheinfo.c > @@ -28,6 +28,21 @@ > #define CLIDR_CTYPE(clidr, level) \ > (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level)) > > +int cache_line_size(void) > +{ > + u32 cwg = cache_type_cwg(); > + > + if (cwg == 0) > + return ARCH_DMA_MINALIGN; > +#ifdef CONFIG_ACPI > + /* compare cache line size detected from PPTT with CWG reporting */ > + if (coherency_max_size > (4 << cwg)) > + return coherency_max_size; > +#endif > + > + return 4 << cwg; > +} I'd rather have cache_line_size() report the PPTT information if available, ignoring CWG with a fallback to the latter if not available. We don't use cache_line_size() for DMA cache coherency, only performance, so I think it's safe to return a value smaller than CWG in cache_line_size().
On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote: > Hi, > > On 4/25/19 8:40 PM, Shaokun Zhang wrote: > > cache_line_size is derived from CTR_EL0.CWG field and is called mostly > > for I/O device drivers. For HiSilicon certain plantform, like the > > But there are core users too? Thinkgs like blk-mq, the trace ring buffer, > iommu/iova, slub/slab. And a quick look seems to indicate a number of those > users are going to be checking the cache line size before the cachinfo is > populated (it happens fairly late via device_initcall() and a hp notifier). > Is it going to be a problem if the value changes? > Yes, I agree with that and share the same concern. If the users of these can't get updated with the new value once cacheinfo is populated, then we need to figure to solve this differently(I mean still from PPTT or firmware info as we don't have anything more reliable). -- Regards, Sudeep
On Sat, Apr 27, 2019 at 05:12:44PM +0100, Catalin Marinas wrote: > On Fri, Apr 26, 2019 at 12:18:33PM -0500, Jeremy Linton wrote: > > On 4/25/19 8:40 PM, Shaokun Zhang wrote: > > > cache_line_size is derived from CTR_EL0.CWG field and is called mostly > > > for I/O device drivers. For HiSilicon certain plantform, like the > > > > But there are core users too? Thinkgs like blk-mq, the trace ring buffer, > > iommu/iova, slub/slab. > > cache_line_size() is indeed used in the core parts of the kernel, for > example when passing SLAB_HWCACHE_ALIGN on kmem_cache creation. Its > meaning is performance rather than coherency as we use ARCH_DMA_MINALIGN > for the latter. > > > And a quick look seems to indicate a number of those > > users are going to be checking the cache line size before the cachinfo is > > populated (it happens fairly late via device_initcall() and a hp notifier). > > Is it going to be a problem if the value changes? > > That's a good point. At a quick look I didn't see anything that would be > affected by a non-constant cache_line_size(). > Ah, that's good. But won't it still affect early boot allocations(if the smaller init time value is used before cacheinfo is populated) ? Sorry, I haven't looked at all the uses of cache_line_size(). But if cache_line_size() takes care of reading updated value and impact on boot time allocations are minimal, then this solution should be fine. -- Regards, Sudeep
Hi Catalin, On 2019/4/28 0:16, Catalin Marinas wrote: > On Fri, Apr 26, 2019 at 09:40:21AM +0800, Shaokun Zhang wrote: >> diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h >> index 926434f413fa..f120d48b27ac 100644 >> --- a/arch/arm64/include/asm/cache.h >> +++ b/arch/arm64/include/asm/cache.h >> @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void) >> >> #define __read_mostly __attribute__((__section__(".data..read_mostly"))) >> >> -static inline int cache_line_size(void) >> -{ >> - u32 cwg = cache_type_cwg(); >> - return cwg ? 4 << cwg : ARCH_DMA_MINALIGN; >> -} >> +extern int cache_line_size(void); > > Nitpick: no need for 'extern' on function prototypes. > Oh, yes, my bad mistake. >> /* >> * Read the effective value of CTR_EL0. >> diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c >> index 0bf0a835122f..0b26d53790a8 100644 >> --- a/arch/arm64/kernel/cacheinfo.c >> +++ b/arch/arm64/kernel/cacheinfo.c >> @@ -28,6 +28,21 @@ >> #define CLIDR_CTYPE(clidr, level) \ >> (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level)) >> >> +int cache_line_size(void) >> +{ >> + u32 cwg = cache_type_cwg(); >> + >> + if (cwg == 0) >> + return ARCH_DMA_MINALIGN; >> +#ifdef CONFIG_ACPI >> + /* compare cache line size detected from PPTT with CWG reporting */ >> + if (coherency_max_size > (4 << cwg)) >> + return coherency_max_size; >> +#endif >> + >> + return 4 << cwg; >> +} > > I'd rather have cache_line_size() report the PPTT information if > available, ignoring CWG with a fallback to the latter if not available. > Okay, got it, I will follow it in next version. > We don't use cache_line_size() for DMA cache coherency, only > performance, so I think it's safe to return a value smaller than CWG in > cache_line_size(). Agree, a nice idea. Thanks, Shaokun >
diff --git a/arch/arm64/include/asm/cache.h b/arch/arm64/include/asm/cache.h index 926434f413fa..f120d48b27ac 100644 --- a/arch/arm64/include/asm/cache.h +++ b/arch/arm64/include/asm/cache.h @@ -91,11 +91,7 @@ static inline u32 cache_type_cwg(void) #define __read_mostly __attribute__((__section__(".data..read_mostly"))) -static inline int cache_line_size(void) -{ - u32 cwg = cache_type_cwg(); - return cwg ? 4 << cwg : ARCH_DMA_MINALIGN; -} +extern int cache_line_size(void); /* * Read the effective value of CTR_EL0. diff --git a/arch/arm64/kernel/cacheinfo.c b/arch/arm64/kernel/cacheinfo.c index 0bf0a835122f..0b26d53790a8 100644 --- a/arch/arm64/kernel/cacheinfo.c +++ b/arch/arm64/kernel/cacheinfo.c @@ -28,6 +28,21 @@ #define CLIDR_CTYPE(clidr, level) \ (((clidr) & CLIDR_CTYPE_MASK(level)) >> CLIDR_CTYPE_SHIFT(level)) +int cache_line_size(void) +{ + u32 cwg = cache_type_cwg(); + + if (cwg == 0) + return ARCH_DMA_MINALIGN; +#ifdef CONFIG_ACPI + /* compare cache line size detected from PPTT with CWG reporting */ + if (coherency_max_size > (4 << cwg)) + return coherency_max_size; +#endif + + return 4 << cwg; +} + static inline enum cache_type get_cache_type(int level) { u64 clidr;
cache_line_size is derived from CTR_EL0.CWG field and is called mostly for I/O device drivers. For HiSilicon certain plantform, like the Kunpeng920 server SoC, cache line sizes are different between L1/2 cache and L3 cache while L1 cache line size is 64-byte and L3 is 128-byte, but CTR_EL0.CWG is misreporting using L1 cache line size. We shall correct the right value which is important for I/O performance. Let's update the cache line size if it is detected from PPTT information when it is larger than CTR_EL0.CWG reporting. Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Will Deacon <will.deacon@arm.com> Reported-by: Zhenfa Qiu <qiuzhenfa@hisilicon.com> Suggested-by: Catalin Marinas <catalin.marinas@arm.com> Signed-off-by: Shaokun Zhang <zhangshaokun@hisilicon.com> --- arch/arm64/include/asm/cache.h | 6 +----- arch/arm64/kernel/cacheinfo.c | 15 +++++++++++++++ 2 files changed, 16 insertions(+), 5 deletions(-)