Message ID | tencent_6D851035F6F2FD0B5A69FB391AE39AC6300A@qq.com (mailing list archive) |
---|---|
State | Accepted |
Commit | 542124fc0d5cccea273bccf2ee58545ac17aad3f |
Headers | show |
Series | RISC-V: only flush icache when it has VM_EXEC set | expand |
Hi Yangyu, On 09/01/2024 19:48, Yangyu Chen wrote: > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores > in the system is very costly, limiting flush_icache_mm to be called only > when vma->vm_flags has VM_EXEC can help minimize the frequency of these > operations. It improves performance and reduces disturbances when Which platform did you test this patchset? Do you have any measurement of the performance improvements? > copy_from_user_page is needed such as profiling with perf. > > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC > flags in the future since we have checked it in the __set_pte_at function. > > Signed-off-by: Yangyu Chen <cyy@cyyself.name> > --- > arch/riscv/include/asm/cacheflush.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h > index 3cb53c4df27c..915f532dc336 100644 > --- a/arch/riscv/include/asm/cacheflush.h > +++ b/arch/riscv/include/asm/cacheflush.h > @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) > * so instead we just flush the whole thing. > */ > #define flush_icache_range(start, end) flush_icache_all() > -#define flush_icache_user_page(vma, pg, addr, len) \ > - flush_icache_mm(vma->vm_mm, 0) > +#define flush_icache_user_page(vma, pg, addr, len) \ > +do { \ > + if (vma->vm_flags & VM_EXEC) \ > + flush_icache_mm(vma->vm_mm, 0); \ > +} while (0) > > #ifdef CONFIG_64BIT > #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) Otherwise, it looks good to me, so you can add: Reviewed-by: Alexandre Ghiti <alexghiti@rivosinc.com> Thanks, Alex
On Wed, Jan 10, 2024 at 02:48:59AM +0800, Yangyu Chen wrote: > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores > in the system is very costly, limiting flush_icache_mm to be called only > when vma->vm_flags has VM_EXEC can help minimize the frequency of these > operations. It improves performance and reduces disturbances when > copy_from_user_page is needed such as profiling with perf. Hi Yangyu, Since this is for "performance", can you please add the benchmark, test platform and performance numbers in your commit msg? thanks > > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC > flags in the future since we have checked it in the __set_pte_at function. > > Signed-off-by: Yangyu Chen <cyy@cyyself.name> > --- > arch/riscv/include/asm/cacheflush.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h > index 3cb53c4df27c..915f532dc336 100644 > --- a/arch/riscv/include/asm/cacheflush.h > +++ b/arch/riscv/include/asm/cacheflush.h > @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) > * so instead we just flush the whole thing. > */ > #define flush_icache_range(start, end) flush_icache_all() > -#define flush_icache_user_page(vma, pg, addr, len) \ > - flush_icache_mm(vma->vm_mm, 0) > +#define flush_icache_user_page(vma, pg, addr, len) \ > +do { \ > + if (vma->vm_flags & VM_EXEC) \ > + flush_icache_mm(vma->vm_mm, 0); \ > +} while (0) > > #ifdef CONFIG_64BIT > #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) > -- > 2.43.0 >
On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote: > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores > in the system is very costly, limiting flush_icache_mm to be called only > when vma->vm_flags has VM_EXEC can help minimize the frequency of these > operations. It improves performance and reduces disturbances when > copy_from_user_page is needed such as profiling with perf. > > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC > flags in the future since we have checked it in the __set_pte_at function. > > Signed-off-by: Yangyu Chen <cyy@cyyself.name> > --- > arch/riscv/include/asm/cacheflush.h | 7 +++++-- > 1 file changed, 5 insertions(+), 2 deletions(-) > > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h > index 3cb53c4df27c..915f532dc336 100644 > --- a/arch/riscv/include/asm/cacheflush.h > +++ b/arch/riscv/include/asm/cacheflush.h > @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) > * so instead we just flush the whole thing. > */ > #define flush_icache_range(start, end) flush_icache_all() > -#define flush_icache_user_page(vma, pg, addr, len) \ > - flush_icache_mm(vma->vm_mm, 0) > +#define flush_icache_user_page(vma, pg, addr, len) \ > +do { \ > + if (vma->vm_flags & VM_EXEC) \ > + flush_icache_mm(vma->vm_mm, 0); \ > +} while (0) > > #ifdef CONFIG_64BIT > #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) I'm not super worried about the benchmarks, I think we can just open-loop assume this is faster by avoiding the flushes. I do think we need a hook into at least tlb_update_vma_flags(), though, to insert the fence.i when upgrading a mapping to include VM_EXEC.
On Wed, Mar 20, 2024 at 1:48 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: > > On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote: > > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores > > in the system is very costly, limiting flush_icache_mm to be called only > > when vma->vm_flags has VM_EXEC can help minimize the frequency of these > > operations. It improves performance and reduces disturbances when > > copy_from_user_page is needed such as profiling with perf. > > > > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC > > flags in the future since we have checked it in the __set_pte_at function. > > > > Signed-off-by: Yangyu Chen <cyy@cyyself.name> > > --- > > arch/riscv/include/asm/cacheflush.h | 7 +++++-- > > 1 file changed, 5 insertions(+), 2 deletions(-) > > > > diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h > > index 3cb53c4df27c..915f532dc336 100644 > > --- a/arch/riscv/include/asm/cacheflush.h > > +++ b/arch/riscv/include/asm/cacheflush.h > > @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) > > * so instead we just flush the whole thing. > > */ > > #define flush_icache_range(start, end) flush_icache_all() > > -#define flush_icache_user_page(vma, pg, addr, len) \ > > - flush_icache_mm(vma->vm_mm, 0) > > +#define flush_icache_user_page(vma, pg, addr, len) \ > > +do { \ > > + if (vma->vm_flags & VM_EXEC) \ > > + flush_icache_mm(vma->vm_mm, 0); \ > > +} while (0) > > > > #ifdef CONFIG_64BIT > > #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) > > I'm not super worried about the benchmarks, I think we can just > open-loop assume this is faster by avoiding the flushes. I do think we > need a hook into at least tlb_update_vma_flags(), though, to insert the > fence.i when upgrading a mapping to include VM_EXEC. I'd say Yangyu is right when he mentions in the commit log: "For I-D coherence concerns, it will not fail if such a page adds VM_EXEC flags in the future since we have checked it in the __set_pte_at function.". If a region indeed becomes executable, the page table will be modified to reflect that and then will pass in __set_pte_at() which, as Yangyu mentions, does the right thing. Or am I missing something? Thanks, Alex
> On Mar 22, 2024, at 23:50, Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > On Wed, Mar 20, 2024 at 1:48 AM Palmer Dabbelt <palmer@dabbelt.com> wrote: >> >> On Tue, 09 Jan 2024 10:48:59 PST (-0800), cyy@cyyself.name wrote: >>> As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores >>> in the system is very costly, limiting flush_icache_mm to be called only >>> when vma->vm_flags has VM_EXEC can help minimize the frequency of these >>> operations. It improves performance and reduces disturbances when >>> copy_from_user_page is needed such as profiling with perf. >>> >>> For I-D coherence concerns, it will not fail if such a page adds VM_EXEC >>> flags in the future since we have checked it in the __set_pte_at function. >>> >>> Signed-off-by: Yangyu Chen <cyy@cyyself.name> >>> --- >>> arch/riscv/include/asm/cacheflush.h | 7 +++++-- >>> 1 file changed, 5 insertions(+), 2 deletions(-) >>> >>> diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h >>> index 3cb53c4df27c..915f532dc336 100644 >>> --- a/arch/riscv/include/asm/cacheflush.h >>> +++ b/arch/riscv/include/asm/cacheflush.h >>> @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) >>> * so instead we just flush the whole thing. >>> */ >>> #define flush_icache_range(start, end) flush_icache_all() >>> -#define flush_icache_user_page(vma, pg, addr, len) \ >>> - flush_icache_mm(vma->vm_mm, 0) >>> +#define flush_icache_user_page(vma, pg, addr, len) \ >>> +do { \ >>> + if (vma->vm_flags & VM_EXEC) \ >>> + flush_icache_mm(vma->vm_mm, 0); \ >>> +} while (0) >>> >>> #ifdef CONFIG_64BIT >>> #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end) >> >> I'm not super worried about the benchmarks, I think we can just >> open-loop assume this is faster by avoiding the flushes. I do think we >> need a hook into at least tlb_update_vma_flags(), though, to insert the >> fence.i when upgrading a mapping to include VM_EXEC. > > I'd say Yangyu is right when he mentions in the commit log: "For I-D > coherence concerns, it will not fail if such a page adds VM_EXEC flags > in the future since we have checked it in the __set_pte_at function.". > If a region indeed becomes executable, the page table will be modified > to reflect that and then will pass in __set_pte_at() which, as Yangyu > mentions, does the right thing. Or am I missing something? > I think so. Unless we have any other way to update PTE rather than using __set_pte_at, I think it’s safe. I’m too busy these days to provide a micro enough benchmark. Thanks, Yangyu Chen
Hello: This patch was applied to riscv/linux.git (for-next) by Palmer Dabbelt <palmer@rivosinc.com>: On Wed, 10 Jan 2024 02:48:59 +0800 you wrote: > As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores > in the system is very costly, limiting flush_icache_mm to be called only > when vma->vm_flags has VM_EXEC can help minimize the frequency of these > operations. It improves performance and reduces disturbances when > copy_from_user_page is needed such as profiling with perf. > > For I-D coherence concerns, it will not fail if such a page adds VM_EXEC > flags in the future since we have checked it in the __set_pte_at function. > > [...] Here is the summary with links: - RISC-V: only flush icache when it has VM_EXEC set https://git.kernel.org/riscv/c/542124fc0d5c You are awesome, thank you!
diff --git a/arch/riscv/include/asm/cacheflush.h b/arch/riscv/include/asm/cacheflush.h index 3cb53c4df27c..915f532dc336 100644 --- a/arch/riscv/include/asm/cacheflush.h +++ b/arch/riscv/include/asm/cacheflush.h @@ -33,8 +33,11 @@ static inline void flush_dcache_page(struct page *page) * so instead we just flush the whole thing. */ #define flush_icache_range(start, end) flush_icache_all() -#define flush_icache_user_page(vma, pg, addr, len) \ - flush_icache_mm(vma->vm_mm, 0) +#define flush_icache_user_page(vma, pg, addr, len) \ +do { \ + if (vma->vm_flags & VM_EXEC) \ + flush_icache_mm(vma->vm_mm, 0); \ +} while (0) #ifdef CONFIG_64BIT #define flush_cache_vmap(start, end) flush_tlb_kernel_range(start, end)
As I-Cache flush on current RISC-V needs to send IPIs to every CPU cores in the system is very costly, limiting flush_icache_mm to be called only when vma->vm_flags has VM_EXEC can help minimize the frequency of these operations. It improves performance and reduces disturbances when copy_from_user_page is needed such as profiling with perf. For I-D coherence concerns, it will not fail if such a page adds VM_EXEC flags in the future since we have checked it in the __set_pte_at function. Signed-off-by: Yangyu Chen <cyy@cyyself.name> --- arch/riscv/include/asm/cacheflush.h | 7 +++++-- 1 file changed, 5 insertions(+), 2 deletions(-)