Message ID | 20240131155929.169961-5-alexghiti@rivosinc.com (mailing list archive) |
---|---|
State | RFC |
Headers | show |
Series | Svvptc extension to remove preventive sfence.vma | expand |
On Wed, Jan 31, 2024 at 04:59:29PM +0100, Alexandre Ghiti wrote: > The preventive sfence.vma were emitted because new mappings must be made > visible to the page table walker but Svvptc guarantees that xRET act as > a fence, so no need to sfence.vma for the uarchs that implement this > extension. AFAIU, your first submission shows that you don't need that xRET property. Similarly for other archs. What was rationale behind this Svvptc change? > This allows to drastically reduce the number of sfence.vma emitted: > > * Ubuntu boot to login: > Before: ~630k sfence.vma > After: ~200k sfence.vma > > * ltp - mmapstress01 > Before: ~45k > After: ~6.3k > > * lmbench - lat_pagefault > Before: ~665k > After: 832 (!) > > * lmbench - lat_mmap > Before: ~546k > After: 718 (!) This Svvptc seems to move/add the "burden" of the synchronization to xRET: Perhaps integrate the above counts w/ the perf gains in the cover letter? Andrea
Hi Andrea, On Thu, Feb 1, 2024 at 4:03 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > On Wed, Jan 31, 2024 at 04:59:29PM +0100, Alexandre Ghiti wrote: > > The preventive sfence.vma were emitted because new mappings must be made > > visible to the page table walker but Svvptc guarantees that xRET act as > > a fence, so no need to sfence.vma for the uarchs that implement this > > extension. > > AFAIU, your first submission shows that you don't need that xRET property. > Similarly for other archs. What was rationale behind this Svvptc change? Actually, the ARC has just changed its mind and removed this new behaviour from the Svvptc extension, so we will take some gratuitous page faults (but that should be outliners), which makes riscv similar to x86 and arm64. > > > > This allows to drastically reduce the number of sfence.vma emitted: > > > > * Ubuntu boot to login: > > Before: ~630k sfence.vma > > After: ~200k sfence.vma > > > > * ltp - mmapstress01 > > Before: ~45k > > After: ~6.3k > > > > * lmbench - lat_pagefault > > Before: ~665k > > After: 832 (!) > > > > * lmbench - lat_mmap > > Before: ~546k > > After: 718 (!) > > This Svvptc seems to move/add the "burden" of the synchronization to xRET: > Perhaps integrate the above counts w/ the perf gains in the cover letter? Yes, I'll copy that to the cover letter. Thanks for your interest! Alex > > Andrea
On Fri, Feb 2, 2024 at 4:42 PM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > Hi Andrea, > > On Thu, Feb 1, 2024 at 4:03 PM Andrea Parri <parri.andrea@gmail.com> wrote: > > > > On Wed, Jan 31, 2024 at 04:59:29PM +0100, Alexandre Ghiti wrote: > > > The preventive sfence.vma were emitted because new mappings must be made > > > visible to the page table walker but Svvptc guarantees that xRET act as > > > a fence, so no need to sfence.vma for the uarchs that implement this > > > extension. > > > > AFAIU, your first submission shows that you don't need that xRET property. > > Similarly for other archs. What was rationale behind this Svvptc change? > > Actually, the ARC has just changed its mind and removed this new The wording was incorrect here, the ARC did not state anything, the author of Svvptc proposed an amended version of the spec that removes this behaviour and that's under discussion. > behaviour from the Svvptc extension, so we will take some gratuitous > page faults (but that should be outliners), which makes riscv similar > to x86 and arm64. > > > > > > > > This allows to drastically reduce the number of sfence.vma emitted: > > > > > > * Ubuntu boot to login: > > > Before: ~630k sfence.vma > > > After: ~200k sfence.vma > > > > > > * ltp - mmapstress01 > > > Before: ~45k > > > After: ~6.3k > > > > > > * lmbench - lat_pagefault > > > Before: ~665k > > > After: 832 (!) > > > > > > * lmbench - lat_mmap > > > Before: ~546k > > > After: 718 (!) > > > > This Svvptc seems to move/add the "burden" of the synchronization to xRET: > > Perhaps integrate the above counts w/ the perf gains in the cover letter? > > Yes, I'll copy that to the cover letter. > > Thanks for your interest! > > Alex > > > > > Andrea
Hi Alex, On Thu, Feb 1, 2024 at 12:04 AM Alexandre Ghiti <alexghiti@rivosinc.com> wrote: > > The preventive sfence.vma were emitted because new mappings must be made > visible to the page table walker but Svvptc guarantees that xRET act as > a fence, so no need to sfence.vma for the uarchs that implement this > extension. > > This allows to drastically reduce the number of sfence.vma emitted: > > * Ubuntu boot to login: > Before: ~630k sfence.vma > After: ~200k sfence.vma > > * ltp - mmapstress01 > Before: ~45k > After: ~6.3k > > * lmbench - lat_pagefault > Before: ~665k > After: 832 (!) > > * lmbench - lat_mmap > Before: ~546k > After: 718 (!) > > Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> > --- > arch/riscv/include/asm/pgtable.h | 16 +++++++++++++++- > arch/riscv/mm/pgtable.c | 13 +++++++++++++ > 2 files changed, 28 insertions(+), 1 deletion(-) > > diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h > index 0c94260b5d0c..50986e4c4601 100644 > --- a/arch/riscv/include/asm/pgtable.h > +++ b/arch/riscv/include/asm/pgtable.h > @@ -473,6 +473,9 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, > struct vm_area_struct *vma, unsigned long address, > pte_t *ptep, unsigned int nr) > { > + asm_volatile_goto(ALTERNATIVE("nop", "j %l[svvptc]", 0, RISCV_ISA_EXT_SVVPTC, 1) > + : : : : svvptc); > + > /* > * The kernel assumes that TLBs don't cache invalid entries, but > * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a > @@ -482,12 +485,23 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, > */ > while (nr--) > local_flush_tlb_page(address + nr * PAGE_SIZE); > + > +svvptc: > + /* > + * Svvptc guarantees that xRET act as a fence, so when the uarch does > + * not cache invalid entries, we don't have to do anything. > + */ > + ; > } From the perspective of RISC-V arch, the logic of this patch is reasonable. The code of mm comm submodule may be missing update_mmu_cache_range(), for example: there is no flush TLB in remap_pte_range() after updating pte. I will send a patch to mm/ to fix this problem next. Thanks, Yunhui
diff --git a/arch/riscv/include/asm/pgtable.h b/arch/riscv/include/asm/pgtable.h index 0c94260b5d0c..50986e4c4601 100644 --- a/arch/riscv/include/asm/pgtable.h +++ b/arch/riscv/include/asm/pgtable.h @@ -473,6 +473,9 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, struct vm_area_struct *vma, unsigned long address, pte_t *ptep, unsigned int nr) { + asm_volatile_goto(ALTERNATIVE("nop", "j %l[svvptc]", 0, RISCV_ISA_EXT_SVVPTC, 1) + : : : : svvptc); + /* * The kernel assumes that TLBs don't cache invalid entries, but * in RISC-V, SFENCE.VMA specifies an ordering constraint, not a @@ -482,12 +485,23 @@ static inline void update_mmu_cache_range(struct vm_fault *vmf, */ while (nr--) local_flush_tlb_page(address + nr * PAGE_SIZE); + +svvptc: + /* + * Svvptc guarantees that xRET act as a fence, so when the uarch does + * not cache invalid entries, we don't have to do anything. + */ + ; } #define update_mmu_cache(vma, addr, ptep) \ update_mmu_cache_range(NULL, vma, addr, ptep, 1) #define __HAVE_ARCH_UPDATE_MMU_TLB -#define update_mmu_tlb update_mmu_cache +static inline void update_mmu_tlb(struct vm_area_struct *vma, + unsigned long address, pte_t *ptep) +{ + flush_tlb_range(vma, address, address + PAGE_SIZE); +} static inline void update_mmu_cache_pmd(struct vm_area_struct *vma, unsigned long address, pmd_t *pmdp) diff --git a/arch/riscv/mm/pgtable.c b/arch/riscv/mm/pgtable.c index ef887efcb679..99ed389e4c8a 100644 --- a/arch/riscv/mm/pgtable.c +++ b/arch/riscv/mm/pgtable.c @@ -9,6 +9,9 @@ int ptep_set_access_flags(struct vm_area_struct *vma, unsigned long address, pte_t *ptep, pte_t entry, int dirty) { + asm_volatile_goto(ALTERNATIVE("nop", "j %l[svvptc]", 0, RISCV_ISA_EXT_SVVPTC, 1) + : : : : svvptc); + if (!pte_same(ptep_get(ptep), entry)) __set_pte_at(ptep, entry); /* @@ -16,6 +19,16 @@ int ptep_set_access_flags(struct vm_area_struct *vma, * the case that the PTE changed and the spurious fault case. */ return true; + +svvptc: + if (!pte_same(ptep_get(ptep), entry)) { + __set_pte_at(ptep, entry); + /* Here only not svadu is impacted */ + flush_tlb_page(vma, address); + return true; + } + + return false; } int ptep_test_and_clear_young(struct vm_area_struct *vma,
The preventive sfence.vma were emitted because new mappings must be made visible to the page table walker but Svvptc guarantees that xRET act as a fence, so no need to sfence.vma for the uarchs that implement this extension. This allows to drastically reduce the number of sfence.vma emitted: * Ubuntu boot to login: Before: ~630k sfence.vma After: ~200k sfence.vma * ltp - mmapstress01 Before: ~45k After: ~6.3k * lmbench - lat_pagefault Before: ~665k After: 832 (!) * lmbench - lat_mmap Before: ~546k After: 718 (!) Signed-off-by: Alexandre Ghiti <alexghiti@rivosinc.com> --- arch/riscv/include/asm/pgtable.h | 16 +++++++++++++++- arch/riscv/mm/pgtable.c | 13 +++++++++++++ 2 files changed, 28 insertions(+), 1 deletion(-)