Message ID | 20180913092812.247989787@infradead.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | my generic mmu_gather patches | expand |
Hi Peter, On Thu, Sep 13, 2018 at 11:21:17AM +0200, Peter Zijlstra wrote: > Generic mmu_gather provides everything that ARM needs: > > - range tracking > - RCU table free > - VM_EXEC tracking > - VIPT cache flushing > > The one notable curiosity is the 'funny' range tracking for classical > ARM in __pte_free_tlb(). > > Cc: Will Deacon <will.deacon@arm.com> > Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> > Cc: Andrew Morton <akpm@linux-foundation.org> > Cc: Nick Piggin <npiggin@gmail.com> > Cc: Russell King <linux@armlinux.org.uk> > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> > --- > arch/arm/include/asm/tlb.h | 255 ++------------------------------------------- > 1 file changed, 14 insertions(+), 241 deletions(-) So whilst I was reviewing this, I realised that I think we should be selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE. Whilst we don't distinguish between invalidation of intermediate and leaf levels on 32-bit, the CPU is still permitted to cache partial translation table walks even if the leaf entry indicates a fault. That means that after tearing down the PTEs, we can still get walk cache allocations and so if the RCU batching of the page tables fails, we need to invalidate the TLB after clearing the intermediate entries but before freeing them. > -static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, > - unsigned long addr) > +__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr) > { > pgtable_page_dtor(pte); > > -#ifdef CONFIG_ARM_LPAE > - tlb_add_flush(tlb, addr); > -#else > +#ifndef CONFIG_ARM_LPAE > /* > * With the classic ARM MMU, a pte page has two corresponding pmd > * entries, each covering 1MB. > */ > - addr &= PMD_MASK; > - tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE); > - tlb_add_flush(tlb, addr + SZ_1M); > + addr = (addr & PMD_MASK) + SZ_1M; > + __tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE); Hmm, I don't think you've got the range correct here. Don't we want something like: __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE) to ensure that we flush on both sides of the 1M boundary? Will
On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote: > So whilst I was reviewing this, I realised that I think we should be > selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE. Yes very much so. Let me invert that option, you normally want that, except if you don't natively use the linux page-tables. --- Subject: asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE From: Peter Zijlstra <peterz@infradead.org> Date: Wed Sep 19 13:24:41 CEST 2018 Make issuing a TLB invalidate for page-table pages the normal case. The reason is twofold: - too many invalidates is safer than too few, - most architectures use the linux page-tables natively and would this require this. Make it an opt-out, instead of an opt-in. Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/Kconfig | 2 +- arch/arm64/Kconfig | 1 - arch/powerpc/Kconfig | 1 + arch/sparc/Kconfig | 1 + arch/x86/Kconfig | 1 - include/asm-generic/tlb.h | 9 +++++---- mm/mmu_gather.c | 2 +- 7 files changed, 9 insertions(+), 8 deletions(-) --- a/arch/Kconfig +++ b/arch/Kconfig @@ -362,7 +362,7 @@ config HAVE_ARCH_JUMP_LABEL config HAVE_RCU_TABLE_FREE bool -config HAVE_RCU_TABLE_INVALIDATE +config HAVE_RCU_TABLE_NO_INVALIDATE bool config HAVE_MMU_GATHER_PAGE_SIZE --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -142,7 +142,6 @@ config ARM64 select HAVE_PERF_USER_STACK_DUMP select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RCU_TABLE_FREE - select HAVE_RCU_TABLE_INVALIDATE select HAVE_RSEQ select HAVE_STACKPROTECTOR select HAVE_SYSCALL_TRACEPOINTS --- a/arch/powerpc/Kconfig +++ b/arch/powerpc/Kconfig @@ -216,6 +216,7 @@ config PPC select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE if SMP + select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_MMU_GATHER_PAGE_SIZE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if PPC64 && CPU_LITTLE_ENDIAN --- a/arch/sparc/Kconfig +++ b/arch/sparc/Kconfig @@ -64,6 +64,7 @@ config SPARC64 select HAVE_KRETPROBES select HAVE_KPROBES select HAVE_RCU_TABLE_FREE if SMP + select HAVE_RCU_TABLE_NO_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_MEMBLOCK_NODE_MAP select HAVE_ARCH_TRANSPARENT_HUGEPAGE select HAVE_DYNAMIC_FTRACE --- a/arch/x86/Kconfig +++ b/arch/x86/Kconfig @@ -181,7 +181,6 @@ config X86 select HAVE_PERF_REGS select HAVE_PERF_USER_STACK_DUMP select HAVE_RCU_TABLE_FREE if PARAVIRT - select HAVE_RCU_TABLE_INVALIDATE if HAVE_RCU_TABLE_FREE select HAVE_REGS_AND_STACK_ACCESS_API select HAVE_RELIABLE_STACKTRACE if X86_64 && (UNWINDER_FRAME_POINTER || UNWINDER_ORC) && STACK_VALIDATION select HAVE_STACKPROTECTOR if CC_HAS_SANE_STACKPROTECTOR --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -127,11 +127,12 @@ * When used, an architecture is expected to provide __tlb_remove_table() * which does the actual freeing of these pages. * - * HAVE_RCU_TABLE_INVALIDATE + * HAVE_RCU_TABLE_NO_INVALIDATE * - * This makes HAVE_RCU_TABLE_FREE call tlb_flush_mmu_tlbonly() before freeing - * the page-table pages. Required if you use HAVE_RCU_TABLE_FREE and your - * architecture uses the Linux page-tables natively. + * This makes HAVE_RCU_TABLE_FREE avoid calling tlb_flush_mmu_tlbonly() before + * freeing the page-table pages. This can be avoided if you use + * HAVE_RCU_TABLE_FREE and your architecture does _NOT_ use the Linux + * page-tables natively. * */ #define HAVE_GENERIC_MMU_GATHER --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -157,7 +157,7 @@ bool __tlb_remove_page_size(struct mmu_g */ static inline void tlb_table_invalidate(struct mmu_gather *tlb) { -#ifdef CONFIG_HAVE_RCU_TABLE_INVALIDATE +#ifndef CONFIG_HAVE_RCU_TABLE_NO_INVALIDATE /* * Invalidate page-table caches used by hardware walkers. Then we still * need to RCU-sched wait while freeing the pages because software
On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote: > > + addr = (addr & PMD_MASK) + SZ_1M; > > + __tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE); > > Hmm, I don't think you've got the range correct here. Don't we want > something like: > > __tlb_adjust_range(tlb, addr - PAGE_SIZE, 2 * PAGE_SIZE) > > to ensure that we flush on both sides of the 1M boundary? Argh indeed. I confused {start,size} with {start,end}. Thanks!
On Wed, Sep 19, 2018 at 01:28:29PM +0200, Peter Zijlstra wrote: > On Tue, Sep 18, 2018 at 03:10:34PM +0100, Will Deacon wrote: > > > So whilst I was reviewing this, I realised that I think we should be > > selecting HAVE_RCU_TABLE_INVALIDATE for arch/arm/ if HAVE_RCU_TABLE_FREE. > > Yes very much so. Let me invert that option, you normally want that, > except if you don't natively use the linux page-tables. Yeah, inverting this to be opt-out is definitely the safe thing to do. Patch below looks good: Acked-by: Will Deacon <will.deacon@arm.com> Will > --- > Subject: asm-generic/tlb: Invert HAVE_RCU_TABLE_INVALIDATE > From: Peter Zijlstra <peterz@infradead.org> > Date: Wed Sep 19 13:24:41 CEST 2018 > > Make issuing a TLB invalidate for page-table pages the normal case. > > The reason is twofold: > > - too many invalidates is safer than too few, > - most architectures use the linux page-tables natively > and would this require this. > > Make it an opt-out, instead of an opt-in. > > Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org>
--- a/arch/arm/include/asm/tlb.h +++ b/arch/arm/include/asm/tlb.h @@ -33,270 +33,43 @@ #include <asm/pgalloc.h> #include <asm/tlbflush.h> -#define MMU_GATHER_BUNDLE 8 - -#ifdef CONFIG_HAVE_RCU_TABLE_FREE static inline void __tlb_remove_table(void *_table) { free_page_and_swap_cache((struct page *)_table); } -struct mmu_table_batch { - struct rcu_head rcu; - unsigned int nr; - void *tables[0]; -}; - -#define MAX_TABLE_BATCH \ - ((PAGE_SIZE - sizeof(struct mmu_table_batch)) / sizeof(void *)) - -extern void tlb_table_flush(struct mmu_gather *tlb); -extern void tlb_remove_table(struct mmu_gather *tlb, void *table); - -#define tlb_remove_entry(tlb, entry) tlb_remove_table(tlb, entry) -#else -#define tlb_remove_entry(tlb, entry) tlb_remove_page(tlb, entry) -#endif /* CONFIG_HAVE_RCU_TABLE_FREE */ - -/* - * TLB handling. This allows us to remove pages from the page - * tables, and efficiently handle the TLB issues. - */ -struct mmu_gather { - struct mm_struct *mm; -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - struct mmu_table_batch *batch; - unsigned int need_flush; -#endif - unsigned int fullmm; - struct vm_area_struct *vma; - unsigned long start, end; - unsigned long range_start; - unsigned long range_end; - unsigned int nr; - unsigned int max; - struct page **pages; - struct page *local[MMU_GATHER_BUNDLE]; -}; - -DECLARE_PER_CPU(struct mmu_gather, mmu_gathers); - -/* - * This is unnecessarily complex. There's three ways the TLB shootdown - * code is used: - * 1. Unmapping a range of vmas. See zap_page_range(), unmap_region(). - * tlb->fullmm = 0, and tlb_start_vma/tlb_end_vma will be called. - * tlb->vma will be non-NULL. - * 2. Unmapping all vmas. See exit_mmap(). - * tlb->fullmm = 1, and tlb_start_vma/tlb_end_vma will be called. - * tlb->vma will be non-NULL. Additionally, page tables will be freed. - * 3. Unmapping argument pages. See shift_arg_pages(). - * tlb->fullmm = 0, but tlb_start_vma/tlb_end_vma will not be called. - * tlb->vma will be NULL. - */ -static inline void tlb_flush(struct mmu_gather *tlb) -{ - if (tlb->fullmm || !tlb->vma) - flush_tlb_mm(tlb->mm); - else if (tlb->range_end > 0) { - flush_tlb_range(tlb->vma, tlb->range_start, tlb->range_end); - tlb->range_start = TASK_SIZE; - tlb->range_end = 0; - } -} - -static inline void tlb_add_flush(struct mmu_gather *tlb, unsigned long addr) -{ - if (!tlb->fullmm) { - if (addr < tlb->range_start) - tlb->range_start = addr; - if (addr + PAGE_SIZE > tlb->range_end) - tlb->range_end = addr + PAGE_SIZE; - } -} - -static inline void __tlb_alloc_page(struct mmu_gather *tlb) -{ - unsigned long addr = __get_free_pages(GFP_NOWAIT | __GFP_NOWARN, 0); - - if (addr) { - tlb->pages = (void *)addr; - tlb->max = PAGE_SIZE / sizeof(struct page *); - } -} - -static inline void tlb_flush_mmu_tlbonly(struct mmu_gather *tlb) -{ - tlb_flush(tlb); -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - tlb_table_flush(tlb); -#endif -} - -static inline void tlb_flush_mmu_free(struct mmu_gather *tlb) -{ - free_pages_and_swap_cache(tlb->pages, tlb->nr); - tlb->nr = 0; - if (tlb->pages == tlb->local) - __tlb_alloc_page(tlb); -} - -static inline void tlb_flush_mmu(struct mmu_gather *tlb) -{ - tlb_flush_mmu_tlbonly(tlb); - tlb_flush_mmu_free(tlb); -} - -static inline void -arch_tlb_gather_mmu(struct mmu_gather *tlb, struct mm_struct *mm, - unsigned long start, unsigned long end) -{ - tlb->mm = mm; - tlb->fullmm = !(start | (end+1)); - tlb->start = start; - tlb->end = end; - tlb->vma = NULL; - tlb->max = ARRAY_SIZE(tlb->local); - tlb->pages = tlb->local; - tlb->nr = 0; - __tlb_alloc_page(tlb); +#include <asm-generic/tlb.h> -#ifdef CONFIG_HAVE_RCU_TABLE_FREE - tlb->batch = NULL; +#ifndef CONFIG_HAVE_RCU_TABLE_FREE +#define tlb_remove_table(tlb, entry) tlb_remove_page(tlb, entry) #endif -} - -static inline void -arch_tlb_finish_mmu(struct mmu_gather *tlb, - unsigned long start, unsigned long end, bool force) -{ - if (force) { - tlb->range_start = start; - tlb->range_end = end; - } - - tlb_flush_mmu(tlb); - /* keep the page table cache within bounds */ - check_pgt_cache(); - - if (tlb->pages != tlb->local) - free_pages((unsigned long)tlb->pages, 0); -} - -/* - * Memorize the range for the TLB flush. - */ static inline void -tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, unsigned long addr) -{ - tlb_add_flush(tlb, addr); -} - -#define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ - tlb_remove_tlb_entry(tlb, ptep, address) -/* - * In the case of tlb vma handling, we can optimise these away in the - * case where we're doing a full MM flush. When we're doing a munmap, - * the vmas are adjusted to only cover the region to be torn down. - */ -static inline void -tlb_start_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) -{ - if (!tlb->fullmm) { - flush_cache_range(vma, vma->vm_start, vma->vm_end); - tlb->vma = vma; - tlb->range_start = TASK_SIZE; - tlb->range_end = 0; - } -} - -static inline void -tlb_end_vma(struct mmu_gather *tlb, struct vm_area_struct *vma) -{ - if (!tlb->fullmm) - tlb_flush(tlb); -} - -static inline bool __tlb_remove_page(struct mmu_gather *tlb, struct page *page) -{ - tlb->pages[tlb->nr++] = page; - VM_WARN_ON(tlb->nr > tlb->max); - if (tlb->nr == tlb->max) - return true; - return false; -} - -static inline void tlb_remove_page(struct mmu_gather *tlb, struct page *page) -{ - if (__tlb_remove_page(tlb, page)) - tlb_flush_mmu(tlb); -} - -static inline bool __tlb_remove_page_size(struct mmu_gather *tlb, - struct page *page, int page_size) -{ - return __tlb_remove_page(tlb, page); -} - -static inline void tlb_remove_page_size(struct mmu_gather *tlb, - struct page *page, int page_size) -{ - return tlb_remove_page(tlb, page); -} - -static inline void __pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, - unsigned long addr) +__pte_free_tlb(struct mmu_gather *tlb, pgtable_t pte, unsigned long addr) { pgtable_page_dtor(pte); -#ifdef CONFIG_ARM_LPAE - tlb_add_flush(tlb, addr); -#else +#ifndef CONFIG_ARM_LPAE /* * With the classic ARM MMU, a pte page has two corresponding pmd * entries, each covering 1MB. */ - addr &= PMD_MASK; - tlb_add_flush(tlb, addr + SZ_1M - PAGE_SIZE); - tlb_add_flush(tlb, addr + SZ_1M); + addr = (addr & PMD_MASK) + SZ_1M; + __tlb_adjust_range(tlb, addr - PAGE_SIZE, addr + PAGE_SIZE); #endif - tlb_remove_entry(tlb, pte); -} - -static inline void __pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, - unsigned long addr) -{ -#ifdef CONFIG_ARM_LPAE - tlb_add_flush(tlb, addr); - tlb_remove_entry(tlb, virt_to_page(pmdp)); -#endif + tlb_remove_table(tlb, pte); } static inline void -tlb_remove_pmd_tlb_entry(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr) +__pmd_free_tlb(struct mmu_gather *tlb, pmd_t *pmdp, unsigned long addr) { - tlb_add_flush(tlb, addr); -} - -#define pte_free_tlb(tlb, ptep, addr) __pte_free_tlb(tlb, ptep, addr) -#define pmd_free_tlb(tlb, pmdp, addr) __pmd_free_tlb(tlb, pmdp, addr) -#define pud_free_tlb(tlb, pudp, addr) pud_free((tlb)->mm, pudp) - -#define tlb_migrate_finish(mm) do { } while (0) - -static inline void tlb_change_page_size(struct mmu_gather *tlb, - unsigned int page_size) -{ -} - -static inline void tlb_flush_remove_tables(struct mm_struct *mm) -{ -} +#ifdef CONFIG_ARM_LPAE + struct page *page = virt_to_page(pmdp); -static inline void tlb_flush_remove_tables_local(void *arg) -{ + pgtable_pmd_page_dtor(page); + tlb_remove_table(tlb, page); +#endif } #endif /* CONFIG_MMU */
Generic mmu_gather provides everything that ARM needs: - range tracking - RCU table free - VM_EXEC tracking - VIPT cache flushing The one notable curiosity is the 'funny' range tracking for classical ARM in __pte_free_tlb(). Cc: Will Deacon <will.deacon@arm.com> Cc: "Aneesh Kumar K.V" <aneesh.kumar@linux.vnet.ibm.com> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Nick Piggin <npiggin@gmail.com> Cc: Russell King <linux@armlinux.org.uk> Signed-off-by: Peter Zijlstra (Intel) <peterz@infradead.org> --- arch/arm/include/asm/tlb.h | 255 ++------------------------------------------- 1 file changed, 14 insertions(+), 241 deletions(-)