Message ID | 20210524090114.63446-8-aneesh.kumar@linux.ibm.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Speedup mremap on ppc64 | expand |
On Sun, May 23, 2021 at 11:04 PM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and > page walk cache where TLB entries are mapped with page size PAGE_SIZE. So I dislike this patch for two reasons: (a) naming. If the ppc people want to use crazy TLA's that have no meaning outside of the powerpc community, that's fine. But only in powerpc code. "pwc" makes no sense to me, or to anybody else that isn't intimately involved in low-level powerpc stuff. I assume it's "page walk cache", but honestly, outside of this area, PWC is mostly used for a specific type of webcam. So there's no way I'd accept this as-is, simply because of that. flush_pte_tlb_pwc_range() is simply not an acceptable name. You would have to spell it out, not use an obscure TLA. But I think you don't even want to do that, because of (b) is this even worth it as a public interface? Why doesn't the powerpc radix TLB flushing code just always flush the page table walking cache when the range is larger than a PMD? Once you have big flush ranges like that, I don't believe it makes any sense not to flush the walking cache too. NOTE! This is particularly true as "flush the walking cache" isn't a well-defined operation anyway. Which _levels_ of the walking cache? Again, the size (and alignment) of the flush would actually tell you. A new boolean "flush" parameter does *NOT* tell that at all. So I think this new interface is mis-named, but I also think it's pointless. Just DTRT automatically when somebody asks for a flush that covers a PMD range (or a PUD range). Linus
Linus Torvalds <torvalds@linux-foundation.org> writes: > On Sun, May 23, 2021 at 11:04 PM Aneesh Kumar K.V > <aneesh.kumar@linux.ibm.com> wrote: >> >> Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and >> page walk cache where TLB entries are mapped with page size PAGE_SIZE. > > So I dislike this patch for two reasons: > > (a) naming. > > If the ppc people want to use crazy TLA's that have no meaning outside > of the powerpc community, that's fine. But only in powerpc code. > > "pwc" makes no sense to me, or to anybody else that isn't intimately > involved in low-level powerpc stuff. I assume it's "page walk cache", > but honestly, outside of this area, PWC is mostly used for a specific > type of webcam. > > So there's no way I'd accept this as-is, simply because of that. > flush_pte_tlb_pwc_range() is simply not an acceptable name. You would > have to spell it out, not use an obscure TLA. > > But I think you don't even want to do that, because of How about flush_tlb_and_page_table_cache() ? > > (b) is this even worth it as a public interface? > > Why doesn't the powerpc radix TLB flushing code just always flush the > page table walking cache when the range is larger than a PMD? > > Once you have big flush ranges like that, I don't believe it makes any > sense not to flush the walking cache too. But such a large range invalidate doesn't imply we are freeing page tables. Hence forcing a page table cache flush for large range invalidate can have performance impact. ppc64 don't do a range page table cache invalidate. Hence we will have to flush the full page table cache. > > NOTE! This is particularly true as "flush the walking cache" isn't a > well-defined operation anyway. Which _levels_ of the walking cache? > Again, the size (and alignment) of the flush would actually tell you. > A new boolean "flush" parameter does *NOT* tell that at all. > > So I think this new interface is mis-named, but I also think it's > pointless. Just DTRT automatically when somebody asks for a flush that > covers a PMD range (or a PUD range). > > Linus -aneesh
On Tue, May 25, 2021 at 3:28 AM Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> wrote: > > How about flush_tlb_and_page_table_cache() ? Honestly, I'd prefer it to be a separate function. So keep the existing flush_tlb() as-is, and add a flush_tlb_walking_cache() and document that any architecture that flushes the walker caches as part of the regular tlb flush can just make that a no-op. Would that work for powerpc? But: > > (b) is this even worth it as a public interface? > > But such a large range invalidate doesn't imply we are freeing page > tables. No. But it's what everybody else (ie x86 and ARM) does, and if you're flushing megabytes of TLB's, what's the downside of flushing a few TLB walker cache entries? You already do that for internal powerpc errata anyway (ie "mm_needs_flush_escalation()"), so I'm saying that you might as well treat the page walker cache as a powerpc-internal implementation thing. Put another way: can you even _measure_ the difference between "just make powerpc look like everybody else" and "add a new explicit page table walker cache flush function interface"? Now, from a quick look at the powerpc code, it looks like powerpc is a bit mis-architected, and when you flush the walker cache, you flush everything for that ASID. x86 and arm only flush the parts affected by the TLB flush range (now, admittedly, that's what they do _architecturally_ - for all I know the actual hardware just always flushes all walker caches when you flush any TLB and so maybe they act exactly like the powrpc RIC_FLUSH_PWC in practice). So maybe it's measurable. But I kind of doubt it, and I'd like to know that you've actually done some testing to see that "yes, this matters, I can't just say 'if flushing more than a pmd, just flush the walker cache too'". Hmm? Linus
diff --git a/arch/powerpc/include/asm/book3s/64/tlbflush.h b/arch/powerpc/include/asm/book3s/64/tlbflush.h index f9f8a3a264f7..e84fee9db106 100644 --- a/arch/powerpc/include/asm/book3s/64/tlbflush.h +++ b/arch/powerpc/include/asm/book3s/64/tlbflush.h @@ -80,6 +80,16 @@ static inline void flush_hugetlb_tlb_range(struct vm_area_struct *vma, return flush_hugetlb_tlb_pwc_range(vma, start, end, false); } +#define flush_pte_tlb_pwc_range flush_tlb_pwc_range +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma, + unsigned long start, unsigned long end) +{ + if (radix_enabled()) + return radix__flush_tlb_pwc_range_psize(vma->vm_mm, start, + end, mmu_virtual_psize, true); + return hash__flush_tlb_range(vma, start, end); +} + static inline void flush_tlb_range(struct vm_area_struct *vma, unsigned long start, unsigned long end) { diff --git a/mm/mremap.c b/mm/mremap.c index 7372c8c0cf26..000a71917557 100644 --- a/mm/mremap.c +++ b/mm/mremap.c @@ -210,6 +210,16 @@ static void move_ptes(struct vm_area_struct *vma, pmd_t *old_pmd, drop_rmap_locks(vma); } +#ifndef flush_pte_tlb_pwc_range +#define flush_pte_tlb_pwc_range flush_pte_tlb_pwc_range +static inline void flush_pte_tlb_pwc_range(struct vm_area_struct *vma, + unsigned long start, + unsigned long end) +{ + return flush_tlb_range(vma, start, end); +} +#endif + #ifdef CONFIG_HAVE_MOVE_PMD static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, unsigned long new_addr, pmd_t *old_pmd, pmd_t *new_pmd) @@ -260,7 +270,7 @@ static bool move_normal_pmd(struct vm_area_struct *vma, unsigned long old_addr, VM_BUG_ON(!pmd_none(*new_pmd)); pmd_populate(mm, new_pmd, pmd_pgtable(pmd)); - flush_tlb_range(vma, old_addr, old_addr + PMD_SIZE); + flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PMD_SIZE); if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl); @@ -307,7 +317,7 @@ static bool move_normal_pud(struct vm_area_struct *vma, unsigned long old_addr, VM_BUG_ON(!pud_none(*new_pud)); pud_populate(mm, new_pud, (pmd_t *)pud_page_vaddr(pud)); - flush_tlb_range(vma, old_addr, old_addr + PUD_SIZE); + flush_pte_tlb_pwc_range(vma, old_addr, old_addr + PUD_SIZE); if (new_ptl != old_ptl) spin_unlock(new_ptl); spin_unlock(old_ptl);
Some architectures do have the concept of page walk cache which need to be flush when updating higher levels of page tables. A fast mremap that involves moving page table pages instead of copying pte entries should flush page walk cache since the old translation cache is no more valid. Add new helper flush_pte_tlb_pwc_range() which invalidates both TLB and page walk cache where TLB entries are mapped with page size PAGE_SIZE. Signed-off-by: Aneesh Kumar K.V <aneesh.kumar@linux.ibm.com> --- arch/powerpc/include/asm/book3s/64/tlbflush.h | 10 ++++++++++ mm/mremap.c | 14 ++++++++++++-- 2 files changed, 22 insertions(+), 2 deletions(-)