Message ID | 20240129143221.263763-9-david@redhat.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/memory: optimize unmap/zap with PTE-mapped THP | expand |
On 29/01/2024 14:32, David Hildenbrand wrote: > Let's add a helper that lets us batch-process multiple consecutive PTEs. > > Note that the loop will get optimized out on all architectures except on > powerpc. We have to add an early define of __tlb_remove_tlb_entry() on > ppc to make the compiler happy (and avoid making tlb_remove_tlb_entries() a > macro). > > Signed-off-by: David Hildenbrand <david@redhat.com> Reviewed-by: Ryan Roberts <ryan.roberts@arm.com> > --- > arch/powerpc/include/asm/tlb.h | 2 ++ > include/asm-generic/tlb.h | 20 ++++++++++++++++++++ > 2 files changed, 22 insertions(+) > > diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h > index b3de6102a907..1ca7d4c4b90d 100644 > --- a/arch/powerpc/include/asm/tlb.h > +++ b/arch/powerpc/include/asm/tlb.h > @@ -19,6 +19,8 @@ > > #include <linux/pagemap.h> > > +static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, > + unsigned long address); > #define __tlb_remove_tlb_entry __tlb_remove_tlb_entry > > #define tlb_flush tlb_flush > diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h > index 428c3f93addc..bd00dd238b79 100644 > --- a/include/asm-generic/tlb.h > +++ b/include/asm-generic/tlb.h > @@ -616,6 +616,26 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, > __tlb_remove_tlb_entry(tlb, ptep, address); \ > } while (0) > > +/** > + * tlb_remove_tlb_entries - remember unmapping of multiple consecutive ptes for > + * later tlb invalidation. > + * > + * Similar to tlb_remove_tlb_entry(), but remember unmapping of multiple > + * consecutive ptes instead of only a single one. > + */ > +static inline void tlb_remove_tlb_entries(struct mmu_gather *tlb, > + pte_t *ptep, unsigned int nr, unsigned long address) > +{ > + tlb_flush_pte_range(tlb, address, PAGE_SIZE * nr); > + for (;;) { > + __tlb_remove_tlb_entry(tlb, ptep, address); > + if (--nr == 0) > + break; > + ptep++; > + address += PAGE_SIZE; > + } > +} > + > #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ > do { \ > unsigned long _sz = huge_page_size(h); \
diff --git a/arch/powerpc/include/asm/tlb.h b/arch/powerpc/include/asm/tlb.h index b3de6102a907..1ca7d4c4b90d 100644 --- a/arch/powerpc/include/asm/tlb.h +++ b/arch/powerpc/include/asm/tlb.h @@ -19,6 +19,8 @@ #include <linux/pagemap.h> +static inline void __tlb_remove_tlb_entry(struct mmu_gather *tlb, pte_t *ptep, + unsigned long address); #define __tlb_remove_tlb_entry __tlb_remove_tlb_entry #define tlb_flush tlb_flush diff --git a/include/asm-generic/tlb.h b/include/asm-generic/tlb.h index 428c3f93addc..bd00dd238b79 100644 --- a/include/asm-generic/tlb.h +++ b/include/asm-generic/tlb.h @@ -616,6 +616,26 @@ static inline void tlb_flush_p4d_range(struct mmu_gather *tlb, __tlb_remove_tlb_entry(tlb, ptep, address); \ } while (0) +/** + * tlb_remove_tlb_entries - remember unmapping of multiple consecutive ptes for + * later tlb invalidation. + * + * Similar to tlb_remove_tlb_entry(), but remember unmapping of multiple + * consecutive ptes instead of only a single one. + */ +static inline void tlb_remove_tlb_entries(struct mmu_gather *tlb, + pte_t *ptep, unsigned int nr, unsigned long address) +{ + tlb_flush_pte_range(tlb, address, PAGE_SIZE * nr); + for (;;) { + __tlb_remove_tlb_entry(tlb, ptep, address); + if (--nr == 0) + break; + ptep++; + address += PAGE_SIZE; + } +} + #define tlb_remove_huge_tlb_entry(h, tlb, ptep, address) \ do { \ unsigned long _sz = huge_page_size(h); \
Let's add a helper that lets us batch-process multiple consecutive PTEs. Note that the loop will get optimized out on all architectures except on powerpc. We have to add an early define of __tlb_remove_tlb_entry() on ppc to make the compiler happy (and avoid making tlb_remove_tlb_entries() a macro). Signed-off-by: David Hildenbrand <david@redhat.com> --- arch/powerpc/include/asm/tlb.h | 2 ++ include/asm-generic/tlb.h | 20 ++++++++++++++++++++ 2 files changed, 22 insertions(+)