Message ID | 20190611121928.19253-1-will.deacon@arm.com (mailing list archive) |
---|---|
State | Mainlined, archived |
Commit | 01d57485fcdb9f9101a10a18e32d5f8b023cab86 |
Headers | show |
Series | arm64: tlbflush: Ensure start/end of address range are aligned to stride | expand |
Hi Will, On 2019/6/11 20:19, Will Deacon wrote: > Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to > PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to > perform more than PTRS_PER_PTE invalidation instructions in a single > call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather > code does not ensure that the end address of the range is rounded-up > to the stride when freeing intermediate page tables in pXX_free_tlb(), > which defeats our range checking. > > Align the bounds passed into __flush_tlb_range(). > > Cc: Catalin Marinas <catalin.marinas@arm.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Reported-by: Hanjun Guo <guohanjun@huawei.com> Thanks for the patch,I will test the patch tomorrow my local time as it's late here, and will update you when I get the results. Thanks Hanjun
On 2019/6/11 23:23, Hanjun Guo wrote: > Hi Will, > > On 2019/6/11 20:19, Will Deacon wrote: >> Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to >> PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to >> perform more than PTRS_PER_PTE invalidation instructions in a single >> call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather >> code does not ensure that the end address of the range is rounded-up >> to the stride when freeing intermediate page tables in pXX_free_tlb(), >> which defeats our range checking. >> >> Align the bounds passed into __flush_tlb_range(). >> >> Cc: Catalin Marinas <catalin.marinas@arm.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Reported-by: Hanjun Guo <guohanjun@huawei.com> > > Thanks for the patch,I will test the patch tomorrow my local time > as it's late here, and will update you when I get the results. I tested this patch on top of 5.2-rc1, and on the Kunpeng920 ARM64 server platform, with test case I reported before, I can see about 100% speedup for munmap() (from about 47us to 25us), that's great! Tested-by: Hanjun Guo <guohanjun@huawei.com> Reviewed-by: Hanjun Guo <guohanjun@huawei.com> Thanks Hanjun
diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h index 3a1870228946..dff8f9ea5754 100644 --- a/arch/arm64/include/asm/tlbflush.h +++ b/arch/arm64/include/asm/tlbflush.h @@ -195,6 +195,9 @@ static inline void __flush_tlb_range(struct vm_area_struct *vma, unsigned long asid = ASID(vma->vm_mm); unsigned long addr; + start = round_down(start, stride); + end = round_up(end, stride); + if ((end - start) >= (MAX_TLBI_OPS * stride)) { flush_tlb_mm(vma->vm_mm); return;
Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to perform more than PTRS_PER_PTE invalidation instructions in a single call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather code does not ensure that the end address of the range is rounded-up to the stride when freeing intermediate page tables in pXX_free_tlb(), which defeats our range checking. Align the bounds passed into __flush_tlb_range(). Cc: Catalin Marinas <catalin.marinas@arm.com> Cc: Peter Zijlstra <peterz@infradead.org> Reported-by: Hanjun Guo <guohanjun@huawei.com> Signed-off-by: Will Deacon <will.deacon@arm.com> --- arch/arm64/include/asm/tlbflush.h | 3 +++ 1 file changed, 3 insertions(+)