diff mbox series

arm64: tlbflush: Ensure start/end of address range are aligned to stride

Message ID 20190611121928.19253-1-will.deacon@arm.com (mailing list archive)
State Mainlined, archived
Commit 01d57485fcdb9f9101a10a18e32d5f8b023cab86
Headers show
Series arm64: tlbflush: Ensure start/end of address range are aligned to stride | expand

Commit Message

Will Deacon June 11, 2019, 12:19 p.m. UTC
Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to
PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
perform more than PTRS_PER_PTE invalidation instructions in a single
call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
code does not ensure that the end address of the range is rounded-up
to the stride when freeing intermediate page tables in pXX_free_tlb(),
which defeats our range checking.

Align the bounds passed into __flush_tlb_range().

Cc: Catalin Marinas <catalin.marinas@arm.com>
Cc: Peter Zijlstra <peterz@infradead.org>
Reported-by: Hanjun Guo <guohanjun@huawei.com>
Signed-off-by: Will Deacon <will.deacon@arm.com>
---
 arch/arm64/include/asm/tlbflush.h | 3 +++
 1 file changed, 3 insertions(+)

Comments

Hanjun Guo June 11, 2019, 3:23 p.m. UTC | #1
Hi Will,

On 2019/6/11 20:19, Will Deacon wrote:
> Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to
> PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
> perform more than PTRS_PER_PTE invalidation instructions in a single
> call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
> code does not ensure that the end address of the range is rounded-up
> to the stride when freeing intermediate page tables in pXX_free_tlb(),
> which defeats our range checking.
> 
> Align the bounds passed into __flush_tlb_range().
> 
> Cc: Catalin Marinas <catalin.marinas@arm.com>
> Cc: Peter Zijlstra <peterz@infradead.org>
> Reported-by: Hanjun Guo <guohanjun@huawei.com>

Thanks for the patch,I will test the patch tomorrow my local time
as it's late here, and will update you when I get the results.

Thanks
Hanjun
Hanjun Guo June 12, 2019, 12:43 p.m. UTC | #2
On 2019/6/11 23:23, Hanjun Guo wrote:
> Hi Will,
> 
> On 2019/6/11 20:19, Will Deacon wrote:
>> Since commit 3d65b6bbc01e ("arm64: tlbi: Set MAX_TLBI_OPS to
>> PTRS_PER_PTE"), we resort to per-ASID invalidation when attempting to
>> perform more than PTRS_PER_PTE invalidation instructions in a single
>> call to __flush_tlb_range(). Whilst this is beneficial, the mmu_gather
>> code does not ensure that the end address of the range is rounded-up
>> to the stride when freeing intermediate page tables in pXX_free_tlb(),
>> which defeats our range checking.
>>
>> Align the bounds passed into __flush_tlb_range().
>>
>> Cc: Catalin Marinas <catalin.marinas@arm.com>
>> Cc: Peter Zijlstra <peterz@infradead.org>
>> Reported-by: Hanjun Guo <guohanjun@huawei.com>
> 
> Thanks for the patch,I will test the patch tomorrow my local time
> as it's late here, and will update you when I get the results.

I tested this patch on top of 5.2-rc1, and on the Kunpeng920 ARM64
server platform, with test case I reported before, I can see about
100% speedup for munmap() (from about 47us to 25us), that's great!

Tested-by: Hanjun Guo <guohanjun@huawei.com>
Reviewed-by: Hanjun Guo <guohanjun@huawei.com>

Thanks
Hanjun
diff mbox series

Patch

diff --git a/arch/arm64/include/asm/tlbflush.h b/arch/arm64/include/asm/tlbflush.h
index 3a1870228946..dff8f9ea5754 100644
--- a/arch/arm64/include/asm/tlbflush.h
+++ b/arch/arm64/include/asm/tlbflush.h
@@ -195,6 +195,9 @@  static inline void __flush_tlb_range(struct vm_area_struct *vma,
 	unsigned long asid = ASID(vma->vm_mm);
 	unsigned long addr;
 
+	start = round_down(start, stride);
+	end = round_up(end, stride);
+
 	if ((end - start) >= (MAX_TLBI_OPS * stride)) {
 		flush_tlb_mm(vma->vm_mm);
 		return;