Message ID | 1558322252-113575-1-git-send-email-yang.shi@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3] mm: mmu_gather: remove __tlb_reset_range() for force flush | expand |
On Mon, 20 May 2019 11:17:32 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote: > A few new fields were added to mmu_gather to make TLB flush smarter for > huge page by telling what level of page table is changed. > > __tlb_reset_range() is used to reset all these page table state to > unchanged, which is called by TLB flush for parallel mapping changes for > the same range under non-exclusive lock (i.e. read mmap_sem). Before > commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in > munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update > PTEs in parallel don't remove page tables. But, the forementioned > commit may do munmap() under read mmap_sem and free page tables. This > may result in program hang on aarch64 reported by Jan Stancek. The > problem could be reproduced by his test program with slightly modified > below. > > ... > > Use fullmm flush since it yields much better performance on aarch64 and > non-fullmm doesn't yields significant difference on x86. > > The original proposed fix came from Jan Stancek who mainly debugged this > issue, I just wrapped up everything together. Thanks. I'll add Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") to this.
On 5/22/19 7:18 AM, Andrew Morton wrote: > On Mon, 20 May 2019 11:17:32 +0800 Yang Shi <yang.shi@linux.alibaba.com> wrote: > >> A few new fields were added to mmu_gather to make TLB flush smarter for >> huge page by telling what level of page table is changed. >> >> __tlb_reset_range() is used to reset all these page table state to >> unchanged, which is called by TLB flush for parallel mapping changes for >> the same range under non-exclusive lock (i.e. read mmap_sem). Before >> commit dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in >> munmap"), the syscalls (e.g. MADV_DONTNEED, MADV_FREE) which may update >> PTEs in parallel don't remove page tables. But, the forementioned >> commit may do munmap() under read mmap_sem and free page tables. This >> may result in program hang on aarch64 reported by Jan Stancek. The >> problem could be reproduced by his test program with slightly modified >> below. >> >> ... >> >> Use fullmm flush since it yields much better performance on aarch64 and >> non-fullmm doesn't yields significant difference on x86. >> >> The original proposed fix came from Jan Stancek who mainly debugged this >> issue, I just wrapped up everything together. > Thanks. I'll add > > Fixes: dd2283f2605e ("mm: mmap: zap pages with read mmap_sem in munmap") > > to this. Thanks, Andrew.
diff --git a/mm/mmu_gather.c b/mm/mmu_gather.c index 99740e1..289f8cf 100644 --- a/mm/mmu_gather.c +++ b/mm/mmu_gather.c @@ -245,14 +245,28 @@ void tlb_finish_mmu(struct mmu_gather *tlb, { /* * If there are parallel threads are doing PTE changes on same range - * under non-exclusive lock(e.g., mmap_sem read-side) but defer TLB - * flush by batching, a thread has stable TLB entry can fail to flush - * the TLB by observing pte_none|!pte_dirty, for example so flush TLB - * forcefully if we detect parallel PTE batching threads. + * under non-exclusive lock (e.g., mmap_sem read-side) but defer TLB + * flush by batching, one thread may end up seeing inconsistent PTEs + * and result in having stale TLB entries. So flush TLB forcefully + * if we detect parallel PTE batching threads. + * + * However, some syscalls, e.g. munmap(), may free page tables, this + * needs force flush everything in the given range. Otherwise this + * may result in having stale TLB entries for some architectures, + * e.g. aarch64, that could specify flush what level TLB. */ if (mm_tlb_flush_nested(tlb->mm)) { + /* + * The aarch64 yields better performance with fullmm by + * avoiding multiple CPUs spamming TLBI messages at the + * same time. + * + * On x86 non-fullmm doesn't yield significant difference + * against fullmm. + */ + tlb->fullmm = 1; __tlb_reset_range(tlb); - __tlb_adjust_range(tlb, start, end - start); + tlb->freed_tables = 1; } tlb_flush_mmu(tlb);