Message ID | 1392761566-24834-4-git-send-email-riel@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Tue, 18 Feb 2014, riel@redhat.com wrote: > From: Rik van Riel <riel@redhat.com> > > The NUMA scanning code can end up iterating over many gigabytes > of unpopulated memory, especially in the case of a freshly started > KVM guest with lots of memory. > > This results in the mmu notifier code being called even when > there are no mapped pages in a virtual address range. The amount > of time wasted can be enough to trigger soft lockup warnings > with very large KVM guests. > > This patch moves the mmu notifier call to the pmd level, which > represents 1GB areas of memory on x86-64. Furthermore, the mmu > notifier code is only called from the address in the PMD where > present mappings are first encountered. > > The hugetlbfs code is left alone for now; hugetlb mappings are > not relocatable, and as such are left alone by the NUMA code, > and should never trigger this problem to begin with. > > Signed-off-by: Rik van Riel <riel@redhat.com> > Cc: Peter Zijlstra <peterz@infradead.org> > Cc: Andrea Arcangeli <aarcange@redhat.com> > Reported-by: Xing Gang <gang.xing@hp.com> > Tested-by: Chegu Vinod <chegu_vinod@hp.com> Acked-by: David Rientjes <rientjes@google.com> Might have been cleaner to move the mmu_notifier_invalidate_range_{start,end}() to hugetlb_change_protection() as well, though. -- To unsubscribe from this list: send the line "unsubscribe kvm" in the body of a message to majordomo@vger.kernel.org More majordomo info at http://vger.kernel.org/majordomo-info.html
On 02/18/2014 09:24 PM, David Rientjes wrote: > On Tue, 18 Feb 2014, riel@redhat.com wrote: > >> From: Rik van Riel <riel@redhat.com> >> >> The NUMA scanning code can end up iterating over many gigabytes >> of unpopulated memory, especially in the case of a freshly started >> KVM guest with lots of memory. >> >> This results in the mmu notifier code being called even when >> there are no mapped pages in a virtual address range. The amount >> of time wasted can be enough to trigger soft lockup warnings >> with very large KVM guests. >> >> This patch moves the mmu notifier call to the pmd level, which >> represents 1GB areas of memory on x86-64. Furthermore, the mmu >> notifier code is only called from the address in the PMD where >> present mappings are first encountered. >> >> The hugetlbfs code is left alone for now; hugetlb mappings are >> not relocatable, and as such are left alone by the NUMA code, >> and should never trigger this problem to begin with. >> >> Signed-off-by: Rik van Riel <riel@redhat.com> >> Cc: Peter Zijlstra <peterz@infradead.org> >> Cc: Andrea Arcangeli <aarcange@redhat.com> >> Reported-by: Xing Gang <gang.xing@hp.com> >> Tested-by: Chegu Vinod <chegu_vinod@hp.com> > > Acked-by: David Rientjes <rientjes@google.com> > > Might have been cleaner to move the > mmu_notifier_invalidate_range_{start,end}() to hugetlb_change_protection() > as well, though. I can certainly do that if you want. Just let me know and I'll send a v2 of patch 3 :)
diff --git a/mm/mprotect.c b/mm/mprotect.c index 6006c05..44850ee 100644 --- a/mm/mprotect.c +++ b/mm/mprotect.c @@ -109,9 +109,11 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pgprot_t newprot, int dirty_accountable, int prot_numa) { pmd_t *pmd; + struct mm_struct *mm = vma->vm_mm; unsigned long next; unsigned long pages = 0; unsigned long nr_huge_updates = 0; + unsigned long mni_start = 0; pmd = pmd_offset(pud, addr); do { @@ -120,6 +122,13 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, next = pmd_addr_end(addr, end); if (!pmd_trans_huge(*pmd) && pmd_none_or_clear_bad(pmd)) continue; + + /* invoke the mmu notifier if the pmd is populated */ + if (!mni_start) { + mni_start = addr; + mmu_notifier_invalidate_range_start(mm, mni_start, end); + } + if (pmd_trans_huge(*pmd)) { if (next - addr != HPAGE_PMD_SIZE) split_huge_page_pmd(vma, addr, pmd); @@ -143,6 +152,9 @@ static inline unsigned long change_pmd_range(struct vm_area_struct *vma, pages += this_pages; } while (pmd++, addr = next, addr != end); + if (mni_start) + mmu_notifier_invalidate_range_end(mm, mni_start, end); + if (nr_huge_updates) count_vm_numa_events(NUMA_HUGE_PTE_UPDATES, nr_huge_updates); return pages; @@ -205,12 +217,12 @@ unsigned long change_protection(struct vm_area_struct *vma, unsigned long start, struct mm_struct *mm = vma->vm_mm; unsigned long pages; - mmu_notifier_invalidate_range_start(mm, start, end); - if (is_vm_hugetlb_page(vma)) + if (is_vm_hugetlb_page(vma)) { + mmu_notifier_invalidate_range_start(mm, start, end); pages = hugetlb_change_protection(vma, start, end, newprot); - else + mmu_notifier_invalidate_range_end(mm, start, end); + } else pages = change_protection_range(vma, start, end, newprot, dirty_accountable, prot_numa); - mmu_notifier_invalidate_range_end(mm, start, end); return pages; }