Message ID | 20180828112034.30875-4-npiggin@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: dirty/accessed pte optimisations | expand |
Hi, On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: > Similarly to the previous patch, this tries to optimise dirty/accessed > bits in ptes to avoid access costs of hardware setting them. > This patch results in silent nios2 boot failures, silent meaning that the boot stalls. ... Unpacking initramfs... Freeing initrd memory: 2168K workingset: timestamp_bits=30 max_order=15 bucket_order=0 jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc. random: fast init done random: crng init done [no further activity until the qemu session is aborted] Reverting the patch fixes the problem. Bisect log is attached. Guenter --- # bad: [387ac6229ecf6e012649d4fc409c5352655a4cf0] Add linux-next specific files for 20180905 # good: [57361846b52bc686112da6ca5368d11210796804] Linux 4.19-rc2 git bisect start 'HEAD' 'v4.19-rc2' # good: [668570e8389bb076bea9b7531553e1362f5abd11] Merge remote-tracking branch 'net-next/master' git bisect good 668570e8389bb076bea9b7531553e1362f5abd11 # good: [7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6] Merge remote-tracking branch 'battery/for-next' git bisect good 7f2f69ebf0bcf3e9bcff7d560ba92cee960a66a6 # good: [c31458d3e03e3a2edeaab225a22eaf68c07c8290] Merge remote-tracking branch 'rpmsg/for-next' git bisect good c31458d3e03e3a2edeaab225a22eaf68c07c8290 # good: [e0f43dcbe9af8ac72f39fe92c5d0ee1883546427] Merge remote-tracking branch 'nvdimm/libnvdimm-for-next' git bisect good e0f43dcbe9af8ac72f39fe92c5d0ee1883546427 # bad: [f509e2c0f3cd11df238f0f1b5ba013fe726decdf] of: ignore sub-page memory regions git bisect bad f509e2c0f3cd11df238f0f1b5ba013fe726decdf # good: [2f7eebf30b87534f7e4c3982307579d9adc782a5] ocfs2: fix clusters leak in ocfs2_defrag_extent() git bisect good 2f7eebf30b87534f7e4c3982307579d9adc782a5 # good: [119eb88c9dd23e305939ad748237100078e304a8] mm/swapfile.c: call free_swap_slot() in __swap_entry_free() git bisect good 119eb88c9dd23e305939ad748237100078e304a8 # good: [21d64d37adf3ab20b4c3a1951018e84bf815c887] mm: remove vm_insert_pfn() git bisect good 21d64d37adf3ab20b4c3a1951018e84bf815c887 # good: [90cd1a69010844e9dbfc43279d681d798812b962] cramfs: convert to use vmf_insert_mixed git bisect good 90cd1a69010844e9dbfc43279d681d798812b962 # good: [c7dd91289b4bb4c400a8a71953511991815f8e6f] mm/cow: optimise pte dirty/accessed bits handling in fork git bisect good c7dd91289b4bb4c400a8a71953511991815f8e6f # bad: [87d74ae75700a39effcb8c9ed8a8445e719ac369] hexagon: switch to NO_BOOTMEM git bisect bad 87d74ae75700a39effcb8c9ed8a8445e719ac369 # bad: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion git bisect bad 3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360 # first bad commit: [3d1d5b26ac5b4d4193dc618a50cd88de1fb0d360] mm: optimise pte dirty/accessed bit setting by demand based pte insertion
On Wed, 5 Sep 2018 07:29:51 -0700 Guenter Roeck <linux@roeck-us.net> wrote: > Hi, > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: > > Similarly to the previous patch, this tries to optimise dirty/accessed > > bits in ptes to avoid access costs of hardware setting them. > > > > This patch results in silent nios2 boot failures, silent meaning that > the boot stalls. > > ... > Unpacking initramfs... > Freeing initrd memory: 2168K > workingset: timestamp_bits=30 max_order=15 bucket_order=0 > jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc. > random: fast init done > random: crng init done > > [no further activity until the qemu session is aborted] > > Reverting the patch fixes the problem. Bisect log is attached. Thanks for bisecting it, I'll try to reproduce. Just qemu with no obscure options? Interesting that it's hit nios2 but apparently not other archs (yet). Thanks, Nick
On 09/05/2018 03:18 PM, Nicholas Piggin wrote: > On Wed, 5 Sep 2018 07:29:51 -0700 > Guenter Roeck <linux@roeck-us.net> wrote: > >> Hi, >> >> On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: >>> Similarly to the previous patch, this tries to optimise dirty/accessed >>> bits in ptes to avoid access costs of hardware setting them. >>> >> >> This patch results in silent nios2 boot failures, silent meaning that >> the boot stalls. >> >> ... >> Unpacking initramfs... >> Freeing initrd memory: 2168K >> workingset: timestamp_bits=30 max_order=15 bucket_order=0 >> jffs2: version 2.2. (NAND) © 2001-2006 Red Hat, Inc. >> random: fast init done >> random: crng init done >> >> [no further activity until the qemu session is aborted] >> >> Reverting the patch fixes the problem. Bisect log is attached. > > Thanks for bisecting it, I'll try to reproduce. Just qemu with no > obscure options? Interesting that it's hit nios2 but apparently not > other archs (yet). > Nothing special. See https://github.com/groeck/linux-build-test/tree/master/rootfs/nios2/. Guenter
On Wed, 5 Sep 2018 07:29:51 -0700 Guenter Roeck <linux@roeck-us.net> wrote: > Hi, > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: > > Similarly to the previous patch, this tries to optimise dirty/accessed > > bits in ptes to avoid access costs of hardware setting them. > > > > This patch results in silent nios2 boot failures, silent meaning that > the boot stalls. Okay I just got back to looking at this. The reason for the hang is I think a bug in the nios2 TLB code, but maybe other archs have similar issues. In case of a missing / !present Linux pte, nios2 installs a TLB entry with no permissions via its fast TLB exception handler (software TLB fill). Then it relies on that causing a TLB permission exception in a slower handler that calls handle_mm_fault to set the Linux pte and flushes the old TLB. Then the fast exception handler will find the new Linux pte. With this patch, nios2 has a case where handle_mm_fault does not flush the old TLB, which results in the TLB permission exception continually being retried. What happens now is that fault paths like do_read_fault will install a Linux pte with the young bit clear and return. That will cause nios2 to fault again but this time go down the bottom of handle_pte_fault and to the access flags update with the young bit set. The young bit is seen to be different, so that causes ptep_set_access_flags to do a TLB flush and that finally allows the fast TLB handler to fire and pick up the new Linux pte. With this patch, the young bit is set in the first handle_mm_fault, so the second handle_mm_fault no longer sees the ptes are different and does not flush the TLB. The spurious fault handler also does not flush them unless FAULT_FLAG_WRITE is set. What nios2 should do is invalidate the TLB in update_mmu_cache. What it *really* should do is install the new TLB entry, I have some patches to make that work in qemu I can submit. But I would like to try getting these dirty/accessed bit optimisation in 4.20, so I will send a simple path to just do the TLB invalidate that could go in Andrew's git tree. Is that agreeable with the nios2 maintainers? Thanks, Nick
On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote: > On Wed, 5 Sep 2018 07:29:51 -0700 > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > Hi, > > > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: > > > > > > Similarly to the previous patch, this tries to optimise > > > dirty/accessed > > > bits in ptes to avoid access costs of hardware setting them. > > > > > This patch results in silent nios2 boot failures, silent meaning > > that > > the boot stalls. > Okay I just got back to looking at this. The reason for the hang is > I think a bug in the nios2 TLB code, but maybe other archs have > similar > issues. > > In case of a missing / !present Linux pte, nios2 installs a TLB entry > with no permissions via its fast TLB exception handler (software TLB > fill). Then it relies on that causing a TLB permission exception in a > slower handler that calls handle_mm_fault to set the Linux pte and > flushes the old TLB. Then the fast exception handler will find the > new > Linux pte. > > With this patch, nios2 has a case where handle_mm_fault does not > flush > the old TLB, which results in the TLB permission exception > continually > being retried. > > What happens now is that fault paths like do_read_fault will install > a > Linux pte with the young bit clear and return. That will cause nios2 > to > fault again but this time go down the bottom of handle_pte_fault and > to > the access flags update with the young bit set. The young bit is seen > to > be different, so that causes ptep_set_access_flags to do a TLB flush > and > that finally allows the fast TLB handler to fire and pick up the new > Linux pte. > > With this patch, the young bit is set in the first handle_mm_fault, > so > the second handle_mm_fault no longer sees the ptes are different and > does not flush the TLB. The spurious fault handler also does not > flush > them unless FAULT_FLAG_WRITE is set. > > What nios2 should do is invalidate the TLB in update_mmu_cache. What > it > *really* should do is install the new TLB entry, I have some patches > to > make that work in qemu I can submit. But I would like to try getting > these dirty/accessed bit optimisation in 4.20, so I will send a > simple > path to just do the TLB invalidate that could go in Andrew's git > tree. > > Is that agreeable with the nios2 maintainers? > > Thanks, > Nick > Hi Do you have patches to test? Regards Ley Foon
On Fri, 21 Sep 2018 16:42:05 +0800 Ley Foon Tan <ley.foon.tan@intel.com> wrote: > On Tue, 2018-09-18 at 03:53 +1000, Nicholas Piggin wrote: > > On Wed, 5 Sep 2018 07:29:51 -0700 > > Guenter Roeck <linux@roeck-us.net> wrote: > > > > > > > > Hi, > > > > > > On Tue, Aug 28, 2018 at 09:20:34PM +1000, Nicholas Piggin wrote: > > > > > > > > Similarly to the previous patch, this tries to optimise > > > > dirty/accessed > > > > bits in ptes to avoid access costs of hardware setting them. > > > > > > > This patch results in silent nios2 boot failures, silent meaning > > > that > > > the boot stalls. > > Okay I just got back to looking at this. The reason for the hang is > > I think a bug in the nios2 TLB code, but maybe other archs have > > similar > > issues. > > > > In case of a missing / !present Linux pte, nios2 installs a TLB entry > > with no permissions via its fast TLB exception handler (software TLB > > fill). Then it relies on that causing a TLB permission exception in a > > slower handler that calls handle_mm_fault to set the Linux pte and > > flushes the old TLB. Then the fast exception handler will find the > > new > > Linux pte. > > > > With this patch, nios2 has a case where handle_mm_fault does not > > flush > > the old TLB, which results in the TLB permission exception > > continually > > being retried. > > > > What happens now is that fault paths like do_read_fault will install > > a > > Linux pte with the young bit clear and return. That will cause nios2 > > to > > fault again but this time go down the bottom of handle_pte_fault and > > to > > the access flags update with the young bit set. The young bit is seen > > to > > be different, so that causes ptep_set_access_flags to do a TLB flush > > and > > that finally allows the fast TLB handler to fire and pick up the new > > Linux pte. > > > > With this patch, the young bit is set in the first handle_mm_fault, > > so > > the second handle_mm_fault no longer sees the ptes are different and > > does not flush the TLB. The spurious fault handler also does not > > flush > > them unless FAULT_FLAG_WRITE is set. > > > > What nios2 should do is invalidate the TLB in update_mmu_cache. What > > it > > *really* should do is install the new TLB entry, I have some patches > > to > > make that work in qemu I can submit. But I would like to try getting > > these dirty/accessed bit optimisation in 4.20, so I will send a > > simple > > path to just do the TLB invalidate that could go in Andrew's git > > tree. > > > > Is that agreeable with the nios2 maintainers? > > > > Thanks, > > Nick > > > Hi > > Do you have patches to test? I've been working on some, it has taken longer than I expected, I'll hopefully have something to send out by tomorrow. Thanks, Nick
diff --git a/mm/huge_memory.c b/mm/huge_memory.c index 5fb1a43e12e0..2c169041317f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1197,6 +1197,7 @@ static vm_fault_t do_huge_pmd_wp_page_fallback(struct vm_fault *vmf, for (i = 0; i < HPAGE_PMD_NR; i++, haddr += PAGE_SIZE) { pte_t entry; entry = mk_pte(pages[i], vma->vm_page_prot); + entry = pte_mkyoung(entry); entry = maybe_mkwrite(pte_mkdirty(entry), vma); memcg = (void *)page_private(pages[i]); set_page_private(pages[i], 0); @@ -2067,7 +2068,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, struct page *page; pgtable_t pgtable; pmd_t old_pmd, _pmd; - bool young, write, soft_dirty, pmd_migration = false; + bool young, write, dirty, soft_dirty, pmd_migration = false; unsigned long addr; int i; @@ -2145,8 +2146,7 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, page = pmd_page(old_pmd); VM_BUG_ON_PAGE(!page_count(page), page); page_ref_add(page, HPAGE_PMD_NR - 1); - if (pmd_dirty(old_pmd)) - SetPageDirty(page); + dirty = pmd_dirty(old_pmd); write = pmd_write(old_pmd); young = pmd_young(old_pmd); soft_dirty = pmd_soft_dirty(old_pmd); @@ -2176,8 +2176,10 @@ static void __split_huge_pmd_locked(struct vm_area_struct *vma, pmd_t *pmd, entry = maybe_mkwrite(entry, vma); if (!write) entry = pte_wrprotect(entry); - if (!young) - entry = pte_mkold(entry); + if (young) + entry = pte_mkyoung(entry); + if (dirty) + entry = pte_mkdirty(entry); if (soft_dirty) entry = pte_mksoft_dirty(entry); } diff --git a/mm/memory.c b/mm/memory.c index 3d8bf8220bd0..d205ba69918c 100644 --- a/mm/memory.c +++ b/mm/memory.c @@ -1830,10 +1830,9 @@ static int insert_pfn(struct vm_area_struct *vma, unsigned long addr, entry = pte_mkspecial(pfn_t_pte(pfn, prot)); out_mkwrite: - if (mkwrite) { - entry = pte_mkyoung(entry); + entry = pte_mkyoung(entry); + if (mkwrite) entry = maybe_mkwrite(pte_mkdirty(entry), vma); - } set_pte_at(mm, addr, pte, entry); update_mmu_cache(vma, addr, pte); /* XXX: why not for insert_page? */ @@ -2560,6 +2559,7 @@ static vm_fault_t wp_page_copy(struct vm_fault *vmf) } flush_cache_page(vma, vmf->address, pte_pfn(vmf->orig_pte)); entry = mk_pte(new_page, vma->vm_page_prot); + entry = pte_mkyoung(entry); entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* * Clear the pte entry and flush it first, before updating the @@ -3069,6 +3069,7 @@ vm_fault_t do_swap_page(struct vm_fault *vmf) inc_mm_counter_fast(vma->vm_mm, MM_ANONPAGES); dec_mm_counter_fast(vma->vm_mm, MM_SWAPENTS); pte = mk_pte(page, vma->vm_page_prot); + pte = pte_mkyoung(pte); if ((vmf->flags & FAULT_FLAG_WRITE) && reuse_swap_page(page, NULL)) { pte = maybe_mkwrite(pte_mkdirty(pte), vma); vmf->flags &= ~FAULT_FLAG_WRITE; @@ -3479,6 +3480,7 @@ vm_fault_t alloc_set_pte(struct vm_fault *vmf, struct mem_cgroup *memcg, flush_icache_page(vma, page); entry = mk_pte(page, vma->vm_page_prot); + entry = pte_mkyoung(entry); if (write) entry = maybe_mkwrite(pte_mkdirty(entry), vma); /* copy-on-write page */
Similarly to the previous patch, this tries to optimise dirty/accessed bits in ptes to avoid access costs of hardware setting them. Signed-off-by: Nicholas Piggin <npiggin@gmail.com> --- mm/huge_memory.c | 12 +++++++----- mm/memory.c | 8 +++++--- 2 files changed, 12 insertions(+), 8 deletions(-)