Message ID | alpine.LSU.2.11.2008301405000.5954@eggly.anvils (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: fixes to past from future testing | expand |
On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <hughd@google.com> wrote: > > check_move_unevictable_pages() is used in making unevictable shmem pages > evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and > i915/gem check_release_pagevec(). Those may pass down subpages of a huge > page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". > > That does not crash or warn at present, but the accounting of vmstats > unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: > scanned being incremented on each subpage, rescued only on the head > (since tails already appear evictable once the head has been updated). > > 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has > established that vm_events in general (and unevictable_pgs_rescued in > particular) should count every subpage: so follow that precedent here. > > Do this in such a way that if mem_cgroup_page_lruvec() is made stricter > (to check page->mem_cgroup is always set), no problem: skip the tails > before calling it, and add thp_nr_pages() to vmstats on the head. > > Signed-off-by: Hugh Dickins <hughd@google.com> Thanks for catching this. Reviewed-by: Shakeel Butt <shakeelb@google.com>
在 2020/8/31 上午5:08, Hugh Dickins 写道: > check_move_unevictable_pages() is used in making unevictable shmem pages > evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and > i915/gem check_release_pagevec(). Those may pass down subpages of a huge > page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". > > That does not crash or warn at present, but the accounting of vmstats > unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: > scanned being incremented on each subpage, rescued only on the head > (since tails already appear evictable once the head has been updated). > > 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has > established that vm_events in general (and unevictable_pgs_rescued in > particular) should count every subpage: so follow that precedent here. > > Do this in such a way that if mem_cgroup_page_lruvec() is made stricter > (to check page->mem_cgroup is always set), no problem: skip the tails > before calling it, and add thp_nr_pages() to vmstats on the head. > > Signed-off-by: Hugh Dickins <hughd@google.com> > --- > Nothing here worth going to stable, since it's just a testing config > that is fixed, whose event numbers are not very important; but this > will be needed before Alex Shi's warning, and might as well go in now. > > The callers of check_move_unevictable_pages() could be optimized, > to skip over tails: but Matthew Wilcox has other changes in flight > there, so let's skip the optimization for now. > > mm/vmscan.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700 > +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700 > @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct > for (i = 0; i < pvec->nr; i++) { > struct page *page = pvec->pages[i]; > struct pglist_data *pagepgdat = page_pgdat(page); > + int nr_pages; > + > + if (PageTransTail(page)) > + continue; > + > + nr_pages = thp_nr_pages(page); > + pgscanned += nr_pages; > > - pgscanned++; > if (pagepgdat != pgdat) { > if (pgdat) > spin_unlock_irq(&pgdat->lru_lock); > @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct > ClearPageUnevictable(page); > del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); > add_page_to_lru_list(page, lruvec, lru); So, we might randomly del or add a thp tail page into lru? It's interesting to know here. :) Thanks Alex > - pgrescued++; > + pgrescued += nr_pages; > } > } > >
On Tue, 1 Sep 2020, Alex Shi wrote: > 在 2020/8/31 上午5:08, Hugh Dickins 写道: > > --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700 > > +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700 > > @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct > > for (i = 0; i < pvec->nr; i++) { > > struct page *page = pvec->pages[i]; > > struct pglist_data *pagepgdat = page_pgdat(page); > > + int nr_pages; > > + > > + if (PageTransTail(page)) > > + continue; > > + > > + nr_pages = thp_nr_pages(page); > > + pgscanned += nr_pages; > > > > - pgscanned++; > > if (pagepgdat != pgdat) { > > if (pgdat) > > spin_unlock_irq(&pgdat->lru_lock); > > @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct > > ClearPageUnevictable(page); > > del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); > > add_page_to_lru_list(page, lruvec, lru); > > So, we might randomly del or add a thp tail page into lru? > It's interesting to know here. :) No, it wasn't that interesting, I'd have been more concerned if it was like that. Amusing idea though: because the act of adding a thp tail to lru would clear the very bit that says it's a tail. if (!PageLRU(page) || !PageUnevictable(page)) continue; Let's see: PageLRU and PageUnevictable are both of the PF_HEAD type, so permitted on tails, but always redirecting to head: so typically PageLRU(page) would be true, because head on lru; but PageUnevictable(page) would be false on tail, because head has already been moved to an evictable lru by this same function. So it safely went the "continue" way, but without incrementing pgrescued. Hugh > > Thanks > Alex > > > - pgrescued++; > > + pgrescued += nr_pages; > > } > > } > >
On Sun, Aug 30, 2020 at 2:08 PM Hugh Dickins <hughd@google.com> wrote: > > check_move_unevictable_pages() is used in making unevictable shmem pages > evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and > i915/gem check_release_pagevec(). Those may pass down subpages of a huge > page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". > > That does not crash or warn at present, but the accounting of vmstats > unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: > scanned being incremented on each subpage, rescued only on the head > (since tails already appear evictable once the head has been updated). > > 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has > established that vm_events in general (and unevictable_pgs_rescued in > particular) should count every subpage: so follow that precedent here. > > Do this in such a way that if mem_cgroup_page_lruvec() is made stricter > (to check page->mem_cgroup is always set), no problem: skip the tails > before calling it, and add thp_nr_pages() to vmstats on the head. > > Signed-off-by: Hugh Dickins <hughd@google.com> Acked-by: Yang Shi <shy828301@gmail.com> > --- > Nothing here worth going to stable, since it's just a testing config > that is fixed, whose event numbers are not very important; but this > will be needed before Alex Shi's warning, and might as well go in now. > > The callers of check_move_unevictable_pages() could be optimized, > to skip over tails: but Matthew Wilcox has other changes in flight > there, so let's skip the optimization for now. > > mm/vmscan.c | 10 ++++++++-- > 1 file changed, 8 insertions(+), 2 deletions(-) > > --- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700 > +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700 > @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct > for (i = 0; i < pvec->nr; i++) { > struct page *page = pvec->pages[i]; > struct pglist_data *pagepgdat = page_pgdat(page); > + int nr_pages; > + > + if (PageTransTail(page)) > + continue; > + > + nr_pages = thp_nr_pages(page); > + pgscanned += nr_pages; > > - pgscanned++; > if (pagepgdat != pgdat) { > if (pgdat) > spin_unlock_irq(&pgdat->lru_lock); > @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct > ClearPageUnevictable(page); > del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); > add_page_to_lru_list(page, lruvec, lru); > - pgrescued++; > + pgrescued += nr_pages; > } > } > >
--- 5.9-rc2/mm/vmscan.c 2020-08-16 17:32:50.721507348 -0700 +++ linux/mm/vmscan.c 2020-08-28 17:47:10.595580876 -0700 @@ -4260,8 +4260,14 @@ void check_move_unevictable_pages(struct for (i = 0; i < pvec->nr; i++) { struct page *page = pvec->pages[i]; struct pglist_data *pagepgdat = page_pgdat(page); + int nr_pages; + + if (PageTransTail(page)) + continue; + + nr_pages = thp_nr_pages(page); + pgscanned += nr_pages; - pgscanned++; if (pagepgdat != pgdat) { if (pgdat) spin_unlock_irq(&pgdat->lru_lock); @@ -4280,7 +4286,7 @@ void check_move_unevictable_pages(struct ClearPageUnevictable(page); del_page_from_lru_list(page, lruvec, LRU_UNEVICTABLE); add_page_to_lru_list(page, lruvec, lru); - pgrescued++; + pgrescued += nr_pages; } }
check_move_unevictable_pages() is used in making unevictable shmem pages evictable: by shmem_unlock_mapping(), drm_gem_check_release_pagevec() and i915/gem check_release_pagevec(). Those may pass down subpages of a huge page, when /sys/kernel/mm/transparent_hugepage/shmem_enabled is "force". That does not crash or warn at present, but the accounting of vmstats unevictable_pgs_scanned and unevictable_pgs_rescued is inconsistent: scanned being incremented on each subpage, rescued only on the head (since tails already appear evictable once the head has been updated). 5.8 commit 5d91f31faf8e ("mm: swap: fix vmstats for huge page") has established that vm_events in general (and unevictable_pgs_rescued in particular) should count every subpage: so follow that precedent here. Do this in such a way that if mem_cgroup_page_lruvec() is made stricter (to check page->mem_cgroup is always set), no problem: skip the tails before calling it, and add thp_nr_pages() to vmstats on the head. Signed-off-by: Hugh Dickins <hughd@google.com> --- Nothing here worth going to stable, since it's just a testing config that is fixed, whose event numbers are not very important; but this will be needed before Alex Shi's warning, and might as well go in now. The callers of check_move_unevictable_pages() could be optimized, to skip over tails: but Matthew Wilcox has other changes in flight there, so let's skip the optimization for now. mm/vmscan.c | 10 ++++++++-- 1 file changed, 8 insertions(+), 2 deletions(-)