Message ID | 20240626024924.1155558-3-ranxiaokai627@163.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages | expand |
On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > pages, which means of any order, but KPF_THP should only be set > when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df > ("mm: thp: support allocation of anonymous multi-size THP"), > multiple orders of folios can be allocated and mapped to userspace, > so the folio_test_large() check is not sufficient here, > replace it with folio_test_pmd_mappable() to fix this. > > Also kpageflags is not only for userspace memory but for all valid pfn > pages,including slab pages or drivers used pages, so the PG_lru and > is_anon check are unnecessary here. But THP is userspace memory. slab pages or driver pages cannot be THP. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > --- > fs/proc/page.c | 14 ++++---------- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/fs/proc/page.c b/fs/proc/page.c > index 2fb64bdb64eb..3e7b70449c2f 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page) > u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); > else > u |= 1 << KPF_COMPOUND_TAIL; > + Unnecessary new line. > if (folio_test_hugetlb(folio)) > u |= 1 << KPF_HUGE; > - /* > - * We need to check PageLRU/PageAnon > - * to make sure a given page is a thp, not a non-huge compound page. > - */ > - else if (folio_test_large(folio)) { > - if ((k & (1 << PG_lru)) || is_anon) > - u |= 1 << KPF_THP; > - else if (is_huge_zero_folio(folio)) { > + else if (folio_test_pmd_mappable(folio)) { > + u |= 1 << KPF_THP; lru and anon check should stay. > + if (is_huge_zero_folio(folio)) > u |= 1 << KPF_ZERO_PAGE; > - u |= 1 << KPF_THP; > - } > } else if (is_zero_pfn(page_to_pfn(page))) > u |= 1 << KPF_ZERO_PAGE; >
> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > > > KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > > pages, which means of any order, but KPF_THP should only be set > > when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df > > ("mm: thp: support allocation of anonymous multi-size THP"), > > multiple orders of folios can be allocated and mapped to userspace, > > so the folio_test_large() check is not sufficient here, > > replace it with folio_test_pmd_mappable() to fix this. > > > > Also kpageflags is not only for userspace memory but for all valid pfn > > pages,including slab pages or drivers used pages, so the PG_lru and > > is_anon check are unnecessary here. > > But THP is userspace memory. slab pages or driver pages cannot be THP. I see, the THP naming implies userspace memory. Not only compound order. > > > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > --- > > fs/proc/page.c | 14 ++++---------- > > 1 file changed, 4 insertions(+), 10 deletions(-) > > > > diff --git a/fs/proc/page.c b/fs/proc/page.c > > index 2fb64bdb64eb..3e7b70449c2f 100644 > > --- a/fs/proc/page.c > > +++ b/fs/proc/page.c > > @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page) > > u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); > > else > > u |= 1 << KPF_COMPOUND_TAIL; > > + > Unnecessary new line. yes, will fix. > > > if (folio_test_hugetlb(folio)) > > u |= 1 << KPF_HUGE; > > - /* > > - * We need to check PageLRU/PageAnon > > - * to make sure a given page is a thp, not a non-huge compound page. > > - */ > > - else if (folio_test_large(folio)) { > > - if ((k & (1 << PG_lru)) || is_anon) > > - u |= 1 << KPF_THP; > > - else if (is_huge_zero_folio(folio)) { > > + else if (folio_test_pmd_mappable(folio)) { > > + u |= 1 << KPF_THP; > > lru and anon check should stay. thanks, will fix. > > > + if (is_huge_zero_folio(folio)) > > u |= 1 << KPF_ZERO_PAGE; > > - u |= 1 << KPF_THP; > > - } > > } else if (is_zero_pfn(page_to_pfn(page))) > > u |= 1 << KPF_ZERO_PAGE; > > > > -- > Best Regards, > Yan, Zi
On 26/06/2024 04:06, Zi Yan wrote: > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >> >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >> pages, which means of any order, but KPF_THP should only be set >> when the folio is a 2M pmd mappable THP. Why should KPF_THP only be set on 2M THP? What problem does it cause as it is currently configured? I would argue that mTHP is still THP so should still have the flag. And since these smaller mTHP sizes are disabled by default, only mTHP-aware user space will be enabling them, so I'll naively state that it should not cause compat issues as is. Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP sizes to function correctly. So that would need to be reworked if making this change. Thanks, Ryan
On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: > On 26/06/2024 04:06, Zi Yan wrote: > > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >> > >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > >> pages, which means of any order, but KPF_THP should only be set > >> when the folio is a 2M pmd mappable THP. > > Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > currently configured? > > I would argue that mTHP is still THP so should still have the flag. And since > these smaller mTHP sizes are disabled by default, only mTHP-aware user space > will be enabling them, so I'll naively state that it should not cause compat > issues as is. > > Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > sizes to function correctly. So that would need to be reworked if making this > change. + more folks working on mTHP I agree that mTHP is still THP, but we might want different stats/counters for it, since people might want to keep the old THP counters consistent. See recent commits on adding mTHP counters: ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") and changes to make THP counter to only count PMD THP: 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for THP split statistics") In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some adjustment on tools/mm/thpmaps.
On 26/06/2024 15:40, Zi Yan wrote: > On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: >> On 26/06/2024 04:06, Zi Yan wrote: >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>>> >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>> pages, which means of any order, but KPF_THP should only be set >>>> when the folio is a 2M pmd mappable THP. >> >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >> currently configured? >> >> I would argue that mTHP is still THP so should still have the flag. And since >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >> will be enabling them, so I'll naively state that it should not cause compat >> issues as is. >> >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >> sizes to function correctly. So that would need to be reworked if making this >> change. > > + more folks working on mTHP > > I agree that mTHP is still THP, but we might want different > stats/counters for it, since people might want to keep the old THP counters > consistent. See recent commits on adding mTHP counters: > ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback > counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") > > and changes to make THP counter to only count PMD THP: > 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for > THP split statistics") > > In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some > adjustment on tools/mm/thpmaps. That would work for me, assuming we have KPF bits to spare?
On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote: > On 26/06/2024 04:06, Zi Yan wrote: > > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >> > >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > >> pages, which means of any order, but KPF_THP should only be set > >> when the folio is a 2M pmd mappable THP. > > Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > currently configured? > > I would argue that mTHP is still THP so should still have the flag. And since > these smaller mTHP sizes are disabled by default, only mTHP-aware user space > will be enabling them, so I'll naively state that it should not cause compat > issues as is. > > Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > sizes to function correctly. So that would need to be reworked if making this > change. I told you you'd run into trouble calling them "mTHP" ...
On 26/06/2024 16:15, Matthew Wilcox wrote: > On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote: >> On 26/06/2024 04:06, Zi Yan wrote: >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>>> >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>> pages, which means of any order, but KPF_THP should only be set >>>> when the folio is a 2M pmd mappable THP. >> >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >> currently configured? >> >> I would argue that mTHP is still THP so should still have the flag. And since >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >> will be enabling them, so I'll naively state that it should not cause compat >> issues as is. >> >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >> sizes to function correctly. So that would need to be reworked if making this >> change. > > I told you you'd run into trouble calling them "mTHP" ... "There are two hard things in computer science; naming, cache invalidation and off-by-one errors"
Hi ran, kernel test robot noticed the following build warnings: [auto build test WARNING on akpm-mm/mm-everything] [also build test WARNING on linus/master v6.10-rc5 next-20240625] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/ran-xiaokai/mm-Constify-folio_order-folio_test_pmd_mappable/20240626-113027 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240626024924.1155558-3-ranxiaokai627%40163.com patch subject: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages config: parisc-allnoconfig (https://download.01.org/0day-ci/archive/20240626/202406262203.FFeFYbhP-lkp@intel.com/config) compiler: hppa-linux-gcc (GCC) 13.2.0 reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240626/202406262203.FFeFYbhP-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202406262203.FFeFYbhP-lkp@intel.com/ All warnings (new ones prefixed by >>): fs/proc/page.c: In function 'stable_page_flags': >> fs/proc/page.c:151:42: warning: passing argument 1 of 'folio_test_pmd_mappable' discards 'const' qualifier from pointer target type [-Wdiscarded-qualifiers] 151 | else if (folio_test_pmd_mappable(folio)) { | ^~~~~ In file included from include/linux/mm.h:1120, from include/linux/memblock.h:12, from fs/proc/page.c:2: include/linux/huge_mm.h:438:58: note: expected 'struct folio *' but argument is of type 'const struct folio *' 438 | static inline bool folio_test_pmd_mappable(struct folio *folio) | ~~~~~~~~~~~~~~^~~~~ vim +151 fs/proc/page.c 108 109 u64 stable_page_flags(const struct page *page) 110 { 111 const struct folio *folio; 112 unsigned long k; 113 unsigned long mapping; 114 bool is_anon; 115 u64 u = 0; 116 117 /* 118 * pseudo flag: KPF_NOPAGE 119 * it differentiates a memory hole from a page with no flags 120 */ 121 if (!page) 122 return 1 << KPF_NOPAGE; 123 folio = page_folio(page); 124 125 k = folio->flags; 126 mapping = (unsigned long)folio->mapping; 127 is_anon = mapping & PAGE_MAPPING_ANON; 128 129 /* 130 * pseudo flags for the well known (anonymous) memory mapped pages 131 */ 132 if (page_mapped(page)) 133 u |= 1 << KPF_MMAP; 134 if (is_anon) { 135 u |= 1 << KPF_ANON; 136 if (mapping & PAGE_MAPPING_KSM) 137 u |= 1 << KPF_KSM; 138 } 139 140 /* 141 * compound pages: export both head/tail info 142 * they together define a compound page's start/end pos and order 143 */ 144 if (page == &folio->page) 145 u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); 146 else 147 u |= 1 << KPF_COMPOUND_TAIL; 148 149 if (folio_test_hugetlb(folio)) 150 u |= 1 << KPF_HUGE; > 151 else if (folio_test_pmd_mappable(folio)) { 152 u |= 1 << KPF_THP; 153 if (is_huge_zero_folio(folio)) 154 u |= 1 << KPF_ZERO_PAGE; 155 } else if (is_zero_pfn(page_to_pfn(page))) 156 u |= 1 << KPF_ZERO_PAGE; 157 158 /* 159 * Caveats on high order pages: PG_buddy and PG_slab will only be set 160 * on the head page. 161 */ 162 if (PageBuddy(page)) 163 u |= 1 << KPF_BUDDY; 164 else if (page_count(page) == 0 && is_free_buddy_page(page)) 165 u |= 1 << KPF_BUDDY; 166 167 if (PageOffline(page)) 168 u |= 1 << KPF_OFFLINE; 169 if (PageTable(page)) 170 u |= 1 << KPF_PGTABLE; 171 if (folio_test_slab(folio)) 172 u |= 1 << KPF_SLAB; 173
Hi ran, kernel test robot noticed the following build errors: [auto build test ERROR on akpm-mm/mm-everything] [also build test ERROR on linus/master v6.10-rc5 next-20240625] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/ran-xiaokai/mm-Constify-folio_order-folio_test_pmd_mappable/20240626-113027 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/r/20240626024924.1155558-3-ranxiaokai627%40163.com patch subject: [PATCH 2/2] kpageflags: fix wrong KPF_THP on non-pmd-mappable compound pages config: x86_64-allnoconfig (https://download.01.org/0day-ci/archive/20240626/202406262300.iAURISyJ-lkp@intel.com/config) compiler: clang version 18.1.5 (https://github.com/llvm/llvm-project 617a15a9eac96088ae5e9134248d8236e34b91b1) reproduce (this is a W=1 build): (https://download.01.org/0day-ci/archive/20240626/202406262300.iAURISyJ-lkp@intel.com/reproduce) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <lkp@intel.com> | Closes: https://lore.kernel.org/oe-kbuild-all/202406262300.iAURISyJ-lkp@intel.com/ All errors (new ones prefixed by >>): >> fs/proc/page.c:151:35: error: passing 'const struct folio *' to parameter of type 'struct folio *' discards qualifiers [-Werror,-Wincompatible-pointer-types-discards-qualifiers] 151 | else if (folio_test_pmd_mappable(folio)) { | ^~~~~ include/linux/huge_mm.h:438:58: note: passing argument to parameter 'folio' here 438 | static inline bool folio_test_pmd_mappable(struct folio *folio) | ^ 1 error generated. vim +151 fs/proc/page.c 108 109 u64 stable_page_flags(const struct page *page) 110 { 111 const struct folio *folio; 112 unsigned long k; 113 unsigned long mapping; 114 bool is_anon; 115 u64 u = 0; 116 117 /* 118 * pseudo flag: KPF_NOPAGE 119 * it differentiates a memory hole from a page with no flags 120 */ 121 if (!page) 122 return 1 << KPF_NOPAGE; 123 folio = page_folio(page); 124 125 k = folio->flags; 126 mapping = (unsigned long)folio->mapping; 127 is_anon = mapping & PAGE_MAPPING_ANON; 128 129 /* 130 * pseudo flags for the well known (anonymous) memory mapped pages 131 */ 132 if (page_mapped(page)) 133 u |= 1 << KPF_MMAP; 134 if (is_anon) { 135 u |= 1 << KPF_ANON; 136 if (mapping & PAGE_MAPPING_KSM) 137 u |= 1 << KPF_KSM; 138 } 139 140 /* 141 * compound pages: export both head/tail info 142 * they together define a compound page's start/end pos and order 143 */ 144 if (page == &folio->page) 145 u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); 146 else 147 u |= 1 << KPF_COMPOUND_TAIL; 148 149 if (folio_test_hugetlb(folio)) 150 u |= 1 << KPF_HUGE; > 151 else if (folio_test_pmd_mappable(folio)) { 152 u |= 1 << KPF_THP; 153 if (is_huge_zero_folio(folio)) 154 u |= 1 << KPF_ZERO_PAGE; 155 } else if (is_zero_pfn(page_to_pfn(page))) 156 u |= 1 << KPF_ZERO_PAGE; 157 158 /* 159 * Caveats on high order pages: PG_buddy and PG_slab will only be set 160 * on the head page. 161 */ 162 if (PageBuddy(page)) 163 u |= 1 << KPF_BUDDY; 164 else if (page_count(page) == 0 && is_free_buddy_page(page)) 165 u |= 1 << KPF_BUDDY; 166 167 if (PageOffline(page)) 168 u |= 1 << KPF_OFFLINE; 169 if (PageTable(page)) 170 u |= 1 << KPF_PGTABLE; 171 if (folio_test_slab(folio)) 172 u |= 1 << KPF_SLAB; 173
On Wed, Jun 26, 2024 at 10:42 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 26/06/2024 15:40, Zi Yan wrote: > > On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: > >> On 26/06/2024 04:06, Zi Yan wrote: > >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >>>> > >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > >>>> pages, which means of any order, but KPF_THP should only be set > >>>> when the folio is a 2M pmd mappable THP. > >> > >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > >> currently configured? > >> > >> I would argue that mTHP is still THP so should still have the flag. And since > >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space > >> will be enabling them, so I'll naively state that it should not cause compat > >> issues as is. > >> > >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > >> sizes to function correctly. So that would need to be reworked if making this > >> change. > > > > + more folks working on mTHP > > > > I agree that mTHP is still THP, but we might want different > > stats/counters for it, since people might want to keep the old THP counters > > consistent. See recent commits on adding mTHP counters: > > ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback > > counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") > > > > and changes to make THP counter to only count PMD THP: > > 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for > > THP split statistics") > > > > In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some > > adjustment on tools/mm/thpmaps. > > That would work for me, assuming we have KPF bits to spare? +1 Let's check on that and see if we're good ;) Thanks, Lance >
On Wed, Jun 26, 2024 at 11:18 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 26/06/2024 16:15, Matthew Wilcox wrote: > > On Wed, Jun 26, 2024 at 12:07:04PM +0100, Ryan Roberts wrote: > >> On 26/06/2024 04:06, Zi Yan wrote: > >>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > >>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >>>> > >>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > >>>> pages, which means of any order, but KPF_THP should only be set > >>>> when the folio is a 2M pmd mappable THP. > >> > >> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > >> currently configured? > >> > >> I would argue that mTHP is still THP so should still have the flag. And since > >> these smaller mTHP sizes are disabled by default, only mTHP-aware user space > >> will be enabling them, so I'll naively state that it should not cause compat > >> issues as is. > >> > >> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > >> sizes to function correctly. So that would need to be reworked if making this > >> change. > > > > I told you you'd run into trouble calling them "mTHP" ... > > "There are two hard things in computer science; naming, cache invalidation and > off-by-one errors" Totally agree. Naming things can be surprisingly challenging ;) Thanks, Lance >
On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote: > > On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: > > On 26/06/2024 04:06, Zi Yan wrote: > > > On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > > >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > >> > > >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > > >> pages, which means of any order, but KPF_THP should only be set > > >> when the folio is a 2M pmd mappable THP. > > > > Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > > currently configured? > > > > I would argue that mTHP is still THP so should still have the flag. And since > > these smaller mTHP sizes are disabled by default, only mTHP-aware user space > > will be enabling them, so I'll naively state that it should not cause compat > > issues as is. > > > > Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > > sizes to function correctly. So that would need to be reworked if making this > > change. > > + more folks working on mTHP > > I agree that mTHP is still THP, but we might want different > stats/counters for it, since people might want to keep the old THP counters > consistent. See recent commits on adding mTHP counters: > ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback > counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") > > and changes to make THP counter to only count PMD THP: > 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for > THP split statistics") > > In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some > adjustment on tools/mm/thpmaps. It seems we have to do this though I think keeping KPF_THP and adding a separate bit like KPF_PMD_MAPPED makes more sense. but those tools relying on KPF_THP need to realize this and check the new bit , which is not done now. whether the mTHP's name is mTHP or THP will make no difference for this case:-) > > > -- > Best Regards, > Yan, Zi > Thanks Barry
On 27/06/2024 05:10, Barry Song wrote: > On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote: >> >> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: >>> On 26/06/2024 04:06, Zi Yan wrote: >>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>>>> >>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>>> pages, which means of any order, but KPF_THP should only be set >>>>> when the folio is a 2M pmd mappable THP. >>> >>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >>> currently configured? >>> >>> I would argue that mTHP is still THP so should still have the flag. And since >>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >>> will be enabling them, so I'll naively state that it should not cause compat >>> issues as is. >>> >>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >>> sizes to function correctly. So that would need to be reworked if making this >>> change. >> >> + more folks working on mTHP >> >> I agree that mTHP is still THP, but we might want different >> stats/counters for it, since people might want to keep the old THP counters >> consistent. See recent commits on adding mTHP counters: >> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback >> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") >> >> and changes to make THP counter to only count PMD THP: >> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for >> THP split statistics") >> >> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some >> adjustment on tools/mm/thpmaps. > > It seems we have to do this though I think keeping KPF_THP and adding a > separate bit like KPF_PMD_MAPPED makes more sense. but those tools > relying on KPF_THP need to realize this and check the new bit , which is > not done now. > whether the mTHP's name is mTHP or THP will make no difference for > this case:-) I don't quite follow your logic for that last part; If there are 2 separate bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that would be a safe/compatible approach, right? Where as your suggestion requires changes to existing tools to work. Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new flag; We don't currently expose the term "mTHP" to user space. I can't think of a better name though. I'd still like to understand what is actually broken that this change is fixing. Is the concern that a user could see KPF_THP and advance forward by "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries? > >> >> >> -- >> Best Regards, >> Yan, Zi >> > > Thanks > Barry
On Thu, Jun 27, 2024 at 8:39 PM Ryan Roberts <ryan.roberts@arm.com> wrote: > > On 27/06/2024 05:10, Barry Song wrote: > > On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote: > >> > >> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: > >>> On 26/06/2024 04:06, Zi Yan wrote: > >>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: > >>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > >>>>> > >>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > >>>>> pages, which means of any order, but KPF_THP should only be set > >>>>> when the folio is a 2M pmd mappable THP. > >>> > >>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is > >>> currently configured? > >>> > >>> I would argue that mTHP is still THP so should still have the flag. And since > >>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space > >>> will be enabling them, so I'll naively state that it should not cause compat > >>> issues as is. > >>> > >>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP > >>> sizes to function correctly. So that would need to be reworked if making this > >>> change. > >> > >> + more folks working on mTHP > >> > >> I agree that mTHP is still THP, but we might want different > >> stats/counters for it, since people might want to keep the old THP counters > >> consistent. See recent commits on adding mTHP counters: > >> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback > >> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") > >> > >> and changes to make THP counter to only count PMD THP: > >> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for > >> THP split statistics") > >> > >> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some > >> adjustment on tools/mm/thpmaps. > > > > It seems we have to do this though I think keeping KPF_THP and adding a > > separate bit like KPF_PMD_MAPPED makes more sense. but those tools > > relying on KPF_THP need to realize this and check the new bit , which is > > not done now. > > whether the mTHP's name is mTHP or THP will make no difference for > > this case:-) > > I don't quite follow your logic for that last part; If there are 2 separate > bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that > would be a safe/compatible approach, right? Where as your suggestion requires > changes to existing tools to work. Right, my point is that mTHP and THP are both types of THP. The only difference is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how the page is mapped would more accurately reflect reality. However, this change would disrupt tools that assume KPF_THP always means PMD-mapped THP. Therefore, we would still need separate bits for THP and mTHP in this case. I saw Willy complain about mTHP being called "mTHP," but in this case, calling it "mTHP" or just "THP" doesn't change anything if old tools continue to assume that KPF_THP means PMD-mapped THP. > > Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new > flag; We don't currently expose the term "mTHP" to user space. I can't think of > a better name though. Yes. If "compatibility" is a requirement, we cannot disregard it. > I'd still like to understand what is actually broken that this change is fixing. > Is the concern that a user could see KPF_THP and advance forward by > "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries? > Maybe we need an example which is thinking that KPF_THP is PMD-mapped. > > > >> > >> > >> -- > >> Best Regards, > >> Yan, Zi > >> > > Thanks Barry
On 27/06/2024 10:16, Barry Song wrote: > On Thu, Jun 27, 2024 at 8:39 PM Ryan Roberts <ryan.roberts@arm.com> wrote: >> >> On 27/06/2024 05:10, Barry Song wrote: >>> On Thu, Jun 27, 2024 at 2:40 AM Zi Yan <ziy@nvidia.com> wrote: >>>> >>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: >>>>> On 26/06/2024 04:06, Zi Yan wrote: >>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>>>>>> >>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>>>>> pages, which means of any order, but KPF_THP should only be set >>>>>>> when the folio is a 2M pmd mappable THP. >>>>> >>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >>>>> currently configured? >>>>> >>>>> I would argue that mTHP is still THP so should still have the flag. And since >>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >>>>> will be enabling them, so I'll naively state that it should not cause compat >>>>> issues as is. >>>>> >>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >>>>> sizes to function correctly. So that would need to be reworked if making this >>>>> change. >>>> >>>> + more folks working on mTHP >>>> >>>> I agree that mTHP is still THP, but we might want different >>>> stats/counters for it, since people might want to keep the old THP counters >>>> consistent. See recent commits on adding mTHP counters: >>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback >>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") >>>> >>>> and changes to make THP counter to only count PMD THP: >>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for >>>> THP split statistics") >>>> >>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some >>>> adjustment on tools/mm/thpmaps. >>> >>> It seems we have to do this though I think keeping KPF_THP and adding a >>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools >>> relying on KPF_THP need to realize this and check the new bit , which is >>> not done now. >>> whether the mTHP's name is mTHP or THP will make no difference for >>> this case:-) >> >> I don't quite follow your logic for that last part; If there are 2 separate >> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that >> would be a safe/compatible approach, right? Where as your suggestion requires >> changes to existing tools to work. > > Right, my point is that mTHP and THP are both types of THP. The only difference > is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how > the page is mapped would more accurately reflect reality. However, this change > would disrupt tools that assume KPF_THP always means PMD-mapped THP. > Therefore, we would still need separate bits for THP and mTHP in this case. I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is whether PKF_THP implies a fixed size (and alignment). If compat is an issue, then PKF_THP must continue to imply PMD-size. If compat is not an issue, then size can be determined by iterating over the entries. Having a mechanism to determine the level at which a block is mapped would potentially be a useful feature, but seems orthogonal to me. > > I saw Willy complain about mTHP being called "mTHP," but in this case, calling > it "mTHP" or just "THP" doesn't change anything if old tools continue to assume > that KPF_THP means PMD-mapped THP. I think Willy was just ribbing me because he preferred calling it "anonymous large folios". That's how I took it anyway. > >> >> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new >> flag; We don't currently expose the term "mTHP" to user space. I can't think of >> a better name though. > > Yes. If "compatibility" is a requirement, we cannot disregard it. > >> I'd still like to understand what is actually broken that this change is fixing. >> Is the concern that a user could see KPF_THP and advance forward by >> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries? >> > > Maybe we need an example which is thinking that KPF_THP is PMD-mapped. Yes, that would help. > >>> >>>> >>>> >>>> -- >>>> Best Regards, >>>> Yan, Zi >>>> >>> > > Thanks > Barry
>On 27/06/2024 10:16, Barry Song wrote: >> On Thu, Jun 27, 2024 at 8:39?PM Ryan Roberts <ryan.roberts@arm.com> wrote: >>> >>> On 27/06/2024 05:10, Barry Song wrote: >>>> On Thu, Jun 27, 2024 at 2:40?AM Zi Yan <ziy@nvidia.com> wrote: >>>>> >>>>> On Wed Jun 26, 2024 at 7:07 AM EDT, Ryan Roberts wrote: >>>>>> On 26/06/2024 04:06, Zi Yan wrote: >>>>>>> On Tue Jun 25, 2024 at 10:49 PM EDT, ran xiaokai wrote: >>>>>>>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>>>>>>> >>>>>>>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>>>>>>> pages, which means of any order, but KPF_THP should only be set >>>>>>>> when the folio is a 2M pmd mappable THP. >>>>>> >>>>>> Why should KPF_THP only be set on 2M THP? What problem does it cause as it is >>>>>> currently configured? >>>>>> >>>>>> I would argue that mTHP is still THP so should still have the flag. And since >>>>>> these smaller mTHP sizes are disabled by default, only mTHP-aware user space >>>>>> will be enabling them, so I'll naively state that it should not cause compat >>>>>> issues as is. >>>>>> >>>>>> Also, the script at tools/mm/thpmaps relies on KPF_THP being set for all mTHP >>>>>> sizes to function correctly. So that would need to be reworked if making this >>>>>> change. >>>>> >>>>> + more folks working on mTHP >>>>> >>>>> I agree that mTHP is still THP, but we might want different >>>>> stats/counters for it, since people might want to keep the old THP counters >>>>> consistent. See recent commits on adding mTHP counters: >>>>> ec33687c6749 ("mm: add per-order mTHP anon_fault_alloc and anon_fault_fallback >>>>> counters"), 1f97fd042f38 ("mm: shmem: add mTHP counters for anonymous shmem") >>>>> >>>>> and changes to make THP counter to only count PMD THP: >>>>> 835c3a25aa37 ("mm: huge_memory: add the missing folio_test_pmd_mappable() for >>>>> THP split statistics") >>>>> >>>>> In this case, I wonder if we want a new KPF_MTHP bit for mTHP and some >>>>> adjustment on tools/mm/thpmaps. >>>> >>>> It seems we have to do this though I think keeping KPF_THP and adding a >>>> separate bit like KPF_PMD_MAPPED makes more sense. but those tools >>>> relying on KPF_THP need to realize this and check the new bit , which is >>>> not done now. >>>> whether the mTHP's name is mTHP or THP will make no difference for >>>> this case:-) >>> >>> I don't quite follow your logic for that last part; If there are 2 separate >>> bits; KPF_THP and KPF_MTHP, and KPF_THP is only set for PMD-sized THP, that >>> would be a safe/compatible approach, right? Where as your suggestion requires >>> changes to existing tools to work. >> >> Right, my point is that mTHP and THP are both types of THP. The only difference >> is whether they are PMD-mapped or PTE-mapped. Adding a bit to describe how >> the page is mapped would more accurately reflect reality. However, this change >> would disrupt tools that assume KPF_THP always means PMD-mapped THP. >> Therefore, we would still need separate bits for THP and mTHP in this case. > >I think perhaps PTE- vs PMD-mapped is a separate issue. The issue at hand is >whether PKF_THP implies a fixed size (and alignment). If compat is an issue, >then PKF_THP must continue to imply PMD-size. If compat is not an issue, then >size can be determined by iterating over the entries. > >Having a mechanism to determine the level at which a block is mapped would >potentially be a useful feature, but seems orthogonal to me. > >> >> I saw Willy complain about mTHP being called "mTHP," but in this case, calling >> it "mTHP" or just "THP" doesn't change anything if old tools continue to assume >> that KPF_THP means PMD-mapped THP. > >I think Willy was just ribbing me because he preferred calling it "anonymous >large folios". That's how I took it anyway. > >> >>> >>> Thinking about this a bit more, I wonder if PKF_MTHP is the right name for a new >>> flag; We don't currently expose the term "mTHP" to user space. I can't think of >>> a better name though. >> >> Yes. If "compatibility" is a requirement, we cannot disregard it. >> >>> I'd still like to understand what is actually broken that this change is fixing. >>> Is the concern that a user could see KPF_THP and advance forward by >>> "/sys/kernel/mm/transparent_hugepage/hpage_pmd_size / getpagesize()" entries? >>> >> >> Maybe we need an example which is thinking that KPF_THP is PMD-mapped. > >Yes, that would help. For now it is the testcase in tools/testing/selftests/mm/split_huge_page_test, if we try to split THP to other orders other than 0, the testcase will break. Maybe we can use KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL to figure out the compound page's start/end and the order. But these two flags are not for userspace memory only.
On 26.06.24 04:49, ran xiaokai wrote: > From: Ran Xiaokai <ran.xiaokai@zte.com.cn> > > KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound > pages, which means of any order, but KPF_THP should only be set > when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df "should only be set" -- who says that? :) The documentation only talks about "Contiguous pages which construct transparent hugepages". Sure, when it was added there were only PMD ones. > ("mm: thp: support allocation of anonymous multi-size THP"), > multiple orders of folios can be allocated and mapped to userspace, > so the folio_test_large() check is not sufficient here, > replace it with folio_test_pmd_mappable() to fix this. > A couple of points: 1) If I am not daydreaming, ever since we supported non-PMD THP in the pagecache (much longer than anon mTHP), we already indicate KPF_THP for them here. So this is not anon specific? Or am I getting the PG_lru check all wrong? 2) Anon THP are disabled as default. If some custom tool cannot deal with that "extension" we did with smaller THP, it shall be updated if it really has to special-case PMD-mapped THP, before enabled by the admin. I think this interface does exactly what we want, as it is right now. Unless there is *good* reason, we should keep it like that. So I suggest a) Extend the documentation to just state "THP of any size and any mapping granularity" or sth like that. b) Maybe using folio_test_large_rmappable() instead of "(k & (1 << PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated properly. c) Whoever is interested in getting the folio size, use this flag along with a scan for the KPF_COMPOUND_HEAD. I'll note that, scanning documentation, PAGE_IS_HUGE currently only applies to PMD *mapped* THP. It doesn't consider PTE-mapped THP at all (including PMD-ones). Likely, documentation should be updated to state "PMD-mapped THP or any hugetlb page". > Also kpageflags is not only for userspace memory but for all valid pfn > pages,including slab pages or drivers used pages, so the PG_lru and > is_anon check are unnecessary here. I'm completely confused about that statements. We do have KPF_OFFLINE, KPF_PGTABLE, KPF_SLAB, ... I'm missing something important. > > Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> > --- > fs/proc/page.c | 14 ++++---------- > 1 file changed, 4 insertions(+), 10 deletions(-) > > diff --git a/fs/proc/page.c b/fs/proc/page.c > index 2fb64bdb64eb..3e7b70449c2f 100644 > --- a/fs/proc/page.c > +++ b/fs/proc/page.c > @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page) > u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); > else > u |= 1 << KPF_COMPOUND_TAIL; > + > if (folio_test_hugetlb(folio)) > u |= 1 << KPF_HUGE; > - /* > - * We need to check PageLRU/PageAnon > - * to make sure a given page is a thp, not a non-huge compound page. > - */ > - else if (folio_test_large(folio)) { > - if ((k & (1 << PG_lru)) || is_anon) > - u |= 1 << KPF_THP; > - else if (is_huge_zero_folio(folio)) { > + else if (folio_test_pmd_mappable(folio)) { folio_test_pmd_mappable() would not be future safe. It includes anything >= PMD_ORDER as well.
>On 26.06.24 04:49, ran xiaokai wrote: >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >> >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >> pages, which means of any order, but KPF_THP should only be set >> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df > >"should only be set" -- who says that? :) > >The documentation only talks about "Contiguous pages which construct >transparent hugepages". Sure, when it was added there were only PMD ones. Hi, David, I am working on tools/testing/selftests/mm/split_huge_page_test to support splitint THP to more orders, not only order-0, and the testcases failed. It seems the testcodes rely on KPF_THP to indicate only 2M THP. >> ("mm: thp: support allocation of anonymous multi-size THP"), >> multiple orders of folios can be allocated and mapped to userspace, >> so the folio_test_large() check is not sufficient here, >> replace it with folio_test_pmd_mappable() to fix this. >> > >A couple of points: > >1) If I am not daydreaming, ever since we supported non-PMD THP in the > pagecache (much longer than anon mTHP), we already indicate KPF_THP > for them here. So this is not anon specific? Or am I getting the > PG_lru check all wrong? Thanks for the clarification. >2) Anon THP are disabled as default. If some custom tool cannot deal > with that "extension" we did with smaller THP, it shall be updated if > it really has to special-case PMD-mapped THP, before enabled by the > admin. ok, it seems that it is the testcodes which should be updated. > >I think this interface does exactly what we want, as it is right now. >Unless there is *good* reason, we should keep it like that. > >So I suggest > >a) Extend the documentation to just state "THP of any size and any >mapping granularity" or sth like that. yes, it is neccessay to update the documentation to make it more clear. Since the definition of KPF_THP has been expanded since it was firstly introduced. >b) Maybe using folio_test_large_rmappable() instead of "(k & (1 << > PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated > properly. i will investigate on this. >c) Whoever is interested in getting the folio size, use this flag along > with a scan for the KPF_COMPOUND_HEAD. yes, we can use the combination of KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL to figure out the compound order. > >I'll note that, scanning documentation, PAGE_IS_HUGE currently only >applies to PMD *mapped* THP. It doesn't consider PTE-mapped THP at all >(including PMD-ones). Likely, documentation should be updated to state >"PMD-mapped THP or any hugetlb page". i will also investigate on this. >> Also kpageflags is not only for userspace memory but for all valid pfn >> pages,including slab pages or drivers used pages, so the PG_lru and >> is_anon check are unnecessary here. > >I'm completely confused about that statements. We do have KPF_OFFLINE, >KPF_PGTABLE, KPF_SLAB, ... I'm missing something important. My statement is wrong, please ignore this. >> >> Signed-off-by: Ran Xiaokai <ran.xiaokai@zte.com.cn> >> --- >> fs/proc/page.c | 14 ++++---------- >> 1 file changed, 4 insertions(+), 10 deletions(-) >> >> diff --git a/fs/proc/page.c b/fs/proc/page.c >> index 2fb64bdb64eb..3e7b70449c2f 100644 >> --- a/fs/proc/page.c >> +++ b/fs/proc/page.c >> @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page) >> u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); >> else >> u |= 1 << KPF_COMPOUND_TAIL; >> + >> if (folio_test_hugetlb(folio)) >> u |= 1 << KPF_HUGE; >> - /* >> - * We need to check PageLRU/PageAnon >> - * to make sure a given page is a thp, not a non-huge compound page. >> - */ >> - else if (folio_test_large(folio)) { >> - if ((k & (1 << PG_lru)) || is_anon) >> - u |= 1 << KPF_THP; >> - else if (is_huge_zero_folio(folio)) { >> + else if (folio_test_pmd_mappable(folio)) { > >folio_test_pmd_mappable() would not be future safe. It includes anything > >= PMD_ORDER as well. > > >-- >Cheers, > >David / dhildenb
>On 26.06.24 04:49, ran xiaokai wrote: >> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >> >> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >> pages, which means of any order, but KPF_THP should only be set >> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df > >"should only be set" -- who says that? :) > >The documentation only talks about "Contiguous pages which construct >transparent hugepages". Sure, when it was added there were only PMD ones. > > >> ("mm: thp: support allocation of anonymous multi-size THP"), >> multiple orders of folios can be allocated and mapped to userspace, >> so the folio_test_large() check is not sufficient here, >> replace it with folio_test_pmd_mappable() to fix this. >> > >A couple of points: > >1) If I am not daydreaming, ever since we supported non-PMD THP in the > pagecache (much longer than anon mTHP), we already indicate KPF_THP > for them here. So this is not anon specific? Or am I getting the > PG_lru check all wrong? > >2) Anon THP are disabled as default. If some custom tool cannot deal > with that "extension" we did with smaller THP, it shall be updated if > it really has to special-case PMD-mapped THP, before enabled by the > admin. > > >I think this interface does exactly what we want, as it is right now. >Unless there is *good* reason, we should keep it like that. > >So I suggest > >a) Extend the documentation to just state "THP of any size and any >mapping granularity" or sth like that. > >b) Maybe using folio_test_large_rmappable() instead of "(k & (1 << > PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated > properly. Hi, David, The "is_anon" check was introduced to also include page vector cache pages, but now large folios are added to lru list directly, bypassed the pagevec cache. So the is_anon check seems unnecessary here. As now pagecache also supports large folios, the is_anon check seems unsufficient here. Can i say that for userspace memory, folio_test_large_rmappable() == folio_test_large()? if that is true, except the "if ((k & (1 << PG_lru)) || is_anon)" check, we can also remove the folio_test_large() check, like this: else if (folio_test_large_rmappable(folio)) { u |= 1 << KPF_THP; else if (is_huge_zero_folio(folio)) { u |= 1 << KPF_ZERO_PAGE; u |= 1 << KPF_THP; } } else if (is_zero_pfn(page_to_pfn(page))) This will also include the isolated folios. >c) Whoever is interested in getting the folio size, use this flag along > with a scan for the KPF_COMPOUND_HEAD. > > >I'll note that, scanning documentation, PAGE_IS_HUGE currently only >applies to PMD *mapped* THP. It doesn't consider PTE-mapped THP at all >(including PMD-ones). Likely, documentation should be updated to state >"PMD-mapped THP or any hugetlb page". > >> Also kpageflags is not only for userspace memory but for all valid pfn >> pages,including slab pages or drivers used pages, so the PG_lru and >> is_anon check are unnecessary here. > >I'm completely confused about that statements. We do have KPF_OFFLINE, >KPF_PGTABLE, KPF_SLAB, ... I'm missing something important.
On 03.07.24 11:20, ran xiaokai wrote: >> On 26.06.24 04:49, ran xiaokai wrote: >>> From: Ran Xiaokai <ran.xiaokai@zte.com.cn> >>> >>> KPF_COMPOUND_HEAD and KPF_COMPOUND_TAIL are set on "common" compound >>> pages, which means of any order, but KPF_THP should only be set >>> when the folio is a 2M pmd mappable THP. Since commit 19eaf44954df >> >> "should only be set" -- who says that? :) >> >> The documentation only talks about "Contiguous pages which construct >> transparent hugepages". Sure, when it was added there were only PMD ones. >> >> >>> ("mm: thp: support allocation of anonymous multi-size THP"), >>> multiple orders of folios can be allocated and mapped to userspace, >>> so the folio_test_large() check is not sufficient here, >>> replace it with folio_test_pmd_mappable() to fix this. >>> >> >> A couple of points: >> >> 1) If I am not daydreaming, ever since we supported non-PMD THP in the >> pagecache (much longer than anon mTHP), we already indicate KPF_THP >> for them here. So this is not anon specific? Or am I getting the >> PG_lru check all wrong? >> >> 2) Anon THP are disabled as default. If some custom tool cannot deal >> with that "extension" we did with smaller THP, it shall be updated if >> it really has to special-case PMD-mapped THP, before enabled by the >> admin. >> >> >> I think this interface does exactly what we want, as it is right now. >> Unless there is *good* reason, we should keep it like that. >> >> So I suggest >> >> a) Extend the documentation to just state "THP of any size and any >> mapping granularity" or sth like that. >> >> b) Maybe using folio_test_large_rmappable() instead of "(k & (1 << >> PG_lru)) || is_anon", so even isolated-from-LRU THPs are indicated >> properly. > > Hi, David, > > The "is_anon" check was introduced to also include page vector cache > pages, but now large folios are added to lru list directly, bypassed > the pagevec cache. So the is_anon check seems unnecessary here. > As now pagecache also supports large folios, the is_anon check seems > unsufficient here. > > Can i say that for userspace memory, > folio_test_large_rmappable() == folio_test_large()? > if that is true, except the "if ((k & (1 << PG_lru)) || is_anon)" > check, we can also remove the folio_test_large() check, > like this: > > else if (folio_test_large_rmappable(folio)) { > u |= 1 << KPF_THP; > else if (is_huge_zero_folio(folio)) { > u |= 1 << KPF_ZERO_PAGE; > u |= 1 << KPF_THP; > } > } else if (is_zero_pfn(page_to_pfn(page))) > > This will also include the isolated folios. You'll have to keep the folio_test_large() check, folio_test_large_rmappable() wants us to check that ahead of time. Something like ... else if (folio_test_large(folio) && folio_test_large_rmappable(folio)) { /* Note: we indicate any THPs here, not just PMD-sized ones */ u |= 1 << KPF_THP; } else if (is_huge_zero_folio(folio)) { u |= 1 << KPF_ZERO_PAGE; u |= 1 << KPF_THP } else if (is_zero_pfn(page_to_pfn(page))) { u |= 1 << KPF_ZERO_PAGE; } Would likely work and keep the existing behavior (+ include isolated ones).
diff --git a/fs/proc/page.c b/fs/proc/page.c index 2fb64bdb64eb..3e7b70449c2f 100644 --- a/fs/proc/page.c +++ b/fs/proc/page.c @@ -146,19 +146,13 @@ u64 stable_page_flags(const struct page *page) u |= kpf_copy_bit(k, KPF_COMPOUND_HEAD, PG_head); else u |= 1 << KPF_COMPOUND_TAIL; + if (folio_test_hugetlb(folio)) u |= 1 << KPF_HUGE; - /* - * We need to check PageLRU/PageAnon - * to make sure a given page is a thp, not a non-huge compound page. - */ - else if (folio_test_large(folio)) { - if ((k & (1 << PG_lru)) || is_anon) - u |= 1 << KPF_THP; - else if (is_huge_zero_folio(folio)) { + else if (folio_test_pmd_mappable(folio)) { + u |= 1 << KPF_THP; + if (is_huge_zero_folio(folio)) u |= 1 << KPF_ZERO_PAGE; - u |= 1 << KPF_THP; - } } else if (is_zero_pfn(page_to_pfn(page))) u |= 1 << KPF_ZERO_PAGE;