Message ID | ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating | expand |
On Tue, 10 May 2022 11:45:59 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When migrating a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it and remap it with > a migration pte entry. This is correct for PMD or PUD size hugetlb, > since they always contain only one pmd entry or pud entry in the > page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes. So we will nuke or remap only one pte > or pmd entry for this CONT-PTE/PMD size hugetlb page, which is > not expected for hugetlb migration. The problem is we can still > continue to modify the subpages' data of a hugetlb page during > migrating a hugetlb page, which can cause a serious data consistent > issue, since we did not nuke the page table entry and set a > migration pte for the subpages of a hugetlb page. > > To fix this issue, we should change to use huge_ptep_clear_flush() > to nuke a hugetlb page table, and remap it with set_huge_pte_at() > and set_huge_swap_pte_at() when migrating a hugetlb page, which > already considered the CONT-PTE or CONT-PMD size hugetlb. > > ... > > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr > pte_t *ptep, pte_t pte, unsigned long sz) > { > } > + > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep) > +{ > + return ptep_get(ptep); > +} > + > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, pte_t pte) > +{ > +} > #endif /* CONFIG_HUGETLB_PAGE */ > This blows up nommu (arm allnoconfig): In file included from fs/io_uring.c:71: ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply zapped this change. --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1093,17 +1093,6 @@ static inline void set_huge_swap_pte_at( pte_t *ptep, pte_t pte, unsigned long sz) { } - -static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) -{ - return ptep_get(ptep); -} - -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ -} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h,
On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > + > > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > > + unsigned long addr, pte_t *ptep) > > +{ > > + return ptep_get(ptep); > > +} > > + > > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > > + pte_t *ptep, pte_t pte) > > +{ > > +} > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > This blows up nommu (arm allnoconfig): > > In file included from fs/io_uring.c:71: > ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': > ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] > 1100 | return ptep_get(ptep); > | ^~~~~~~~ > > > huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply > zapped this change. > Well that wasn't a great success. Doing this instead. It's pretty nasty - something nicer would be nicer please. --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at( { } +#ifdef CONFIG_MMU static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { @@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc pte_t *ptep, pte_t pte) { } +#endif #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h,
Hi Baolin, I love your patch! Yet something to improve: [auto build test ERROR on akpm-mm/mm-everything] [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc6] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch] url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753 base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything config: h8300-buildonly-randconfig-r001-20220509 (https://download.01.org/0day-ci/archive/20220511/202205110919.CWIcIqYE-lkp@intel.com/config) compiler: h8300-linux-gcc (GCC) 11.3.0 reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # https://github.com/intel-lab-lkp/linux/commit/b666792b4c5f9774c350977ff88837bafc36365a git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753 git checkout b666792b4c5f9774c350977ff88837bafc36365a # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=h8300 SHELL=/bin/bash If you fix the issue, kindly add following tag as appropriate Reported-by: kernel test robot <lkp@intel.com> All errors (new ones prefixed by >>): In file included from kernel/fork.c:51: include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ >> include/linux/hugetlb.h:1100:16: error: incompatible types when returning type 'int' but 'pte_t' was expected 1100 | return ptep_get(ptep); | ^~~~~~~~~~~~~~ kernel/fork.c: At top level: kernel/fork.c:162:13: warning: no previous prototype for 'arch_release_task_struct' [-Wmissing-prototypes] 162 | void __weak arch_release_task_struct(struct task_struct *tsk) | ^~~~~~~~~~~~~~~~~~~~~~~~ kernel/fork.c:847:20: warning: no previous prototype for 'arch_task_cache_init' [-Wmissing-prototypes] 847 | void __init __weak arch_task_cache_init(void) { } | ^~~~~~~~~~~~~~~~~~~~ kernel/fork.c:942:12: warning: no previous prototype for 'arch_dup_task_struct' [-Wmissing-prototypes] 942 | int __weak arch_dup_task_struct(struct task_struct *dst, | ^~~~~~~~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- In file included from kernel/sysctl.c:46: include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ >> include/linux/hugetlb.h:1100:16: error: incompatible types when returning type 'int' but 'pte_t' was expected 1100 | return ptep_get(ptep); | ^~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- In file included from include/linux/migrate.h:8, from mm/page_alloc.c:62: include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ >> include/linux/hugetlb.h:1100:16: error: incompatible types when returning type 'int' but 'pte_t' was expected 1100 | return ptep_get(ptep); | ^~~~~~~~~~~~~~ In file included from include/asm-generic/bug.h:5, from arch/h8300/include/asm/bug.h:8, from include/linux/bug.h:5, from include/linux/mmdebug.h:5, from include/linux/mm.h:6, from mm/page_alloc.c:19: mm/page_alloc.c: In function 'free_pages': include/asm-generic/page.h:89:51: warning: ordered comparison of pointer with null pointer [-Wextra] 89 | #define virt_addr_valid(kaddr) (((void *)(kaddr) >= (void *)PAGE_OFFSET) && \ | ^~ include/linux/compiler.h:78:45: note: in definition of macro 'unlikely' 78 | # define unlikely(x) __builtin_expect(!!(x), 0) | ^ include/linux/mmdebug.h:17:25: note: in expansion of macro 'BUG_ON' 17 | #define VM_BUG_ON(cond) BUG_ON(cond) | ^~~~~~ mm/page_alloc.c:5489:17: note: in expansion of macro 'VM_BUG_ON' 5489 | VM_BUG_ON(!virt_addr_valid((void *)addr)); | ^~~~~~~~~ mm/page_alloc.c:5489:28: note: in expansion of macro 'virt_addr_valid' 5489 | VM_BUG_ON(!virt_addr_valid((void *)addr)); | ^~~~~~~~~~~~~~~ cc1: some warnings being treated as errors -- In file included from fs/io_uring.c:71: include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ >> include/linux/hugetlb.h:1100:16: error: incompatible types when returning type 'int' but 'pte_t' was expected 1100 | return ptep_get(ptep); | ^~~~~~~~~~~~~~ fs/io_uring.c: In function '__io_submit_flush_completions': fs/io_uring.c:2660:40: warning: variable 'prev' set but not used [-Wunused-but-set-variable] 2660 | struct io_wq_work_node *node, *prev; | ^~~~ cc1: some warnings being treated as errors vim +/ptep_get +1100 include/linux/hugetlb.h 1096 1097 static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, 1098 unsigned long addr, pte_t *ptep) 1099 { > 1100 return ptep_get(ptep); 1101 } 1102
On Wed, 11 May 2022 09:19:17 +0800 kernel test robot <lkp@intel.com> wrote: > Hi Baolin, > > I love your patch! Yet something to improve: > > [auto build test ERROR on akpm-mm/mm-everything] > [cannot apply to hnaz-mm/master arm64/for-next/core linus/master v5.18-rc6] > [If your patch is applied to the wrong git tree, kindly drop us a note. > And when submitting patch, we suggest to use '--base' as documented in > https://git-scm.com/docs/git-format-patch] > > url: https://github.com/intel-lab-lkp/linux/commits/Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753 > base: https://git.kernel.org/pub/scm/linux/kernel/git/akpm/mm.git mm-everything > config: h8300-buildonly-randconfig-r001-20220509 (https://download.01.org/0day-ci/archive/20220511/202205110919.CWIcIqYE-lkp@intel.com/config) > compiler: h8300-linux-gcc (GCC) 11.3.0 > reproduce (this is a W=1 build): > wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross > chmod +x ~/bin/make.cross > # https://github.com/intel-lab-lkp/linux/commit/b666792b4c5f9774c350977ff88837bafc36365a > git remote add linux-review https://github.com/intel-lab-lkp/linux > git fetch --no-tags linux-review Baolin-Wang/Fix-CONT-PTE-PMD-size-hugetlb-issue-when-unmapping-or-migrating/20220510-114753 > git checkout b666792b4c5f9774c350977ff88837bafc36365a > # save the config file > mkdir build_dir && cp config build_dir/.config > COMPILER_INSTALL_PATH=$HOME/0day COMPILER=gcc-11.3.0 make.cross W=1 O=build_dir ARCH=h8300 SHELL=/bin/bash > > ... Thanks. https://git.kernel.org/pub/scm/linux/kernel/git/akpm/25-new.git/tree/patches/mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix.patch (which is in current mm-everything) fixes this up.
On 5/11/2022 7:28 AM, Andrew Morton wrote: > On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > >>> + >>> +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, >>> + unsigned long addr, pte_t *ptep) >>> +{ >>> + return ptep_get(ptep); >>> +} >>> + >>> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, >>> + pte_t *ptep, pte_t pte) >>> +{ >>> +} >>> #endif /* CONFIG_HUGETLB_PAGE */ >>> >> >> This blows up nommu (arm allnoconfig): >> >> In file included from fs/io_uring.c:71: >> ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] >> 1100 | return ptep_get(ptep); >> | ^~~~~~~~ >> >> >> huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply >> zapped this change. >> > > Well that wasn't a great success. Doing this instead. It's pretty > nasty - something nicer would be nicer please. Thanks for fixing the building issue. I'll look at this to simplify the dummy function. Myabe just remove the ptep_get(). diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1097,7 +1097,7 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return ptep_get(ptep); + return *ptep; } > > --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix > +++ a/include/linux/hugetlb.h > @@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at( > { > } > > +#ifdef CONFIG_MMU > static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep) > { > @@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc > pte_t *ptep, pte_t pte) > { > } > +#endif > #endif /* CONFIG_HUGETLB_PAGE */ > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > _
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24..4e96daf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (folio_test_hugetlb(folio)) + set_huge_swap_pte_at(mm, address, pvmw.pte, + swp_pte, vma_mmu_pagesize(vma)); + else + set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), compound_order(&folio->page)); /*