Message ID | ea5abf529f0997b5430961012bfda6166c1efc8c.1652147571.git.baolin.wang@linux.alibaba.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | Fix CONT-PTE/PMD size hugetlb issue when unmapping or migrating | expand |
On Tue, 10 May 2022 11:45:59 +0800 Baolin Wang <baolin.wang@linux.alibaba.com> wrote: > On some architectures (like ARM64), it can support CONT-PTE/PMD size > hugetlb, which means it can support not only PMD/PUD size hugetlb: > 2M and 1G, but also CONT-PTE/PMD size: 64K and 32M if a 4K page > size specified. > > When migrating a hugetlb page, we will get the relevant page table > entry by huge_pte_offset() only once to nuke it and remap it with > a migration pte entry. This is correct for PMD or PUD size hugetlb, > since they always contain only one pmd entry or pud entry in the > page table. > > However this is incorrect for CONT-PTE and CONT-PMD size hugetlb, > since they can contain several continuous pte or pmd entry with > same page table attributes. So we will nuke or remap only one pte > or pmd entry for this CONT-PTE/PMD size hugetlb page, which is > not expected for hugetlb migration. The problem is we can still > continue to modify the subpages' data of a hugetlb page during > migrating a hugetlb page, which can cause a serious data consistent > issue, since we did not nuke the page table entry and set a > migration pte for the subpages of a hugetlb page. > > To fix this issue, we should change to use huge_ptep_clear_flush() > to nuke a hugetlb page table, and remap it with set_huge_pte_at() > and set_huge_swap_pte_at() when migrating a hugetlb page, which > already considered the CONT-PTE or CONT-PMD size hugetlb. > > ... > > --- a/include/linux/hugetlb.h > +++ b/include/linux/hugetlb.h > @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr > pte_t *ptep, pte_t pte, unsigned long sz) > { > } > + > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > + unsigned long addr, pte_t *ptep) > +{ > + return ptep_get(ptep); > +} > + > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > + pte_t *ptep, pte_t pte) > +{ > +} > #endif /* CONFIG_HUGETLB_PAGE */ > This blows up nommu (arm allnoconfig): In file included from fs/io_uring.c:71: ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] 1100 | return ptep_get(ptep); | ^~~~~~~~ huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply zapped this change. --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1093,17 +1093,6 @@ static inline void set_huge_swap_pte_at( pte_t *ptep, pte_t pte, unsigned long sz) { } - -static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, - unsigned long addr, pte_t *ptep) -{ - return ptep_get(ptep); -} - -static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, - pte_t *ptep, pte_t pte) -{ -} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h,
On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > > + > > +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > > + unsigned long addr, pte_t *ptep) > > +{ > > + return ptep_get(ptep); > > +} > > + > > +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, > > + pte_t *ptep, pte_t pte) > > +{ > > +} > > #endif /* CONFIG_HUGETLB_PAGE */ > > > > This blows up nommu (arm allnoconfig): > > In file included from fs/io_uring.c:71: > ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': > ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] > 1100 | return ptep_get(ptep); > | ^~~~~~~~ > > > huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply > zapped this change. > Well that wasn't a great success. Doing this instead. It's pretty nasty - something nicer would be nicer please. --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix +++ a/include/linux/hugetlb.h @@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at( { } +#ifdef CONFIG_MMU static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { @@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc pte_t *ptep, pte_t pte) { } +#endif #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h,
On 5/11/2022 7:28 AM, Andrew Morton wrote: > On Tue, 10 May 2022 16:17:39 -0700 Andrew Morton <akpm@linux-foundation.org> wrote: > >>> + >>> +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, >>> + unsigned long addr, pte_t *ptep) >>> +{ >>> + return ptep_get(ptep); >>> +} >>> + >>> +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, >>> + pte_t *ptep, pte_t pte) >>> +{ >>> +} >>> #endif /* CONFIG_HUGETLB_PAGE */ >>> >> >> This blows up nommu (arm allnoconfig): >> >> In file included from fs/io_uring.c:71: >> ./include/linux/hugetlb.h: In function 'huge_ptep_clear_flush': >> ./include/linux/hugetlb.h:1100:16: error: implicit declaration of function 'ptep_get' [-Werror=implicit-function-declaration] >> 1100 | return ptep_get(ptep); >> | ^~~~~~~~ >> >> >> huge_ptep_clear_flush() is only used in CONFIG_NOMMU=n files, so I simply >> zapped this change. >> > > Well that wasn't a great success. Doing this instead. It's pretty > nasty - something nicer would be nicer please. Thanks for fixing the building issue. I'll look at this to simplify the dummy function. Myabe just remove the ptep_get(). diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1097,7 +1097,7 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, unsigned long addr, pte_t *ptep) { - return ptep_get(ptep); + return *ptep; } > > --- a/include/linux/hugetlb.h~mm-rmap-fix-cont-pte-pmd-size-hugetlb-issue-when-migration-fix > +++ a/include/linux/hugetlb.h > @@ -1094,6 +1094,7 @@ static inline void set_huge_swap_pte_at( > { > } > > +#ifdef CONFIG_MMU > static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, > unsigned long addr, pte_t *ptep) > { > @@ -1104,6 +1105,7 @@ static inline void set_huge_pte_at(struc > pte_t *ptep, pte_t pte) > { > } > +#endif > #endif /* CONFIG_HUGETLB_PAGE */ > > static inline spinlock_t *huge_pte_lock(struct hstate *h, > _
diff --git a/include/linux/hugetlb.h b/include/linux/hugetlb.h index 306d6ef..9f71043 100644 --- a/include/linux/hugetlb.h +++ b/include/linux/hugetlb.h @@ -1093,6 +1093,17 @@ static inline void set_huge_swap_pte_at(struct mm_struct *mm, unsigned long addr pte_t *ptep, pte_t pte, unsigned long sz) { } + +static inline pte_t huge_ptep_clear_flush(struct vm_area_struct *vma, + unsigned long addr, pte_t *ptep) +{ + return ptep_get(ptep); +} + +static inline void set_huge_pte_at(struct mm_struct *mm, unsigned long addr, + pte_t *ptep, pte_t pte) +{ +} #endif /* CONFIG_HUGETLB_PAGE */ static inline spinlock_t *huge_pte_lock(struct hstate *h, diff --git a/mm/rmap.c b/mm/rmap.c index 94d6b24..4e96daf 100644 --- a/mm/rmap.c +++ b/mm/rmap.c @@ -1926,13 +1926,15 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, break; } } + + /* Nuke the hugetlb page table entry */ + pteval = huge_ptep_clear_flush(vma, address, pvmw.pte); } else { flush_cache_page(vma, address, pte_pfn(*pvmw.pte)); + /* Nuke the page table entry. */ + pteval = ptep_clear_flush(vma, address, pvmw.pte); } - /* Nuke the page table entry. */ - pteval = ptep_clear_flush(vma, address, pvmw.pte); - /* Set the dirty flag on the folio now the pte is gone. */ if (pte_dirty(pteval)) folio_mark_dirty(folio); @@ -2017,7 +2019,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, pte_t swp_pte; if (arch_unmap_one(mm, vma, address, pteval) < 0) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -2026,7 +2031,10 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, !anon_exclusive, subpage); if (anon_exclusive && page_try_share_anon_rmap(subpage)) { - set_pte_at(mm, address, pvmw.pte, pteval); + if (folio_test_hugetlb(folio)) + set_huge_pte_at(mm, address, pvmw.pte, pteval); + else + set_pte_at(mm, address, pvmw.pte, pteval); ret = false; page_vma_mapped_walk_done(&pvmw); break; @@ -2052,7 +2060,11 @@ static bool try_to_migrate_one(struct folio *folio, struct vm_area_struct *vma, swp_pte = pte_swp_mksoft_dirty(swp_pte); if (pte_uffd_wp(pteval)) swp_pte = pte_swp_mkuffd_wp(swp_pte); - set_pte_at(mm, address, pvmw.pte, swp_pte); + if (folio_test_hugetlb(folio)) + set_huge_swap_pte_at(mm, address, pvmw.pte, + swp_pte, vma_mmu_pagesize(vma)); + else + set_pte_at(mm, address, pvmw.pte, swp_pte); trace_set_migration_pte(address, pte_val(swp_pte), compound_order(&folio->page)); /*