Message ID | 20230218002819.1486479-15-jthoughton@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | hugetlb: introduce HugeTLB high-granularity mapping | expand |
Hi James, Thank you for the patch! Yet something to improve: [auto build test ERROR on next-20230217] [cannot apply to kvm/queue shuah-kselftest/next shuah-kselftest/fixes arnd-asm-generic/master linus/master kvm/linux-next v6.2-rc8 v6.2-rc7 v6.2-rc6 v6.2-rc8] [If your patch is applied to the wrong git tree, kindly drop us a note. And when submitting patch, we suggest to use '--base' as documented in https://git-scm.com/docs/git-format-patch#_base_tree_information] url: https://github.com/intel-lab-lkp/linux/commits/James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216 patch link: https://lore.kernel.org/r/20230218002819.1486479-15-jthoughton%40google.com patch subject: [PATCH v2 14/46] hugetlb: split PTE markers when doing HGM walks config: powerpc-randconfig-r001-20230217 (https://download.01.org/0day-ci/archive/20230219/202302190304.YdPwtMZS-lkp@intel.com/config) compiler: clang version 17.0.0 (https://github.com/llvm/llvm-project db89896bbbd2251fff457699635acbbedeead27f) reproduce (this is a W=1 build): wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross chmod +x ~/bin/make.cross # install powerpc cross compiling tool for clang build # apt-get install binutils-powerpc-linux-gnu # https://github.com/intel-lab-lkp/linux/commit/55c33d65b06ad109b87a418540fe98f7365185d4 git remote add linux-review https://github.com/intel-lab-lkp/linux git fetch --no-tags linux-review James-Houghton/hugetlb-don-t-set-PageUptodate-for-UFFDIO_CONTINUE/20230218-083216 git checkout 55c33d65b06ad109b87a418540fe98f7365185d4 # save the config file mkdir build_dir && cp config build_dir/.config COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc olddefconfig COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross W=1 O=build_dir ARCH=powerpc SHELL=/bin/bash If you fix the issue, kindly add following tag where applicable | Reported-by: kernel test robot <lkp@intel.com> | Link: https://lore.kernel.org/oe-kbuild-all/202302190304.YdPwtMZS-lkp@intel.com/ All errors (new ones prefixed by >>): In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:640: arch/powerpc/include/asm/io-defs.h:47:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(insl, (unsigned long p, void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:104:1: note: expanded from here __do_insl ^ arch/powerpc/include/asm/io.h:579:56: note: expanded from macro '__do_insl' #define __do_insl(p, b, n) readsl((PCI_IO_ADDR)_IO_BASE+(p), (b), (n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from mm/hugetlb.c:11: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:640: arch/powerpc/include/asm/io-defs.h:49:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsb, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:106:1: note: expanded from here __do_outsb ^ arch/powerpc/include/asm/io.h:580:58: note: expanded from macro '__do_outsb' #define __do_outsb(p, b, n) writesb((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from mm/hugetlb.c:11: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:640: arch/powerpc/include/asm/io-defs.h:51:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsw, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:108:1: note: expanded from here __do_outsw ^ arch/powerpc/include/asm/io.h:581:58: note: expanded from macro '__do_outsw' #define __do_outsw(p, b, n) writesw((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ In file included from mm/hugetlb.c:11: In file included from include/linux/highmem.h:12: In file included from include/linux/hardirq.h:11: In file included from arch/powerpc/include/asm/hardirq.h:6: In file included from include/linux/irq.h:20: In file included from include/linux/io.h:13: In file included from arch/powerpc/include/asm/io.h:640: arch/powerpc/include/asm/io-defs.h:53:1: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic] DEF_PCI_AC_NORET(outsl, (unsigned long p, const void *b, unsigned long c), ^~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~ arch/powerpc/include/asm/io.h:637:3: note: expanded from macro 'DEF_PCI_AC_NORET' __do_##name al; \ ^~~~~~~~~~~~~~ <scratch space>:110:1: note: expanded from here __do_outsl ^ arch/powerpc/include/asm/io.h:582:58: note: expanded from macro '__do_outsl' #define __do_outsl(p, b, n) writesl((PCI_IO_ADDR)_IO_BASE+(p),(b),(n)) ~~~~~~~~~~~~~~~~~~~~~^ mm/hugetlb.c:653:8: error: call to undeclared function '__pte_alloc_one'; ISO C99 and later do not support implicit function declarations [-Wimplicit-function-declaration] new = __pte_alloc_one(mm, GFP_PGTABLE_USER); ^ mm/hugetlb.c:653:8: note: did you mean 'pte_alloc_one'? arch/powerpc/include/asm/pgalloc.h:30:25: note: 'pte_alloc_one' declared here static inline pgtable_t pte_alloc_one(struct mm_struct *mm) ^ mm/hugetlb.c:653:28: error: use of undeclared identifier 'GFP_PGTABLE_USER' new = __pte_alloc_one(mm, GFP_PGTABLE_USER); ^ mm/hugetlb.c:660:25: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types] pgtable_pte_page_dtor(new); ^~~ include/linux/mm.h:2661:55: note: passing argument to parameter 'page' here static inline void pgtable_pte_page_dtor(struct page *page) ^ mm/hugetlb.c:661:3: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'struct page *' [-Werror,-Wincompatible-pointer-types] __free_page(new); ^~~~~~~~~~~~~~~~ include/linux/gfp.h:319:40: note: expanded from macro '__free_page' #define __free_page(page) __free_pages((page), 0) ^~~~~~ include/linux/gfp.h:302:39: note: passing argument to parameter 'page' here extern void __free_pages(struct page *page, unsigned int order); ^ >> mm/hugetlb.c:666:44: error: incompatible pointer types passing 'pgtable_t' (aka 'unsigned long *') to parameter of type 'const struct page *' [-Werror,-Wincompatible-pointer-types] hugetlb_install_markers_pte(page_address(new), marker); ^~~ include/linux/mm.h:2001:39: note: passing argument to parameter 'page' here void *page_address(const struct page *page); ^ 6 warnings and 5 errors generated. vim +666 mm/hugetlb.c 606 607 /* 608 * hugetlb_alloc_pte -- Allocate a PTE beneath a pmd_none PMD-level hpte. 609 * 610 * See the comment above hugetlb_alloc_pmd. 611 */ 612 pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, 613 unsigned long addr) 614 { 615 spinlock_t *ptl = hugetlb_pte_lockptr(hpte); 616 pgtable_t new; 617 pmd_t *pmdp; 618 pmd_t pmd; 619 bool is_marker; 620 pte_marker marker; 621 622 if (hpte->level != HUGETLB_LEVEL_PMD) 623 return ERR_PTR(-EINVAL); 624 625 pmdp = (pmd_t *)hpte->ptep; 626 retry: 627 is_marker = false; 628 pmd = READ_ONCE(*pmdp); 629 if (likely(pmd_present(pmd))) 630 return unlikely(pmd_leaf(pmd)) 631 ? ERR_PTR(-EEXIST) 632 : pte_offset_kernel(pmdp, addr); 633 else if (!pmd_none(pmd)) { 634 /* 635 * Not present and not none means that a swap entry lives here. 636 * If it's a PTE marker, we can deal with it. If it's another 637 * swap entry, we don't attempt to split it. 638 */ 639 is_marker = is_pte_marker(__pte(pmd_val(pmd))); 640 if (!is_marker) 641 return ERR_PTR(-EEXIST); 642 643 marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd)))); 644 } 645 646 /* 647 * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result 648 * in page tables being allocated in high memory, needing a kmap to 649 * access. Instead, we call __pte_alloc_one directly with 650 * GFP_PGTABLE_USER to prevent these PTEs being allocated in high 651 * memory. 652 */ 653 new = __pte_alloc_one(mm, GFP_PGTABLE_USER); 654 if (!new) 655 return ERR_PTR(-ENOMEM); 656 657 spin_lock(ptl); 658 if (!pmd_same(pmd, *pmdp)) { 659 spin_unlock(ptl); 660 pgtable_pte_page_dtor(new); 661 __free_page(new); 662 goto retry; 663 } 664 665 if (is_marker) > 666 hugetlb_install_markers_pte(page_address(new), marker); 667 668 mm_inc_nr_ptes(mm); 669 smp_wmb(); /* See comment in pmd_install() */ 670 pmd_populate(mm, pmdp, new); 671 spin_unlock(ptl); 672 return pte_offset_kernel(pmdp, addr); 673 } 674
On 02/18/23 00:27, James Houghton wrote: > Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two > ways: > - UFFDIO_WRITEPROTECT no longer prevents a high-granularity > UFFDIO_CONTINUE. > - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be > properly propagated when high-granularily UFFDIO_CONTINUEs are > performed. > > Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity. > > Signed-off-by: James Houghton <jthoughton@google.com> > > diff --git a/mm/hugetlb.c b/mm/hugetlb.c > index 810c05feb41f..f74183acc521 100644 > --- a/mm/hugetlb.c > +++ b/mm/hugetlb.c Seems relatively straight forward, Acked-by: Mike Kravetz <mike.kravetz@oracle.com>
diff --git a/mm/hugetlb.c b/mm/hugetlb.c index 810c05feb41f..f74183acc521 100644 --- a/mm/hugetlb.c +++ b/mm/hugetlb.c @@ -506,6 +506,30 @@ static bool has_same_uncharge_info(struct file_region *rg, #endif } +static void hugetlb_install_markers_pmd(pmd_t *pmdp, pte_marker marker) +{ + int i; + + for (i = 0; i < PTRS_PER_PMD; ++i) + /* + * WRITE_ONCE not needed because the pud hasn't been + * installed yet. + */ + pmdp[i] = __pmd(pte_val(make_pte_marker(marker))); +} + +static void hugetlb_install_markers_pte(pte_t *ptep, pte_marker marker) +{ + int i; + + for (i = 0; i < PTRS_PER_PTE; ++i) + /* + * WRITE_ONCE not needed because the pmd hasn't been + * installed yet. + */ + ptep[i] = make_pte_marker(marker); +} + /* * hugetlb_alloc_pmd -- Allocate or find a PMD beneath a PUD-level hpte. * @@ -528,23 +552,32 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, pmd_t *new; pud_t *pudp; pud_t pud; + bool is_marker; + pte_marker marker; if (hpte->level != HUGETLB_LEVEL_PUD) return ERR_PTR(-EINVAL); pudp = (pud_t *)hpte->ptep; retry: + is_marker = false; pud = READ_ONCE(*pudp); if (likely(pud_present(pud))) return unlikely(pud_leaf(pud)) ? ERR_PTR(-EEXIST) : pmd_offset(pudp, addr); - else if (!pud_none(pud)) + else if (!pud_none(pud)) { /* - * Not present and not none means that a swap entry lives here, - * and we can't get rid of it. + * Not present and not none means that a swap entry lives here. + * If it's a PTE marker, we can deal with it. If it's another + * swap entry, we don't attempt to split it. */ - return ERR_PTR(-EEXIST); + is_marker = is_pte_marker(__pte(pud_val(pud))); + if (!is_marker) + return ERR_PTR(-EEXIST); + + marker = pte_marker_get(pte_to_swp_entry(__pte(pud_val(pud)))); + } new = pmd_alloc_one(mm, addr); if (!new) @@ -557,6 +590,13 @@ pmd_t *hugetlb_alloc_pmd(struct mm_struct *mm, struct hugetlb_pte *hpte, goto retry; } + /* + * Install markers before PUD to avoid races with other + * page tables walks. + */ + if (is_marker) + hugetlb_install_markers_pmd(new, marker); + mm_inc_nr_pmds(mm); smp_wmb(); /* See comment in pmd_install() */ pud_populate(mm, pudp, new); @@ -576,23 +616,32 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, pgtable_t new; pmd_t *pmdp; pmd_t pmd; + bool is_marker; + pte_marker marker; if (hpte->level != HUGETLB_LEVEL_PMD) return ERR_PTR(-EINVAL); pmdp = (pmd_t *)hpte->ptep; retry: + is_marker = false; pmd = READ_ONCE(*pmdp); if (likely(pmd_present(pmd))) return unlikely(pmd_leaf(pmd)) ? ERR_PTR(-EEXIST) : pte_offset_kernel(pmdp, addr); - else if (!pmd_none(pmd)) + else if (!pmd_none(pmd)) { /* - * Not present and not none means that a swap entry lives here, - * and we can't get rid of it. + * Not present and not none means that a swap entry lives here. + * If it's a PTE marker, we can deal with it. If it's another + * swap entry, we don't attempt to split it. */ - return ERR_PTR(-EEXIST); + is_marker = is_pte_marker(__pte(pmd_val(pmd))); + if (!is_marker) + return ERR_PTR(-EEXIST); + + marker = pte_marker_get(pte_to_swp_entry(__pte(pmd_val(pmd)))); + } /* * With CONFIG_HIGHPTE, calling `pte_alloc_one` directly may result @@ -613,6 +662,9 @@ pte_t *hugetlb_alloc_pte(struct mm_struct *mm, struct hugetlb_pte *hpte, goto retry; } + if (is_marker) + hugetlb_install_markers_pte(page_address(new), marker); + mm_inc_nr_ptes(mm); smp_wmb(); /* See comment in pmd_install() */ pmd_populate(mm, pmdp, new); @@ -7384,7 +7436,12 @@ static int __hugetlb_hgm_walk(struct mm_struct *mm, struct vm_area_struct *vma, if (!pte_present(pte)) { if (!alloc) return 0; - if (unlikely(!huge_pte_none(pte))) + /* + * In hugetlb_alloc_pmd and hugetlb_alloc_pte, + * we split PTE markers, so we can tolerate + * PTE markers here. + */ + if (unlikely(!huge_pte_none_mostly(pte))) return -EEXIST; } else if (hugetlb_pte_present_leaf(hpte, pte)) return 0;
Fix how UFFDIO_CONTINUE and UFFDIO_WRITEPROTECT interact in these two ways: - UFFDIO_WRITEPROTECT no longer prevents a high-granularity UFFDIO_CONTINUE. - UFFD-WP PTE markers installed with UFFDIO_WRITEPROTECT will be properly propagated when high-granularily UFFDIO_CONTINUEs are performed. Note: UFFDIO_WRITEPROTECT is not yet permitted at PAGE_SIZE granularity. Signed-off-by: James Houghton <jthoughton@google.com>