diff mbox series

[v5,17/26] hugetlb/userfaultfd: Hook page faults for uffd write protection

Message ID 20210715201622.211762-1-peterx@redhat.com (mailing list archive)
State New
Headers show
Series userfaultfd-wp: Support shmem and hugetlbfs | expand

Commit Message

Peter Xu July 15, 2021, 8:16 p.m. UTC
Hook up hugetlbfs_fault() with the capability to handle userfaultfd-wp faults.

We do this slightly earlier than hugetlb_cow() so that we can avoid taking some
extra locks that we definitely don't need.

Reviewed-by: Mike Kravetz <mike.kravetz@oracle.com>
Signed-off-by: Peter Xu <peterx@redhat.com>
---
 mm/hugetlb.c | 19 +++++++++++++++++++
 1 file changed, 19 insertions(+)

Comments

kernel test robot July 20, 2021, 3:37 p.m. UTC | #1
Hi Peter,

Thank you for the patch! Yet something to improve:

[auto build test ERROR on kselftest/next]
[also build test ERROR on linus/master v5.14-rc2 next-20210720]
[cannot apply to hnaz-linux-mm/master asm-generic/master arm64/for-next/core linux/master tip/x86/core]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch]

url:    https://github.com/0day-ci/linux/commits/Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210716-041947
base:   https://git.kernel.org/pub/scm/linux/kernel/git/shuah/linux-kselftest.git next
config: s390-randconfig-r023-20210716 (attached as .config)
compiler: clang version 13.0.0 (https://github.com/llvm/llvm-project 5d5b08761f944d5b9822d582378333cc4b36a0a7)
reproduce (this is a W=1 build):
        wget https://raw.githubusercontent.com/intel/lkp-tests/master/sbin/make.cross -O ~/bin/make.cross
        chmod +x ~/bin/make.cross
        # install s390 cross compiling tool for clang build
        # apt-get install binutils-s390x-linux-gnu
        # https://github.com/0day-ci/linux/commit/071935856c8e636cafde633db59259d75069cc8f
        git remote add linux-review https://github.com/0day-ci/linux
        git fetch --no-tags linux-review Peter-Xu/userfaultfd-wp-Support-shmem-and-hugetlbfs/20210716-041947
        git checkout 071935856c8e636cafde633db59259d75069cc8f
        # save the attached .config to linux build tree
        COMPILER_INSTALL_PATH=$HOME/0day COMPILER=clang make.cross ARCH=s390 

If you fix the issue, kindly add following tag as appropriate
Reported-by: kernel test robot <lkp@intel.com>

All errors (new ones prefixed by >>):

   In file included from mm/hugetlb.c:19:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:464:31: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __raw_readb(PCI_IOBASE + addr);
                             ~~~~~~~~~~ ^
   include/asm-generic/io.h:477:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le16_to_cpu((__le16 __force)__raw_readw(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:36:59: note: expanded from macro '__le16_to_cpu'
   #define __le16_to_cpu(x) __swab16((__force __u16)(__le16)(x))
                                                             ^
   include/uapi/linux/swab.h:102:54: note: expanded from macro '__swab16'
   #define __swab16(x) (__u16)__builtin_bswap16((__u16)(x))
                                                        ^
   In file included from mm/hugetlb.c:19:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:490:61: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           val = __le32_to_cpu((__le32 __force)__raw_readl(PCI_IOBASE + addr));
                                                           ~~~~~~~~~~ ^
   include/uapi/linux/byteorder/big_endian.h:34:59: note: expanded from macro '__le32_to_cpu'
   #define __le32_to_cpu(x) __swab32((__force __u32)(__le32)(x))
                                                             ^
   include/uapi/linux/swab.h:115:54: note: expanded from macro '__swab32'
   #define __swab32(x) (__u32)__builtin_bswap32((__u32)(x))
                                                        ^
   In file included from mm/hugetlb.c:19:
   In file included from include/linux/memblock.h:14:
   In file included from arch/s390/include/asm/dma.h:5:
   In file included from arch/s390/include/asm/io.h:75:
   include/asm-generic/io.h:501:33: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writeb(value, PCI_IOBASE + addr);
                               ~~~~~~~~~~ ^
   include/asm-generic/io.h:511:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writew((u16 __force)cpu_to_le16(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:521:59: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           __raw_writel((u32 __force)cpu_to_le32(value), PCI_IOBASE + addr);
                                                         ~~~~~~~~~~ ^
   include/asm-generic/io.h:609:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsb(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:617:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsw(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:625:20: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           readsl(PCI_IOBASE + addr, buffer, count);
                  ~~~~~~~~~~ ^
   include/asm-generic/io.h:634:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesb(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:643:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesw(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
   include/asm-generic/io.h:652:21: warning: performing pointer arithmetic on a null pointer has undefined behavior [-Wnull-pointer-arithmetic]
           writesl(PCI_IOBASE + addr, buffer, count);
                   ~~~~~~~~~~ ^
>> mm/hugetlb.c:5063:29: error: implicit declaration of function 'huge_pte_uffd_wp' [-Werror,-Wimplicit-function-declaration]
           if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
                                      ^
   12 warnings and 1 error generated.


vim +/huge_pte_uffd_wp +5063 mm/hugetlb.c

  4957	
  4958	vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
  4959				unsigned long address, unsigned int flags)
  4960	{
  4961		pte_t *ptep, entry;
  4962		spinlock_t *ptl;
  4963		vm_fault_t ret;
  4964		u32 hash;
  4965		pgoff_t idx;
  4966		struct page *page = NULL;
  4967		struct page *pagecache_page = NULL;
  4968		struct hstate *h = hstate_vma(vma);
  4969		struct address_space *mapping;
  4970		int need_wait_lock = 0;
  4971		unsigned long haddr = address & huge_page_mask(h);
  4972	
  4973		ptep = huge_pte_offset(mm, haddr, huge_page_size(h));
  4974		if (ptep) {
  4975			/*
  4976			 * Since we hold no locks, ptep could be stale.  That is
  4977			 * OK as we are only making decisions based on content and
  4978			 * not actually modifying content here.
  4979			 */
  4980			entry = huge_ptep_get(ptep);
  4981			if (unlikely(is_hugetlb_entry_migration(entry))) {
  4982				migration_entry_wait_huge(vma, mm, ptep);
  4983				return 0;
  4984			} else if (unlikely(is_hugetlb_entry_hwpoisoned(entry)))
  4985				return VM_FAULT_HWPOISON_LARGE |
  4986					VM_FAULT_SET_HINDEX(hstate_index(h));
  4987		}
  4988	
  4989		/*
  4990		 * Acquire i_mmap_rwsem before calling huge_pte_alloc and hold
  4991		 * until finished with ptep.  This serves two purposes:
  4992		 * 1) It prevents huge_pmd_unshare from being called elsewhere
  4993		 *    and making the ptep no longer valid.
  4994		 * 2) It synchronizes us with i_size modifications during truncation.
  4995		 *
  4996		 * ptep could have already be assigned via huge_pte_offset.  That
  4997		 * is OK, as huge_pte_alloc will return the same value unless
  4998		 * something has changed.
  4999		 */
  5000		mapping = vma->vm_file->f_mapping;
  5001		i_mmap_lock_read(mapping);
  5002		ptep = huge_pte_alloc(mm, vma, haddr, huge_page_size(h));
  5003		if (!ptep) {
  5004			i_mmap_unlock_read(mapping);
  5005			return VM_FAULT_OOM;
  5006		}
  5007	
  5008		/*
  5009		 * Serialize hugepage allocation and instantiation, so that we don't
  5010		 * get spurious allocation failures if two CPUs race to instantiate
  5011		 * the same page in the page cache.
  5012		 */
  5013		idx = vma_hugecache_offset(h, vma, haddr);
  5014		hash = hugetlb_fault_mutex_hash(mapping, idx);
  5015		mutex_lock(&hugetlb_fault_mutex_table[hash]);
  5016	
  5017		entry = huge_ptep_get(ptep);
  5018		if (huge_pte_none(entry)) {
  5019			ret = hugetlb_no_page(mm, vma, mapping, idx, address, ptep, flags);
  5020			goto out_mutex;
  5021		}
  5022	
  5023		ret = 0;
  5024	
  5025		/*
  5026		 * entry could be a migration/hwpoison entry at this point, so this
  5027		 * check prevents the kernel from going below assuming that we have
  5028		 * an active hugepage in pagecache. This goto expects the 2nd page
  5029		 * fault, and is_hugetlb_entry_(migration|hwpoisoned) check will
  5030		 * properly handle it.
  5031		 */
  5032		if (!pte_present(entry))
  5033			goto out_mutex;
  5034	
  5035		/*
  5036		 * If we are going to COW the mapping later, we examine the pending
  5037		 * reservations for this page now. This will ensure that any
  5038		 * allocations necessary to record that reservation occur outside the
  5039		 * spinlock. For private mappings, we also lookup the pagecache
  5040		 * page now as it is used to determine if a reservation has been
  5041		 * consumed.
  5042		 */
  5043		if ((flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
  5044			if (vma_needs_reservation(h, vma, haddr) < 0) {
  5045				ret = VM_FAULT_OOM;
  5046				goto out_mutex;
  5047			}
  5048			/* Just decrements count, does not deallocate */
  5049			vma_end_reservation(h, vma, haddr);
  5050	
  5051			if (!(vma->vm_flags & VM_MAYSHARE))
  5052				pagecache_page = hugetlbfs_pagecache_page(h,
  5053									vma, haddr);
  5054		}
  5055	
  5056		ptl = huge_pte_lock(h, mm, ptep);
  5057	
  5058		/* Check for a racing update before calling hugetlb_cow */
  5059		if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
  5060			goto out_ptl;
  5061	
  5062		/* Handle userfault-wp first, before trying to lock more pages */
> 5063		if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
  5064		    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
  5065			struct vm_fault vmf = {
  5066				.vma = vma,
  5067				.address = haddr,
  5068				.flags = flags,
  5069			};
  5070	
  5071			spin_unlock(ptl);
  5072			if (pagecache_page) {
  5073				unlock_page(pagecache_page);
  5074				put_page(pagecache_page);
  5075			}
  5076			mutex_unlock(&hugetlb_fault_mutex_table[hash]);
  5077			i_mmap_unlock_read(mapping);
  5078			return handle_userfault(&vmf, VM_UFFD_WP);
  5079		}
  5080	
  5081		/*
  5082		 * hugetlb_cow() requires page locks of pte_page(entry) and
  5083		 * pagecache_page, so here we need take the former one
  5084		 * when page != pagecache_page or !pagecache_page.
  5085		 */
  5086		page = pte_page(entry);
  5087		if (page != pagecache_page)
  5088			if (!trylock_page(page)) {
  5089				need_wait_lock = 1;
  5090				goto out_ptl;
  5091			}
  5092	
  5093		get_page(page);
  5094	
  5095		if (flags & FAULT_FLAG_WRITE) {
  5096			if (!huge_pte_write(entry)) {
  5097				ret = hugetlb_cow(mm, vma, address, ptep,
  5098						  pagecache_page, ptl);
  5099				goto out_put_page;
  5100			}
  5101			entry = huge_pte_mkdirty(entry);
  5102		}
  5103		entry = pte_mkyoung(entry);
  5104		if (huge_ptep_set_access_flags(vma, haddr, ptep, entry,
  5105							flags & FAULT_FLAG_WRITE))
  5106			update_mmu_cache(vma, haddr, ptep);
  5107	out_put_page:
  5108		if (page != pagecache_page)
  5109			unlock_page(page);
  5110		put_page(page);
  5111	out_ptl:
  5112		spin_unlock(ptl);
  5113	
  5114		if (pagecache_page) {
  5115			unlock_page(pagecache_page);
  5116			put_page(pagecache_page);
  5117		}
  5118	out_mutex:
  5119		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
  5120		i_mmap_unlock_read(mapping);
  5121		/*
  5122		 * Generally it's safe to hold refcount during waiting page lock. But
  5123		 * here we just wait to defer the next page fault to avoid busy loop and
  5124		 * the page is not used after unlocked before returning from the current
  5125		 * page fault. So we are safe from accessing freed page, even if we wait
  5126		 * here without taking refcount.
  5127		 */
  5128		if (need_wait_lock)
  5129			wait_on_page_locked(page);
  5130		return ret;
  5131	}
  5132	

---
0-DAY CI Kernel Test Service, Intel Corporation
https://lists.01.org/hyperkitty/list/kbuild-all@lists.01.org
Peter Xu July 21, 2021, 9:50 p.m. UTC | #2
On Tue, Jul 20, 2021 at 11:37:36PM +0800, kernel test robot wrote:
> config: s390-randconfig-r023-20210716 (attached as .config)

[...]

> >> mm/hugetlb.c:5063:29: error: implicit declaration of function 'huge_pte_uffd_wp' [-Werror,-Wimplicit-function-declaration]
>            if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
>                                       ^
>    12 warnings and 1 error generated.

I remember I raised this question once on why s390 redefines a lot of huge pte
operations on its own even if they're defined the same in generic hugetlb.h..
I think there was a plan to rework that but definitely not landed yet.

Will sqaush below into the patch "mm/hugetlb: Introduce huge pte version of
uffd-wp helpers":

---8<---
diff --git a/arch/s390/include/asm/hugetlb.h b/arch/s390/include/asm/hugetlb.h
index 60f9241e5e4a..19c4b4431d27 100644
--- a/arch/s390/include/asm/hugetlb.h
+++ b/arch/s390/include/asm/hugetlb.h
@@ -115,6 +115,21 @@ static inline pte_t huge_pte_modify(pte_t pte, pgprot_t newprot)
        return pte_modify(pte, newprot);
 }
 
+static inline pte_t huge_pte_mkuffd_wp(pte_t pte)
+{
+       return pte;
+}
+
+static inline pte_t huge_pte_clear_uffd_wp(pte_t pte)
+{
+       return pte;
+}
+
+static inline int huge_pte_uffd_wp(pte_t pte)
+{
+       return 0;
+}
+
 static inline bool gigantic_page_runtime_supported(void)
 {
        return true;
---8<---

Thanks,
diff mbox series

Patch

diff --git a/mm/hugetlb.c b/mm/hugetlb.c
index 4bdd637b0c29..d34636085eaf 100644
--- a/mm/hugetlb.c
+++ b/mm/hugetlb.c
@@ -5059,6 +5059,25 @@  vm_fault_t hugetlb_fault(struct mm_struct *mm, struct vm_area_struct *vma,
 	if (unlikely(!pte_same(entry, huge_ptep_get(ptep))))
 		goto out_ptl;
 
+	/* Handle userfault-wp first, before trying to lock more pages */
+	if (userfaultfd_wp(vma) && huge_pte_uffd_wp(huge_ptep_get(ptep)) &&
+	    (flags & FAULT_FLAG_WRITE) && !huge_pte_write(entry)) {
+		struct vm_fault vmf = {
+			.vma = vma,
+			.address = haddr,
+			.flags = flags,
+		};
+
+		spin_unlock(ptl);
+		if (pagecache_page) {
+			unlock_page(pagecache_page);
+			put_page(pagecache_page);
+		}
+		mutex_unlock(&hugetlb_fault_mutex_table[hash]);
+		i_mmap_unlock_read(mapping);
+		return handle_userfault(&vmf, VM_UFFD_WP);
+	}
+
 	/*
 	 * hugetlb_cow() requires page locks of pte_page(entry) and
 	 * pagecache_page, so here we need take the former one