diff mbox series

[rfc,v2,01/10] mm: add a generic VMA lock-based page fault handler

Message ID 20230821123056.2109942-2-wangkefeng.wang@huawei.com (mailing list archive)
State New
Headers show
Series mm: convert to generic VMA lock-based page fault | expand

Commit Message

Kefeng Wang Aug. 21, 2023, 12:30 p.m. UTC
The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
similar which results in some duplicated codes, let's add a generic VMA
lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
them, and which also make us easy to support this on new architectures.

Since different architectures use different way to check vma whether is
accessable or not, the struct pt_regs, page fault error code and vma flags
are added into struct vm_fault, then, the architecture's page fault code
could re-use struct vm_fault to record and check vma accessable by each
own implementation.

Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
---
 include/linux/mm.h       | 17 +++++++++++++++++
 include/linux/mm_types.h |  2 ++
 mm/memory.c              | 39 +++++++++++++++++++++++++++++++++++++++
 3 files changed, 58 insertions(+)

Comments

kernel test robot Aug. 21, 2023, 3:13 p.m. UTC | #1
Hi Kefeng,

kernel test robot noticed the following build errors:

[auto build test ERROR on next-20230821]
[cannot apply to akpm-mm/mm-everything arm64/for-next/core tip/x86/mm s390/features powerpc/next powerpc/fixes arm/for-next arm/fixes linus/master v6.5-rc7 v6.5-rc6 v6.5-rc5 v6.5-rc7]
[If your patch is applied to the wrong git tree, kindly drop us a note.
And when submitting patch, we suggest to use '--base' as documented in
https://git-scm.com/docs/git-format-patch#_base_tree_information]

url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-add-a-generic-VMA-lock-based-page-fault-handler/20230821-203442
base:   next-20230821
patch link:    https://lore.kernel.org/r/20230821123056.2109942-2-wangkefeng.wang%40huawei.com
patch subject: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
config: i386-tinyconfig (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/config)
compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
reproduce: (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/reproduce)

If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <lkp@intel.com>
| Closes: https://lore.kernel.org/oe-kbuild-all/202308212249.dZG3d55u-lkp@intel.com/

All errors (new ones prefixed by >>):

   In file included from include/linux/memcontrol.h:20,
                    from include/linux/swap.h:9,
                    from include/linux/suspend.h:5,
                    from arch/x86/kernel/asm-offsets.c:14:
>> include/linux/mm.h:810:38: error: redefinition of 'lock_vma_under_rcu'
     810 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
         |                                      ^~~~~~~~~~~~~~~~~~
   include/linux/mm.h:794:38: note: previous definition of 'lock_vma_under_rcu' with type 'struct vm_area_struct *(struct mm_struct *, long unsigned int)'
     794 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
         |                                      ^~~~~~~~~~~~~~~~~~
   make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1
   make[3]: Target 'prepare' not remade because of errors.
   make[2]: *** [Makefile:1203: prepare0] Error 2
   make[2]: Target 'prepare' not remade because of errors.
   make[1]: *** [Makefile:234: __sub-make] Error 2
   make[1]: Target 'prepare' not remade because of errors.
   make: *** [Makefile:234: __sub-make] Error 2
   make: Target 'prepare' not remade because of errors.


vim +/lock_vma_under_rcu +810 include/linux/mm.h

   809	
 > 810	static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
   811			unsigned long address)
   812	{
   813		return NULL;
   814	}
   815
Kefeng Wang Aug. 22, 2023, 2:33 a.m. UTC | #2
Hi

On 2023/8/21 23:13, kernel test robot wrote:
> Hi Kefeng,
> 
> kernel test robot noticed the following build errors:
> 
> [auto build test ERROR on next-20230821]
> [cannot apply to akpm-mm/mm-everything arm64/for-next/core tip/x86/mm s390/features powerpc/next powerpc/fixes arm/for-next arm/fixes linus/master v6.5-rc7 v6.5-rc6 v6.5-rc5 v6.5-rc7]
> [If your patch is applied to the wrong git tree, kindly drop us a note.
> And when submitting patch, we suggest to use '--base' as documented in
> https://git-scm.com/docs/git-format-patch#_base_tree_information]
> 
> url:    https://github.com/intel-lab-lkp/linux/commits/Kefeng-Wang/mm-add-a-generic-VMA-lock-based-page-fault-handler/20230821-203442
> base:   next-20230821
> patch link:    https://lore.kernel.org/r/20230821123056.2109942-2-wangkefeng.wang%40huawei.com
> patch subject: [PATCH rfc v2 01/10] mm: add a generic VMA lock-based page fault handler
> config: i386-tinyconfig (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/config)
> compiler: gcc-12 (Debian 12.2.0-14) 12.2.0
> reproduce: (https://download.01.org/0day-ci/archive/20230821/202308212249.dZG3d55u-lkp@intel.com/reproduce)
> 
> If you fix the issue in a separate patch/commit (i.e. not just a new version of
> the same patch/commit), kindly add following tags
> | Reported-by: kernel test robot <lkp@intel.com>
> | Closes: https://lore.kernel.org/oe-kbuild-all/202308212249.dZG3d55u-lkp@intel.com/
> 
> All errors (new ones prefixed by >>):
> 
>     In file included from include/linux/memcontrol.h:20,
>                      from include/linux/swap.h:9,
>                      from include/linux/suspend.h:5,
>                      from arch/x86/kernel/asm-offsets.c:14:
>>> include/linux/mm.h:810:38: error: redefinition of 'lock_vma_under_rcu'
>       810 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>           |                                      ^~~~~~~~~~~~~~~~~~
>     include/linux/mm.h:794:38: note: previous definition of 'lock_vma_under_rcu' with type 'struct vm_area_struct *(struct mm_struct *, long unsigned int)'
>       794 | static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>           |                                      ^~~~~~~~~~~~~~~~~~
>     make[3]: *** [scripts/Makefile.build:116: arch/x86/kernel/asm-offsets.s] Error 1
>     make[3]: Target 'prepare' not remade because of errors.
>     make[2]: *** [Makefile:1203: prepare0] Error 2
>     make[2]: Target 'prepare' not remade because of errors.
>     make[1]: *** [Makefile:234: __sub-make] Error 2
>     make[1]: Target 'prepare' not remade because of errors.
>     make: *** [Makefile:234: __sub-make] Error 2
>     make: Target 'prepare' not remade because of errors.
> 
> 

Yes, the following change should be drop as it is miscopied...

> vim +/lock_vma_under_rcu +810 include/linux/mm.h
> 
>     809	
>   > 810	static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>     811			unsigned long address)
>     812	{
>     813		return NULL;
>     814	}
>     815	
>
Alexander Gordeev Aug. 24, 2023, 7:12 a.m. UTC | #3
On Mon, Aug 21, 2023 at 08:30:47PM +0800, Kefeng Wang wrote:

Hi Kefeng,

> The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
> eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
> similar which results in some duplicated codes, let's add a generic VMA
> lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
> them, and which also make us easy to support this on new architectures.
> 
> Since different architectures use different way to check vma whether is
> accessable or not, the struct pt_regs, page fault error code and vma flags
> are added into struct vm_fault, then, the architecture's page fault code
> could re-use struct vm_fault to record and check vma accessable by each
> own implementation.
> 
> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
> ---
>  include/linux/mm.h       | 17 +++++++++++++++++
>  include/linux/mm_types.h |  2 ++
>  mm/memory.c              | 39 +++++++++++++++++++++++++++++++++++++++
>  3 files changed, 58 insertions(+)
> 
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 3f764e84e567..22a6f4c56ff3 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -512,9 +512,12 @@ struct vm_fault {
>  		pgoff_t pgoff;			/* Logical page offset based on vma */
>  		unsigned long address;		/* Faulting virtual address - masked */
>  		unsigned long real_address;	/* Faulting virtual address - unmasked */
> +		unsigned long fault_code;	/* Faulting error code during page fault */
> +		struct pt_regs *regs;		/* The registers stored during page fault */
>  	};
>  	enum fault_flag flags;		/* FAULT_FLAG_xxx flags
>  					 * XXX: should really be 'const' */
> +	vm_flags_t vm_flags;		/* VMA flags to be used for access checking */
>  	pmd_t *pmd;			/* Pointer to pmd entry matching
>  					 * the 'address' */
>  	pud_t *pud;			/* Pointer to pud entry matching
> @@ -774,6 +777,9 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
>  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>  					  unsigned long address);
>  
> +bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf);
> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf);
> +
>  #else /* CONFIG_PER_VMA_LOCK */
>  
>  static inline bool vma_start_read(struct vm_area_struct *vma)
> @@ -801,6 +807,17 @@ static inline void assert_fault_locked(struct vm_fault *vmf)
>  	mmap_assert_locked(vmf->vma->vm_mm);
>  }
>  
> +static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
> +		unsigned long address)
> +{
> +	return NULL;
> +}
> +
> +static inline vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
> +{
> +	return VM_FAULT_NONE;
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
>  extern const struct vm_operations_struct vma_dummy_vm_ops;
> diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
> index f5ba5b0bc836..702820cea3f9 100644
> --- a/include/linux/mm_types.h
> +++ b/include/linux/mm_types.h
> @@ -1119,6 +1119,7 @@ typedef __bitwise unsigned int vm_fault_t;
>   * fault. Used to decide whether a process gets delivered SIGBUS or
>   * just gets major/minor fault counters bumped up.
>   *
> + * @VM_FAULT_NONE:		Special case, not starting to handle fault
>   * @VM_FAULT_OOM:		Out Of Memory
>   * @VM_FAULT_SIGBUS:		Bad access
>   * @VM_FAULT_MAJOR:		Page read from storage
> @@ -1139,6 +1140,7 @@ typedef __bitwise unsigned int vm_fault_t;
>   *
>   */
>  enum vm_fault_reason {
> +	VM_FAULT_NONE		= (__force vm_fault_t)0x000000,
>  	VM_FAULT_OOM            = (__force vm_fault_t)0x000001,
>  	VM_FAULT_SIGBUS         = (__force vm_fault_t)0x000002,
>  	VM_FAULT_MAJOR          = (__force vm_fault_t)0x000004,
> diff --git a/mm/memory.c b/mm/memory.c
> index 3b4aaa0d2fff..60fe35db5134 100644
> --- a/mm/memory.c
> +++ b/mm/memory.c
> @@ -5510,6 +5510,45 @@ struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
>  	count_vm_vma_lock_event(VMA_LOCK_ABORT);
>  	return NULL;
>  }
> +
> +#ifdef CONFIG_PER_VMA_LOCK
> +bool __weak arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
> +{
> +	return (vma->vm_flags & vmf->vm_flags) == 0;
> +}
> +#endif
> +
> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
> +{
> +	vm_fault_t fault = VM_FAULT_NONE;
> +	struct vm_area_struct *vma;
> +
> +	if (!(vmf->flags & FAULT_FLAG_USER))
> +		return fault;
> +
> +	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
> +	if (!vma)
> +		return fault;
> +
> +	if (arch_vma_access_error(vma, vmf)) {
> +		vma_end_read(vma);
> +		return fault;
> +	}
> +
> +	fault = handle_mm_fault(vma, vmf->real_address,
> +				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
> +
> +	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
> +		vma_end_read(vma);

Could you please explain how vma_end_read() call could be conditional?

> +
> +	if (fault & VM_FAULT_RETRY)
> +		count_vm_vma_lock_event(VMA_LOCK_RETRY);
> +	else
> +		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
> +
> +	return fault;
> +}
> +
>  #endif /* CONFIG_PER_VMA_LOCK */
>  
>  #ifndef __PAGETABLE_P4D_FOLDED
Kefeng Wang Aug. 26, 2023, 12:56 a.m. UTC | #4
On 2023/8/24 15:12, Alexander Gordeev wrote:
> On Mon, Aug 21, 2023 at 08:30:47PM +0800, Kefeng Wang wrote:
> 
> Hi Kefeng,
> 
>> The ARCH_SUPPORTS_PER_VMA_LOCK are enabled by more and more architectures,
>> eg, x86, arm64, powerpc and s390, and riscv, those implementation are very
>> similar which results in some duplicated codes, let's add a generic VMA
>> lock-based page fault handler try_to_vma_locked_page_fault() to eliminate
>> them, and which also make us easy to support this on new architectures.
>>
>> Since different architectures use different way to check vma whether is
>> accessable or not, the struct pt_regs, page fault error code and vma flags
>> are added into struct vm_fault, then, the architecture's page fault code
>> could re-use struct vm_fault to record and check vma accessable by each
>> own implementation.
>>
>> Signed-off-by: Kefeng Wang <wangkefeng.wang@huawei.com>
>> ---
...
>> +
>> +vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
>> +{
>> +	vm_fault_t fault = VM_FAULT_NONE;
>> +	struct vm_area_struct *vma;
>> +
>> +	if (!(vmf->flags & FAULT_FLAG_USER))
>> +		return fault;
>> +
>> +	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
>> +	if (!vma)
>> +		return fault;
>> +
>> +	if (arch_vma_access_error(vma, vmf)) {
>> +		vma_end_read(vma);
>> +		return fault;
>> +	}
>> +
>> +	fault = handle_mm_fault(vma, vmf->real_address,
>> +				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
>> +
>> +	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
>> +		vma_end_read(vma);
> 
> Could you please explain how vma_end_read() call could be conditional?

The check is added for swap and userfault, see

https://lkml.kernel.org/r/20230630211957.1341547-4-surenb@google.com
> 
>> +
>> +	if (fault & VM_FAULT_RETRY)
>> +		count_vm_vma_lock_event(VMA_LOCK_RETRY);
>> +	else
>> +		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
>> +
>> +	return fault;
>> +}
>> +
>>   #endif /* CONFIG_PER_VMA_LOCK */
>>   
>>   #ifndef __PAGETABLE_P4D_FOLDED
diff mbox series

Patch

diff --git a/include/linux/mm.h b/include/linux/mm.h
index 3f764e84e567..22a6f4c56ff3 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -512,9 +512,12 @@  struct vm_fault {
 		pgoff_t pgoff;			/* Logical page offset based on vma */
 		unsigned long address;		/* Faulting virtual address - masked */
 		unsigned long real_address;	/* Faulting virtual address - unmasked */
+		unsigned long fault_code;	/* Faulting error code during page fault */
+		struct pt_regs *regs;		/* The registers stored during page fault */
 	};
 	enum fault_flag flags;		/* FAULT_FLAG_xxx flags
 					 * XXX: should really be 'const' */
+	vm_flags_t vm_flags;		/* VMA flags to be used for access checking */
 	pmd_t *pmd;			/* Pointer to pmd entry matching
 					 * the 'address' */
 	pud_t *pud;			/* Pointer to pud entry matching
@@ -774,6 +777,9 @@  static inline void assert_fault_locked(struct vm_fault *vmf)
 struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 					  unsigned long address);
 
+bool arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf);
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf);
+
 #else /* CONFIG_PER_VMA_LOCK */
 
 static inline bool vma_start_read(struct vm_area_struct *vma)
@@ -801,6 +807,17 @@  static inline void assert_fault_locked(struct vm_fault *vmf)
 	mmap_assert_locked(vmf->vma->vm_mm);
 }
 
+static inline struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
+		unsigned long address)
+{
+	return NULL;
+}
+
+static inline vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+	return VM_FAULT_NONE;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 extern const struct vm_operations_struct vma_dummy_vm_ops;
diff --git a/include/linux/mm_types.h b/include/linux/mm_types.h
index f5ba5b0bc836..702820cea3f9 100644
--- a/include/linux/mm_types.h
+++ b/include/linux/mm_types.h
@@ -1119,6 +1119,7 @@  typedef __bitwise unsigned int vm_fault_t;
  * fault. Used to decide whether a process gets delivered SIGBUS or
  * just gets major/minor fault counters bumped up.
  *
+ * @VM_FAULT_NONE:		Special case, not starting to handle fault
  * @VM_FAULT_OOM:		Out Of Memory
  * @VM_FAULT_SIGBUS:		Bad access
  * @VM_FAULT_MAJOR:		Page read from storage
@@ -1139,6 +1140,7 @@  typedef __bitwise unsigned int vm_fault_t;
  *
  */
 enum vm_fault_reason {
+	VM_FAULT_NONE		= (__force vm_fault_t)0x000000,
 	VM_FAULT_OOM            = (__force vm_fault_t)0x000001,
 	VM_FAULT_SIGBUS         = (__force vm_fault_t)0x000002,
 	VM_FAULT_MAJOR          = (__force vm_fault_t)0x000004,
diff --git a/mm/memory.c b/mm/memory.c
index 3b4aaa0d2fff..60fe35db5134 100644
--- a/mm/memory.c
+++ b/mm/memory.c
@@ -5510,6 +5510,45 @@  struct vm_area_struct *lock_vma_under_rcu(struct mm_struct *mm,
 	count_vm_vma_lock_event(VMA_LOCK_ABORT);
 	return NULL;
 }
+
+#ifdef CONFIG_PER_VMA_LOCK
+bool __weak arch_vma_access_error(struct vm_area_struct *vma, struct vm_fault *vmf)
+{
+	return (vma->vm_flags & vmf->vm_flags) == 0;
+}
+#endif
+
+vm_fault_t try_vma_locked_page_fault(struct vm_fault *vmf)
+{
+	vm_fault_t fault = VM_FAULT_NONE;
+	struct vm_area_struct *vma;
+
+	if (!(vmf->flags & FAULT_FLAG_USER))
+		return fault;
+
+	vma = lock_vma_under_rcu(current->mm, vmf->real_address);
+	if (!vma)
+		return fault;
+
+	if (arch_vma_access_error(vma, vmf)) {
+		vma_end_read(vma);
+		return fault;
+	}
+
+	fault = handle_mm_fault(vma, vmf->real_address,
+				vmf->flags | FAULT_FLAG_VMA_LOCK, vmf->regs);
+
+	if (!(fault & (VM_FAULT_RETRY | VM_FAULT_COMPLETED)))
+		vma_end_read(vma);
+
+	if (fault & VM_FAULT_RETRY)
+		count_vm_vma_lock_event(VMA_LOCK_RETRY);
+	else
+		count_vm_vma_lock_event(VMA_LOCK_SUCCESS);
+
+	return fault;
+}
+
 #endif /* CONFIG_PER_VMA_LOCK */
 
 #ifndef __PAGETABLE_P4D_FOLDED