diff mbox series

[v1,5/7] mm/pagewalk: add walk_page_range_vma()

Message ID 20220930141931.174362-6-david@redhat.com (mailing list archive)
State New
Headers show
Series mm/ksm: break_ksm() cleanups and fixes | expand

Commit Message

David Hildenbrand Sept. 30, 2022, 2:19 p.m. UTC
Let's add walk_page_range_vma(), which is similar to walk_page_vma(),
however, is only interested in a subset of the VMA range.

To be used in KSM code to stop using follow_page() next.

Signed-off-by: David Hildenbrand <david@redhat.com>
---
 include/linux/pagewalk.h |  3 +++
 mm/pagewalk.c            | 27 +++++++++++++++++++++++++++
 2 files changed, 30 insertions(+)

Comments

Peter Xu Oct. 5, 2022, 8:42 p.m. UTC | #1
On Fri, Sep 30, 2022 at 04:19:29PM +0200, David Hildenbrand wrote:
> +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start,
> +			unsigned long end, const struct mm_walk_ops *ops,
> +			void *private)
> +{
> +	struct mm_walk walk = {
> +		.ops		= ops,
> +		.mm		= vma->vm_mm,
> +		.vma		= vma,
> +		.private	= private,
> +	};
> +	int err;
> +
> +	if (start >= end || !walk.mm)
> +		return -EINVAL;
> +	if (start < vma->vm_start || end > vma->vm_end)
> +		return -EINVAL;
> +
> +	mmap_assert_locked(walk.mm);
> +
> +	err = walk_page_test(start, end, &walk);

According to test_walk():

 * @test_walk:		caller specific callback function to determine whether
 *			we walk over the current vma or not. Returning 0 means
 *			"do page table walk over the current vma", returning
 *			a negative value means "abort current page table walk
 *			right now" and returning 1 means "skip the current vma"

Since this helper has vma passed in, not sure whether this is needed at
all?

walk_page_vma_range() sounds slightly better to me as it does look more
like an extension of walk_page_vma(), rather than sister version of
walk_page_range_novma() (which works for "no backing VMA" case).  But no
strong opinion.
David Hildenbrand Oct. 6, 2022, 9:35 a.m. UTC | #2
On 05.10.22 22:42, Peter Xu wrote:
> On Fri, Sep 30, 2022 at 04:19:29PM +0200, David Hildenbrand wrote:
>> +int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start,
>> +			unsigned long end, const struct mm_walk_ops *ops,
>> +			void *private)
>> +{
>> +	struct mm_walk walk = {
>> +		.ops		= ops,
>> +		.mm		= vma->vm_mm,
>> +		.vma		= vma,
>> +		.private	= private,
>> +	};
>> +	int err;
>> +
>> +	if (start >= end || !walk.mm)
>> +		return -EINVAL;
>> +	if (start < vma->vm_start || end > vma->vm_end)
>> +		return -EINVAL;
>> +
>> +	mmap_assert_locked(walk.mm);
>> +
>> +	err = walk_page_test(start, end, &walk);
> 
> According to test_walk():
> 
>   * @test_walk:		caller specific callback function to determine whether
>   *			we walk over the current vma or not. Returning 0 means
>   *			"do page table walk over the current vma", returning
>   *			a negative value means "abort current page table walk
>   *			right now" and returning 1 means "skip the current vma"
> 
> Since this helper has vma passed in, not sure whether this is needed at
> all?

I kept it because walk_page_vma() similarly has it -- walk_page_vma() 
walks the whole VMA range.

I do agree that it's kind of weird to have it like that. All users of 
walk_page_vma() don't use it, so we can just get rid of it there as 
well. Might make the walk slightly faster.

> 
> walk_page_vma_range() sounds slightly better to me as it does look more
> like an extension of walk_page_vma(), rather than sister version of
> walk_page_range_novma() (which works for "no backing VMA" case).  But no
> strong opinion.
> 

I matched that to walk_page_range_novma(). Now we have

walk_page_range
walk_page_vma
walk_page_range_novma
walk_page_range_vma
diff mbox series

Patch

diff --git a/include/linux/pagewalk.h b/include/linux/pagewalk.h
index f3fafb731ffd..2f8f6cc980b4 100644
--- a/include/linux/pagewalk.h
+++ b/include/linux/pagewalk.h
@@ -99,6 +99,9 @@  int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
 			  unsigned long end, const struct mm_walk_ops *ops,
 			  pgd_t *pgd,
 			  void *private);
+int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, const struct mm_walk_ops *ops,
+			void *private);
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private);
 int walk_page_mapping(struct address_space *mapping, pgoff_t first_index,
diff --git a/mm/pagewalk.c b/mm/pagewalk.c
index 131b2b335b2c..757c075da231 100644
--- a/mm/pagewalk.c
+++ b/mm/pagewalk.c
@@ -514,6 +514,33 @@  int walk_page_range_novma(struct mm_struct *mm, unsigned long start,
 	return __walk_page_range(start, end, &walk);
 }
 
+int walk_page_range_vma(struct vm_area_struct *vma, unsigned long start,
+			unsigned long end, const struct mm_walk_ops *ops,
+			void *private)
+{
+	struct mm_walk walk = {
+		.ops		= ops,
+		.mm		= vma->vm_mm,
+		.vma		= vma,
+		.private	= private,
+	};
+	int err;
+
+	if (start >= end || !walk.mm)
+		return -EINVAL;
+	if (start < vma->vm_start || end > vma->vm_end)
+		return -EINVAL;
+
+	mmap_assert_locked(walk.mm);
+
+	err = walk_page_test(start, end, &walk);
+	if (err > 0)
+		return 0;
+	if (err < 0)
+		return err;
+	return __walk_page_range(start, end, &walk);
+}
+
 int walk_page_vma(struct vm_area_struct *vma, const struct mm_walk_ops *ops,
 		void *private)
 {