diff mbox series

[RFC,2/5] filemap: do file page mapping with folio granularity

Message ID 20230130125504.2509710-3-fengwei.yin@intel.com (mailing list archive)
State New
Headers show
Series folio based filemap_map_pages() | expand

Commit Message

Yin Fengwei Jan. 30, 2023, 12:55 p.m. UTC
Add function to do file page mapping based on folio and update
filemap_map_pages() to use new function. So the filemap page
mapping will deal with folio granularity instead of page
granularity. This allow batched folio refcount update.

Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
---
 mm/filemap.c | 82 ++++++++++++++++++++++++++++++----------------------
 1 file changed, 48 insertions(+), 34 deletions(-)

Comments

Matthew Wilcox Jan. 30, 2023, 1:35 p.m. UTC | #1
On Mon, Jan 30, 2023 at 08:55:01PM +0800, Yin Fengwei wrote:
> Add function to do file page mapping based on folio and update
> filemap_map_pages() to use new function. So the filemap page
> mapping will deal with folio granularity instead of page
> granularity. This allow batched folio refcount update.
> 
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> ---
>  mm/filemap.c | 82 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 48 insertions(+), 34 deletions(-)
> 
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c915ded191f0..fe0c226c8b1e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3351,6 +3351,43 @@ static inline struct folio *next_map_page(struct address_space *mapping,
>  				  mapping, xas, end_pgoff);
>  }
>  
> +

I'd remove this blank line, we typically only have one blank line
between functions.

> +static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
> +	struct folio *folio, struct page *page, unsigned long addr,
> +	int len)

I see this under-indentation in other parts of the mm and it drives me
crazy.  Two tabs to indent the arguments please, otherwise they look
like part of the function.

Also, 'len' is ambiguous.  I'd call this 'nr' or 'nr_pages'.  Also
it should be an unsigned int.

> +{
> +	vm_fault_t ret = 0;
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct file *file = vma->vm_file;
> +	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
> +	int ref_count = 0, count = 0;

Also make these unsigned.

> -		/*
> -		 * NOTE: If there're PTE markers, we'll leave them to be
> -		 * handled in the specific fault path, and it'll prohibit the
> -		 * fault-around logic.
> -		 */

I'd rather not lose this comment; can you move it into
filemap_map_folio_range() please?

> -		if (!pte_none(*vmf->pte))
> -			goto unlock;
> -
> -		/* We're about to handle the fault */
> -		if (vmf->address == addr)
> +		if (VM_FAULT_NOPAGE ==
> +			filemap_map_folio_range(vmf, folio, page, addr, len))
>  			ret = VM_FAULT_NOPAGE;

That indentation is also confusing.  Try this:

		if (filemap_map_folio_range(vmf, folio, page, addr, len) ==
				VM_FAULT_NOPAGE)
			ret = VM_FAULT_NOPAGE;

Except there's an easier way to write it:

		ret |= filemap_map_folio_range(vmf, folio, page, addr, len);


Thanks for doing this!  Looks so much better and performs better!
Yin Fengwei Jan. 31, 2023, 1:03 a.m. UTC | #2
On 1/30/2023 9:35 PM, Matthew Wilcox wrote:
> On Mon, Jan 30, 2023 at 08:55:01PM +0800, Yin Fengwei wrote:
>> Add function to do file page mapping based on folio and update
>> filemap_map_pages() to use new function. So the filemap page
>> mapping will deal with folio granularity instead of page
>> granularity. This allow batched folio refcount update.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> ---
>>  mm/filemap.c | 82 ++++++++++++++++++++++++++++++----------------------
>>  1 file changed, 48 insertions(+), 34 deletions(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index c915ded191f0..fe0c226c8b1e 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3351,6 +3351,43 @@ static inline struct folio *next_map_page(struct address_space *mapping,
>>  				  mapping, xas, end_pgoff);
>>  }
>>  
>> +
> 
> I'd remove this blank line, we typically only have one blank line
> between functions.
OK.

> 
>> +static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>> +	struct folio *folio, struct page *page, unsigned long addr,
>> +	int len)
> 
> I see this under-indentation in other parts of the mm and it drives me
> crazy.  Two tabs to indent the arguments please, otherwise they look
> like part of the function.
OK. I will correct all the indent problems in this series in next version.
> 
> Also, 'len' is ambiguous.  I'd call this 'nr' or 'nr_pages'.  Also
> it should be an unsigned int.
> 
>> +{
>> +	vm_fault_t ret = 0;
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	struct file *file = vma->vm_file;
>> +	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
>> +	int ref_count = 0, count = 0;
> 
> Also make these unsigned.
> 
>> -		/*
>> -		 * NOTE: If there're PTE markers, we'll leave them to be
>> -		 * handled in the specific fault path, and it'll prohibit the
>> -		 * fault-around logic.
>> -		 */
> 
> I'd rather not lose this comment; can you move it into
> filemap_map_folio_range() please?
I will keep all the comments in the right place in next version.

Regards
Yin, Fengwei

> 
>> -		if (!pte_none(*vmf->pte))
>> -			goto unlock;
>> -
>> -		/* We're about to handle the fault */
>> -		if (vmf->address == addr)
>> +		if (VM_FAULT_NOPAGE ==
>> +			filemap_map_folio_range(vmf, folio, page, addr, len))
>>  			ret = VM_FAULT_NOPAGE;
> 
> That indentation is also confusing.  Try this:
> 
> 		if (filemap_map_folio_range(vmf, folio, page, addr, len) ==
> 				VM_FAULT_NOPAGE)
> 			ret = VM_FAULT_NOPAGE;
> 
> Except there's an easier way to write it:
> 
> 		ret |= filemap_map_folio_range(vmf, folio, page, addr, len);
> 
> 
> Thanks for doing this!  Looks so much better and performs better!
Huang, Ying Jan. 31, 2023, 3:34 a.m. UTC | #3
Yin Fengwei <fengwei.yin@intel.com> writes:

> Add function to do file page mapping based on folio and update
> filemap_map_pages() to use new function. So the filemap page
> mapping will deal with folio granularity instead of page
> granularity. This allow batched folio refcount update.
>
> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
> ---
>  mm/filemap.c | 82 ++++++++++++++++++++++++++++++----------------------
>  1 file changed, 48 insertions(+), 34 deletions(-)
>
> diff --git a/mm/filemap.c b/mm/filemap.c
> index c915ded191f0..fe0c226c8b1e 100644
> --- a/mm/filemap.c
> +++ b/mm/filemap.c
> @@ -3351,6 +3351,43 @@ static inline struct folio *next_map_page(struct address_space *mapping,
>  				  mapping, xas, end_pgoff);
>  }
>  
> +
> +static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
> +	struct folio *folio, struct page *page, unsigned long addr,
> +	int len)

As Matthew pointed out, we should rename 'len'. And some comments about
the meaning of the parameters should be good.  For example,

/* Map sub-pages [start_page, start_page + nr_pages) of folio */
static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
	struct folio *folio, struct page *start_page, unsigned int nr_pages,
        unsigned long start)

Best Regards,
Huang, Ying

> +{
> +	vm_fault_t ret = 0;
> +	struct vm_area_struct *vma = vmf->vma;
> +	struct file *file = vma->vm_file;
> +	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
> +	int ref_count = 0, count = 0;
> +
> +	do {
> +		if (PageHWPoison(page))
> +			continue;
> +
> +		if (mmap_miss > 0)
> +			mmap_miss--;
> +
> +		if (!pte_none(*vmf->pte))
> +			continue;
> +
> +		if (vmf->address == addr)
> +			ret = VM_FAULT_NOPAGE;
> +
> +		ref_count++;
> +
> +		do_set_pte(vmf, page, addr);
> +		update_mmu_cache(vma, addr, vmf->pte);
> +
> +	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < len);
> +
> +	folio_ref_add(folio, ref_count);
> +	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
> +
> +	return ret;
> +}
> +
>  vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  			     pgoff_t start_pgoff, pgoff_t end_pgoff)
>  {
> @@ -3361,9 +3398,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  	unsigned long addr;
>  	XA_STATE(xas, &mapping->i_pages, start_pgoff);
>  	struct folio *folio;
> -	struct page *page;
>  	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
>  	vm_fault_t ret = 0;
> +	int len = 0;
>  
>  	rcu_read_lock();
>  	folio = first_map_page(mapping, &xas, end_pgoff);
> @@ -3378,45 +3415,22 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>  	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>  	do {
> -again:
> -		page = folio_file_page(folio, xas.xa_index);
> -		if (PageHWPoison(page))
> -			goto unlock;
> -
> -		if (mmap_miss > 0)
> -			mmap_miss--;
> +		struct page *page;
> +		unsigned long end;
>  
> +		page = folio_file_page(folio, xas.xa_index);
>  		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
> -		vmf->pte += xas.xa_index - last_pgoff;
> +		vmf->pte += xas.xa_index - last_pgoff - len;
>  		last_pgoff = xas.xa_index;
> +		end = folio->index + folio_nr_pages(folio) - 1;
> +		len = min(end, end_pgoff) - xas.xa_index + 1;
>  
> -		/*
> -		 * NOTE: If there're PTE markers, we'll leave them to be
> -		 * handled in the specific fault path, and it'll prohibit the
> -		 * fault-around logic.
> -		 */
> -		if (!pte_none(*vmf->pte))
> -			goto unlock;
> -
> -		/* We're about to handle the fault */
> -		if (vmf->address == addr)
> +		if (VM_FAULT_NOPAGE ==
> +			filemap_map_folio_range(vmf, folio, page, addr, len))
>  			ret = VM_FAULT_NOPAGE;
>  
> -		do_set_pte(vmf, page, addr);
> -		/* no need to invalidate: a not-present page won't be cached */
> -		update_mmu_cache(vma, addr, vmf->pte);
> -		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
> -			xas.xa_index++;
> -			folio_ref_inc(folio);
> -			goto again;
> -		}
> -		folio_unlock(folio);
> -		continue;
> -unlock:
> -		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
> -			xas.xa_index++;
> -			goto again;
> -		}
> +		xas.xa_index = end;
> +
>  		folio_unlock(folio);
>  		folio_put(folio);
>  	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
Yin Fengwei Jan. 31, 2023, 6:32 a.m. UTC | #4
On 1/31/2023 11:34 AM, Huang, Ying wrote:
> Yin Fengwei <fengwei.yin@intel.com> writes:
> 
>> Add function to do file page mapping based on folio and update
>> filemap_map_pages() to use new function. So the filemap page
>> mapping will deal with folio granularity instead of page
>> granularity. This allow batched folio refcount update.
>>
>> Signed-off-by: Yin Fengwei <fengwei.yin@intel.com>
>> ---
>>  mm/filemap.c | 82 ++++++++++++++++++++++++++++++----------------------
>>  1 file changed, 48 insertions(+), 34 deletions(-)
>>
>> diff --git a/mm/filemap.c b/mm/filemap.c
>> index c915ded191f0..fe0c226c8b1e 100644
>> --- a/mm/filemap.c
>> +++ b/mm/filemap.c
>> @@ -3351,6 +3351,43 @@ static inline struct folio *next_map_page(struct address_space *mapping,
>>  				  mapping, xas, end_pgoff);
>>  }
>>  
>> +
>> +static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
>> +	struct folio *folio, struct page *page, unsigned long addr,
>> +	int len)
> 
> As Matthew pointed out, we should rename 'len'. And some comments about
> the meaning of the parameters should be good.  For example,
> 
> /* Map sub-pages [start_page, start_page + nr_pages) of folio */
> static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
> 	struct folio *folio, struct page *start_page, unsigned int nr_pages,
>         unsigned long start)
Yes. I will address this in next version series. Thanks.


Regards
Yin, Fengwei

> 
> Best Regards,
> Huang, Ying
> 
>> +{
>> +	vm_fault_t ret = 0;
>> +	struct vm_area_struct *vma = vmf->vma;
>> +	struct file *file = vma->vm_file;
>> +	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
>> +	int ref_count = 0, count = 0;
>> +
>> +	do {
>> +		if (PageHWPoison(page))
>> +			continue;
>> +
>> +		if (mmap_miss > 0)
>> +			mmap_miss--;
>> +
>> +		if (!pte_none(*vmf->pte))
>> +			continue;
>> +
>> +		if (vmf->address == addr)
>> +			ret = VM_FAULT_NOPAGE;
>> +
>> +		ref_count++;
>> +
>> +		do_set_pte(vmf, page, addr);
>> +		update_mmu_cache(vma, addr, vmf->pte);
>> +
>> +	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < len);
>> +
>> +	folio_ref_add(folio, ref_count);
>> +	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
>> +
>> +	return ret;
>> +}
>> +
>>  vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>>  			     pgoff_t start_pgoff, pgoff_t end_pgoff)
>>  {
>> @@ -3361,9 +3398,9 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>>  	unsigned long addr;
>>  	XA_STATE(xas, &mapping->i_pages, start_pgoff);
>>  	struct folio *folio;
>> -	struct page *page;
>>  	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
>>  	vm_fault_t ret = 0;
>> +	int len = 0;
>>  
>>  	rcu_read_lock();
>>  	folio = first_map_page(mapping, &xas, end_pgoff);
>> @@ -3378,45 +3415,22 @@ vm_fault_t filemap_map_pages(struct vm_fault *vmf,
>>  	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
>>  	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
>>  	do {
>> -again:
>> -		page = folio_file_page(folio, xas.xa_index);
>> -		if (PageHWPoison(page))
>> -			goto unlock;
>> -
>> -		if (mmap_miss > 0)
>> -			mmap_miss--;
>> +		struct page *page;
>> +		unsigned long end;
>>  
>> +		page = folio_file_page(folio, xas.xa_index);
>>  		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
>> -		vmf->pte += xas.xa_index - last_pgoff;
>> +		vmf->pte += xas.xa_index - last_pgoff - len;
>>  		last_pgoff = xas.xa_index;
>> +		end = folio->index + folio_nr_pages(folio) - 1;
>> +		len = min(end, end_pgoff) - xas.xa_index + 1;
>>  
>> -		/*
>> -		 * NOTE: If there're PTE markers, we'll leave them to be
>> -		 * handled in the specific fault path, and it'll prohibit the
>> -		 * fault-around logic.
>> -		 */
>> -		if (!pte_none(*vmf->pte))
>> -			goto unlock;
>> -
>> -		/* We're about to handle the fault */
>> -		if (vmf->address == addr)
>> +		if (VM_FAULT_NOPAGE ==
>> +			filemap_map_folio_range(vmf, folio, page, addr, len))
>>  			ret = VM_FAULT_NOPAGE;
>>  
>> -		do_set_pte(vmf, page, addr);
>> -		/* no need to invalidate: a not-present page won't be cached */
>> -		update_mmu_cache(vma, addr, vmf->pte);
>> -		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
>> -			xas.xa_index++;
>> -			folio_ref_inc(folio);
>> -			goto again;
>> -		}
>> -		folio_unlock(folio);
>> -		continue;
>> -unlock:
>> -		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
>> -			xas.xa_index++;
>> -			goto again;
>> -		}
>> +		xas.xa_index = end;
>> +
>>  		folio_unlock(folio);
>>  		folio_put(folio);
>>  	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);
diff mbox series

Patch

diff --git a/mm/filemap.c b/mm/filemap.c
index c915ded191f0..fe0c226c8b1e 100644
--- a/mm/filemap.c
+++ b/mm/filemap.c
@@ -3351,6 +3351,43 @@  static inline struct folio *next_map_page(struct address_space *mapping,
 				  mapping, xas, end_pgoff);
 }
 
+
+static vm_fault_t filemap_map_folio_range(struct vm_fault *vmf,
+	struct folio *folio, struct page *page, unsigned long addr,
+	int len)
+{
+	vm_fault_t ret = 0;
+	struct vm_area_struct *vma = vmf->vma;
+	struct file *file = vma->vm_file;
+	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
+	int ref_count = 0, count = 0;
+
+	do {
+		if (PageHWPoison(page))
+			continue;
+
+		if (mmap_miss > 0)
+			mmap_miss--;
+
+		if (!pte_none(*vmf->pte))
+			continue;
+
+		if (vmf->address == addr)
+			ret = VM_FAULT_NOPAGE;
+
+		ref_count++;
+
+		do_set_pte(vmf, page, addr);
+		update_mmu_cache(vma, addr, vmf->pte);
+
+	} while (vmf->pte++, page++, addr += PAGE_SIZE, ++count < len);
+
+	folio_ref_add(folio, ref_count);
+	WRITE_ONCE(file->f_ra.mmap_miss, mmap_miss);
+
+	return ret;
+}
+
 vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 			     pgoff_t start_pgoff, pgoff_t end_pgoff)
 {
@@ -3361,9 +3398,9 @@  vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	unsigned long addr;
 	XA_STATE(xas, &mapping->i_pages, start_pgoff);
 	struct folio *folio;
-	struct page *page;
 	unsigned int mmap_miss = READ_ONCE(file->f_ra.mmap_miss);
 	vm_fault_t ret = 0;
+	int len = 0;
 
 	rcu_read_lock();
 	folio = first_map_page(mapping, &xas, end_pgoff);
@@ -3378,45 +3415,22 @@  vm_fault_t filemap_map_pages(struct vm_fault *vmf,
 	addr = vma->vm_start + ((start_pgoff - vma->vm_pgoff) << PAGE_SHIFT);
 	vmf->pte = pte_offset_map_lock(vma->vm_mm, vmf->pmd, addr, &vmf->ptl);
 	do {
-again:
-		page = folio_file_page(folio, xas.xa_index);
-		if (PageHWPoison(page))
-			goto unlock;
-
-		if (mmap_miss > 0)
-			mmap_miss--;
+		struct page *page;
+		unsigned long end;
 
+		page = folio_file_page(folio, xas.xa_index);
 		addr += (xas.xa_index - last_pgoff) << PAGE_SHIFT;
-		vmf->pte += xas.xa_index - last_pgoff;
+		vmf->pte += xas.xa_index - last_pgoff - len;
 		last_pgoff = xas.xa_index;
+		end = folio->index + folio_nr_pages(folio) - 1;
+		len = min(end, end_pgoff) - xas.xa_index + 1;
 
-		/*
-		 * NOTE: If there're PTE markers, we'll leave them to be
-		 * handled in the specific fault path, and it'll prohibit the
-		 * fault-around logic.
-		 */
-		if (!pte_none(*vmf->pte))
-			goto unlock;
-
-		/* We're about to handle the fault */
-		if (vmf->address == addr)
+		if (VM_FAULT_NOPAGE ==
+			filemap_map_folio_range(vmf, folio, page, addr, len))
 			ret = VM_FAULT_NOPAGE;
 
-		do_set_pte(vmf, page, addr);
-		/* no need to invalidate: a not-present page won't be cached */
-		update_mmu_cache(vma, addr, vmf->pte);
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			folio_ref_inc(folio);
-			goto again;
-		}
-		folio_unlock(folio);
-		continue;
-unlock:
-		if (folio_more_pages(folio, xas.xa_index, end_pgoff)) {
-			xas.xa_index++;
-			goto again;
-		}
+		xas.xa_index = end;
+
 		folio_unlock(folio);
 		folio_put(folio);
 	} while ((folio = next_map_page(mapping, &xas, end_pgoff)) != NULL);