mbox series

[v6,0/6] Enable THP for text section of non-shmem files

Message ID 20190622000512.923867-1-songliubraving@fb.com (mailing list archive)
Headers show
Series Enable THP for text section of non-shmem files | expand

Message

Song Liu June 22, 2019, 12:05 a.m. UTC
Changes v5 => v6:
1. Improve THP stats in 3/6, (Kirill).

Changes v4 => v5:
1. Move the logic to drop THP from pagecache to open() path (Rik).
2. Revise description of CONFIG_READ_ONLY_THP_FOR_FS.

Changes v3 => v4:
1. Put the logic to drop THP from pagecache in a separate function (Rik).
2. Move the function to drop THP from pagecache to exit_mmap().
3. Revise confusing commit log 6/6.

Changes v2 => v3:
1. Removed the limitation (cannot write to file with THP) by truncating
   whole file during sys_open (see 6/6);
2. Fixed a VM_BUG_ON_PAGE() in filemap_fault() (see 2/6);
3. Split function rename to a separate patch (Rik);
4. Updated condition in hugepage_vma_check() (Rik).

Changes v1 => v2:
1. Fixed a missing mem_cgroup_commit_charge() for non-shmem case.

This set follows up discussion at LSF/MM 2019. The motivation is to put
text section of an application in THP, and thus reduces iTLB miss rate and
improves performance. Both Facebook and Oracle showed strong interests to
this feature.

To make reviews easier, this set aims a mininal valid product. Current
version of the work does not have any changes to file system specific
code. This comes with some limitations (discussed later).

This set enables an application to "hugify" its text section by simply
running something like:

          madvise(0x600000, 0x80000, MADV_HUGEPAGE);

Before this call, the /proc/<pid>/maps looks like:

    00400000-074d0000 r-xp 00000000 00:27 2006927     app

After this call, part of the text section is split out and mapped to
THP:

    00400000-00425000 r-xp 00000000 00:27 2006927     app
    00600000-00e00000 r-xp 00200000 00:27 2006927     app   <<< on THP
    00e00000-074d0000 r-xp 00a00000 00:27 2006927     app

Limitations:

1. This only works for text section (vma with VM_DENYWRITE).
2. Original limitation #2 is removed in v3.

We gated this feature with an experimental config, READ_ONLY_THP_FOR_FS.
Once we get better support on the write path, we can remove the config and
enable it by default.

Tested cases:
1. Tested with btrfs and ext4.
2. Tested with real work application (memcache like caching service).
3. Tested with "THP aware uprobe":
   https://patchwork.kernel.org/project/linux-mm/list/?series=131339

Please share your comments and suggestions on this.

Thanks!

Song Liu (6):
  filemap: check compound_head(page)->mapping in filemap_fault()
  filemap: update offset check in filemap_fault()
  mm,thp: stats for file backed THP
  khugepaged: rename collapse_shmem() and khugepaged_scan_shmem()
  mm,thp: add read-only THP support for (non-shmem) FS
  mm,thp: avoid writes to file with THP in pagecache

 drivers/base/node.c    |   6 +++
 fs/inode.c             |   3 ++
 fs/namei.c             |  22 ++++++++-
 fs/proc/meminfo.c      |   4 ++
 fs/proc/task_mmu.c     |   4 +-
 include/linux/fs.h     |  31 ++++++++++++
 include/linux/mmzone.h |   2 +
 mm/Kconfig             |  11 +++++
 mm/filemap.c           |   9 ++--
 mm/khugepaged.c        | 104 +++++++++++++++++++++++++++++++++--------
 mm/rmap.c              |  12 +++--
 mm/vmstat.c            |   2 +
 12 files changed, 180 insertions(+), 30 deletions(-)

--
2.17.1

Comments

Hillf Danton June 22, 2019, 3:11 a.m. UTC | #1
Hello

On Fri, 21 Jun 2019 17:05:10 -0700 Song Liu <songliubraving@fb.com> wrote:
> Next patch will add khugepaged support of non-shmem files. This patch
> renames these two functions to reflect the new functionality:
> 
>     collapse_shmem()        =>  collapse_file()
>     khugepaged_scan_shmem() =>  khugepaged_scan_file()
> 
> Acked-by: Rik van Riel <riel@surriel.com>
> Signed-off-by: Song Liu <songliubraving@fb.com>
> ---
>  mm/khugepaged.c | 13 +++++++------
>  1 file changed, 7 insertions(+), 6 deletions(-)
> 
> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
> index 0f7419938008..dde8e45552b3 100644
> --- a/mm/khugepaged.c
> +++ b/mm/khugepaged.c
> @@ -1287,7 +1287,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>  }
>  
>  /**
> - * collapse_shmem - collapse small tmpfs/shmem pages into huge one.
> + * collapse_file - collapse small tmpfs/shmem pages into huge one.
>   *
>   * Basic scheme is simple, details are more complex:
>   *  - allocate and lock a new huge page;
> @@ -1304,10 +1304,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>   *    + restore gaps in the page cache;
>   *    + unlock and free huge page;
>   */
> -static void collapse_shmem(struct mm_struct *mm,
> +static void collapse_file(struct vm_area_struct *vma,
>  		struct address_space *mapping, pgoff_t start,
>  		struct page **hpage, int node)
>  {
> +	struct mm_struct *mm = vma->vm_mm;
>  	gfp_t gfp;
>  	struct page *new_page;
>  	struct mem_cgroup *memcg;
> @@ -1563,7 +1564,7 @@ static void collapse_shmem(struct mm_struct *mm,
>  	/* TODO: tracepoints */
>  }
>  
> -static void khugepaged_scan_shmem(struct mm_struct *mm,
> +static void khugepaged_scan_file(struct vm_area_struct *vma,
>  		struct address_space *mapping,
>  		pgoff_t start, struct page **hpage)
>  {
> @@ -1631,14 +1632,14 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
>  			result = SCAN_EXCEED_NONE_PTE;
>  		} else {
>  			node = khugepaged_find_target_node();
> -			collapse_shmem(mm, mapping, start, hpage, node);
> +			collapse_file(vma, mapping, start, hpage, node);
>  		}
>  	}
>  
>  	/* TODO: tracepoints */
>  }
>  #else
> -static void khugepaged_scan_shmem(struct mm_struct *mm,
> +static void khugepaged_scan_file(struct vm_area_struct *vma,
>  		struct address_space *mapping,
>  		pgoff_t start, struct page **hpage)
>  {
> @@ -1722,7 +1723,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>  				file = get_file(vma->vm_file);
>  				up_read(&mm->mmap_sem);
>  				ret = 1;
> -				khugepaged_scan_shmem(mm, file->f_mapping,
> +				khugepaged_scan_file(vma, file->f_mapping,
>  						pgoff, hpage);
>  				fput(file);

Is it a change that should have put some material in the log message?
Is it unlikely for vma to go without mmap_sem held?
>  			} else {
> -- 
> 2.17.1
> 

Hillf
Song Liu June 22, 2019, 4:48 a.m. UTC | #2
> On Jun 21, 2019, at 8:11 PM, Hillf Danton <hdanton@sina.com> wrote:
> 
> 
> Hello
> 
> On Fri, 21 Jun 2019 17:05:10 -0700 Song Liu <songliubraving@fb.com> wrote:
>> Next patch will add khugepaged support of non-shmem files. This patch
>> renames these two functions to reflect the new functionality:
>> 
>>    collapse_shmem()        =>  collapse_file()
>>    khugepaged_scan_shmem() =>  khugepaged_scan_file()
>> 
>> Acked-by: Rik van Riel <riel@surriel.com>
>> Signed-off-by: Song Liu <songliubraving@fb.com>
>> ---
>> mm/khugepaged.c | 13 +++++++------
>> 1 file changed, 7 insertions(+), 6 deletions(-)
>> 
>> diff --git a/mm/khugepaged.c b/mm/khugepaged.c
>> index 0f7419938008..dde8e45552b3 100644
>> --- a/mm/khugepaged.c
>> +++ b/mm/khugepaged.c
>> @@ -1287,7 +1287,7 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>> }
>> 
>> /**
>> - * collapse_shmem - collapse small tmpfs/shmem pages into huge one.
>> + * collapse_file - collapse small tmpfs/shmem pages into huge one.
>>  *
>>  * Basic scheme is simple, details are more complex:
>>  *  - allocate and lock a new huge page;
>> @@ -1304,10 +1304,11 @@ static void retract_page_tables(struct address_space *mapping, pgoff_t pgoff)
>>  *    + restore gaps in the page cache;
>>  *    + unlock and free huge page;
>>  */
>> -static void collapse_shmem(struct mm_struct *mm,
>> +static void collapse_file(struct vm_area_struct *vma,
>> 		struct address_space *mapping, pgoff_t start,
>> 		struct page **hpage, int node)
>> {
>> +	struct mm_struct *mm = vma->vm_mm;
>> 	gfp_t gfp;
>> 	struct page *new_page;
>> 	struct mem_cgroup *memcg;
>> @@ -1563,7 +1564,7 @@ static void collapse_shmem(struct mm_struct *mm,
>> 	/* TODO: tracepoints */
>> }
>> 
>> -static void khugepaged_scan_shmem(struct mm_struct *mm,
>> +static void khugepaged_scan_file(struct vm_area_struct *vma,
>> 		struct address_space *mapping,
>> 		pgoff_t start, struct page **hpage)
>> {
>> @@ -1631,14 +1632,14 @@ static void khugepaged_scan_shmem(struct mm_struct *mm,
>> 			result = SCAN_EXCEED_NONE_PTE;
>> 		} else {
>> 			node = khugepaged_find_target_node();
>> -			collapse_shmem(mm, mapping, start, hpage, node);
>> +			collapse_file(vma, mapping, start, hpage, node);
>> 		}
>> 	}
>> 
>> 	/* TODO: tracepoints */
>> }
>> #else
>> -static void khugepaged_scan_shmem(struct mm_struct *mm,
>> +static void khugepaged_scan_file(struct vm_area_struct *vma,
>> 		struct address_space *mapping,
>> 		pgoff_t start, struct page **hpage)
>> {
>> @@ -1722,7 +1723,7 @@ static unsigned int khugepaged_scan_mm_slot(unsigned int pages,
>> 				file = get_file(vma->vm_file);
>> 				up_read(&mm->mmap_sem);
>> 				ret = 1;
>> -				khugepaged_scan_shmem(mm, file->f_mapping,
>> +				khugepaged_scan_file(vma, file->f_mapping,
>> 						pgoff, hpage);
>> 				fput(file);
> 
> Is it a change that should have put some material in the log message?
> Is it unlikely for vma to go without mmap_sem held?

This is a great point. We really need to be more careful. Let me fix 
it in the next version. 

Thanks,
Song