diff mbox series

[v3,3/6] mm: shmem: add multi-size THP sysfs interface for anonymous shmem

Message ID 716c515156e8c891766d8fd3f1df231d289b2a37.1717033868.git.baolin.wang@linux.alibaba.com (mailing list archive)
State New
Headers show
Series add mTHP support for anonymous shmem | expand

Commit Message

Baolin Wang May 30, 2024, 2:04 a.m. UTC
To support the use of mTHP with anonymous shmem, add a new sysfs interface
'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
directory for each mTHP to control whether shmem is enabled for that mTHP,
with a value similar to the top level 'shmem_enabled', which can be set to:
"always", "inherit (to inherit the top level setting)", "within_size", "advise",
"never", "deny", "force". These values follow the same semantics as the top
level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
to 'always' to keep compatibility.

By default, PMD-sized hugepages have enabled="inherit" and all other hugepage
sizes have enabled="never" for '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.

In addition, if top level value is 'force', then only PMD-sized hugepages
have enabled="inherit", otherwise configuration will be failed and vice versa.
That means now we will avoid using non-PMD sized THP to override the global
huge allocation.

Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
---
 Documentation/admin-guide/mm/transhuge.rst | 29 +++++++
 include/linux/huge_mm.h                    | 10 +++
 mm/huge_memory.c                           | 11 +--
 mm/shmem.c                                 | 96 ++++++++++++++++++++++
 4 files changed, 138 insertions(+), 8 deletions(-)

Comments

wang wei June 1, 2024, 3:29 a.m. UTC | #1
At 2024-05-30 10:04:14, "Baolin Wang" <baolin.wang@linux.alibaba.com> wrote:

>To support the use of mTHP with anonymous shmem, add a new sysfs interface
>'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
>directory for each mTHP to control whether shmem is enabled for that mTHP,
>with a value similar to the top level 'shmem_enabled', which can be set to:
>"always", "inherit (to inherit the top level setting)", "within_size", "advise",
>"never", "deny", "force". These values follow the same semantics as the top
>level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
>to 'always' to keep compatibility.
>
>By default, PMD-sized hugepages have enabled="inherit" and all other hugepage
>sizes have enabled="never" for '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.
>
>In addition, if top level value is 'force', then only PMD-sized hugepages
>have enabled="inherit", otherwise configuration will be failed and vice versa.
>That means now we will avoid using non-PMD sized THP to override the global
>huge allocation.
>
>Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>---
> Documentation/admin-guide/mm/transhuge.rst | 29 +++++++
> include/linux/huge_mm.h                    | 10 +++
> mm/huge_memory.c                           | 11 +--
> mm/shmem.c                                 | 96 ++++++++++++++++++++++
> 4 files changed, 138 insertions(+), 8 deletions(-)
>
>diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>index d414d3f5592a..659459374032 100644
>--- a/Documentation/admin-guide/mm/transhuge.rst
>+++ b/Documentation/admin-guide/mm/transhuge.rst
>@@ -332,6 +332,35 @@ deny
> force
>     Force the huge option on for all - very useful for testing;
> 
>+Anonymous shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob
>+to control mTHP allocation: /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled.
>+Its value for each mTHP is essentially consistent with the global setting, except
>+for the addition of 'inherit' to ensure compatibility with the global settings.
>+always
>+    Attempt to allocate <size> huge pages every time we need a new page;
>+
>+inherit
>+    Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
>+    have enabled="inherit" and all other hugepage sizes have enabled="never";
>+
>+never
>+    Do not allocate <size> huge pages;
>+
>+within_size
>+    Only allocate <size> huge page if it will be fully within i_size.
>+    Also respect fadvise()/madvise() hints;
>+
>+advise
>+    Only allocate <size> huge pages if requested with fadvise()/madvise();
>+
>+deny
>+    Has the same semantics as 'never', now mTHP allocation policy is only
>+    used for anonymous shmem and no not override tmpfs.
>+
>+force
>+    Has the same semantics as 'always', now mTHP allocation policy is only
>+    used for anonymous shmem and no not override tmpfs.

>+


I just briefly reviewed the discussion about the value of hugepages-<size>kB/shmem_enabled
in V1 [PATCH 5/8]. Is there a conclusion now? Maybe I left out some important information.


> Need of application restart
> ===========================
Baolin Wang June 2, 2024, 4:36 a.m. UTC | #2
On 2024/6/1 11:29, wang wei wrote:
> At 2024-05-30 10:04:14, "Baolin Wang" <baolin.wang@linux.alibaba.com> wrote:
> 
>>To support the use of mTHP with anonymous shmem, add a new sysfs interface
>>'shmem_enabled' in the '/sys/kernel/mm/transparent_hugepage/hugepages-kB/'
>>directory for each mTHP to control whether shmem is enabled for that mTHP,
>>with a value similar to the top level 'shmem_enabled', which can be set to:
>>"always", "inherit (to inherit the top level setting)", "within_size", "advise",
>>"never", "deny", "force". These values follow the same semantics as the top
>>level, except the 'deny' is equivalent to 'never', and 'force' is equivalent
>>to 'always' to keep compatibility.
>>
>>By default, PMD-sized hugepages have enabled="inherit" and all other hugepage
>>sizes have enabled="never" for '/sys/kernel/mm/transparent_hugepage/hugepages-xxkB/shmem_enabled'.
>>
>>In addition, if top level value is 'force', then only PMD-sized hugepages
>>have enabled="inherit", otherwise configuration will be failed and vice versa.
>>That means now we will avoid using non-PMD sized THP to override the global
>>huge allocation.
>>
>>Signed-off-by: Baolin Wang <baolin.wang@linux.alibaba.com>
>>---
>> Documentation/admin-guide/mm/transhuge.rst | 29 +++++++
>> include/linux/huge_mm.h                    | 10 +++
>> mm/huge_memory.c                           | 11 +--
>> mm/shmem.c                                 | 96 ++++++++++++++++++++++
>> 4 files changed, 138 insertions(+), 8 deletions(-)
>>
>>diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
>>index d414d3f5592a..659459374032 100644
>>--- a/Documentation/admin-guide/mm/transhuge.rst
>>+++ b/Documentation/admin-guide/mm/transhuge.rst
>>@@ -332,6 +332,35 @@ deny
>> force
>>     Force the huge option on for all - very useful for testing;
>> 
>>+Anonymous shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob
>>+to control mTHP allocation: /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled.
>>+Its value for each mTHP is essentially consistent with the global setting, except
>>+for the addition of 'inherit' to ensure compatibility with the global settings.
>>+always
>>+    Attempt to allocate <size> huge pages every time we need a new page;
>>+
>>+inherit
>>+    Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
>>+    have enabled="inherit" and all other hugepage sizes have enabled="never";
>>+
>>+never
>>+    Do not allocate <size> huge pages;
>>+
>>+within_size
>>+    Only allocate <size> huge page if it will be fully within i_size.
>>+    Also respect fadvise()/madvise() hints;
>>+
>>+advise
>>+    Only allocate <size> huge pages if requested with fadvise()/madvise();
>>+
>>+deny
>>+    Has the same semantics as 'never', now mTHP allocation policy is only
>>+    used for anonymous shmem and no not override tmpfs.
>>+
>>+force
>>+    Has the same semantics as 'always', now mTHP allocation policy is only
>>+    used for anonymous shmem and no not override tmpfs.
>  >+
> 
> I just briefly reviewed the discussion about the value of 
> hugepages-<size>kB/shmem_enabled
> in V1 [PATCH 5/8]. Is there a conclusion now? Maybe I left out some 
> important information.

You can refer to the this patch's commit message and documentation, 
which are based on the conclusions of previous discussions.

In addition, you can also read more discussions from the last bi-weekly 
MM meeting[1], summarized by David.

[1] 
https://lore.kernel.org/all/f1783ff0-65bd-4b2b-8952-52b6822a0835@redhat.com/#t
diff mbox series

Patch

diff --git a/Documentation/admin-guide/mm/transhuge.rst b/Documentation/admin-guide/mm/transhuge.rst
index d414d3f5592a..659459374032 100644
--- a/Documentation/admin-guide/mm/transhuge.rst
+++ b/Documentation/admin-guide/mm/transhuge.rst
@@ -332,6 +332,35 @@  deny
 force
     Force the huge option on for all - very useful for testing;
 
+Anonymous shmem can also use "multi-size THP" (mTHP) by adding a new sysfs knob
+to control mTHP allocation: /sys/kernel/mm/transparent_hugepage/hugepages-<size>kB/shmem_enabled.
+Its value for each mTHP is essentially consistent with the global setting, except
+for the addition of 'inherit' to ensure compatibility with the global settings.
+always
+    Attempt to allocate <size> huge pages every time we need a new page;
+
+inherit
+    Inherit the top-level "shmem_enabled" value. By default, PMD-sized hugepages
+    have enabled="inherit" and all other hugepage sizes have enabled="never";
+
+never
+    Do not allocate <size> huge pages;
+
+within_size
+    Only allocate <size> huge page if it will be fully within i_size.
+    Also respect fadvise()/madvise() hints;
+
+advise
+    Only allocate <size> huge pages if requested with fadvise()/madvise();
+
+deny
+    Has the same semantics as 'never', now mTHP allocation policy is only
+    used for anonymous shmem and no not override tmpfs.
+
+force
+    Has the same semantics as 'always', now mTHP allocation policy is only
+    used for anonymous shmem and no not override tmpfs.
+
 Need of application restart
 ===========================
 
diff --git a/include/linux/huge_mm.h b/include/linux/huge_mm.h
index 020e2344eb86..fac21548c5de 100644
--- a/include/linux/huge_mm.h
+++ b/include/linux/huge_mm.h
@@ -6,6 +6,7 @@ 
 #include <linux/mm_types.h>
 
 #include <linux/fs.h> /* only for vma_is_dax() */
+#include <linux/kobject.h>
 
 vm_fault_t do_huge_pmd_anonymous_page(struct vm_fault *vmf);
 int copy_huge_pmd(struct mm_struct *dst_mm, struct mm_struct *src_mm,
@@ -63,6 +64,7 @@  ssize_t single_hugepage_flag_show(struct kobject *kobj,
 				  struct kobj_attribute *attr, char *buf,
 				  enum transparent_hugepage_flag flag);
 extern struct kobj_attribute shmem_enabled_attr;
+extern struct kobj_attribute thpsize_shmem_enabled_attr;
 
 /*
  * Mask of all large folio orders supported for anonymous THP; all orders up to
@@ -265,6 +267,14 @@  unsigned long thp_vma_allowable_orders(struct vm_area_struct *vma,
 	return __thp_vma_allowable_orders(vma, vm_flags, tva_flags, orders);
 }
 
+struct thpsize {
+	struct kobject kobj;
+	struct list_head node;
+	int order;
+};
+
+#define to_thpsize(kobj) container_of(kobj, struct thpsize, kobj)
+
 enum mthp_stat_item {
 	MTHP_STAT_ANON_FAULT_ALLOC,
 	MTHP_STAT_ANON_FAULT_FALLBACK,
diff --git a/mm/huge_memory.c b/mm/huge_memory.c
index 8e49f402d7c7..1360a1903b66 100644
--- a/mm/huge_memory.c
+++ b/mm/huge_memory.c
@@ -449,14 +449,6 @@  static void thpsize_release(struct kobject *kobj);
 static DEFINE_SPINLOCK(huge_anon_orders_lock);
 static LIST_HEAD(thpsize_list);
 
-struct thpsize {
-	struct kobject kobj;
-	struct list_head node;
-	int order;
-};
-
-#define to_thpsize(kobj) container_of(kobj, struct thpsize, kobj)
-
 static ssize_t thpsize_enabled_show(struct kobject *kobj,
 				    struct kobj_attribute *attr, char *buf)
 {
@@ -517,6 +509,9 @@  static struct kobj_attribute thpsize_enabled_attr =
 
 static struct attribute *thpsize_attrs[] = {
 	&thpsize_enabled_attr.attr,
+#ifdef CONFIG_SHMEM
+	&thpsize_shmem_enabled_attr.attr,
+#endif
 	NULL,
 };
 
diff --git a/mm/shmem.c b/mm/shmem.c
index ae358efc397a..d5ab5e211100 100644
--- a/mm/shmem.c
+++ b/mm/shmem.c
@@ -131,6 +131,14 @@  struct shmem_options {
 #define SHMEM_SEEN_QUOTA 32
 };
 
+#ifdef CONFIG_TRANSPARENT_HUGEPAGE
+static unsigned long huge_anon_shmem_orders_always __read_mostly;
+static unsigned long huge_anon_shmem_orders_madvise __read_mostly;
+static unsigned long huge_anon_shmem_orders_inherit __read_mostly;
+static unsigned long huge_anon_shmem_orders_within_size __read_mostly;
+static DEFINE_SPINLOCK(huge_anon_shmem_orders_lock);
+#endif
+
 #ifdef CONFIG_TMPFS
 static unsigned long shmem_default_max_blocks(void)
 {
@@ -4672,6 +4680,12 @@  void __init shmem_init(void)
 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
 	else
 		shmem_huge = SHMEM_HUGE_NEVER; /* just in case it was patched */
+
+	/*
+	 * Default to setting PMD-sized THP to inherit the global setting and
+	 * disable all other multi-size THPs, when anonymous shmem uses mTHP.
+	 */
+	huge_anon_shmem_orders_inherit = BIT(HPAGE_PMD_ORDER);
 #endif
 	return;
 
@@ -4731,6 +4745,11 @@  static ssize_t shmem_enabled_store(struct kobject *kobj,
 			huge != SHMEM_HUGE_NEVER && huge != SHMEM_HUGE_DENY)
 		return -EINVAL;
 
+	/* Do not override huge allocation policy with non-PMD sized mTHP */
+	if (huge == SHMEM_HUGE_FORCE &&
+	    huge_anon_shmem_orders_inherit != BIT(HPAGE_PMD_ORDER))
+		return -EINVAL;
+
 	shmem_huge = huge;
 	if (shmem_huge > SHMEM_HUGE_DENY)
 		SHMEM_SB(shm_mnt->mnt_sb)->huge = shmem_huge;
@@ -4738,6 +4757,83 @@  static ssize_t shmem_enabled_store(struct kobject *kobj,
 }
 
 struct kobj_attribute shmem_enabled_attr = __ATTR_RW(shmem_enabled);
+
+static ssize_t thpsize_shmem_enabled_show(struct kobject *kobj,
+					  struct kobj_attribute *attr, char *buf)
+{
+	int order = to_thpsize(kobj)->order;
+	const char *output;
+
+	if (test_bit(order, &huge_anon_shmem_orders_always))
+		output = "[always] inherit within_size advise never deny [force]";
+	else if (test_bit(order, &huge_anon_shmem_orders_inherit))
+		output = "always [inherit] within_size advise never deny force";
+	else if (test_bit(order, &huge_anon_shmem_orders_within_size))
+		output = "always inherit [within_size] advise never deny force";
+	else if (test_bit(order, &huge_anon_shmem_orders_madvise))
+		output = "always inherit within_size [advise] never deny force";
+	else
+		output = "always inherit within_size advise [never] [deny] force";
+
+	return sysfs_emit(buf, "%s\n", output);
+}
+
+static ssize_t thpsize_shmem_enabled_store(struct kobject *kobj,
+					   struct kobj_attribute *attr,
+					   const char *buf, size_t count)
+{
+	int order = to_thpsize(kobj)->order;
+	ssize_t ret = count;
+
+	if (sysfs_streq(buf, "always") || sysfs_streq(buf, "force")) {
+		spin_lock(&huge_anon_shmem_orders_lock);
+		clear_bit(order, &huge_anon_shmem_orders_inherit);
+		clear_bit(order, &huge_anon_shmem_orders_madvise);
+		clear_bit(order, &huge_anon_shmem_orders_within_size);
+		set_bit(order, &huge_anon_shmem_orders_always);
+		spin_unlock(&huge_anon_shmem_orders_lock);
+	} else if (sysfs_streq(buf, "inherit")) {
+		/* Do not override huge allocation policy with non-PMD sized mTHP */
+		if (shmem_huge == SHMEM_HUGE_FORCE &&
+		    order != HPAGE_PMD_ORDER)
+			return -EINVAL;
+
+		spin_lock(&huge_anon_shmem_orders_lock);
+		clear_bit(order, &huge_anon_shmem_orders_always);
+		clear_bit(order, &huge_anon_shmem_orders_madvise);
+		clear_bit(order, &huge_anon_shmem_orders_within_size);
+		set_bit(order, &huge_anon_shmem_orders_inherit);
+		spin_unlock(&huge_anon_shmem_orders_lock);
+	} else if (sysfs_streq(buf, "within_size")) {
+		spin_lock(&huge_anon_shmem_orders_lock);
+		clear_bit(order, &huge_anon_shmem_orders_always);
+		clear_bit(order, &huge_anon_shmem_orders_inherit);
+		clear_bit(order, &huge_anon_shmem_orders_madvise);
+		set_bit(order, &huge_anon_shmem_orders_within_size);
+		spin_unlock(&huge_anon_shmem_orders_lock);
+	} else if (sysfs_streq(buf, "madvise")) {
+		spin_lock(&huge_anon_shmem_orders_lock);
+		clear_bit(order, &huge_anon_shmem_orders_always);
+		clear_bit(order, &huge_anon_shmem_orders_inherit);
+		clear_bit(order, &huge_anon_shmem_orders_within_size);
+		set_bit(order, &huge_anon_shmem_orders_madvise);
+		spin_unlock(&huge_anon_shmem_orders_lock);
+	} else if (sysfs_streq(buf, "never") || sysfs_streq(buf, "deny")) {
+		spin_lock(&huge_anon_shmem_orders_lock);
+		clear_bit(order, &huge_anon_shmem_orders_always);
+		clear_bit(order, &huge_anon_shmem_orders_inherit);
+		clear_bit(order, &huge_anon_shmem_orders_within_size);
+		clear_bit(order, &huge_anon_shmem_orders_madvise);
+		spin_unlock(&huge_anon_shmem_orders_lock);
+	} else {
+		ret = -EINVAL;
+	}
+
+	return ret;
+}
+
+struct kobj_attribute thpsize_shmem_enabled_attr =
+	__ATTR(shmem_enabled, 0644, thpsize_shmem_enabled_show, thpsize_shmem_enabled_store);
 #endif /* CONFIG_TRANSPARENT_HUGEPAGE && CONFIG_SYSFS */
 
 #else /* !CONFIG_SHMEM */