mbox series

[v5,0/2] make hugetlb_optimize_vmemmap compatible with memmap_on_memory

Message ID 20220620110616.12056-1-songmuchun@bytedance.com (mailing list archive)
Headers show
Series make hugetlb_optimize_vmemmap compatible with memmap_on_memory | expand

Message

Muchun Song June 20, 2022, 11:06 a.m. UTC
This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory
and is based on mm-stable.  The reason refers to the patch 2's commit log.

v5:
 - Replace enum to defines per David.
 - Walk vmemmap page tables to avoid false-positive.

v4:
 - Fix compiling error when CONFIG_MEMORY_HOTPLUG is disabled reported by kernel test robot.
 - Fix a bug when memory_block_size_bytes() is not equal to section size.

v3:
 - Switch complicated enumeration magic (David).
 - Introduce PageVmemmapSelfHosted to make both parameters compatible (David and Oscar).

v2:
 - Fix compile error when !CONFIG_ZONE_DEVICE reported by kernel test robot.

Muchun Song (2):
  mm: memory_hotplug: enumerate all supported section flags
  mm: memory_hotplug: make hugetlb_optimize_vmemmap compatible with
    memmap_on_memory

 Documentation/admin-guide/kernel-parameters.txt | 22 ++++-----
 Documentation/admin-guide/sysctl/vm.rst         |  5 +-
 include/linux/memory_hotplug.h                  |  9 ----
 include/linux/mmzone.h                          | 41 +++++++++++----
 include/linux/page-flags.h                      | 11 +++++
 mm/hugetlb_vmemmap.c                            | 66 ++++++++++++++++++++++---
 mm/memory_hotplug.c                             | 33 +++++++------
 mm/sparse.c                                     |  2 +-
 8 files changed, 132 insertions(+), 57 deletions(-)

Comments

Andrew Morton June 21, 2022, 8:53 p.m. UTC | #1
On Mon, 20 Jun 2022 19:06:14 +0800 Muchun Song <songmuchun@bytedance.com> wrote:

> This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory
> and is based on mm-stable.  The reason refers to the patch 2's commit log.
> 
> v5:
>  - Replace enum to defines per David.
>  - Walk vmemmap page tables to avoid false-positive.

I can't see this second change in the v3->v5 deltas?



From: Muchun Song <songmuchun@bytedance.com>
Subject: mm-memory_hotplug-enumerate-all-supported-section-flags-v5
Date: Mon, 20 Jun 2022 19:06:15 +0800

replace enum with defines per David
 
Link: https://lkml.kernel.org/r/20220620110616.12056-2-songmuchun@bytedance.com
Signed-off-by: Muchun Song <songmuchun@bytedance.com>
Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
---

 include/linux/mmzone.h |   13 +++++--------
 1 file changed, 5 insertions(+), 8 deletions(-)

--- a/include/linux/mmzone.h~mm-memory_hotplug-enumerate-all-supported-section-flags-v5
+++ a/include/linux/mmzone.h
@@ -1439,16 +1439,13 @@ enum {
 	SECTION_MAP_LAST_BIT,
 };
 
-enum {
-	SECTION_MARKED_PRESENT		= BIT(SECTION_MARKED_PRESENT_BIT),
-	SECTION_HAS_MEM_MAP		= BIT(SECTION_HAS_MEM_MAP_BIT),
-	SECTION_IS_ONLINE		= BIT(SECTION_IS_ONLINE_BIT),
-	SECTION_IS_EARLY		= BIT(SECTION_IS_EARLY_BIT),
+#define SECTION_MARKED_PRESENT		BIT(SECTION_MARKED_PRESENT_BIT)
+#define SECTION_HAS_MEM_MAP		BIT(SECTION_HAS_MEM_MAP_BIT)
+#define SECTION_IS_ONLINE		BIT(SECTION_IS_ONLINE_BIT)
+#define SECTION_IS_EARLY		BIT(SECTION_IS_EARLY_BIT)
 #ifdef CONFIG_ZONE_DEVICE
-	SECTION_TAINT_ZONE_DEVICE	= BIT(SECTION_TAINT_ZONE_DEVICE_BIT),
+#define SECTION_TAINT_ZONE_DEVICE	BIT(SECTION_TAINT_ZONE_DEVICE_BIT)
 #endif
-};
-
 #define SECTION_MAP_MASK		(~(BIT(SECTION_MAP_LAST_BIT) - 1))
 #define SECTION_NID_SHIFT		SECTION_MAP_LAST_BIT
Muchun Song June 22, 2022, 3:31 a.m. UTC | #2
On Tue, Jun 21, 2022 at 01:53:13PM -0700, Andrew Morton wrote:
> On Mon, 20 Jun 2022 19:06:14 +0800 Muchun Song <songmuchun@bytedance.com> wrote:
> 
> > This series makes hugetlb_optimize_vmemmap compatible with memmap_on_memory
> > and is based on mm-stable.  The reason refers to the patch 2's commit log.
> > 
> > v5:
> >  - Replace enum to defines per David.
> >  - Walk vmemmap page tables to avoid false-positive.
> 
> I can't see this second change in the v3->v5 deltas? 
> 

My changlog is not clear, Let me clarify it here.

v3: Drop a section flag SECTION_CANNOT_OPTIMIZE_VMEMMAP and introduce a page
    flag PageVmemmapSelfHosted to make both parameters compatible.
v4: Fix compiling error when !CONFIG_MEMORY_HOTPLUG and a bug when memory block
    spans multiple sections.
v5: Fix a bug which PageVmemmapSelfHosted() check can be false-positive.

Thanks.

> From: Muchun Song <songmuchun@bytedance.com>
> Subject: mm-memory_hotplug-enumerate-all-supported-section-flags-v5
> Date: Mon, 20 Jun 2022 19:06:15 +0800
> 
> replace enum with defines per David
>  
> Link: https://lkml.kernel.org/r/20220620110616.12056-2-songmuchun@bytedance.com
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  include/linux/mmzone.h |   13 +++++--------
>  1 file changed, 5 insertions(+), 8 deletions(-)
> 
> --- a/include/linux/mmzone.h~mm-memory_hotplug-enumerate-all-supported-section-flags-v5
> +++ a/include/linux/mmzone.h
> @@ -1439,16 +1439,13 @@ enum {
>  	SECTION_MAP_LAST_BIT,
>  };
>  
> -enum {
> -	SECTION_MARKED_PRESENT		= BIT(SECTION_MARKED_PRESENT_BIT),
> -	SECTION_HAS_MEM_MAP		= BIT(SECTION_HAS_MEM_MAP_BIT),
> -	SECTION_IS_ONLINE		= BIT(SECTION_IS_ONLINE_BIT),
> -	SECTION_IS_EARLY		= BIT(SECTION_IS_EARLY_BIT),
> +#define SECTION_MARKED_PRESENT		BIT(SECTION_MARKED_PRESENT_BIT)
> +#define SECTION_HAS_MEM_MAP		BIT(SECTION_HAS_MEM_MAP_BIT)
> +#define SECTION_IS_ONLINE		BIT(SECTION_IS_ONLINE_BIT)
> +#define SECTION_IS_EARLY		BIT(SECTION_IS_EARLY_BIT)
>  #ifdef CONFIG_ZONE_DEVICE
> -	SECTION_TAINT_ZONE_DEVICE	= BIT(SECTION_TAINT_ZONE_DEVICE_BIT),
> +#define SECTION_TAINT_ZONE_DEVICE	BIT(SECTION_TAINT_ZONE_DEVICE_BIT)
>  #endif
> -};
> -
>  #define SECTION_MAP_MASK		(~(BIT(SECTION_MAP_LAST_BIT) - 1))
>  #define SECTION_NID_SHIFT		SECTION_MAP_LAST_BIT
>  
> _
> 
> 
> 
> 
> From: Muchun Song <songmuchun@bytedance.com>
> Subject: mm-memory_hotplug-make-hugetlb_optimize_vmemmap-compatible-with-memmap_on_memory-v5
> Date: Mon, 20 Jun 2022 19:06:16 +0800
> 
> walk vmemmap page tables to avoid false-positive
> 
> Link: https://lkml.kernel.org/r/20220620110616.12056-3-songmuchun@bytedance.com
> Signed-off-by: Muchun Song <songmuchun@bytedance.com>
> Co-developed-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Oscar Salvador <osalvador@suse.de>
> Signed-off-by: Andrew Morton <akpm@linux-foundation.org>
> ---
> 
>  mm/hugetlb_vmemmap.c |   69 ++++++++++++++++++++++++++---------------
>  1 file changed, 44 insertions(+), 25 deletions(-)
> 
> --- a/mm/hugetlb_vmemmap.c~mm-memory_hotplug-make-hugetlb_optimize_vmemmap-compatible-with-memmap_on_memory-v5
> +++ a/mm/hugetlb_vmemmap.c
> @@ -10,6 +10,7 @@
>   */
>  #define pr_fmt(fmt)	"HugeTLB: " fmt
>  
> +#include <linux/memory.h>
>  #include "hugetlb_vmemmap.h"
>  
>  /*
> @@ -99,34 +100,52 @@ int hugetlb_vmemmap_alloc(struct hstate
>  static unsigned int vmemmap_optimizable_pages(struct hstate *h,
>  					      struct page *head)
>  {
> -	struct mem_section *ms;
> -	struct page *vmemmap_page;
> -	unsigned long pfn = page_to_pfn(head);
> -
>  	if (READ_ONCE(vmemmap_optimize_mode) == VMEMMAP_OPTIMIZE_OFF)
>  		return 0;
>  
> -	ms = __pfn_to_section(pfn);
> -	vmemmap_page = sparse_decode_mem_map(ms->section_mem_map,
> -					     pfn_to_section_nr(pfn));
> -	/*
> -	 * Only the vmemmap pages' vmemmap may be marked as VmemmapSelfHosted.
> -	 *
> -	 * Due to HugeTLB alignment requirements, and the vmemmap pages being
> -	 * at the start of the hotplugged memory region. Checking any vmemmap
> -	 * page's vmemmap is fine.
> -	 *
> -	 * [      hotplugged memory     ]
> -	 * [ vmemmap ][  usable memory  ]
> -	 *   ^   |      |            |
> -	 *   +---+      |            |
> -	 *     ^        |            |
> -	 *     +--------+            |
> -	 *         ^                 |
> -	 *         +-----------------+
> -	 */
> -	if (PageVmemmapSelfHosted(vmemmap_page))
> -		return 0;
> +	if (IS_ENABLED(CONFIG_MEMORY_HOTPLUG)) {
> +		pmd_t *pmdp, pmd;
> +		struct page *vmemmap_page;
> +		unsigned long vaddr = (unsigned long)head;
> +
> +		/*
> +		 * Only the vmemmap page's vmemmap page can be self-hosted.
> +		 * Walking the page tables to find the backing page of the
> +		 * vmemmap page.
> +		 */
> +		pmdp = pmd_off_k(vaddr);
> +		/*
> +		 * The READ_ONCE() is used to stabilize *pmdp in a register or
> +		 * on the stack so that it will stop changing under the code.
> +		 * The only concurrent operation where it can be changed is
> +		 * split_vmemmap_huge_pmd() (*pmdp will be stable after this
> +		 * operation).
> +		 */
> +		pmd = READ_ONCE(*pmdp);
> +		if (pmd_leaf(pmd))
> +			vmemmap_page = pmd_page(pmd) + pte_index(vaddr);
> +		else
> +			vmemmap_page = pte_page(*pte_offset_kernel(pmdp, vaddr));
> +		/*
> +		 * Due to HugeTLB alignment requirements and the vmemmap pages
> +		 * being at the start of the hotplugged memory region in
> +		 * memory_hotplug.memmap_on_memory case. Checking any vmemmap
> +		 * page's vmemmap page if it is marked as VmemmapSelfHosted is
> +		 * sufficient.
> +		 *
> +		 * [                  hotplugged memory                  ]
> +		 * [        section        ][...][        section        ]
> +		 * [ vmemmap ][              usable memory               ]
> +		 *   ^   |     |                                        |
> +		 *   +---+     |                                        |
> +		 *     ^       |                                        |
> +		 *     +-------+                                        |
> +		 *          ^                                           |
> +		 *          +-------------------------------------------+
> +		 */
> +		if (PageVmemmapSelfHosted(vmemmap_page))
> +			return 0;
> +	}
>  
>  	return hugetlb_optimize_vmemmap_pages(h);
>  }
> _
> 
>