diff mbox series

[v1] memblock: Initialized the memory of memblock.reserve to the MIGRATE_MOVABL

Message ID 20240925110235.3157-1-suhua1@kingsoft.com (mailing list archive)
State New
Headers show
Series [v1] memblock: Initialized the memory of memblock.reserve to the MIGRATE_MOVABL | expand

Commit Message

Hua Su Sept. 25, 2024, 11:02 a.m. UTC
After sparse_init function requests memory for struct page in memblock and
adds it to memblock.reserved, this memory area is present in both
memblock.memory and memblock.reserved.

When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
is called during the initialization of the free area of the zone, this
function calls for_each_mem_pfn_range to initialize all memblock.memory,
excluding memory that is also placed in memblock.reserved, such as the
struct page metadata that describes the page, 1TB memory is about 16GB,
and generally this part of reserved memory occupies more than 90% of the
total reserved memory of the system. So all memory in memblock.memory is
set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
freed pages are placed on buddy's MIGRATE_MOVABL list for use.

When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
is initialized in memmap_init. The subsequent free_low_memory_core_early
initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
free_low_memory_core_early and deferred_init_memmap. As a result, when
hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
MIGRATE_MOVABL.

Large amount of UNMOVABL memory is not conducive to defragmentation, so
the reserved memory is also set to MIGRATE_MOVABLE in the
free_low_memory_core_early phase following the alignment of
pageblock_nr_pages.

Eg:
echo 500000 > /proc/sys/vm/nr_hugepages
cat /proc/pagetypeinfo

before:
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
…
Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
Unmovable ≈ 15GB

after:
Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
…
Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0

Signed-off-by: suhua <suhua1@kingsoft.com>
---
 mm/mm_init.c | 6 ++++++
 1 file changed, 6 insertions(+)

Comments

Hua Su Sept. 27, 2024, 8:28 a.m. UTC | #1
The main purpose of this patch is to unify the migration type settings for
memblock.reserve to MIGRATE_MOVABL, both with and without
CONFIG_DEFERRED_STRUCT_PAGE_INIT.

Thanks
suhua

suhua <suhua.tanke@gmail.com> 于2024年9月25日周三 19:02写道:

> After sparse_init function requests memory for struct page in memblock and
> adds it to memblock.reserved, this memory area is present in both
> memblock.memory and memblock.reserved.
>
> When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> is called during the initialization of the free area of the zone, this
> function calls for_each_mem_pfn_range to initialize all memblock.memory,
> excluding memory that is also placed in memblock.reserved, such as the
> struct page metadata that describes the page, 1TB memory is about 16GB,
> and generally this part of reserved memory occupies more than 90% of the
> total reserved memory of the system. So all memory in memblock.memory is
> set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> freed pages are placed on buddy's MIGRATE_MOVABL list for use.
>
> When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> is initialized in memmap_init. The subsequent free_low_memory_core_early
> initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> free_low_memory_core_early and deferred_init_memmap. As a result, when
> hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> MIGRATE_MOVABL.
>
> Large amount of UNMOVABL memory is not conducive to defragmentation, so
> the reserved memory is also set to MIGRATE_MOVABLE in the
> free_low_memory_core_early phase following the alignment of
> pageblock_nr_pages.
>
> Eg:
> echo 500000 > /proc/sys/vm/nr_hugepages
> cat /proc/pagetypeinfo
>
> before:
> Free pages count per migrate type at order       0      1      2      3
>   4      5      6      7      8      9     10
> …
> Node    0, zone   Normal, type    Unmovable     51      2      1     28
>  53     35     35     43     40     69   3852
> Node    0, zone   Normal, type      Movable   6485   4610    666    202
> 200    185    208     87     54      2    240
> Node    0, zone   Normal, type  Reclaimable      2      2      1     23
>  13      1      2      1      0      1      0
> Node    0, zone   Normal, type   HighAtomic      0      0      0      0
>   0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0
>   0      0      0      0      0      0      0
> Unmovable ≈ 15GB
>
> after:
> Free pages count per migrate type at order       0      1      2      3
>   4      5      6      7      8      9     10
> …
> Node    0, zone   Normal, type    Unmovable      0      1      1      0
>   0      0      0      1      1      1      0
> Node    0, zone   Normal, type      Movable   1563   4107   1119    189
> 256    368    286    132    109      4   3841
> Node    0, zone   Normal, type  Reclaimable      2      2      1     23
>  13      1      2      1      0      1      0
> Node    0, zone   Normal, type   HighAtomic      0      0      0      0
>   0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0
>   0      0      0      0      0      0      0
>
> Signed-off-by: suhua <suhua1@kingsoft.com>
> ---
>  mm/mm_init.c | 6 ++++++
>  1 file changed, 6 insertions(+)
>
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 4ba5607aaf19..e0190e3f8f26 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned
> long pfn, int nid)
>                 if (zone_spans_pfn(zone, pfn))
>                         break;
>         }
> +
> +       if (pageblock_aligned(pfn)) {
> +               set_pageblock_migratetype(pfn_to_page(pfn),
> MIGRATE_MOVABLE);
> +               cond_resched();
> +       }
> +
>         __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
>  }
>  #else
> --
> 2.34.1
>
>
Mike Rapoport Sept. 29, 2024, 9:15 a.m. UTC | #2
On Wed, Sep 25, 2024 at 07:02:35PM +0800, suhua wrote:
> After sparse_init function requests memory for struct page in memblock and
> adds it to memblock.reserved, this memory area is present in both
> memblock.memory and memblock.reserved.
> 
> When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> is called during the initialization of the free area of the zone, this
> function calls for_each_mem_pfn_range to initialize all memblock.memory,
> excluding memory that is also placed in memblock.reserved, such as the
> struct page metadata that describes the page, 1TB memory is about 16GB,
> and generally this part of reserved memory occupies more than 90% of the
> total reserved memory of the system. So all memory in memblock.memory is
> set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> freed pages are placed on buddy's MIGRATE_MOVABL list for use.

Please make sure you spell MIGRATE_MOVABLE and MIGRATE_UNMOVABLE correctly.
 
> When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> is initialized in memmap_init. The subsequent free_low_memory_core_early
> initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> free_low_memory_core_early and deferred_init_memmap. As a result, when
> hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> MIGRATE_MOVABL.
> 
> Large amount of UNMOVABL memory is not conducive to defragmentation, so
> the reserved memory is also set to MIGRATE_MOVABLE in the
> free_low_memory_core_early phase following the alignment of
> pageblock_nr_pages.
> 
> Eg:
> echo 500000 > /proc/sys/vm/nr_hugepages
> cat /proc/pagetypeinfo
> 
> before:
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> …
> Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
> Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
> Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> Unmovable ≈ 15GB
> 
> after:
> Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> …
> Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
> Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
> Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> 
> Signed-off-by: suhua <suhua1@kingsoft.com>
> ---
>  mm/mm_init.c | 6 ++++++
>  1 file changed, 6 insertions(+)
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 4ba5607aaf19..e0190e3f8f26 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid)
>  		if (zone_spans_pfn(zone, pfn))
>  			break;
>  	}
> +
> +	if (pageblock_aligned(pfn)) {
> +		set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
> +		cond_resched();
> +	}

If you are trying to make initialization of pageblock migrate type
consistent with or without CONFIG_DEFERRED_STRUCT_PAGE_INIT, move setting
of migrate type from deferred_free_pages() to deferred_init_pages().

> +
>  	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
>  }
>  #else
> -- 
> 2.34.1
>
Hua Su Oct. 12, 2024, 3:55 a.m. UTC | #3
Hi Mike

Thanks for your advice and sorry for taking so long to reply.

I looked at the logic again. deferred_init_pages is currently used to
handle all (memory &&! reserved) area memblock,and put that memory in
buddy.
Change it to also handle reserved memory may involve more code
changes. I wonder if I can change the commit message: This patch is
mainly to
make the migration type to MIGRATE_MOVABLE when the reserve type page
is initialized, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT
is set or not?

When not set CONFIG_DEFERRED_STRUCT_PAGE_INIT, initializes memblock of
reserve type to MIGRATE_MOVABLE by default at memmap_init initializes
memory.

Sincerely yours,
Su


Mike Rapoport <rppt@kernel.org> 于2024年9月29日周日 17:18写道:
>
> On Wed, Sep 25, 2024 at 07:02:35PM +0800, suhua wrote:
> > After sparse_init function requests memory for struct page in memblock and
> > adds it to memblock.reserved, this memory area is present in both
> > memblock.memory and memblock.reserved.
> >
> > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> > is called during the initialization of the free area of the zone, this
> > function calls for_each_mem_pfn_range to initialize all memblock.memory,
> > excluding memory that is also placed in memblock.reserved, such as the
> > struct page metadata that describes the page, 1TB memory is about 16GB,
> > and generally this part of reserved memory occupies more than 90% of the
> > total reserved memory of the system. So all memory in memblock.memory is
> > set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> > For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> > freed pages are placed on buddy's MIGRATE_MOVABL list for use.
>
> Please make sure you spell MIGRATE_MOVABLE and MIGRATE_UNMOVABLE correctly.
>
> > When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> > is initialized in memmap_init. The subsequent free_low_memory_core_early
> > initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> > memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> > free_low_memory_core_early and deferred_init_memmap. As a result, when
> > hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> > will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> > with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> > to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> > page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> > MIGRATE_MOVABL.
> >
> > Large amount of UNMOVABL memory is not conducive to defragmentation, so
> > the reserved memory is also set to MIGRATE_MOVABLE in the
> > free_low_memory_core_early phase following the alignment of
> > pageblock_nr_pages.
> >
> > Eg:
> > echo 500000 > /proc/sys/vm/nr_hugepages
> > cat /proc/pagetypeinfo
> >
> > before:
> > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > …
> > Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
> > Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
> > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > Unmovable ≈ 15GB
> >
> > after:
> > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > …
> > Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
> > Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
> > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> >
> > Signed-off-by: suhua <suhua1@kingsoft.com>
> > ---
> >  mm/mm_init.c | 6 ++++++
> >  1 file changed, 6 insertions(+)
> >
> > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > index 4ba5607aaf19..e0190e3f8f26 100644
> > --- a/mm/mm_init.c
> > +++ b/mm/mm_init.c
> > @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid)
> >               if (zone_spans_pfn(zone, pfn))
> >                       break;
> >       }
> > +
> > +     if (pageblock_aligned(pfn)) {
> > +             set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
> > +             cond_resched();
> > +     }
>
> If you are trying to make initialization of pageblock migrate type
> consistent with or without CONFIG_DEFERRED_STRUCT_PAGE_INIT, move setting
> of migrate type from deferred_free_pages() to deferred_init_pages().
>
> > +
> >       __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
> >  }
> >  #else
> > --
> > 2.34.1
> >
>
> --
> Sincerely yours,
> Mike.
Mike Rapoport Oct. 16, 2024, 11:57 a.m. UTC | #4
Hi,

On Sat, Oct 12, 2024 at 11:55:31AM +0800, Su Hua wrote:
> Hi Mike
> 
> Thanks for your advice and sorry for taking so long to reply.

Please don't top-post on the Linux kernel mailing lists
 
> I looked at the logic again. deferred_init_pages is currently used to
> handle all (memory &&! reserved) area memblock,and put that memory in
> buddy.
> Change it to also handle reserved memory may involve more code
> changes. I wonder if I can change the commit message: This patch is
> mainly to
> make the migration type to MIGRATE_MOVABLE when the reserve type page
> is initialized, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT
> is set or not?
> 
> When not set CONFIG_DEFERRED_STRUCT_PAGE_INIT, initializes memblock of
> reserve type to MIGRATE_MOVABLE by default at memmap_init initializes
> memory.

This should be more clearly emphasized in the commit message.
 
> Sincerely yours,
> Su
> 
> 
> Mike Rapoport <rppt@kernel.org> 于2024年9月29日周日 17:18写道:
> >
> > On Wed, Sep 25, 2024 at 07:02:35PM +0800, suhua wrote:
> > > After sparse_init function requests memory for struct page in memblock and
> > > adds it to memblock.reserved, this memory area is present in both
> > > memblock.memory and memblock.reserved.
> > >
> > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> > > is called during the initialization of the free area of the zone, this
> > > function calls for_each_mem_pfn_range to initialize all memblock.memory,
> > > excluding memory that is also placed in memblock.reserved, such as the
> > > struct page metadata that describes the page, 1TB memory is about 16GB,
> > > and generally this part of reserved memory occupies more than 90% of the
> > > total reserved memory of the system. So all memory in memblock.memory is
> > > set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> > > For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> > > freed pages are placed on buddy's MIGRATE_MOVABL list for use.
> >
> > Please make sure you spell MIGRATE_MOVABLE and MIGRATE_UNMOVABLE correctly.
> >
> > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> > > is initialized in memmap_init. The subsequent free_low_memory_core_early
> > > initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> > > memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> > > free_low_memory_core_early and deferred_init_memmap. As a result, when
> > > hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> > > will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> > > with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> > > to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> > > page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> > > MIGRATE_MOVABL.
> > >
> > > Large amount of UNMOVABL memory is not conducive to defragmentation, so
> > > the reserved memory is also set to MIGRATE_MOVABLE in the
> > > free_low_memory_core_early phase following the alignment of
> > > pageblock_nr_pages.
> > >
> > > Eg:
> > > echo 500000 > /proc/sys/vm/nr_hugepages
> > > cat /proc/pagetypeinfo
> > >
> > > before:
> > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > …
> > > Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
> > > Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
> > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > > Unmovable ≈ 15GB
> > >
> > > after:
> > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > …
> > > Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
> > > Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
> > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > >
> > > Signed-off-by: suhua <suhua1@kingsoft.com>
> > > ---
> > >  mm/mm_init.c | 6 ++++++
> > >  1 file changed, 6 insertions(+)
> > >
> > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > index 4ba5607aaf19..e0190e3f8f26 100644
> > > --- a/mm/mm_init.c
> > > +++ b/mm/mm_init.c
> > > @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid)
> > >               if (zone_spans_pfn(zone, pfn))
> > >                       break;
> > >       }
> > > +
> > > +     if (pageblock_aligned(pfn)) {
> > > +             set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
> > > +             cond_resched();

No need to call cond_resched() here

> > > +     }
> > > +
> > >       __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
> > >  }
> > >  #else
> > > --
> > > 2.34.1
> > >
> >
> > --
> > Sincerely yours,
> > Mike.
Hua Su Oct. 17, 2024, 2:58 a.m. UTC | #5
> Hi,
>
> On Sat, Oct 12, 2024 at 11:55:31AM +0800, Su Hua wrote:
> > Hi Mike
> >
> > Thanks for your advice and sorry for taking so long to reply.
>
> Please don't top-post on the Linux kernel mailing lists

Thank you for the correction.

> > I looked at the logic again. deferred_init_pages is currently used to
> > handle all (memory &&! reserved) area memblock,and put that memory in
> > buddy.
> > Change it to also handle reserved memory may involve more code
> > changes. I wonder if I can change the commit message: This patch is
> > mainly to
> > make the migration type to MIGRATE_MOVABLE when the reserve type page
> > is initialized, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > is set or not?
> >
> > When not set CONFIG_DEFERRED_STRUCT_PAGE_INIT, initializes memblock of
> > reserve type to MIGRATE_MOVABLE by default at memmap_init initializes
> > memory.
>
> This should be more clearly emphasized in the commit message.

Ok,I'll update the commit message.

> > Sincerely yours,
> > Su
> >
> >
> > Mike Rapoport <rppt@kernel.org> 于2024年9月29日周日 17:18写道:
> > >
> > > On Wed, Sep 25, 2024 at 07:02:35PM +0800, suhua wrote:
> > > > After sparse_init function requests memory for struct page in memblock and
> > > > adds it to memblock.reserved, this memory area is present in both
> > > > memblock.memory and memblock.reserved.
> > > >
> > > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> > > > is called during the initialization of the free area of the zone, this
> > > > function calls for_each_mem_pfn_range to initialize all memblock.memory,
> > > > excluding memory that is also placed in memblock.reserved, such as the
> > > > struct page metadata that describes the page, 1TB memory is about 16GB,
> > > > and generally this part of reserved memory occupies more than 90% of the
> > > > total reserved memory of the system. So all memory in memblock.memory is
> > > > set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> > > > For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> > > > freed pages are placed on buddy's MIGRATE_MOVABL list for use.
> > >
> > > Please make sure you spell MIGRATE_MOVABLE and MIGRATE_UNMOVABLE correctly.
> > >
> > > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> > > > is initialized in memmap_init. The subsequent free_low_memory_core_early
> > > > initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> > > > memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> > > > free_low_memory_core_early and deferred_init_memmap. As a result, when
> > > > hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> > > > will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> > > > with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> > > > to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> > > > page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> > > > MIGRATE_MOVABL.
> > > >
> > > > Large amount of UNMOVABL memory is not conducive to defragmentation, so
> > > > the reserved memory is also set to MIGRATE_MOVABLE in the
> > > > free_low_memory_core_early phase following the alignment of
> > > > pageblock_nr_pages.
> > > >
> > > > Eg:
> > > > echo 500000 > /proc/sys/vm/nr_hugepages
> > > > cat /proc/pagetypeinfo
> > > >
> > > > before:
> > > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > > …
> > > > Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
> > > > Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
> > > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > > > Unmovable ≈ 15GB
> > > >
> > > > after:
> > > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > > …
> > > > Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
> > > > Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
> > > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > > >
> > > > Signed-off-by: suhua <suhua1@kingsoft.com>
> > > > ---
> > > >  mm/mm_init.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > index 4ba5607aaf19..e0190e3f8f26 100644
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid)
> > > >               if (zone_spans_pfn(zone, pfn))
> > > >                       break;
> > > >       }
> > > > +
> > > > +     if (pageblock_aligned(pfn)) {
> > > > +             set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
> > > > +             cond_resched();
>
> No need to call cond_resched() here

Alright, there isn't much reserved memory, I'll remove this function.

> > > > +     }
>
> > > > +
> > > >       __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
> > > >  }
> > > >  #else
> > > > --
> > > > 2.34.1
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.

Sincerely yours,
Su

Mike Rapoport <rppt@kernel.org> 于2024年10月16日周三 20:01写道:
>
> Hi,
>
> On Sat, Oct 12, 2024 at 11:55:31AM +0800, Su Hua wrote:
> > Hi Mike
> >
> > Thanks for your advice and sorry for taking so long to reply.
>
> Please don't top-post on the Linux kernel mailing lists
>
> > I looked at the logic again. deferred_init_pages is currently used to
> > handle all (memory &&! reserved) area memblock,and put that memory in
> > buddy.
> > Change it to also handle reserved memory may involve more code
> > changes. I wonder if I can change the commit message: This patch is
> > mainly to
> > make the migration type to MIGRATE_MOVABLE when the reserve type page
> > is initialized, regardless of whether CONFIG_DEFERRED_STRUCT_PAGE_INIT
> > is set or not?
> >
> > When not set CONFIG_DEFERRED_STRUCT_PAGE_INIT, initializes memblock of
> > reserve type to MIGRATE_MOVABLE by default at memmap_init initializes
> > memory.
>
> This should be more clearly emphasized in the commit message.
>
> > Sincerely yours,
> > Su
> >
> >
> > Mike Rapoport <rppt@kernel.org> 于2024年9月29日周日 17:18写道:
> > >
> > > On Wed, Sep 25, 2024 at 07:02:35PM +0800, suhua wrote:
> > > > After sparse_init function requests memory for struct page in memblock and
> > > > adds it to memblock.reserved, this memory area is present in both
> > > > memblock.memory and memblock.reserved.
> > > >
> > > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT is not set. The memmap_init function
> > > > is called during the initialization of the free area of the zone, this
> > > > function calls for_each_mem_pfn_range to initialize all memblock.memory,
> > > > excluding memory that is also placed in memblock.reserved, such as the
> > > > struct page metadata that describes the page, 1TB memory is about 16GB,
> > > > and generally this part of reserved memory occupies more than 90% of the
> > > > total reserved memory of the system. So all memory in memblock.memory is
> > > > set to MIGRATE_MOVABLE according to the alignment of pageblock_nr_pages.
> > > > For example, if hugetlb_optimize_vmemmap=1, huge pages are allocated, the
> > > > freed pages are placed on buddy's MIGRATE_MOVABL list for use.
> > >
> > > Please make sure you spell MIGRATE_MOVABLE and MIGRATE_UNMOVABLE correctly.
> > >
> > > > When CONFIG_DEFERRED_STRUCT_PAGE_INIT=y, only the first_deferred_pfn range
> > > > is initialized in memmap_init. The subsequent free_low_memory_core_early
> > > > initializes all memblock.reserved memory but not MIGRATE_MOVABL. All
> > > > memblock.memory is set to MIGRATE_MOVABL when it is placed in buddy via
> > > > free_low_memory_core_early and deferred_init_memmap. As a result, when
> > > > hugetlb_optimize_vmemmap=1 and huge pages are allocated, the freed pages
> > > > will be placed on buddy's MIGRATE_UNMOVABL list (For example, on machines
> > > > with 1TB of memory, alloc 2MB huge page size of 1000GB frees up about 15GB
> > > > to MIGRATE_UNMOVABL). Since the huge page alloc requires a MIGRATE_MOVABL
> > > > page, a fallback is performed to alloc memory from MIGRATE_UNMOVABL for
> > > > MIGRATE_MOVABL.
> > > >
> > > > Large amount of UNMOVABL memory is not conducive to defragmentation, so
> > > > the reserved memory is also set to MIGRATE_MOVABLE in the
> > > > free_low_memory_core_early phase following the alignment of
> > > > pageblock_nr_pages.
> > > >
> > > > Eg:
> > > > echo 500000 > /proc/sys/vm/nr_hugepages
> > > > cat /proc/pagetypeinfo
> > > >
> > > > before:
> > > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > > …
> > > > Node    0, zone   Normal, type    Unmovable     51      2      1     28     53     35     35     43     40     69   3852
> > > > Node    0, zone   Normal, type      Movable   6485   4610    666    202    200    185    208     87     54      2    240
> > > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > > > Unmovable ≈ 15GB
> > > >
> > > > after:
> > > > Free pages count per migrate type at order       0      1      2      3      4      5      6      7      8      9     10
> > > > …
> > > > Node    0, zone   Normal, type    Unmovable      0      1      1      0      0      0      0      1      1      1      0
> > > > Node    0, zone   Normal, type      Movable   1563   4107   1119    189    256    368    286    132    109      4   3841
> > > > Node    0, zone   Normal, type  Reclaimable      2      2      1     23     13      1      2      1      0      1      0
> > > > Node    0, zone   Normal, type   HighAtomic      0      0      0      0      0      0      0      0      0      0      0
> > > > Node    0, zone   Normal, type      Isolate      0      0      0      0      0      0      0      0      0      0      0
> > > >
> > > > Signed-off-by: suhua <suhua1@kingsoft.com>
> > > > ---
> > > >  mm/mm_init.c | 6 ++++++
> > > >  1 file changed, 6 insertions(+)
> > > >
> > > > diff --git a/mm/mm_init.c b/mm/mm_init.c
> > > > index 4ba5607aaf19..e0190e3f8f26 100644
> > > > --- a/mm/mm_init.c
> > > > +++ b/mm/mm_init.c
> > > > @@ -722,6 +722,12 @@ static void __meminit init_reserved_page(unsigned long pfn, int nid)
> > > >               if (zone_spans_pfn(zone, pfn))
> > > >                       break;
> > > >       }
> > > > +
> > > > +     if (pageblock_aligned(pfn)) {
> > > > +             set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
> > > > +             cond_resched();
>
> No need to call cond_resched() here
>
> > > > +     }
> > > > +
> > > >       __init_single_page(pfn_to_page(pfn), pfn, zid, nid);
> > > >  }
> > > >  #else
> > > > --
> > > > 2.34.1
> > > >
> > >
> > > --
> > > Sincerely yours,
> > > Mike.
>
> --
> Sincerely yours,
> Mike.
diff mbox series

Patch

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 4ba5607aaf19..e0190e3f8f26 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -722,6 +722,12 @@  static void __meminit init_reserved_page(unsigned long pfn, int nid)
 		if (zone_spans_pfn(zone, pfn))
 			break;
 	}
+
+	if (pageblock_aligned(pfn)) {
+		set_pageblock_migratetype(pfn_to_page(pfn), MIGRATE_MOVABLE);
+		cond_resched();
+	}
+
 	__init_single_page(pfn_to_page(pfn), pfn, zid, nid);
 }
 #else