Message ID | 20240520231555.395979-5-echanude@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | arm64: init: override deferred_page_init_max_threads | expand |
On 05/20/24 at 07:15pm, Eric Chanudet wrote: > This was the behavior prior to making the function arch-specific with > commit ecd096506922 ("mm: make deferred init's max threads > arch-specific") > > Architectures can override the generic implementation that uses only one > CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 > platforms shows faster deferred_init_memmap completions: > > | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | > | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | > | | 8cpus | 8cpus | 8cpus | 32cpus | > |---------|-------------|--------------|-----------------|--------------| > | threads | ms (%) | ms (%) | ms (%) | ms (%) | > |---------|-------------|--------------|-----------------|--------------| > | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | > | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > --- > arch/arm64/mm/init.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 9b5ab6818f7f..71f5188fe63d 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) > free_area_init(max_zone_pfns); > } > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) > +{ > + return max_t(int, cpumask_weight(node_cpumask), 1); > +} > +#endif LGTM, Reviewed-by: Baoquan He <bhe@redhat.com> > + > int pfn_is_map_memory(unsigned long pfn) > { > phys_addr_t addr = PFN_PHYS(pfn); > -- > 2.44.0 >
(added powerpc folks) On Mon, May 20, 2024 at 07:15:59PM -0400, Eric Chanudet wrote: > This was the behavior prior to making the function arch-specific with > commit ecd096506922 ("mm: make deferred init's max threads > arch-specific") > > Architectures can override the generic implementation that uses only one > CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 > platforms shows faster deferred_init_memmap completions: > > | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | > | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | > | | 8cpus | 8cpus | 8cpus | 32cpus | > |---------|-------------|--------------|-----------------|--------------| > | threads | ms (%) | ms (%) | ms (%) | ms (%) | > |---------|-------------|--------------|-----------------|--------------| > | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | > | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > --- > arch/arm64/mm/init.c | 7 +++++++ > 1 file changed, 7 insertions(+) > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > index 9b5ab6818f7f..71f5188fe63d 100644 > --- a/arch/arm64/mm/init.c > +++ b/arch/arm64/mm/init.c > @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) > free_area_init(max_zone_pfns); > } > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) > +{ > + return max_t(int, cpumask_weight(node_cpumask), 1); > +} > +#endif > + Maybe we should make this default and let architectures that want a single thread override deferred_page_init_max_threads() to return 1? > int pfn_is_map_memory(unsigned long pfn) > { > phys_addr_t addr = PFN_PHYS(pfn); > -- > 2.44.0 >
On Tue, May 21, 2024 at 07:10:07PM +0300, Mike Rapoport wrote: > (added powerpc folks) > > On Mon, May 20, 2024 at 07:15:59PM -0400, Eric Chanudet wrote: > > This was the behavior prior to making the function arch-specific with > > commit ecd096506922 ("mm: make deferred init's max threads > > arch-specific") > > > > Architectures can override the generic implementation that uses only one > > CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 > > platforms shows faster deferred_init_memmap completions: > > > > | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | > > | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | > > | | 8cpus | 8cpus | 8cpus | 32cpus | > > |---------|-------------|--------------|-----------------|--------------| > > | threads | ms (%) | ms (%) | ms (%) | ms (%) | > > |---------|-------------|--------------|-----------------|--------------| > > | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | > > | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | > > > > Signed-off-by: Eric Chanudet <echanude@redhat.com> > > --- > > arch/arm64/mm/init.c | 7 +++++++ > > 1 file changed, 7 insertions(+) > > > > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > > index 9b5ab6818f7f..71f5188fe63d 100644 > > --- a/arch/arm64/mm/init.c > > +++ b/arch/arm64/mm/init.c > > @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) > > free_area_init(max_zone_pfns); > > } > > > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > > +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) > > +{ > > + return max_t(int, cpumask_weight(node_cpumask), 1); > > +} > > +#endif > > + > > Maybe we should make this default and let architectures that want a single > thread override deferred_page_init_max_threads() to return 1? It would affect more archs than I can try this on. Currently, only x86 (with this change, arm64) return more than one thread. I'm happy to send a v2 inverting the logic if you find it preferable. Best, > > int pfn_is_map_memory(unsigned long pfn) > > { > > phys_addr_t addr = PFN_PHYS(pfn); > > -- > > 2.44.0 > > > > -- > Sincerely yours, > Mike. >
Eric Chanudet <echanude@redhat.com> writes: > On Tue, May 21, 2024 at 07:10:07PM +0300, Mike Rapoport wrote: >> (added powerpc folks) Thanks Mike. >> On Mon, May 20, 2024 at 07:15:59PM -0400, Eric Chanudet wrote: >> > This was the behavior prior to making the function arch-specific with >> > commit ecd096506922 ("mm: make deferred init's max threads >> > arch-specific") >> > >> > Architectures can override the generic implementation that uses only one >> > CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 >> > platforms shows faster deferred_init_memmap completions: >> > >> > | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | >> > | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | >> > | | 8cpus | 8cpus | 8cpus | 32cpus | >> > |---------|-------------|--------------|-----------------|--------------| >> > | threads | ms (%) | ms (%) | ms (%) | ms (%) | >> > |---------|-------------|--------------|-----------------|--------------| >> > | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | >> > | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | How did you measure this, just some printks in page_alloc_init_late() or something more sophisticated? Just so I can do some comparable measurements. >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c >> > index 9b5ab6818f7f..71f5188fe63d 100644 >> > --- a/arch/arm64/mm/init.c >> > +++ b/arch/arm64/mm/init.c >> > @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) >> > free_area_init(max_zone_pfns); >> > } >> > >> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> > +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) >> > +{ >> > + return max_t(int, cpumask_weight(node_cpumask), 1); >> > +} >> > +#endif >> > + >> >> Maybe we should make this default and let architectures that want a single >> thread override deferred_page_init_max_threads() to return 1? > > It would affect more archs than I can try this on. Currently, only x86 > (with this change, arm64) return more than one thread. I can test powerpc and we can find someone to test s390. No other arches have it enabled in their defconfig. > I'm happy to send a v2 inverting the logic if you find it preferable. That seems preferable. It's a scalability feature, it makes no sense for the default to be a single thread AFAICS. cheers
On Wed, May 22, 2024 at 11:41:07PM +1000, Michael Ellerman wrote: > Eric Chanudet <echanude@redhat.com> writes: > > On Tue, May 21, 2024 at 07:10:07PM +0300, Mike Rapoport wrote: > >> (added powerpc folks) > > Thanks Mike. > > >> On Mon, May 20, 2024 at 07:15:59PM -0400, Eric Chanudet wrote: > >> > This was the behavior prior to making the function arch-specific with > >> > commit ecd096506922 ("mm: make deferred init's max threads > >> > arch-specific") > >> > > >> > Architectures can override the generic implementation that uses only one > >> > CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 > >> > platforms shows faster deferred_init_memmap completions: > >> > > >> > | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | > >> > | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | > >> > | | 8cpus | 8cpus | 8cpus | 32cpus | > >> > |---------|-------------|--------------|-----------------|--------------| > >> > | threads | ms (%) | ms (%) | ms (%) | ms (%) | > >> > |---------|-------------|--------------|-----------------|--------------| > >> > | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | > >> > | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | > > How did you measure this, just some printks in page_alloc_init_late() or > something more sophisticated? Just so I can do some comparable measurements. I used the existing pr_info in deferred_init_memmap(). > >> > diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c > >> > index 9b5ab6818f7f..71f5188fe63d 100644 > >> > --- a/arch/arm64/mm/init.c > >> > +++ b/arch/arm64/mm/init.c > >> > @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) > >> > free_area_init(max_zone_pfns); > >> > } > >> > > >> > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > >> > +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) > >> > +{ > >> > + return max_t(int, cpumask_weight(node_cpumask), 1); > >> > +} > >> > +#endif > >> > + > >> > >> Maybe we should make this default and let architectures that want a single > >> thread override deferred_page_init_max_threads() to return 1? > > > > It would affect more archs than I can try this on. Currently, only x86 > > (with this change, arm64) return more than one thread. > > I can test powerpc and we can find someone to test s390. No other > arches have it enabled in their defconfig. Many thanks! > > I'm happy to send a v2 inverting the logic if you find it preferable. > > That seems preferable. It's a scalability feature, it makes no sense for > the default to be a single thread AFAICS. Understood, I'll respin.
diff --git a/arch/arm64/mm/init.c b/arch/arm64/mm/init.c index 9b5ab6818f7f..71f5188fe63d 100644 --- a/arch/arm64/mm/init.c +++ b/arch/arm64/mm/init.c @@ -158,6 +158,13 @@ static void __init zone_sizes_init(void) free_area_init(max_zone_pfns); } +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +int __init deferred_page_init_max_threads(const struct cpumask *node_cpumask) +{ + return max_t(int, cpumask_weight(node_cpumask), 1); +} +#endif + int pfn_is_map_memory(unsigned long pfn) { phys_addr_t addr = PFN_PHYS(pfn);
This was the behavior prior to making the function arch-specific with commit ecd096506922 ("mm: make deferred init's max threads arch-specific") Architectures can override the generic implementation that uses only one CPU. Setting DEFERRED_STRUCT_PAGE_INIT and testing on a few arm64 platforms shows faster deferred_init_memmap completions: | | x13s | SA8775p-ride | Ampere R137-P31 | Ampere HR330 | | | Metal, 32GB | VM, 36GB | VM, 58GB | Metal, 128GB | | | 8cpus | 8cpus | 8cpus | 32cpus | |---------|-------------|--------------|-----------------|--------------| | threads | ms (%) | ms (%) | ms (%) | ms (%) | |---------|-------------|--------------|-----------------|--------------| | 1 | 108 (0%) | 72 (0%) | 224 (0%) | 324 (0%) | | cpus | 24 (-77%) | 36 (-50%) | 40 (-82%) | 56 (-82%) | Signed-off-by: Eric Chanudet <echanude@redhat.com> --- arch/arm64/mm/init.c | 7 +++++++ 1 file changed, 7 insertions(+)