Message ID | 20220825063102.92307-1-lizhe.67@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [v3] page_ext: introduce boot parameter 'early_page_ext' | expand |
On Thu 25-08-22 14:31:02, lizhe.67@bytedance.com wrote: > From: Li Zhe <lizhe.67@bytedance.com> > > In 'commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init"")', > we call page_ext_init() after page_alloc_init_late() to avoid some panic > problem. It seems that we cannot track early page allocations in current > kernel even if page structure has been initialized early. > > This patch introduce a new boot parameter 'early_page_ext' to resolve this > problem. If we pass it to kernel, function page_ext_init() will be moved > up and feature 'deferred initialization of struct pages' will be disabled. will be disabled to initialize the page allocator early and prevent from the OOM mentioned above. > It can help us to catch early page allocations. This is useful especially > when we find that the free memory value is not the same right after > different kernel booting. > > Changelogs: > > v1->v2: > - use a cmd line parameter to move up function page_ext_init() instead of > using CONFIG_DEFERRED_STRUCT_PAGE_INIT > - fix oom problem[1] > > v2->v3: > - make adjustments suggested by Michal Hocko > > v1 patch: https://lore.kernel.org/lkml/Yv3r6Y1vh+6AbY4+@dhcp22.suse.cz/T/ > v2 patch: https://lore.kernel.org/lkml/20220824065058.81051-1-lizhe.67@bytedance.com/T/ > > [1]: https://lore.kernel.org/linux-mm/YwHmXLu5txij+p35@xsang-OptiPlex-9020/ the changelog is usually not part of the changelog and goes under --- > > Suggested-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com> I still have few comments below before I am going to ack. But this looks much better already. > --- > Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ > include/linux/page_ext.h | 11 +++++++++++ > init/main.c | 6 +++++- > mm/page_alloc.c | 2 ++ > mm/page_ext.c | 12 ++++++++++++ > 5 files changed, 36 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index d7f30902fda0..7b5726828ac0 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1471,6 +1471,12 @@ > Permit 'security.evm' to be updated regardless of > current integrity status. > > + early_page_ext [KNL] Boot-time early page_ext initializing option. > + This boot parameter disables the deferred initialization > + of struct page and move up function page_ext_init() in > + order to catch early page allocations. Available with > + CONFIG_PAGE_EXTENSION=y. For admins it would likely be more easier to understand something like following early_page_ext [KNL] Enforces page_ext initialization to earlier stages so cover more early boot allocations. Please note that as side effect some optimizations might be disabled to achieve that (e.g. parallelized memory initialization is disabled) so the boot process might take longer, especially on systems with a lot of memory. Available with CONFIG_PAGE_EXTENSION=y [...] > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 3dc715d7ac29..bf4f2a12d7dc 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -85,6 +85,18 @@ unsigned long page_ext_size = sizeof(struct page_ext); > > static unsigned long total_usage; > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +bool early_page_ext __meminitdata; > +#else > +bool early_page_ext __meminitdata = true; > +#endif Why should default depend on DEFERRED_STRUCT_PAGE_INIT at all. This is just confusing and I do not see how it serves a purpose. We might grow more optimizations which would prefent early page_ext init. Let's just have default false and only enforce with the parameter. This is more predictable and easier to understand.
On 25 Aug 2022 09:11:29 +0200, mhocko@suse.com wrote: >On Thu 25-08-22 14:31:02, lizhe.67@bytedance.com wrote: >> From: Li Zhe <lizhe.67@bytedance.com> >> >> In 'commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init"")', >> we call page_ext_init() after page_alloc_init_late() to avoid some panic >> problem. It seems that we cannot track early page allocations in current >> kernel even if page structure has been initialized early. >> >> This patch introduce a new boot parameter 'early_page_ext' to resolve this >> problem. If we pass it to kernel, function page_ext_init() will be moved >> up and feature 'deferred initialization of struct pages' will be disabled. > >will be disabled to initialize the page allocator early and prevent from >the OOM mentioned above. > >> It can help us to catch early page allocations. This is useful especially >> when we find that the free memory value is not the same right after >> different kernel booting. >> >> Changelogs: >> >> v1->v2: >> - use a cmd line parameter to move up function page_ext_init() instead of >> using CONFIG_DEFERRED_STRUCT_PAGE_INIT >> - fix oom problem[1] >> >> v2->v3: >> - make adjustments suggested by Michal Hocko >> >> v1 patch: https://lore.kernel.org/lkml/Yv3r6Y1vh+6AbY4+@dhcp22.suse.cz/T/ >> v2 patch: https://lore.kernel.org/lkml/20220824065058.81051-1-lizhe.67@bytedance.com/T/ >> >> [1]: https://lore.kernel.org/linux-mm/YwHmXLu5txij+p35@xsang-OptiPlex-9020/ > >the changelog is usually not part of the changelog and goes under --- Thanks for correcting my mistake! >> >> Suggested-by: Michal Hocko <mhocko@suse.com> >> Signed-off-by: Li Zhe <lizhe.67@bytedance.com> > >I still have few comments below before I am going to ack. But this looks >much better already. > >> --- >> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ >> include/linux/page_ext.h | 11 +++++++++++ >> init/main.c | 6 +++++- >> mm/page_alloc.c | 2 ++ >> mm/page_ext.c | 12 ++++++++++++ >> 5 files changed, 36 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index d7f30902fda0..7b5726828ac0 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -1471,6 +1471,12 @@ >> Permit 'security.evm' to be updated regardless of >> current integrity status. >> >> + early_page_ext [KNL] Boot-time early page_ext initializing option. >> + This boot parameter disables the deferred initialization >> + of struct page and move up function page_ext_init() in >> + order to catch early page allocations. Available with >> + CONFIG_PAGE_EXTENSION=y. > >For admins it would likely be more easier to understand something like >following > early_page_ext [KNL] Enforces page_ext initialization to earlier > stages so cover more early boot allocations. > Please note that as side effect some > optimizations might be disabled to achieve that > (e.g. parallelized memory initialization is > disabled) so the boot process might take longer, > especially on systems with a lot of memory. > Available with CONFIG_PAGE_EXTENSION=y Great, I will use this description in my v4 patch. It is much more easier for us to understand. Thanks! >[...] >> diff --git a/mm/page_ext.c b/mm/page_ext.c >> index 3dc715d7ac29..bf4f2a12d7dc 100644 >> --- a/mm/page_ext.c >> +++ b/mm/page_ext.c >> @@ -85,6 +85,18 @@ unsigned long page_ext_size = sizeof(struct page_ext); >> >> static unsigned long total_usage; >> >> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> +bool early_page_ext __meminitdata; >> +#else >> +bool early_page_ext __meminitdata = true; >> +#endif > >Why should default depend on DEFERRED_STRUCT_PAGE_INIT at all. This is >just confusing and I do not see how it serves a purpose. We might grow >more optimizations which would prefent early page_ext init. > >Let's just have default false and only enforce with the parameter. This >is more predictable and easier to understand. Yes, this is confusing. Without depending on DEFERRED_STRUCT_PAGE_INIT, the logic here will be more clear. I will remove it from my v4 patch. Thanks!
On 8/25/22 08:31, lizhe.67@bytedance.com wrote: > From: Li Zhe <lizhe.67@bytedance.com> > > In 'commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init"")', > we call page_ext_init() after page_alloc_init_late() to avoid some panic > problem. It seems that we cannot track early page allocations in current > kernel even if page structure has been initialized early. > > This patch introduce a new boot parameter 'early_page_ext' to resolve this > problem. If we pass it to kernel, function page_ext_init() will be moved > up and feature 'deferred initialization of struct pages' will be disabled. > It can help us to catch early page allocations. This is useful especially > when we find that the free memory value is not the same right after > different kernel booting. > > Changelogs: > > v1->v2: > - use a cmd line parameter to move up function page_ext_init() instead of > using CONFIG_DEFERRED_STRUCT_PAGE_INIT > - fix oom problem[1] > > v2->v3: > - make adjustments suggested by Michal Hocko > > v1 patch: https://lore.kernel.org/lkml/Yv3r6Y1vh+6AbY4+@dhcp22.suse.cz/T/ IIRC v1 failed to boot in some automatic bot test. Will this not fail with the same config/hw combination when the parameter is passed? > v2 patch: https://lore.kernel.org/lkml/20220824065058.81051-1-lizhe.67@bytedance.com/T/ > > [1]: https://lore.kernel.org/linux-mm/YwHmXLu5txij+p35@xsang-OptiPlex-9020/ > > Suggested-by: Michal Hocko <mhocko@suse.com> > Signed-off-by: Li Zhe <lizhe.67@bytedance.com> > --- > Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ > include/linux/page_ext.h | 11 +++++++++++ > init/main.c | 6 +++++- > mm/page_alloc.c | 2 ++ > mm/page_ext.c | 12 ++++++++++++ > 5 files changed, 36 insertions(+), 1 deletion(-) > > diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt > index d7f30902fda0..7b5726828ac0 100644 > --- a/Documentation/admin-guide/kernel-parameters.txt > +++ b/Documentation/admin-guide/kernel-parameters.txt > @@ -1471,6 +1471,12 @@ > Permit 'security.evm' to be updated regardless of > current integrity status. > > + early_page_ext [KNL] Boot-time early page_ext initializing option. > + This boot parameter disables the deferred initialization > + of struct page and move up function page_ext_init() in > + order to catch early page allocations. Available with > + CONFIG_PAGE_EXTENSION=y. > + > failslab= > fail_usercopy= > fail_page_alloc= > diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h > index fabb2e1e087f..68d690649234 100644 > --- a/include/linux/page_ext.h > +++ b/include/linux/page_ext.h > @@ -36,9 +36,15 @@ struct page_ext { > unsigned long flags; > }; > > +extern bool early_page_ext; > extern unsigned long page_ext_size; > extern void pgdat_page_ext_init(struct pglist_data *pgdat); > > +static inline bool early_page_ext_enable(void) > +{ > + return early_page_ext; > +} I think it should better be named early_page_ext_enabled() as it returns the status, not sets it to true? > + > #ifdef CONFIG_SPARSEMEM > static inline void page_ext_init_flatmem(void) > { > @@ -67,6 +73,11 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) > #else /* !CONFIG_PAGE_EXTENSION */ > struct page_ext; > > +static inline bool early_page_ext_enable(void) > +{ > + return false; > +} > + > static inline void pgdat_page_ext_init(struct pglist_data *pgdat) > { > } > diff --git a/init/main.c b/init/main.c > index 91642a4e69be..d95edb67a499 100644 > --- a/init/main.c > +++ b/init/main.c > @@ -849,6 +849,9 @@ static void __init mm_init(void) > pgtable_init(); > debug_objects_mem_init(); > vmalloc_init(); > + /* Should be run after vmap initialization */ > + if (early_page_ext_enable()) > + page_ext_init(); > /* Should be run before the first non-init thread is created */ > init_espfix_bsp(); > /* Should be run after espfix64 is set up. */ > @@ -1606,7 +1609,8 @@ static noinline void __init kernel_init_freeable(void) > padata_init(); > page_alloc_init_late(); > /* Initialize page ext after all struct pages are initialized. */ > - page_ext_init(); > + if (!early_page_ext_enable()) > + page_ext_init(); > > do_basic_setup(); > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index e5486d47406e..e580b197aa1e 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -482,6 +482,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn) > { > static unsigned long prev_end_pfn, nr_initialised; > > + if (early_page_ext_enable()) > + return false; > /* > * prev_end_pfn static that contains the end of previous zone > * No need to protect because called very early in boot before smp_init. > diff --git a/mm/page_ext.c b/mm/page_ext.c > index 3dc715d7ac29..bf4f2a12d7dc 100644 > --- a/mm/page_ext.c > +++ b/mm/page_ext.c > @@ -85,6 +85,18 @@ unsigned long page_ext_size = sizeof(struct page_ext); > > static unsigned long total_usage; > > +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT > +bool early_page_ext __meminitdata; > +#else > +bool early_page_ext __meminitdata = true; > +#endif > +static int __init setup_early_page_ext(char *str) > +{ > + early_page_ext = true; > + return 0; > +} > +early_param("early_page_ext", setup_early_page_ext); > + > static bool __init invoke_need_callbacks(void) > { > int i;
On 25 Aug 2022 11:36:31 +0200, vbabka@suse.cz wrote: >On 8/25/22 08:31, lizhe.67@bytedance.com wrote: >> From: Li Zhe <lizhe.67@bytedance.com> >> >> In 'commit 2f1ee0913ce5 ("Revert "mm: use early_pfn_to_nid in page_ext_init"")', >> we call page_ext_init() after page_alloc_init_late() to avoid some panic >> problem. It seems that we cannot track early page allocations in current >> kernel even if page structure has been initialized early. >> >> This patch introduce a new boot parameter 'early_page_ext' to resolve this >> problem. If we pass it to kernel, function page_ext_init() will be moved >> up and feature 'deferred initialization of struct pages' will be disabled. >> It can help us to catch early page allocations. This is useful especially >> when we find that the free memory value is not the same right after >> different kernel booting. >> >> Changelogs: >> >> v1->v2: >> - use a cmd line parameter to move up function page_ext_init() instead of >> using CONFIG_DEFERRED_STRUCT_PAGE_INIT >> - fix oom problem[1] >> >> v2->v3: >> - make adjustments suggested by Michal Hocko >> >> v1 patch: https://lore.kernel.org/lkml/Yv3r6Y1vh+6AbY4+@dhcp22.suse.cz/T/ > >IIRC v1 failed to boot in some automatic bot test. Will this not fail with >the same config/hw combination when the parameter is passed? There is no problem with or without the parameter now. > >> v2 patch: https://lore.kernel.org/lkml/20220824065058.81051-1-lizhe.67@bytedance.com/T/ >> >> [1]: https://lore.kernel.org/linux-mm/YwHmXLu5txij+p35@xsang-OptiPlex-9020/ >> >> Suggested-by: Michal Hocko <mhocko@suse.com> >> Signed-off-by: Li Zhe <lizhe.67@bytedance.com> >> --- >> Documentation/admin-guide/kernel-parameters.txt | 6 ++++++ >> include/linux/page_ext.h | 11 +++++++++++ >> init/main.c | 6 +++++- >> mm/page_alloc.c | 2 ++ >> mm/page_ext.c | 12 ++++++++++++ >> 5 files changed, 36 insertions(+), 1 deletion(-) >> >> diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt >> index d7f30902fda0..7b5726828ac0 100644 >> --- a/Documentation/admin-guide/kernel-parameters.txt >> +++ b/Documentation/admin-guide/kernel-parameters.txt >> @@ -1471,6 +1471,12 @@ >> Permit 'security.evm' to be updated regardless of >> current integrity status. >> >> + early_page_ext [KNL] Boot-time early page_ext initializing option. >> + This boot parameter disables the deferred initialization >> + of struct page and move up function page_ext_init() in >> + order to catch early page allocations. Available with >> + CONFIG_PAGE_EXTENSION=y. >> + >> failslab= >> fail_usercopy= >> fail_page_alloc= >> diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h >> index fabb2e1e087f..68d690649234 100644 >> --- a/include/linux/page_ext.h >> +++ b/include/linux/page_ext.h >> @@ -36,9 +36,15 @@ struct page_ext { >> unsigned long flags; >> }; >> >> +extern bool early_page_ext; >> extern unsigned long page_ext_size; >> extern void pgdat_page_ext_init(struct pglist_data *pgdat); >> >> +static inline bool early_page_ext_enable(void) >> +{ >> + return early_page_ext; >> +} > >I think it should better be named early_page_ext_enabled() as it returns the >status, not sets it to true? Yes you are right. I will fix it in my v4 patch. Thanks! > >> + >> #ifdef CONFIG_SPARSEMEM >> static inline void page_ext_init_flatmem(void) >> { >> @@ -67,6 +73,11 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) >> #else /* !CONFIG_PAGE_EXTENSION */ >> struct page_ext; >> >> +static inline bool early_page_ext_enable(void) >> +{ >> + return false; >> +} >> + >> static inline void pgdat_page_ext_init(struct pglist_data *pgdat) >> { >> } >> diff --git a/init/main.c b/init/main.c >> index 91642a4e69be..d95edb67a499 100644 >> --- a/init/main.c >> +++ b/init/main.c >> @@ -849,6 +849,9 @@ static void __init mm_init(void) >> pgtable_init(); >> debug_objects_mem_init(); >> vmalloc_init(); >> + /* Should be run after vmap initialization */ >> + if (early_page_ext_enable()) >> + page_ext_init(); >> /* Should be run before the first non-init thread is created */ >> init_espfix_bsp(); >> /* Should be run after espfix64 is set up. */ >> @@ -1606,7 +1609,8 @@ static noinline void __init kernel_init_freeable(void) >> padata_init(); >> page_alloc_init_late(); >> /* Initialize page ext after all struct pages are initialized. */ >> - page_ext_init(); >> + if (!early_page_ext_enable()) >> + page_ext_init(); >> >> do_basic_setup(); >> >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index e5486d47406e..e580b197aa1e 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -482,6 +482,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn) >> { >> static unsigned long prev_end_pfn, nr_initialised; >> >> + if (early_page_ext_enable()) >> + return false; >> /* >> * prev_end_pfn static that contains the end of previous zone >> * No need to protect because called very early in boot before smp_init. >> diff --git a/mm/page_ext.c b/mm/page_ext.c >> index 3dc715d7ac29..bf4f2a12d7dc 100644 >> --- a/mm/page_ext.c >> +++ b/mm/page_ext.c >> @@ -85,6 +85,18 @@ unsigned long page_ext_size = sizeof(struct page_ext); >> >> static unsigned long total_usage; >> >> +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT >> +bool early_page_ext __meminitdata; >> +#else >> +bool early_page_ext __meminitdata = true; >> +#endif >> +static int __init setup_early_page_ext(char *str) >> +{ >> + early_page_ext = true; >> + return 0; >> +} >> +early_param("early_page_ext", setup_early_page_ext); >> + >> static bool __init invoke_need_callbacks(void) >> { >> int i;
diff --git a/Documentation/admin-guide/kernel-parameters.txt b/Documentation/admin-guide/kernel-parameters.txt index d7f30902fda0..7b5726828ac0 100644 --- a/Documentation/admin-guide/kernel-parameters.txt +++ b/Documentation/admin-guide/kernel-parameters.txt @@ -1471,6 +1471,12 @@ Permit 'security.evm' to be updated regardless of current integrity status. + early_page_ext [KNL] Boot-time early page_ext initializing option. + This boot parameter disables the deferred initialization + of struct page and move up function page_ext_init() in + order to catch early page allocations. Available with + CONFIG_PAGE_EXTENSION=y. + failslab= fail_usercopy= fail_page_alloc= diff --git a/include/linux/page_ext.h b/include/linux/page_ext.h index fabb2e1e087f..68d690649234 100644 --- a/include/linux/page_ext.h +++ b/include/linux/page_ext.h @@ -36,9 +36,15 @@ struct page_ext { unsigned long flags; }; +extern bool early_page_ext; extern unsigned long page_ext_size; extern void pgdat_page_ext_init(struct pglist_data *pgdat); +static inline bool early_page_ext_enable(void) +{ + return early_page_ext; +} + #ifdef CONFIG_SPARSEMEM static inline void page_ext_init_flatmem(void) { @@ -67,6 +73,11 @@ static inline struct page_ext *page_ext_next(struct page_ext *curr) #else /* !CONFIG_PAGE_EXTENSION */ struct page_ext; +static inline bool early_page_ext_enable(void) +{ + return false; +} + static inline void pgdat_page_ext_init(struct pglist_data *pgdat) { } diff --git a/init/main.c b/init/main.c index 91642a4e69be..d95edb67a499 100644 --- a/init/main.c +++ b/init/main.c @@ -849,6 +849,9 @@ static void __init mm_init(void) pgtable_init(); debug_objects_mem_init(); vmalloc_init(); + /* Should be run after vmap initialization */ + if (early_page_ext_enable()) + page_ext_init(); /* Should be run before the first non-init thread is created */ init_espfix_bsp(); /* Should be run after espfix64 is set up. */ @@ -1606,7 +1609,8 @@ static noinline void __init kernel_init_freeable(void) padata_init(); page_alloc_init_late(); /* Initialize page ext after all struct pages are initialized. */ - page_ext_init(); + if (!early_page_ext_enable()) + page_ext_init(); do_basic_setup(); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index e5486d47406e..e580b197aa1e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -482,6 +482,8 @@ defer_init(int nid, unsigned long pfn, unsigned long end_pfn) { static unsigned long prev_end_pfn, nr_initialised; + if (early_page_ext_enable()) + return false; /* * prev_end_pfn static that contains the end of previous zone * No need to protect because called very early in boot before smp_init. diff --git a/mm/page_ext.c b/mm/page_ext.c index 3dc715d7ac29..bf4f2a12d7dc 100644 --- a/mm/page_ext.c +++ b/mm/page_ext.c @@ -85,6 +85,18 @@ unsigned long page_ext_size = sizeof(struct page_ext); static unsigned long total_usage; +#ifdef CONFIG_DEFERRED_STRUCT_PAGE_INIT +bool early_page_ext __meminitdata; +#else +bool early_page_ext __meminitdata = true; +#endif +static int __init setup_early_page_ext(char *str) +{ + early_page_ext = true; + return 0; +} +early_param("early_page_ext", setup_early_page_ext); + static bool __init invoke_need_callbacks(void) { int i;