Message ID | 20241007182315.401167-1-ziy@nvidia.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] mm: avoid clearing user movable page twice with init_on_alloc=1 | expand |
On 07.10.24 20:23, Zi Yan wrote: > Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and > init_on_free=1 boot options") forces allocated page to be cleared in > post_alloc_hook() when init_on_alloc=1. > > For non PMD folios, if arch does not define > vma_alloc_zeroed_movable_folio(), the default implementation again clears > the page return from the buddy allocator. So the page is cleared twice. > Fix it by passing __GFP_ZERO instead to avoid double page clearing. > At the moment, s390,arm64,x86,alpha,m68k are not impacted since they > define their own vma_alloc_zeroed_movable_folio(). > > For PMD folios, folio_zero_user() is called to clear the folio again. > Fix it by calling folio_zero_user() only if init_on_alloc is set. > All arch are impacted. > > Signed-off-by: Zi Yan <ziy@nvidia.com> > --- > include/linux/highmem.h | 14 ++------------ > mm/huge_memory.c | 4 +++- > 2 files changed, 5 insertions(+), 13 deletions(-) > > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index 930a591b9b61..4b15224842e1 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > * Return: A folio containing one allocated and zeroed page or NULL if > * we are out of memory. > */ > -static inline > -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, > - unsigned long vaddr) > -{ > - struct folio *folio; > - > - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false); > - if (folio) > - clear_user_highpage(&folio->page, vaddr); > - > - return folio; > -} > +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ > + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) > #endif > > static inline void clear_highpage(struct page *page) > diff --git a/mm/huge_memory.c b/mm/huge_memory.c > index a7b05f4c2a5e..ff746151896f 100644 > --- a/mm/huge_memory.c > +++ b/mm/huge_memory.c > @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, > goto release; > } > > - folio_zero_user(folio, vmf->address); > + if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, > + &init_on_alloc)) > + folio_zero_user(folio, vmf->address); > /* > * The memory barrier inside __folio_mark_uptodate makes sure that > * folio_zero_user writes become visible before the set_pmd_at() I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code?
On 8 Oct 2024, at 4:26, David Hildenbrand wrote: > On 07.10.24 20:23, Zi Yan wrote: >> Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and >> init_on_free=1 boot options") forces allocated page to be cleared in >> post_alloc_hook() when init_on_alloc=1. >> >> For non PMD folios, if arch does not define >> vma_alloc_zeroed_movable_folio(), the default implementation again clears >> the page return from the buddy allocator. So the page is cleared twice. >> Fix it by passing __GFP_ZERO instead to avoid double page clearing. >> At the moment, s390,arm64,x86,alpha,m68k are not impacted since they >> define their own vma_alloc_zeroed_movable_folio(). >> >> For PMD folios, folio_zero_user() is called to clear the folio again. >> Fix it by calling folio_zero_user() only if init_on_alloc is set. >> All arch are impacted. >> >> Signed-off-by: Zi Yan <ziy@nvidia.com> >> --- >> include/linux/highmem.h | 14 ++------------ >> mm/huge_memory.c | 4 +++- >> 2 files changed, 5 insertions(+), 13 deletions(-) >> >> diff --git a/include/linux/highmem.h b/include/linux/highmem.h >> index 930a591b9b61..4b15224842e1 100644 >> --- a/include/linux/highmem.h >> +++ b/include/linux/highmem.h >> @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) >> * Return: A folio containing one allocated and zeroed page or NULL if >> * we are out of memory. >> */ >> -static inline >> -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, >> - unsigned long vaddr) >> -{ >> - struct folio *folio; >> - >> - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false); >> - if (folio) >> - clear_user_highpage(&folio->page, vaddr); >> - >> - return folio; >> -} >> +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ >> + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) >> #endif >> static inline void clear_highpage(struct page *page) >> diff --git a/mm/huge_memory.c b/mm/huge_memory.c >> index a7b05f4c2a5e..ff746151896f 100644 >> --- a/mm/huge_memory.c >> +++ b/mm/huge_memory.c >> @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, >> goto release; >> } >> - folio_zero_user(folio, vmf->address); >> + if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, >> + &init_on_alloc)) >> + folio_zero_user(folio, vmf->address); >> /* >> * The memory barrier inside __folio_mark_uptodate makes sure that >> * folio_zero_user writes become visible before the set_pmd_at() > > I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. > > Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but folio_zero_user() uses vmf->address to improve cache performance by changing subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this optimization. To keep it, vmf->address will need to be passed to allocation code. Maybe that is acceptable? Best Regards, Yan, Zi
On 10/8/24 13:52, Zi Yan wrote: > On 8 Oct 2024, at 4:26, David Hildenbrand wrote: > >> >> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. >> >> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? > > I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but Create some nice inline wrapper for the test and it will look less ugly? :) > folio_zero_user() uses vmf->address to improve cache performance by changing > subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target > sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this > optimization. To keep it, vmf->address will need to be passed to allocation > code. Maybe that is acceptable? I'd rather not change the page allocation code for this... > Best Regards, > Yan, Zi
On 08.10.24 14:57, Vlastimil Babka wrote: > On 10/8/24 13:52, Zi Yan wrote: >> On 8 Oct 2024, at 4:26, David Hildenbrand wrote: >> >>> >>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. >>> >>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? >> >> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but > > Create some nice inline wrapper for the test and it will look less ugly? :) > >> folio_zero_user() uses vmf->address to improve cache performance by changing >> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target >> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this >> optimization. To keep it, vmf->address will need to be passed to allocation >> code. Maybe that is acceptable? > > I'd rather not change the page allocation code for this... Although I'm curious if that optimization from 2017 is still valuable :)
On 8 Oct 2024, at 9:06, David Hildenbrand wrote: > On 08.10.24 14:57, Vlastimil Babka wrote: >> On 10/8/24 13:52, Zi Yan wrote: >>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote: >>> >>>> >>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. >>>> >>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? >>> >>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but >> >> Create some nice inline wrapper for the test and it will look less ugly? :) something like? static inline bool alloc_zeroed() { return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, &init_on_alloc); } I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP. So both PMD THP and mTHP are zeroed twice for all arch. Adding Ryan for mTHP. >> >>> folio_zero_user() uses vmf->address to improve cache performance by changing >>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target >>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this >>> optimization. To keep it, vmf->address will need to be passed to allocation >>> code. Maybe that is acceptable? >> >> I'd rather not change the page allocation code for this... > > Although I'm curious if that optimization from 2017 is still valuable :) Maybe Ying can give some insight on this. Do we need some general guidance on who is responsible for zeroing allocated folios? Should people use GFP_ZERO instead of zeroing by themselves if possible? Best Regards, Yan, Zi
Zi Yan <ziy@nvidia.com> writes: > On 8 Oct 2024, at 9:06, David Hildenbrand wrote: > >> On 08.10.24 14:57, Vlastimil Babka wrote: >>> On 10/8/24 13:52, Zi Yan wrote: >>>> On 8 Oct 2024, at 4:26, David Hildenbrand wrote: >>>> >>>>> >>>>> I remember we discussed that in the past and that we do *not* want to sprinkle these CONFIG_INIT_ON_ALLOC_DEFAULT_ON checks all over the kernel. >>>>> >>>>> Ideally, we'd use GFP_ZERO and have the buddy just do that for us? There is the slight chance that we zero-out when we're not going to use the allocated folio, but ... that can happen either way even with the current code? >>>> >>>> I agree that putting CONFIG_INIT_ON_ALLOC_DEFAULT_ON here is not ideal, but >>> >>> Create some nice inline wrapper for the test and it will look less ugly? :) > > something like? > > static inline bool alloc_zeroed() > { > return static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, > &init_on_alloc); > } > > > I missed another folio_zero_user() caller in alloc_anon_folio() for mTHP. > So both PMD THP and mTHP are zeroed twice for all arch. > > Adding Ryan for mTHP. > >>> >>>> folio_zero_user() uses vmf->address to improve cache performance by changing >>>> subpage clearing order. See commit c79b57e462b5 ("mm: hugetlb: clear target >>>> sub-page last when clearing huge page”). If we use GFP_ZERO, we lose this >>>> optimization. To keep it, vmf->address will need to be passed to allocation >>>> code. Maybe that is acceptable? >>> >>> I'd rather not change the page allocation code for this... >> >> Although I'm curious if that optimization from 2017 is still valuable :) > > Maybe Ying can give some insight on this. I guess the optimization still applies now. Although the size of the per-core(thread) last level cache increases, it's still quite common for it to be smaller than the size of THP. And the sizes of L1/L2 are significantly smaller, the likelihood for the accessed cache line to be in L1/L2/LLC increases with the optimization. -- Best Regards, Huang, Ying
Zi Yan <ziy@nvidia.com> writes: [snip] > diff --git a/include/linux/highmem.h b/include/linux/highmem.h > index 930a591b9b61..4b15224842e1 100644 > --- a/include/linux/highmem.h > +++ b/include/linux/highmem.h > @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) > * Return: A folio containing one allocated and zeroed page or NULL if > * we are out of memory. > */ > -static inline > -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, > - unsigned long vaddr) > -{ > - struct folio *folio; > - > - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false); > - if (folio) > - clear_user_highpage(&folio->page, vaddr); > - > - return folio; > -} > +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ > + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) Although just one line, I still prefer to use inline function instead of macro here. Not strong opinion. > #endif [snip] -- Best Regards, Huang, Ying
diff --git a/include/linux/highmem.h b/include/linux/highmem.h index 930a591b9b61..4b15224842e1 100644 --- a/include/linux/highmem.h +++ b/include/linux/highmem.h @@ -220,18 +220,8 @@ static inline void clear_user_highpage(struct page *page, unsigned long vaddr) * Return: A folio containing one allocated and zeroed page or NULL if * we are out of memory. */ -static inline -struct folio *vma_alloc_zeroed_movable_folio(struct vm_area_struct *vma, - unsigned long vaddr) -{ - struct folio *folio; - - folio = vma_alloc_folio(GFP_HIGHUSER_MOVABLE, 0, vma, vaddr, false); - if (folio) - clear_user_highpage(&folio->page, vaddr); - - return folio; -} +#define vma_alloc_zeroed_movable_folio(vma, vaddr) \ + vma_alloc_folio(GFP_HIGHUSER_MOVABLE | __GFP_ZERO, 0, vma, vaddr, false) #endif static inline void clear_highpage(struct page *page) diff --git a/mm/huge_memory.c b/mm/huge_memory.c index a7b05f4c2a5e..ff746151896f 100644 --- a/mm/huge_memory.c +++ b/mm/huge_memory.c @@ -1177,7 +1177,9 @@ static vm_fault_t __do_huge_pmd_anonymous_page(struct vm_fault *vmf, goto release; } - folio_zero_user(folio, vmf->address); + if (!static_branch_maybe(CONFIG_INIT_ON_ALLOC_DEFAULT_ON, + &init_on_alloc)) + folio_zero_user(folio, vmf->address); /* * The memory barrier inside __folio_mark_uptodate makes sure that * folio_zero_user writes become visible before the set_pmd_at()
Commit 6471384af2a6 ("mm: security: introduce init_on_alloc=1 and init_on_free=1 boot options") forces allocated page to be cleared in post_alloc_hook() when init_on_alloc=1. For non PMD folios, if arch does not define vma_alloc_zeroed_movable_folio(), the default implementation again clears the page return from the buddy allocator. So the page is cleared twice. Fix it by passing __GFP_ZERO instead to avoid double page clearing. At the moment, s390,arm64,x86,alpha,m68k are not impacted since they define their own vma_alloc_zeroed_movable_folio(). For PMD folios, folio_zero_user() is called to clear the folio again. Fix it by calling folio_zero_user() only if init_on_alloc is set. All arch are impacted. Signed-off-by: Zi Yan <ziy@nvidia.com> --- include/linux/highmem.h | 14 ++------------ mm/huge_memory.c | 4 +++- 2 files changed, 5 insertions(+), 13 deletions(-)