mbox series

[0/4] mm: page_ext: Introduce new iteration API

Message ID cover.1739931468.git.luizcap@redhat.com (mailing list archive)
Headers show
Series mm: page_ext: Introduce new iteration API | expand

Message

Luiz Capitulino Feb. 19, 2025, 2:17 a.m. UTC
Hi,

  [ Thanks to David Hildenbrand for identifying the root cause of this
    issue and proving guidance on how to fix it. The new API idea, bugs
    and misconceptions are all mine though ]

Currently, trying to reserve 1G pages with page_owner=on and sparsemem
causes a crash. The reproducer is very simple:

 1. Build the kernel with CONFIG_SPARSEMEM=y and the table extensions
 2. Pass 'default_hugepagesz=1 page_owner=on' in the kernel command-line
 3. Reserve one 1G page at run-time, this should crash (see patch 1 for
    backtrace) 

 [ A crash with page_table_check is also possible, but harder to trigger ]

Apparently, starting with commit cf54f310d0d3 ("mm/hugetlb: use __GFP_COMP
for gigantic folios") we now pass the full allocation order to page
extension clients and the page extension implementation assumes that all
PFNs of an allocation range will be stored in the same memory section (which
is not true for 1G pages).

To fix this, this series introduces a new iteration API for page extension
objects. The API checks if the next page extension object can be retrieved
from the current section or if it needs to look up for it in another
section.

All details in patch 1. Also, this series is against Linus tree commit
2408a807bfc3f738850ef5ad5e3fd59d66168996 .

RFC -> v1
=========

- Revamped the API by introducing for_each_page_ext macros
- Implemented various suggestions from David Hildenbrand, including page_ext
  lookup optimization
- Fixed changelogs

Luiz Capitulino (4):
  mm: page_ext: add an iteration API for page extensions
  mm: page_table_check: use new iteration API
  mm: page_owner: use new iteration API
  mm: page_ext: make page_ext_next() private to page_ext

 include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
 mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
 mm/page_owner.c          | 61 +++++++++++++++++-------------------
 mm/page_table_check.c    | 39 +++++++----------------
 4 files changed, 152 insertions(+), 63 deletions(-)

Comments

Andrew Morton Feb. 19, 2025, 11:52 p.m. UTC | #1
On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:

> To fix this, this series introduces a new iteration API for page extension
> objects. The API checks if the next page extension object can be retrieved
> from the current section or if it needs to look up for it in another
> section.
> 
> ...

A regression since 6.12, so we should backport the fix.

> ...
>
>  include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>  mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>  mm/page_owner.c          | 61 +++++++++++++++++-------------------
>  mm/page_table_check.c    | 39 +++++++----------------
>  4 files changed, 152 insertions(+), 63 deletions(-)

That's a lot to backport!

Is there some quick-n-dirty fixup we can apply for the sake of -stable
kernels, then work on this long-term approach for future kernels?
David Hildenbrand Feb. 20, 2025, 10:49 a.m. UTC | #2
On 20.02.25 00:52, Andrew Morton wrote:
> On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:
> 
>> To fix this, this series introduces a new iteration API for page extension
>> objects. The API checks if the next page extension object can be retrieved
>> from the current section or if it needs to look up for it in another
>> section.
>>
>> ...
> 
> A regression since 6.12, so we should backport the fix.
> 
>> ...
>>
>>   include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>>   mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>>   mm/page_owner.c          | 61 +++++++++++++++++-------------------
>>   mm/page_table_check.c    | 39 +++++++----------------
>>   4 files changed, 152 insertions(+), 63 deletions(-)
> 
> That's a lot to backport!
> 
> Is there some quick-n-dirty fixup we can apply for the sake of -stable
> kernels, then work on this long-term approach for future kernels?

I assume we could loop in 
reset_page_owner()/page_table_check_free()/set_page_owner()/page_table_check_alloc(). 
Not-so-nice for upstream, maybe good-enough for stable. Still nasty :)

OTOH, we don't really expect a lot of conflicts.
Luiz Capitulino Feb. 20, 2025, 8:23 p.m. UTC | #3
On 2025-02-20 05:49, David Hildenbrand wrote:
> On 20.02.25 00:52, Andrew Morton wrote:
>> On Tue, 18 Feb 2025 21:17:46 -0500 Luiz Capitulino <luizcap@redhat.com> wrote:
>>
>>> To fix this, this series introduces a new iteration API for page extension
>>> objects. The API checks if the next page extension object can be retrieved
>>> from the current section or if it needs to look up for it in another
>>> section.
>>>
>>> ...
>>
>> A regression since 6.12, so we should backport the fix.
>>
>>> ...
>>>
>>>   include/linux/page_ext.h | 67 +++++++++++++++++++++++++++++++++++++---
>>>   mm/page_ext.c            | 48 ++++++++++++++++++++++++++++
>>>   mm/page_owner.c          | 61 +++++++++++++++++-------------------
>>>   mm/page_table_check.c    | 39 +++++++----------------
>>>   4 files changed, 152 insertions(+), 63 deletions(-)
>>
>> That's a lot to backport!
>>
>> Is there some quick-n-dirty fixup we can apply for the sake of -stable
>> kernels, then work on this long-term approach for future kernels?
> 
> I assume we could loop in reset_page_owner()/page_table_check_free()/set_page_owner()/page_table_check_alloc(). Not-so-nice for upstream, maybe good-enough for stable. Still nasty :)

I think Andrew wants to have the quick-n-dirty fix for upstream, so that
it's easier to backport to -stable. Then we work on this solution on top.

> OTOH, we don't really expect a lot of conflicts.

Yes, I was able to apply this series on top of 6.12.15 without conflicts.
Given that -stable does backport a lot of fixes anyways, I would push for
having this on -stable.

But just to answer the original question: I can't think of quick-n-dirty,
but I can think of easy-n-ugly:

  1. We could add a check for MAX_PAGE_ORDER for the first function in a
     call chain calling page_ext_next() (that is, bail out if > MAX_PAGE_ORDER)

  2. We could replace all page_ext_next() calls to a version of look_page_ext()
     that takes a PFN

But all these ideas have regression risk as well, so I don't see the advantage.