diff mbox series

mm: page_alloc: validate buddy before check its migratetype.

Message ID 20220330221238.396357-1-zi.yan@sent.com (mailing list archive)
State New
Headers show
Series mm: page_alloc: validate buddy before check its migratetype. | expand

Commit Message

Zi Yan March 30, 2022, 10:12 p.m. UTC
From: Zi Yan <ziy@nvidia.com>

Whenever a buddy page is found, page_is_buddy() should be called to
check its validity. Add the missing check during pageblock merge check.

Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")

Reported-by: Steven Rostedt <rostedt@goodmis.org>
Signed-off-by: Zi Yan <ziy@nvidia.com>
---
 mm/page_alloc.c | 3 +++
 1 file changed, 3 insertions(+)

Comments

Linus Torvalds March 30, 2022, 10:27 p.m. UTC | #1
On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>
> Whenever a buddy page is found, page_is_buddy() should be called to
> check its validity. Add the missing check during pageblock merge check.

Applied.

>                 buddy_pfn = __find_buddy_pfn(pfn, order);
>                 buddy = page + (buddy_pfn - pfn);
> +
> +               if (!page_is_buddy(page, buddy, order))
> +                       goto done_merging;

I wonder if that sequence shouldn't be made some helper function.

Also, looking around, I will note that unset_migratetype_isolate() in
mm/page_isolation.c is missing that "page_is_buddy()" check.

I _think_ it's probably ok because we checked

        if (PageBuddy(page)) {

on the (original, non-puddy) page, and then we only use the buddy page
pointer for that

                   if (!is_migrate_isolate_page(buddy)) {

and it's been like that for a _loong_ time.

But honestly, it feels like we would be better off with always having
the page_is_buddy() check anyway.

Or is there some reason why we don't want it here?

             Linus
Steven Rostedt March 30, 2022, 10:29 p.m. UTC | #2
On Wed, 30 Mar 2022 18:12:38 -0400
Zi Yan <zi.yan@sent.com> wrote:

> From: Zi Yan <ziy@nvidia.com>
> 
> Whenever a buddy page is found, page_is_buddy() should be called to
> check its validity. Add the missing check during pageblock merge check.
> 
> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")

Link: https://lore.kernel.org/all/20220330154208.71aca532@gandalf.local.home/

> 
> Reported-by: Steven Rostedt <rostedt@goodmis.org>

Tested-by: Steven Rostedt (Google) <rostedt@goodmis.org>

-- Steve

> Signed-off-by: Zi Yan <ziy@nvidia.com>
> ---
>  mm/page_alloc.c | 3 +++
>  1 file changed, 3 insertions(+)
> 
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index bdc8f60ae462..6c6af8658775 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -1108,6 +1108,9 @@ static inline void __free_one_page(struct page *page,
>  
>  		buddy_pfn = __find_buddy_pfn(pfn, order);
>  		buddy = page + (buddy_pfn - pfn);
> +
> +		if (!page_is_buddy(page, buddy, order))
> +			goto done_merging;
>  		buddy_mt = get_pageblock_migratetype(buddy);
>  
>  		if (migratetype != buddy_mt
Linus Torvalds March 30, 2022, 11:03 p.m. UTC | #3
On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>
> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")

Oh, btw - should this perhaps be backported further back than that
alleged "fixes" commit?

It does look like maybe the problem potentially existed before too,
and was just much harder to trigger.

That said, google doesn't find any other reports that look like
Steven's oops, so maybe it really never happened and backporting isn't
called for.

Or possibly my google-fu is just bad.

                  Linus
Zi Yan March 30, 2022, 11:48 p.m. UTC | #4
On 30 Mar 2022, at 19:03, Linus Torvalds wrote:

> On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>>
>> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")
>
> Oh, btw - should this perhaps be backported further back than that
> alleged "fixes" commit?
>
> It does look like maybe the problem potentially existed before too,
> and was just much harder to trigger.
>
> That said, google doesn't find any other reports that look like
> Steven's oops, so maybe it really never happened and backporting isn't
> called for.
>
> Or possibly my google-fu is just bad.
>

There might not be any issue with the original code because this bug
could only be triggered when CONFIG_FLATMEM and CONFIG_MEMORY_ISOLATION
are both set, which never happens, since CONFIG_MEMORY_ISOLATION
depends on CONFIG_SPARSEMEM.

By checking Steven's boot log, it should be PFN 0x21ee00 that triggers
the bug, since the physical memory range ends at PFN 0x21edff.
PFN 0x21ee00 is 2MB aligned instead of MAX_ORDER-1 (4MB) aligned.
The original code assumes all physical memory ranges are at least
MAX_ORDER-1 aligned, which is true when CONFIG_SPARSEMEM is set
(CONFIG_MEMORY_ISOLATION depends on it), since CONFIG_SPARSEMEM
allocates pageblock_flags array (the NULL-deferenced bitmap points
to) at section size granularity (128MB > 4MB). However, CONFIG_FLATMEM
does not do this. It allocates pageblock_flags array at the exact size
of the physical memory. So checking 0x21ee00 will not cause NULL
dereferencing when CONFIG_MEMORY_ISOLATION is set and the original
if statement can be true.

Now I am wondering if the page_is_buddy() check is correct for
CONFIG_FLATMEM. Is mem_map allocation aligned to MAX_ORDER-1
or just the present physical memory range? Is PageBuddy(0x21ee00)
accessing some random memory location?

--
Best Regards,
Yan, Zi
Zi Yan March 31, 2022, 12:10 a.m. UTC | #5
On 30 Mar 2022, at 19:48, Zi Yan wrote:

> On 30 Mar 2022, at 19:03, Linus Torvalds wrote:
>
>> On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>>>
>>> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")
>>
>> Oh, btw - should this perhaps be backported further back than that
>> alleged "fixes" commit?
>>
>> It does look like maybe the problem potentially existed before too,
>> and was just much harder to trigger.
>>
>> That said, google doesn't find any other reports that look like
>> Steven's oops, so maybe it really never happened and backporting isn't
>> called for.
>>
>> Or possibly my google-fu is just bad.
>>
>
> There might not be any issue with the original code because this bug
> could only be triggered when CONFIG_FLATMEM and CONFIG_MEMORY_ISOLATION
> are both set, which never happens, since CONFIG_MEMORY_ISOLATION
> depends on CONFIG_SPARSEMEM.
>
> By checking Steven's boot log, it should be PFN 0x21ee00 that triggers
> the bug, since the physical memory range ends at PFN 0x21edff.
> PFN 0x21ee00 is 2MB aligned instead of MAX_ORDER-1 (4MB) aligned.
> The original code assumes all physical memory ranges are at least
> MAX_ORDER-1 aligned, which is true when CONFIG_SPARSEMEM is set
> (CONFIG_MEMORY_ISOLATION depends on it), since CONFIG_SPARSEMEM
> allocates pageblock_flags array (the NULL-deferenced bitmap points
> to) at section size granularity (128MB > 4MB). However, CONFIG_FLATMEM
> does not do this. It allocates pageblock_flags array at the exact size
> of the physical memory. So checking 0x21ee00 will not cause NULL
> dereferencing when CONFIG_MEMORY_ISOLATION is set and the original
> if statement can be true.
>
> Now I am wondering if the page_is_buddy() check is correct for
> CONFIG_FLATMEM. Is mem_map allocation aligned to MAX_ORDER-1
> or just the present physical memory range? Is PageBuddy(0x21ee00)
> accessing some random memory location?

OK. mem_map seems to be MAX_ORDER-1 aligned, so there is no
problem with PageBuddy(0x21ee00).



--
Best Regards,
Yan, Zi
Zi Yan March 31, 2022, 12:15 a.m. UTC | #6
On 30 Mar 2022, at 18:27, Linus Torvalds wrote:

> On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>>
>> Whenever a buddy page is found, page_is_buddy() should be called to
>> check its validity. Add the missing check during pageblock merge check.
>
> Applied.
>
>>                 buddy_pfn = __find_buddy_pfn(pfn, order);
>>                 buddy = page + (buddy_pfn - pfn);
>> +
>> +               if (!page_is_buddy(page, buddy, order))
>> +                       goto done_merging;
>
> I wonder if that sequence shouldn't be made some helper function.
>
> Also, looking around, I will note that unset_migratetype_isolate() in
> mm/page_isolation.c is missing that "page_is_buddy()" check.
>
> I _think_ it's probably ok because we checked
>
>         if (PageBuddy(page)) {
>
> on the (original, non-puddy) page, and then we only use the buddy page
> pointer for that
>
>                    if (!is_migrate_isolate_page(buddy)) {
>
> and it's been like that for a _loong_ time.
>
> But honestly, it feels like we would be better off with always having
> the page_is_buddy() check anyway.
>
> Or is there some reason why we don't want it here?
>
>              Linus

Like I said in the other email, memory isolation depends on sparsemem,
which would preclude the same NULL dereferencing from happening. But
I agree a helper function would be better. I will send a patch and see
how people think about it.


--
Best Regards,
Yan, Zi
Vlastimil Babka March 31, 2022, 8:57 a.m. UTC | #7
On 3/31/22 02:10, Zi Yan wrote:
> On 30 Mar 2022, at 19:48, Zi Yan wrote:
> 
>> On 30 Mar 2022, at 19:03, Linus Torvalds wrote:
>>
>>> On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>>>>
>>>> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")
>>>
>>> Oh, btw - should this perhaps be backported further back than that
>>> alleged "fixes" commit?
>>>
>>> It does look like maybe the problem potentially existed before too,
>>> and was just much harder to trigger.
>>>
>>> That said, google doesn't find any other reports that look like
>>> Steven's oops, so maybe it really never happened and backporting isn't
>>> called for.
>>>
>>> Or possibly my google-fu is just bad.
>>>
>>
>> There might not be any issue with the original code because this bug
>> could only be triggered when CONFIG_FLATMEM and CONFIG_MEMORY_ISOLATION
>> are both set, which never happens, since CONFIG_MEMORY_ISOLATION
>> depends on CONFIG_SPARSEMEM.

Good point. Which means unset_migratetype_isolate() that Linus pointed
out, is currently also safe as it's a CONFIG_MEMORY_ISOLATION code.
We could still implement the suggested page_find_buddy() wrapper using
page_is_buddy() internally, as well as the cleanup of __free_one_page(),
but it's not urgent.

>> By checking Steven's boot log, it should be PFN 0x21ee00 that triggers
>> the bug, since the physical memory range ends at PFN 0x21edff.
>> PFN 0x21ee00 is 2MB aligned instead of MAX_ORDER-1 (4MB) aligned.
>> The original code assumes all physical memory ranges are at least
>> MAX_ORDER-1 aligned, which is true when CONFIG_SPARSEMEM is set
>> (CONFIG_MEMORY_ISOLATION depends on it), since CONFIG_SPARSEMEM
>> allocates pageblock_flags array (the NULL-deferenced bitmap points
>> to) at section size granularity (128MB > 4MB). However, CONFIG_FLATMEM
>> does not do this. It allocates pageblock_flags array at the exact size
>> of the physical memory. So checking 0x21ee00 will not cause NULL
>> dereferencing when CONFIG_MEMORY_ISOLATION is set and the original
>> if statement can be true.
>>
>> Now I am wondering if the page_is_buddy() check is correct for
>> CONFIG_FLATMEM. Is mem_map allocation aligned to MAX_ORDER-1
>> or just the present physical memory range? Is PageBuddy(0x21ee00)
>> accessing some random memory location?
> 
> OK. mem_map seems to be MAX_ORDER-1 aligned, so there is no
> problem with PageBuddy(0x21ee00).

Yeah mem_map has to be in all config variants, otherwise buddy merging
would have been blowing up in page_is_buddy() even prior to all the
"sometimes avoid merging pageblock" changes.
> 
> --
> Best Regards,
> Yan, Zi
Zi Yan March 31, 2022, 10:07 p.m. UTC | #8
On 31 Mar 2022, at 4:57, Vlastimil Babka wrote:

> On 3/31/22 02:10, Zi Yan wrote:
>> On 30 Mar 2022, at 19:48, Zi Yan wrote:
>>
>>> On 30 Mar 2022, at 19:03, Linus Torvalds wrote:
>>>
>>>> On Wed, Mar 30, 2022 at 3:12 PM Zi Yan <zi.yan@sent.com> wrote:
>>>>>
>>>>> Fixes: 1dd214b8f21c ("mm: page_alloc: avoid merging non-fallbackable pageblocks with others")
>>>>
>>>> Oh, btw - should this perhaps be backported further back than that
>>>> alleged "fixes" commit?
>>>>
>>>> It does look like maybe the problem potentially existed before too,
>>>> and was just much harder to trigger.
>>>>
>>>> That said, google doesn't find any other reports that look like
>>>> Steven's oops, so maybe it really never happened and backporting isn't
>>>> called for.
>>>>
>>>> Or possibly my google-fu is just bad.
>>>>
>>>
>>> There might not be any issue with the original code because this bug
>>> could only be triggered when CONFIG_FLATMEM and CONFIG_MEMORY_ISOLATION
>>> are both set, which never happens, since CONFIG_MEMORY_ISOLATION
>>> depends on CONFIG_SPARSEMEM.
>
> Good point. Which means unset_migratetype_isolate() that Linus pointed
> out, is currently also safe as it's a CONFIG_MEMORY_ISOLATION code.
> We could still implement the suggested page_find_buddy() wrapper using
> page_is_buddy() internally, as well as the cleanup of __free_one_page(),
> but it's not urgent.
>
Sure. Will do that.

>>> By checking Steven's boot log, it should be PFN 0x21ee00 that triggers
>>> the bug, since the physical memory range ends at PFN 0x21edff.
>>> PFN 0x21ee00 is 2MB aligned instead of MAX_ORDER-1 (4MB) aligned.
>>> The original code assumes all physical memory ranges are at least
>>> MAX_ORDER-1 aligned, which is true when CONFIG_SPARSEMEM is set
>>> (CONFIG_MEMORY_ISOLATION depends on it), since CONFIG_SPARSEMEM
>>> allocates pageblock_flags array (the NULL-deferenced bitmap points
>>> to) at section size granularity (128MB > 4MB). However, CONFIG_FLATMEM
>>> does not do this. It allocates pageblock_flags array at the exact size
>>> of the physical memory. So checking 0x21ee00 will not cause NULL
>>> dereferencing when CONFIG_MEMORY_ISOLATION is set and the original
>>> if statement can be true.


For the record, the actual situation is that struct page of PFN 0x21ee00
is allocated but uninitialized. In CONFIG_FLATMEM, pageblock flags are
stored in zone->pageblock_flags. When reading PFN 0x21ee00's pageblock
flags, the page's zone is determined to be ZONE_MOVABLE, since page->flags
is -1UL and page_zonenum() is 3. The system does not have ZONE_MOVABLE,
so zone->pageblock_flags is NULL and reading it caused NULL pointer
dereferencing.

>>>
>>> Now I am wondering if the page_is_buddy() check is correct for
>>> CONFIG_FLATMEM. Is mem_map allocation aligned to MAX_ORDER-1
>>> or just the present physical memory range? Is PageBuddy(0x21ee00)
>>> accessing some random memory location?
>>
>> OK. mem_map seems to be MAX_ORDER-1 aligned, so there is no
>> problem with PageBuddy(0x21ee00).
>
> Yeah mem_map has to be in all config variants, otherwise buddy merging
> would have been blowing up in page_is_buddy() even prior to all the
> "sometimes avoid merging pageblock" changes.
>>
>> --
>> Best Regards,
>> Yan, Zi


--
Best Regards,
Yan, Zi
diff mbox series

Patch

diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index bdc8f60ae462..6c6af8658775 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1108,6 +1108,9 @@  static inline void __free_one_page(struct page *page,
 
 		buddy_pfn = __find_buddy_pfn(pfn, order);
 		buddy = page + (buddy_pfn - pfn);
+
+		if (!page_is_buddy(page, buddy, order))
+			goto done_merging;
 		buddy_mt = get_pageblock_migratetype(buddy);
 
 		if (migratetype != buddy_mt