diff mbox

[v6,01/15] x86/mm: reserve only exiting low pages

Message ID 1502138329-123460-2-git-send-email-pasha.tatashin@oracle.com (mailing list archive)
State New, archived
Headers show

Commit Message

Pavel Tatashin Aug. 7, 2017, 8:38 p.m. UTC
Struct pages are initialized by going through __init_single_page(). Since
the existing physical memory in memblock is represented in memblock.memory
list, struct page for every page from this list goes through
__init_single_page().

The second memblock list: memblock.reserved, manages the allocated memory.
The memory that won't be available to kernel allocator. So, every page from
this list goes through reserve_bootmem_region(), where certain struct page
fields are set, the assumption being that the struct pages have been
initialized beforehand.

In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
memblock.memory might start at a later PFN. For example, in QEMU,
e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
so PFN 0 is not on memblock.memory (and hence isn't initialized via
__init_single_page) but is on memblock.reserved (and hence we set fields in
the uninitialized struct page).

Currently, the struct page memory is always zeroed during allocation,
which prevents this problem from being detected. But, if some asserts
provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
visible in existing kernels.

In this patchset we will stop zeroing struct page memory during allocation.
Therefore, this bug must be fixed in order to avoid random assert failures
caused by CONFIG_DEBUG_VM_PGFLAGS triggers.

The fix is to reserve memory from the first existing PFN.

Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
Reviewed-by: Bob Picco <bob.picco@oracle.com>
---
 arch/x86/kernel/setup.c | 5 ++++-
 1 file changed, 4 insertions(+), 1 deletion(-)

Comments

Michal Hocko Aug. 11, 2017, 8:07 a.m. UTC | #1
On Mon 07-08-17 16:38:35, Pavel Tatashin wrote:
> Struct pages are initialized by going through __init_single_page(). Since
> the existing physical memory in memblock is represented in memblock.memory
> list, struct page for every page from this list goes through
> __init_single_page().

By a page _from_ this list you mean struct pages backing the physical
memory of the memblock lists?
 
> The second memblock list: memblock.reserved, manages the allocated memory.
> The memory that won't be available to kernel allocator. So, every page from
> this list goes through reserve_bootmem_region(), where certain struct page
> fields are set, the assumption being that the struct pages have been
> initialized beforehand.
> 
> In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
> memblock.memory might start at a later PFN. For example, in QEMU,
> e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
> so PFN 0 is not on memblock.memory (and hence isn't initialized via
> __init_single_page) but is on memblock.reserved (and hence we set fields in
> the uninitialized struct page).
> 
> Currently, the struct page memory is always zeroed during allocation,
> which prevents this problem from being detected. But, if some asserts
> provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
> visible in existing kernels.
> 
> In this patchset we will stop zeroing struct page memory during allocation.
> Therefore, this bug must be fixed in order to avoid random assert failures
> caused by CONFIG_DEBUG_VM_PGFLAGS triggers.
> 
> The fix is to reserve memory from the first existing PFN.

Hmm, I assume this is a result of some assert triggering, right? Which
one? Why don't we need the same treatment for other than x86 arch?

> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Reviewed-by: Bob Picco <bob.picco@oracle.com>

I guess that the review happened inhouse. I do not want to question its
value but it is rather strange to not hear the specific review comments
which might be useful in general and moreover even not include those
people on the CC list so they are aware of the follow up discussion.

> ---
>  arch/x86/kernel/setup.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3486d0498800..489cdc141bcb 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow);
>  
>  static void __init trim_low_memory_range(void)
>  {
> -	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
> +	unsigned long min_pfn = find_min_pfn_with_active_regions();
> +	phys_addr_t base = min_pfn << PAGE_SHIFT;
> +
> +	memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
>  }
>  	
>  /*
> -- 
> 2.14.0
Pavel Tatashin Aug. 11, 2017, 3:24 p.m. UTC | #2
>> Struct pages are initialized by going through __init_single_page(). Since
>> the existing physical memory in memblock is represented in memblock.memory
>> list, struct page for every page from this list goes through
>> __init_single_page().
> 
> By a page _from_ this list you mean struct pages backing the physical
> memory of the memblock lists?

Correct: "for every page from this list...", for every page represented 
by this list the struct page is initialized through __init_single_page()

>> In this patchset we will stop zeroing struct page memory during allocation.
>> Therefore, this bug must be fixed in order to avoid random assert failures
>> caused by CONFIG_DEBUG_VM_PGFLAGS triggers.
>>
>> The fix is to reserve memory from the first existing PFN.
> 
> Hmm, I assume this is a result of some assert triggering, right? Which
> one? Why don't we need the same treatment for other than x86 arch?

Correct, the pgflags asserts were triggered when we were setting 
reserved flags to struct page for PFN 0 in which was never initialized 
through __init_single_page(). The reason they were triggered is because 
we set all uninitialized memory to ones in one of the debug patches.

>> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Reviewed-by: Bob Picco <bob.picco@oracle.com>
> 
> I guess that the review happened inhouse. I do not want to question its
> value but it is rather strange to not hear the specific review comments
> which might be useful in general and moreover even not include those
> people on the CC list so they are aware of the follow up discussion.

I will bring this up with my colleagues to how to handle this better in 
the future. I will also CC the reviewers when I sent out the updated 
patch series.
Michal Hocko Aug. 14, 2017, 11:40 a.m. UTC | #3
On Fri 11-08-17 11:24:55, Pasha Tatashin wrote:
[...]
> >>In this patchset we will stop zeroing struct page memory during allocation.
> >>Therefore, this bug must be fixed in order to avoid random assert failures
> >>caused by CONFIG_DEBUG_VM_PGFLAGS triggers.
> >>
> >>The fix is to reserve memory from the first existing PFN.
> >
> >Hmm, I assume this is a result of some assert triggering, right? Which
> >one? Why don't we need the same treatment for other than x86 arch?
> 
> Correct, the pgflags asserts were triggered when we were setting reserved
> flags to struct page for PFN 0 in which was never initialized through
> __init_single_page(). The reason they were triggered is because we set all
> uninitialized memory to ones in one of the debug patches.

And why don't we need the same treatment for other architectures?
Pavel Tatashin Aug. 14, 2017, 1:30 p.m. UTC | #4
>> Correct, the pgflags asserts were triggered when we were setting reserved
>> flags to struct page for PFN 0 in which was never initialized through
>> __init_single_page(). The reason they were triggered is because we set all
>> uninitialized memory to ones in one of the debug patches.
> 
> And why don't we need the same treatment for other architectures?
> 

I have not seen similar issues on other architectures. At least this low 
memory reserve is x86 specific for BIOS purposes:

Documentation/admin-guide/kernel-parameters.txt
3624	reservelow=	[X86]
3625			Format: nn[K]
3626			Set the amount of memory to reserve for BIOS at
3627			the bottom of the address space.

If there are similar cases with other architectures, they will be caught 
by the last patch in this series, where all allocated memory is set to 
ones, and page flags asserts will be triggered. I have boot-tested on 
SPARC, ARM, and x86.

Pasha
Michal Hocko Aug. 14, 2017, 1:55 p.m. UTC | #5
Let's CC Hpa on this one. I am still not sure it is correct. The full
series is here
http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatashin@oracle.com

On Mon 07-08-17 16:38:35, Pavel Tatashin wrote:
> Struct pages are initialized by going through __init_single_page(). Since
> the existing physical memory in memblock is represented in memblock.memory
> list, struct page for every page from this list goes through
> __init_single_page().
> 
> The second memblock list: memblock.reserved, manages the allocated memory.
> The memory that won't be available to kernel allocator. So, every page from
> this list goes through reserve_bootmem_region(), where certain struct page
> fields are set, the assumption being that the struct pages have been
> initialized beforehand.
> 
> In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
> memblock.memory might start at a later PFN. For example, in QEMU,
> e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
> so PFN 0 is not on memblock.memory (and hence isn't initialized via
> __init_single_page) but is on memblock.reserved (and hence we set fields in
> the uninitialized struct page).
> 
> Currently, the struct page memory is always zeroed during allocation,
> which prevents this problem from being detected. But, if some asserts
> provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
> visible in existing kernels.
> 
> In this patchset we will stop zeroing struct page memory during allocation.
> Therefore, this bug must be fixed in order to avoid random assert failures
> caused by CONFIG_DEBUG_VM_PGFLAGS triggers.
> 
> The fix is to reserve memory from the first existing PFN.
> 
> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
> Reviewed-by: Bob Picco <bob.picco@oracle.com>
> ---
>  arch/x86/kernel/setup.c | 5 ++++-
>  1 file changed, 4 insertions(+), 1 deletion(-)
> 
> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
> index 3486d0498800..489cdc141bcb 100644
> --- a/arch/x86/kernel/setup.c
> +++ b/arch/x86/kernel/setup.c
> @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow);
>  
>  static void __init trim_low_memory_range(void)
>  {
> -	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
> +	unsigned long min_pfn = find_min_pfn_with_active_regions();
> +	phys_addr_t base = min_pfn << PAGE_SHIFT;
> +
> +	memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
>  }
>  	
>  /*
> -- 
> 2.14.0
Pavel Tatashin Aug. 17, 2017, 3:37 p.m. UTC | #6
Hi Michal,

While working on a bug that was reported to me by "kernel test robot".

  unable to handle kernel NULL pointer dereference at           (null)

The issue was that page_to_pfn() on that configuration was looking for a 
section inside flags fields in "struct page". So, reserved but 
unavailable memory should have its "struct page" zeroed.

Therefore, I am going to remove this patch from my series, but instead 
have a new patch that iterates through:

reserved && !memory memblocks, and zeroes struct pages for them. Since 
for that memory struct pages will never go through __init_single_page(), 
yet some fields might still be accessed.

Pasha

On 08/14/2017 09:55 AM, Michal Hocko wrote:
> Let's CC Hpa on this one. I am still not sure it is correct. The full
> series is here
> http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatashin@oracle.com
> 
> On Mon 07-08-17 16:38:35, Pavel Tatashin wrote:
>> Struct pages are initialized by going through __init_single_page(). Since
>> the existing physical memory in memblock is represented in memblock.memory
>> list, struct page for every page from this list goes through
>> __init_single_page().
>>
>> The second memblock list: memblock.reserved, manages the allocated memory.
>> The memory that won't be available to kernel allocator. So, every page from
>> this list goes through reserve_bootmem_region(), where certain struct page
>> fields are set, the assumption being that the struct pages have been
>> initialized beforehand.
>>
>> In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but
>> memblock.memory might start at a later PFN. For example, in QEMU,
>> e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory,
>> so PFN 0 is not on memblock.memory (and hence isn't initialized via
>> __init_single_page) but is on memblock.reserved (and hence we set fields in
>> the uninitialized struct page).
>>
>> Currently, the struct page memory is always zeroed during allocation,
>> which prevents this problem from being detected. But, if some asserts
>> provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become
>> visible in existing kernels.
>>
>> In this patchset we will stop zeroing struct page memory during allocation.
>> Therefore, this bug must be fixed in order to avoid random assert failures
>> caused by CONFIG_DEBUG_VM_PGFLAGS triggers.
>>
>> The fix is to reserve memory from the first existing PFN.
>>
>> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com>
>> Reviewed-by: Steven Sistare <steven.sistare@oracle.com>
>> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com>
>> Reviewed-by: Bob Picco <bob.picco@oracle.com>
>> ---
>>   arch/x86/kernel/setup.c | 5 ++++-
>>   1 file changed, 4 insertions(+), 1 deletion(-)
>>
>> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
>> index 3486d0498800..489cdc141bcb 100644
>> --- a/arch/x86/kernel/setup.c
>> +++ b/arch/x86/kernel/setup.c
>> @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow);
>>   
>>   static void __init trim_low_memory_range(void)
>>   {
>> -	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
>> +	unsigned long min_pfn = find_min_pfn_with_active_regions();
>> +	phys_addr_t base = min_pfn << PAGE_SHIFT;
>> +
>> +	memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
>>   }
>>   	
>>   /*
>> -- 
>> 2.14.0
>
diff mbox

Patch

diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c
index 3486d0498800..489cdc141bcb 100644
--- a/arch/x86/kernel/setup.c
+++ b/arch/x86/kernel/setup.c
@@ -790,7 +790,10 @@  early_param("reservelow", parse_reservelow);
 
 static void __init trim_low_memory_range(void)
 {
-	memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE));
+	unsigned long min_pfn = find_min_pfn_with_active_regions();
+	phys_addr_t base = min_pfn << PAGE_SHIFT;
+
+	memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE));
 }
 	
 /*