Message ID | 1502138329-123460-2-git-send-email-pasha.tatashin@oracle.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Mon 07-08-17 16:38:35, Pavel Tatashin wrote: > Struct pages are initialized by going through __init_single_page(). Since > the existing physical memory in memblock is represented in memblock.memory > list, struct page for every page from this list goes through > __init_single_page(). By a page _from_ this list you mean struct pages backing the physical memory of the memblock lists? > The second memblock list: memblock.reserved, manages the allocated memory. > The memory that won't be available to kernel allocator. So, every page from > this list goes through reserve_bootmem_region(), where certain struct page > fields are set, the assumption being that the struct pages have been > initialized beforehand. > > In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but > memblock.memory might start at a later PFN. For example, in QEMU, > e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory, > so PFN 0 is not on memblock.memory (and hence isn't initialized via > __init_single_page) but is on memblock.reserved (and hence we set fields in > the uninitialized struct page). > > Currently, the struct page memory is always zeroed during allocation, > which prevents this problem from being detected. But, if some asserts > provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become > visible in existing kernels. > > In this patchset we will stop zeroing struct page memory during allocation. > Therefore, this bug must be fixed in order to avoid random assert failures > caused by CONFIG_DEBUG_VM_PGFLAGS triggers. > > The fix is to reserve memory from the first existing PFN. Hmm, I assume this is a result of some assert triggering, right? Which one? Why don't we need the same treatment for other than x86 arch? > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > Reviewed-by: Steven Sistare <steven.sistare@oracle.com> > Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> > Reviewed-by: Bob Picco <bob.picco@oracle.com> I guess that the review happened inhouse. I do not want to question its value but it is rather strange to not hear the specific review comments which might be useful in general and moreover even not include those people on the CC list so they are aware of the follow up discussion. > --- > arch/x86/kernel/setup.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3486d0498800..489cdc141bcb 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow); > > static void __init trim_low_memory_range(void) > { > - memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); > + unsigned long min_pfn = find_min_pfn_with_active_regions(); > + phys_addr_t base = min_pfn << PAGE_SHIFT; > + > + memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE)); > } > > /* > -- > 2.14.0
>> Struct pages are initialized by going through __init_single_page(). Since >> the existing physical memory in memblock is represented in memblock.memory >> list, struct page for every page from this list goes through >> __init_single_page(). > > By a page _from_ this list you mean struct pages backing the physical > memory of the memblock lists? Correct: "for every page from this list...", for every page represented by this list the struct page is initialized through __init_single_page() >> In this patchset we will stop zeroing struct page memory during allocation. >> Therefore, this bug must be fixed in order to avoid random assert failures >> caused by CONFIG_DEBUG_VM_PGFLAGS triggers. >> >> The fix is to reserve memory from the first existing PFN. > > Hmm, I assume this is a result of some assert triggering, right? Which > one? Why don't we need the same treatment for other than x86 arch? Correct, the pgflags asserts were triggered when we were setting reserved flags to struct page for PFN 0 in which was never initialized through __init_single_page(). The reason they were triggered is because we set all uninitialized memory to ones in one of the debug patches. >> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> >> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> >> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> >> Reviewed-by: Bob Picco <bob.picco@oracle.com> > > I guess that the review happened inhouse. I do not want to question its > value but it is rather strange to not hear the specific review comments > which might be useful in general and moreover even not include those > people on the CC list so they are aware of the follow up discussion. I will bring this up with my colleagues to how to handle this better in the future. I will also CC the reviewers when I sent out the updated patch series.
On Fri 11-08-17 11:24:55, Pasha Tatashin wrote: [...] > >>In this patchset we will stop zeroing struct page memory during allocation. > >>Therefore, this bug must be fixed in order to avoid random assert failures > >>caused by CONFIG_DEBUG_VM_PGFLAGS triggers. > >> > >>The fix is to reserve memory from the first existing PFN. > > > >Hmm, I assume this is a result of some assert triggering, right? Which > >one? Why don't we need the same treatment for other than x86 arch? > > Correct, the pgflags asserts were triggered when we were setting reserved > flags to struct page for PFN 0 in which was never initialized through > __init_single_page(). The reason they were triggered is because we set all > uninitialized memory to ones in one of the debug patches. And why don't we need the same treatment for other architectures?
>> Correct, the pgflags asserts were triggered when we were setting reserved >> flags to struct page for PFN 0 in which was never initialized through >> __init_single_page(). The reason they were triggered is because we set all >> uninitialized memory to ones in one of the debug patches. > > And why don't we need the same treatment for other architectures? > I have not seen similar issues on other architectures. At least this low memory reserve is x86 specific for BIOS purposes: Documentation/admin-guide/kernel-parameters.txt 3624 reservelow= [X86] 3625 Format: nn[K] 3626 Set the amount of memory to reserve for BIOS at 3627 the bottom of the address space. If there are similar cases with other architectures, they will be caught by the last patch in this series, where all allocated memory is set to ones, and page flags asserts will be triggered. I have boot-tested on SPARC, ARM, and x86. Pasha
Let's CC Hpa on this one. I am still not sure it is correct. The full series is here http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatashin@oracle.com On Mon 07-08-17 16:38:35, Pavel Tatashin wrote: > Struct pages are initialized by going through __init_single_page(). Since > the existing physical memory in memblock is represented in memblock.memory > list, struct page for every page from this list goes through > __init_single_page(). > > The second memblock list: memblock.reserved, manages the allocated memory. > The memory that won't be available to kernel allocator. So, every page from > this list goes through reserve_bootmem_region(), where certain struct page > fields are set, the assumption being that the struct pages have been > initialized beforehand. > > In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but > memblock.memory might start at a later PFN. For example, in QEMU, > e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory, > so PFN 0 is not on memblock.memory (and hence isn't initialized via > __init_single_page) but is on memblock.reserved (and hence we set fields in > the uninitialized struct page). > > Currently, the struct page memory is always zeroed during allocation, > which prevents this problem from being detected. But, if some asserts > provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become > visible in existing kernels. > > In this patchset we will stop zeroing struct page memory during allocation. > Therefore, this bug must be fixed in order to avoid random assert failures > caused by CONFIG_DEBUG_VM_PGFLAGS triggers. > > The fix is to reserve memory from the first existing PFN. > > Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> > Reviewed-by: Steven Sistare <steven.sistare@oracle.com> > Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> > Reviewed-by: Bob Picco <bob.picco@oracle.com> > --- > arch/x86/kernel/setup.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c > index 3486d0498800..489cdc141bcb 100644 > --- a/arch/x86/kernel/setup.c > +++ b/arch/x86/kernel/setup.c > @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow); > > static void __init trim_low_memory_range(void) > { > - memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); > + unsigned long min_pfn = find_min_pfn_with_active_regions(); > + phys_addr_t base = min_pfn << PAGE_SHIFT; > + > + memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE)); > } > > /* > -- > 2.14.0
Hi Michal, While working on a bug that was reported to me by "kernel test robot". unable to handle kernel NULL pointer dereference at (null) The issue was that page_to_pfn() on that configuration was looking for a section inside flags fields in "struct page". So, reserved but unavailable memory should have its "struct page" zeroed. Therefore, I am going to remove this patch from my series, but instead have a new patch that iterates through: reserved && !memory memblocks, and zeroes struct pages for them. Since for that memory struct pages will never go through __init_single_page(), yet some fields might still be accessed. Pasha On 08/14/2017 09:55 AM, Michal Hocko wrote: > Let's CC Hpa on this one. I am still not sure it is correct. The full > series is here > http://lkml.kernel.org/r/1502138329-123460-1-git-send-email-pasha.tatashin@oracle.com > > On Mon 07-08-17 16:38:35, Pavel Tatashin wrote: >> Struct pages are initialized by going through __init_single_page(). Since >> the existing physical memory in memblock is represented in memblock.memory >> list, struct page for every page from this list goes through >> __init_single_page(). >> >> The second memblock list: memblock.reserved, manages the allocated memory. >> The memory that won't be available to kernel allocator. So, every page from >> this list goes through reserve_bootmem_region(), where certain struct page >> fields are set, the assumption being that the struct pages have been >> initialized beforehand. >> >> In trim_low_memory_range() we unconditionally reserve memoryfrom PFN 0, but >> memblock.memory might start at a later PFN. For example, in QEMU, >> e820__memblock_setup() can use PFN 1 as the first PFN in memblock.memory, >> so PFN 0 is not on memblock.memory (and hence isn't initialized via >> __init_single_page) but is on memblock.reserved (and hence we set fields in >> the uninitialized struct page). >> >> Currently, the struct page memory is always zeroed during allocation, >> which prevents this problem from being detected. But, if some asserts >> provided by CONFIG_DEBUG_VM_PGFLAGS are tighten, this problem may become >> visible in existing kernels. >> >> In this patchset we will stop zeroing struct page memory during allocation. >> Therefore, this bug must be fixed in order to avoid random assert failures >> caused by CONFIG_DEBUG_VM_PGFLAGS triggers. >> >> The fix is to reserve memory from the first existing PFN. >> >> Signed-off-by: Pavel Tatashin <pasha.tatashin@oracle.com> >> Reviewed-by: Steven Sistare <steven.sistare@oracle.com> >> Reviewed-by: Daniel Jordan <daniel.m.jordan@oracle.com> >> Reviewed-by: Bob Picco <bob.picco@oracle.com> >> --- >> arch/x86/kernel/setup.c | 5 ++++- >> 1 file changed, 4 insertions(+), 1 deletion(-) >> >> diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c >> index 3486d0498800..489cdc141bcb 100644 >> --- a/arch/x86/kernel/setup.c >> +++ b/arch/x86/kernel/setup.c >> @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow); >> >> static void __init trim_low_memory_range(void) >> { >> - memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); >> + unsigned long min_pfn = find_min_pfn_with_active_regions(); >> + phys_addr_t base = min_pfn << PAGE_SHIFT; >> + >> + memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE)); >> } >> >> /* >> -- >> 2.14.0 >
diff --git a/arch/x86/kernel/setup.c b/arch/x86/kernel/setup.c index 3486d0498800..489cdc141bcb 100644 --- a/arch/x86/kernel/setup.c +++ b/arch/x86/kernel/setup.c @@ -790,7 +790,10 @@ early_param("reservelow", parse_reservelow); static void __init trim_low_memory_range(void) { - memblock_reserve(0, ALIGN(reserve_low, PAGE_SIZE)); + unsigned long min_pfn = find_min_pfn_with_active_regions(); + phys_addr_t base = min_pfn << PAGE_SHIFT; + + memblock_reserve(base, ALIGN(reserve_low, PAGE_SIZE)); } /*