Message ID | 20210421065108.1987-1-rppt@kernel.org (mailing list archive) |
---|---|
Headers | show |
Series | arm64: drop pfn_valid_within() and simplify pfn_valid() | expand |
On 2021/4/21 14:51, Mike Rapoport wrote: > From: Mike Rapoport <rppt@linux.ibm.com> > > Hi, > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire > pfn_valid_within() to 1. > > The idea is to mark NOMAP pages as reserved in the memory map and restore > the intended semantics of pfn_valid() to designate availability of struct > page for a pfn. > > With this the core mm will be able to cope with the fact that it cannot use > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks > will be treated correctly even without the need for pfn_valid_within. > > The patches are only boot tested on qemu-system-aarch64 so I'd really > appreciate memory stress tests on real hardware. > > If this actually works we'll be one step closer to drop custom pfn_valid() > on arm64 altogether. Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in move_freepages_block()->move_freepages() will be optimized, if there are holes in zone, the 'struce page'(memory map) for pfn range of hole will be free by free_memmap(), and then the page traverse in the zone(with holes) from move_freepages() will meet the wrong page, then it could panic at PageLRU(page) test, check link[1], "The idea is to mark NOMAP pages as reserved in the memory map", I see the patch2 check memblock_is_nomap() in memory region of memblock, but it seems that memblock_mark_nomap() is not called(maybe I missed), then memmap_init_reserved_pages() won't work, so should the HOLES_IN_ZONE still be needed for generic mm code? [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/
On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote: > > On 2021/4/21 14:51, Mike Rapoport wrote: > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > Hi, > > > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire > > pfn_valid_within() to 1. > > > > The idea is to mark NOMAP pages as reserved in the memory map and restore > > the intended semantics of pfn_valid() to designate availability of struct > > page for a pfn. > > > > With this the core mm will be able to cope with the fact that it cannot use > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks > > will be treated correctly even without the need for pfn_valid_within. > > > > The patches are only boot tested on qemu-system-aarch64 so I'd really > > appreciate memory stress tests on real hardware. > > > > If this actually works we'll be one step closer to drop custom pfn_valid() > > on arm64 altogether. > > Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in > move_freepages_block()->move_freepages() > will be optimized, if there are holes in zone, the 'struce page'(memory map) > for pfn range of hole will be free by > free_memmap(), and then the page traverse in the zone(with holes) from > move_freepages() will meet the wrong page, > then it could panic at PageLRU(page) test, check link[1], First, HOLES_IN_ZONE name us hugely misleading, this configuration option has nothing to to with memory holes, but rather it is there to deal with holes or undefined struct pages in the memory map, when these holes can be inside a MAX_ORDER_NR_PAGES region. In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid accessing *missing* struct pages, like those that are freed at free_memmap(). But on arm64 these tests also filter out the nomap entries because their struct pages are not initialized. The panic you refer to happened because there was an uninitialized struct page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to nomap memory. With these changes I make sure that such pages will be properly initialized as PageReserved and the pfn walkers will be able to rely on the memory map. Note also, that free_memmap() aligns the parts being freed on MAX_ORDER boundaries, so there will be no missing parts in the memory map within a MAX_ORDER_NR_PAGES region. > "The idea is to mark NOMAP pages as reserved in the memory map", I see the > patch2 check memblock_is_nomap() in memory region > of memblock, but it seems that memblock_mark_nomap() is not called(maybe I > missed), then memmap_init_reserved_pages() won't > work, so should the HOLES_IN_ZONE still be needed for generic mm code? > > [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/ >
On 2021/4/22 15:29, Mike Rapoport wrote: > On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote: >> On 2021/4/21 14:51, Mike Rapoport wrote: >>> From: Mike Rapoport <rppt@linux.ibm.com> >>> >>> Hi, >>> >>> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire >>> pfn_valid_within() to 1. >>> >>> The idea is to mark NOMAP pages as reserved in the memory map and restore >>> the intended semantics of pfn_valid() to designate availability of struct >>> page for a pfn. >>> >>> With this the core mm will be able to cope with the fact that it cannot use >>> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks >>> will be treated correctly even without the need for pfn_valid_within. >>> >>> The patches are only boot tested on qemu-system-aarch64 so I'd really >>> appreciate memory stress tests on real hardware. >>> >>> If this actually works we'll be one step closer to drop custom pfn_valid() >>> on arm64 altogether. >> Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in >> move_freepages_block()->move_freepages() >> will be optimized, if there are holes in zone, the 'struce page'(memory map) >> for pfn range of hole will be free by >> free_memmap(), and then the page traverse in the zone(with holes) from >> move_freepages() will meet the wrong page, >> then it could panic at PageLRU(page) test, check link[1], > First, HOLES_IN_ZONE name us hugely misleading, this configuration option > has nothing to to with memory holes, but rather it is there to deal with > holes or undefined struct pages in the memory map, when these holes can be > inside a MAX_ORDER_NR_PAGES region. > > In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid > accessing *missing* struct pages, like those that are freed at > free_memmap(). But on arm64 these tests also filter out the nomap entries > because their struct pages are not initialized. > > The panic you refer to happened because there was an uninitialized struct > page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to > nomap memory. > > With these changes I make sure that such pages will be properly initialized > as PageReserved and the pfn walkers will be able to rely on the memory map. > > Note also, that free_memmap() aligns the parts being freed on MAX_ORDER > boundaries, so there will be no missing parts in the memory map within a > MAX_ORDER_NR_PAGES region. Ok, thanks, we met a same panic like the link on arm32(without HOLES_IN_ZONE), the scheme for arm64 could be suit for arm32, right? I will try the patchset with some changes on arm32 and give some feedback. Again, the stupid question, where will mark the region of memblock with MEMBLOCK_NOMAP flag ? > >> "The idea is to mark NOMAP pages as reserved in the memory map", I see the >> patch2 check memblock_is_nomap() in memory region >> of memblock, but it seems that memblock_mark_nomap() is not called(maybe I >> missed), then memmap_init_reserved_pages() won't >> work, so should the HOLES_IN_ZONE still be needed for generic mm code? >> >> [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/ >>
On 2021/4/22 23:28, Kefeng Wang wrote: > > On 2021/4/22 15:29, Mike Rapoport wrote: >> On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote: >>> On 2021/4/21 14:51, Mike Rapoport wrote: >>>> From: Mike Rapoport <rppt@linux.ibm.com> >>>> >>>> Hi, >>>> >>>> These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially >>>> hardwire >>>> pfn_valid_within() to 1. >>>> >>>> The idea is to mark NOMAP pages as reserved in the memory map and >>>> restore >>>> the intended semantics of pfn_valid() to designate availability of >>>> struct >>>> page for a pfn. >>>> >>>> With this the core mm will be able to cope with the fact that it >>>> cannot use >>>> NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER >>>> blocks >>>> will be treated correctly even without the need for pfn_valid_within. >>>> >>>> The patches are only boot tested on qemu-system-aarch64 so I'd really >>>> appreciate memory stress tests on real hardware. >>>> >>>> If this actually works we'll be one step closer to drop custom >>>> pfn_valid() >>>> on arm64 altogether. ... > > Ok, thanks, we met a same panic like the link on arm32(without > HOLES_IN_ZONE), > > the scheme for arm64 could be suit for arm32, right? I will try the > patchset with > > some changes on arm32 and give some feedback. I tested this patchset(plus arm32 change, like arm64 does) based on lts 5.10,add some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no panic, any idea, thanks. Zone ranges: Normal [mem 0x0000000080a00000-0x00000000b01fffff] HighMem [mem 0x00000000b0200000-0x00000000ffffefff] Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000080a00000-0x00000000855fffff] node 0: [mem 0x0000000086a00000-0x0000000087dfffff] node 0: [mem 0x000000008bd00000-0x000000008c4fffff] node 0: [mem 0x000000008e300000-0x000000008ecfffff] node 0: [mem 0x0000000090d00000-0x00000000bfffffff] node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] node 0: [mem 0x00000000de700000-0x00000000de9fffff] node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] node 0: [mem 0x00000000fda00000-0x00000000ffffefff] ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 === >move_freepages: start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff 8<--- cut here --- Unable to handle kernel paging request at virtual address fffffffe pgd = 5dd50df5 [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 Internal error: Oops: 37 [#1] SMP ARM Modules linked in: gmac(O) CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 Hardware name: Hisilicon A9 PC is at move_freepages_block+0x150/0x278 LR is at move_freepages_block+0x150/0x278 pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 sp : c4179cf8 ip : 00000000 fp : 00000001 r10: c4179d58 r9 : 000de7ff r8 : 00000000 r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user Control: 1ac5387d Table: 83b0c04a DAC: 55555555 Process test-oom (pid: 635, stack limit = 0x25d667df)
On Thu, Apr 22, 2021 at 11:28:24PM +0800, Kefeng Wang wrote: > > On 2021/4/22 15:29, Mike Rapoport wrote: > > On Thu, Apr 22, 2021 at 03:00:20PM +0800, Kefeng Wang wrote: > > > On 2021/4/21 14:51, Mike Rapoport wrote: > > > > From: Mike Rapoport <rppt@linux.ibm.com> > > > > > > > > Hi, > > > > > > > > These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire > > > > pfn_valid_within() to 1. > > > > > > > > The idea is to mark NOMAP pages as reserved in the memory map and restore > > > > the intended semantics of pfn_valid() to designate availability of struct > > > > page for a pfn. > > > > > > > > With this the core mm will be able to cope with the fact that it cannot use > > > > NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks > > > > will be treated correctly even without the need for pfn_valid_within. > > > > > > > > The patches are only boot tested on qemu-system-aarch64 so I'd really > > > > appreciate memory stress tests on real hardware. > > > > > > > > If this actually works we'll be one step closer to drop custom pfn_valid() > > > > on arm64 altogether. > > > Hi Mike,I have a question, without HOLES_IN_ZONE, the pfn_valid_within() in > > > move_freepages_block()->move_freepages() > > > will be optimized, if there are holes in zone, the 'struce page'(memory map) > > > for pfn range of hole will be free by > > > free_memmap(), and then the page traverse in the zone(with holes) from > > > move_freepages() will meet the wrong page, > > > then it could panic at PageLRU(page) test, check link[1], > > First, HOLES_IN_ZONE name us hugely misleading, this configuration option > > has nothing to to with memory holes, but rather it is there to deal with > > holes or undefined struct pages in the memory map, when these holes can be > > inside a MAX_ORDER_NR_PAGES region. > > > > In general pfn walkers use pfn_valid() and pfn_valid_within() to avoid > > accessing *missing* struct pages, like those that are freed at > > free_memmap(). But on arm64 these tests also filter out the nomap entries > > because their struct pages are not initialized. > > > > The panic you refer to happened because there was an uninitialized struct > > page in the middle of MAX_ORDER_NR_PAGES region because it corresponded to > > nomap memory. > > > > With these changes I make sure that such pages will be properly initialized > > as PageReserved and the pfn walkers will be able to rely on the memory map. > > > > Note also, that free_memmap() aligns the parts being freed on MAX_ORDER > > boundaries, so there will be no missing parts in the memory map within a > > MAX_ORDER_NR_PAGES region. > > Ok, thanks, we met a same panic like the link on arm32(without > HOLES_IN_ZONE), > > the scheme for arm64 could be suit for arm32, right? In general yes. You just need to make sure that usage of pfn_valid() in arch/arm does not presume that it tests something beyond availability of struct page for a pfn. > I will try the patchset with some changes on arm32 and give some > feedback. > > Again, the stupid question, where will mark the region of memblock with > MEMBLOCK_NOMAP flag ? Not sure I understand the question. The memory regions with "nomap" property in the device tree will be marked MEMBLOCK_NOMAP. > > > "The idea is to mark NOMAP pages as reserved in the memory map", I see the > > > patch2 check memblock_is_nomap() in memory region > > > of memblock, but it seems that memblock_mark_nomap() is not called(maybe I > > > missed), then memmap_init_reserved_pages() won't > > > work, so should the HOLES_IN_ZONE still be needed for generic mm code? > > > > > > [1] https://lore.kernel.org/linux-arm-kernel/541193a6-2bce-f042-5bb2-88913d5f1047@arm.com/ > > >
On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: > > I tested this patchset(plus arm32 change, like arm64 does) based on lts > 5.10,add > > some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no > panic, > > any idea, thanks. Are there any changes on top of 5.10 except for pfn_valid() patch? Do you see this panic on 5.10 without the changes? Can you see stack backtrace beyond move_freepages_block? > Zone ranges: > Normal [mem 0x0000000080a00000-0x00000000b01fffff] > HighMem [mem 0x00000000b0200000-0x00000000ffffefff] > Movable zone start for each node > Early memory node ranges > node 0: [mem 0x0000000080a00000-0x00000000855fffff] > node 0: [mem 0x0000000086a00000-0x0000000087dfffff] > node 0: [mem 0x000000008bd00000-0x000000008c4fffff] > node 0: [mem 0x000000008e300000-0x000000008ecfffff] > node 0: [mem 0x0000000090d00000-0x00000000bfffffff] > node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] > node 0: [mem 0x00000000de700000-0x00000000de9fffff] > node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] > node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] > node 0: [mem 0x00000000fda00000-0x00000000ffffefff] > > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000] > : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff > 8<--- cut here --- > Unable to handle kernel paging request at virtual address fffffffe > pgd = 5dd50df5 > [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 > Internal error: Oops: 37 [#1] SMP ARM > Modules linked in: gmac(O) > CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 > Hardware name: Hisilicon A9 > PC is at move_freepages_block+0x150/0x278 > LR is at move_freepages_block+0x150/0x278 > pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 > sp : c4179cf8 ip : 00000000 fp : 00000001 > r10: c4179d58 r9 : 000de7ff r8 : 00000000 > r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 > r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user > Control: 1ac5387d Table: 83b0c04a DAC: 55555555 > Process test-oom (pid: 635, stack limit = 0x25d667df) >
On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: > > On 2021/4/25 15:19, Mike Rapoport wrote: > > On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: > > I tested this patchset(plus arm32 change, like arm64 does) based on lts > 5.10,add > > some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no > panic, > > any idea, thanks. > > > Are there any changes on top of 5.10 except for pfn_valid() patch? > Do you see this panic on 5.10 without the changes? > > Yes, there are some BSP support for arm board based on 5.10, with or without > > your patch will get same panic, the panic pfn=de600 in the range of > [dcc00,de00] > > which is freed by free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, > de700000 > > we see the PC is at PageLRU, same reason like arm64 panic log, > > "PageBuddy in move_freepages returns false > Then we call PageLRU, the macro calls PF_HEAD which is compound_page() > compound_page reads page->compound_head, it is 0xffffffffffffffff, so it > resturns 0xfffffffffffffffe - and accessing this address causes crash" > > Can you see stack backtrace beyond move_freepages_block? > > I do some oom test, so the log is about memory allocate, > > [<c02383c8>] (move_freepages_block) from [<c0238668>] > (steal_suitable_fallback+0x174/0x1f4) > > [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4) Hmm, this is called with a page from free list, having a page from a freed part of the memory map passed to steal_suitable_fallback() means that there is an issue with creation of the free list. Can you please add "memblock=debug" to the kernel command line and post the log? > [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08) > [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c) > [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8) > [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4) > [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0) > [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60) > > > > Zone ranges: > Normal [mem 0x0000000080a00000-0x00000000b01fffff] > HighMem [mem 0x00000000b0200000-0x00000000ffffefff] > Movable zone start for each node > Early memory node ranges > node 0: [mem 0x0000000080a00000-0x00000000855fffff] > node 0: [mem 0x0000000086a00000-0x0000000087dfffff] > node 0: [mem 0x000000008bd00000-0x000000008c4fffff] > node 0: [mem 0x000000008e300000-0x000000008ecfffff] > node 0: [mem 0x0000000090d00000-0x00000000bfffffff] > node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] > node 0: [mem 0x00000000de700000-0x00000000de9fffff] > node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] > node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] > node 0: [mem 0x00000000fda00000-0x00000000ffffefff] > > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000] > : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff > 8<--- cut here --- > Unable to handle kernel paging request at virtual address fffffffe > pgd = 5dd50df5 > [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 > Internal error: Oops: 37 [#1] SMP ARM > Modules linked in: gmac(O) > CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 > Hardware name: Hisilicon A9 > PC is at move_freepages_block+0x150/0x278 > LR is at move_freepages_block+0x150/0x278 > pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 > sp : c4179cf8 ip : 00000000 fp : 00000001 > r10: c4179d58 r9 : 000de7ff r8 : 00000000 > r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 > r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user > Control: 1ac5387d Table: 83b0c04a DAC: 55555555 > Process test-oom (pid: 635, stack limit = 0x25d667df) > >
On 2021/4/26 13:20, Mike Rapoport wrote: > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: >> On 2021/4/25 15:19, Mike Rapoport wrote: >> >> On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: >> >> I tested this patchset(plus arm32 change, like arm64 does) based on lts >> 5.10,add >> >> some debug log, the useful info shows below, if we enable HOLES_IN_ZONE, no >> panic, >> >> any idea, thanks. >> >> >> Are there any changes on top of 5.10 except for pfn_valid() patch? >> Do you see this panic on 5.10 without the changes? >> >> Yes, there are some BSP support for arm board based on 5.10, with or without >> >> your patch will get same panic, the panic pfn=de600 in the range of >> [dcc00,de00] >> >> which is freed by free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, >> de700000 >> >> we see the PC is at PageLRU, same reason like arm64 panic log, >> >> "PageBuddy in move_freepages returns false >> Then we call PageLRU, the macro calls PF_HEAD which is compound_page() >> compound_page reads page->compound_head, it is 0xffffffffffffffff, so it >> resturns 0xfffffffffffffffe - and accessing this address causes crash" >> >> Can you see stack backtrace beyond move_freepages_block? >> >> I do some oom test, so the log is about memory allocate, >> >> [<c02383c8>] (move_freepages_block) from [<c0238668>] >> (steal_suitable_fallback+0x174/0x1f4) >> >> [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4) > Hmm, this is called with a page from free list, having a page from a freed > part of the memory map passed to steal_suitable_fallback() means that there > is an issue with creation of the free list. > > Can you please add "memblock=debug" to the kernel command line and post the > log? Here is the log, CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache OF: fdt: Machine model: HISI-CA9 memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188 memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4 Malformed early option 'vecpage_wrprotect' Memory policy: Data cache writealloc memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30 MEMBLOCK configuration: memory size = 0x4c0fffff reserved size = 0x027ef058 memory.cnt = 0xa memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 memory[0x7] [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0 memory[0x8] [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0 memory[0x9] [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0 reserved.cnt = 0x4 reserved[0x0] [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0 reserved[0x1] [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0 reserved[0x2] [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0 reserved[0x3] [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0 memblock_alloc_try_nid: 2097152 bytes align=0x200000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xb0000000-0xb01fffff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xaffff000-0xafffffff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 40 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 iotable_init+0x34/0xf0 memblock_reserve: [0xafffefd8-0xafffefff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xafffd000-0xafffdfff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xafffc000-0xafffcfff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xafffb000-0xafffbfff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 early_alloc+0x20/0x4c memblock_reserve: [0xafffa000-0xafffafff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 384 bytes align=0x20 nid=0 from=0x00000000 max_addr=0x00000000 sparse_init_nid+0x34/0x1d8 memblock_reserve: [0xafffee40-0xafffefbf] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_exact_nid_raw: 12582912 bytes align=0x80000 nid=0 from=0xc09fffff max_addr=0x00000000 sparse_init_nid+0xec/0x1d8 memblock_reserve: [0xaf380000-0xaff7ffff] memblock_alloc_range_nid+0x104/0x13c Zone ranges: Normal [mem 0x0000000080a00000-0x00000000b01fffff] HighMem [mem 0x00000000b0200000-0x00000000ffffefff] Movable zone start for each node Early memory node ranges node 0: [mem 0x0000000080a00000-0x00000000855fffff] node 0: [mem 0x0000000086a00000-0x0000000087dfffff] node 0: [mem 0x000000008bd00000-0x000000008c4fffff] node 0: [mem 0x000000008e300000-0x000000008ecfffff] node 0: [mem 0x0000000090d00000-0x00000000bfffffff] node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] node 0: [mem 0x00000000de700000-0x00000000de9fffff] node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] node 0: [mem 0x00000000fda00000-0x00000000ffffefff] Zeroed struct page in unavailable ranges: 513 pages Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff] On node 0 totalpages: 311551 Normal zone: 1230 pages used for memmap Normal zone: 0 pages reserved Normal zone: 157440 pages, LIFO batch:31 HighMem zone: 154111 pages, LIFO batch:31 memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffee20-0xafffee3f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffee00-0xafffee1f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffede0-0xafffedff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffedc0-0xafffeddf] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffeda0-0xafffedbf] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffed80-0xafffed9f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffed60-0xafffed7f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffed40-0xafffed5f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffed20-0xafffed3f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 32 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 setup_arch+0x440/0x5c4 memblock_reserve: [0xafffed00-0xafffed1f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 22396 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 early_init_dt_alloc_memory_arch+0x30/0x64 memblock_reserve: [0xafff4884-0xafff9fff] memblock_alloc_range_nid+0x104/0x13c [dts]:cpu type is 1380 memblock_alloc_try_nid: 404 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc.constprop.8+0x1c/0x24 memblock_reserve: [0xafffeb60-0xafffecf3] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 404 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc.constprop.8+0x1c/0x24 memblock_reserve: [0xafffe9c0-0xafffeb53] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x1000 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafff3000-0xafff3fff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4096 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafff2000-0xafff2fff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 278528 bytes align=0x1000 nid=-1 from=0xc09fffff max_addr=0x00000000 pcpu_dfl_fc_alloc+0x28/0x34 memblock_reserve: [0xaffae000-0xafff1fff] memblock_alloc_range_nid+0x104/0x13c memblock_free: [0xaffbf000-0xaffbefff] pcpu_embed_first_chunk+0x5ec/0x6a8 memblock_free: [0xaffd0000-0xaffcffff] pcpu_embed_first_chunk+0x5ec/0x6a8 memblock_free: [0xaffe1000-0xaffe0fff] pcpu_embed_first_chunk+0x5ec/0x6a8 memblock_free: [0xafff2000-0xafff1fff] pcpu_embed_first_chunk+0x5ec/0x6a8 percpu: Embedded 17 pages/cpu s37044 r8192 d24396 u69632 memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffefc0-0xafffefc3] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 4 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe9a0-0xafffe9a3] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 16 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe980-0xafffe98f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 16 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe960-0xafffe96f] memblock_alloc_range_nid+0x104/0x13c pcpu-alloc: s37044 r8192 d24396 u69632 alloc=17*4096 pcpu-alloc: [0] 0 [0] 1 [0] 2 [0] 3 memblock_alloc_try_nid: 128 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe8e0-0xafffe95f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 92 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe880-0xafffe8db] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 384 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe700-0xafffe87f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 388 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe560-0xafffe6e3] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 96 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe500-0xafffe55f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 92 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe4a0-0xafffe4fb] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 768 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe1a0-0xafffe49f] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 772 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafff4580-0xafff4883] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 192 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 memblock_alloc+0x18/0x20 memblock_reserve: [0xafffe0e0-0xafffe19f] memblock_alloc_range_nid+0x104/0x13c memblock_free: [0xafff3000-0xafff3fff] pcpu_embed_first_chunk+0x570/0x6a8 memblock_free: [0xafff2000-0xafff2fff] pcpu_embed_first_chunk+0x58c/0x6a8 Built 1 zonelists, mobility grouping on. Total pages: 310321 Kernel command line: console=ttyAMA0,9600n8N lpj=8000000 initrd=0x83000000,0x2000000 maxcpus=4 master_cpu=1 quiet highres=off oops=panic vecpage_wrprotect ksm=1 ramdisk_size=30720 kmemleak=off min_loop=128 lockd.nlm_tcpport=13001 lockd.nlm_udpport=13001 rdinit=/sbin/init root=/dev/ram0 vmalloc=256M printk: log_buf_len individual max cpu contribution: 4096 bytes printk: log_buf_len total cpu_extra contributions: 12288 bytes printk: log_buf_len min size: 16384 bytes memblock_alloc_try_nid: 32768 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 setup_log_buf+0xe4/0x404 memblock_reserve: [0xaffa6000-0xaffadfff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 12288 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 setup_log_buf+0x130/0x404 memblock_reserve: [0xaffa3000-0xaffa5fff] memblock_alloc_range_nid+0x104/0x13c memblock_alloc_try_nid: 90112 bytes align=0x4 nid=-1 from=0x00000000 max_addr=0x00000000 setup_log_buf+0x180/0x404 memblock_reserve: [0xaff8d000-0xaffa2fff] memblock_alloc_range_nid+0x104/0x13c printk: log_buf_len: 32768 bytes printk: early log buf free: 2492(15%) memblock_alloc_try_nid: 524288 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1b0/0x2e8 memblock_reserve: [0xaf300000-0xaf37ffff] memblock_alloc_range_nid+0x104/0x13c Dentry cache hash table entries: 131072 (order: 7, 524288 bytes, linear) memblock_alloc_try_nid: 262144 bytes align=0x20 nid=-1 from=0x00000000 max_addr=0x00000000 alloc_large_system_hash+0x1b0/0x2e8 memblock_reserve: [0xaf2c0000-0xaf2fffff] memblock_alloc_range_nid+0x104/0x13c Inode-cache hash table entries: 65536 (order: 6, 262144 bytes, linear) mem auto-init: stack:off, heap alloc:off, heap free:off memblock_free: [0xaf430000-0xaf453fff] mem_init+0x154/0x238 memblock_free: [0xaf510000-0xaf545fff] mem_init+0x154/0x238 memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x154/0x238 memblock_free: [0xafd98000-0xafdcdfff] mem_init+0x154/0x238 memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x154/0x238 memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x154/0x238 memblock_free: [0xafee0000-0xafefffff] mem_init+0x154/0x238 Memory: 1191160K/1246204K available (4096K kernel code, 436K rwdata, 1120K rodata, 1024K init, 491K bss, 55044K reserved, 0K cma-reserved, 616444K highmem) >> [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08) >> [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c) >> [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8) >> [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4) >> [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0) >> [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60) >> >> >> >> Zone ranges: >> Normal [mem 0x0000000080a00000-0x00000000b01fffff] >> HighMem [mem 0x00000000b0200000-0x00000000ffffefff] >> Movable zone start for each node >> Early memory node ranges >> node 0: [mem 0x0000000080a00000-0x00000000855fffff] >> node 0: [mem 0x0000000086a00000-0x0000000087dfffff] >> node 0: [mem 0x000000008bd00000-0x000000008c4fffff] >> node 0: [mem 0x000000008e300000-0x000000008ecfffff] >> node 0: [mem 0x0000000090d00000-0x00000000bfffffff] >> node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] >> node 0: [mem 0x00000000de700000-0x00000000de9fffff] >> node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] >> node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] >> node 0: [mem 0x00000000fda00000-0x00000000ffffefff] >> >> ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 >> ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 >> ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 >> ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 >> ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 >> ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 >> ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 >> === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000] >> : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff >> 8<--- cut here --- >> Unable to handle kernel paging request at virtual address fffffffe >> pgd = 5dd50df5 >> [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 >> Internal error: Oops: 37 [#1] SMP ARM >> Modules linked in: gmac(O) >> CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 >> Hardware name: Hisilicon A9 >> PC is at move_freepages_block+0x150/0x278 >> LR is at move_freepages_block+0x150/0x278 >> pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 >> sp : c4179cf8 ip : 00000000 fp : 00000001 >> r10: c4179d58 r9 : 000de7ff r8 : 00000000 >> r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 >> r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe >> Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user >> Control: 1ac5387d Table: 83b0c04a DAC: 55555555 >> Process test-oom (pid: 635, stack limit = 0x25d667df) >> >>
On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote: > > On 2021/4/26 13:20, Mike Rapoport wrote: > > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: > > > On 2021/4/25 15:19, Mike Rapoport wrote: > > > > > > On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: > > > > > > I tested this patchset(plus arm32 change, like arm64 does) > > > based on lts 5.10,add some debug log, the useful info shows > > > below, if we enable HOLES_IN_ZONE, no panic, any idea, > > > thanks. > > > > > > Are there any changes on top of 5.10 except for pfn_valid() patch? > > > Do you see this panic on 5.10 without the changes? > > > > > > Yes, there are some BSP support for arm board based on 5.10, Is it possible to test 5.12? > > > with or without your patch will get same panic, the panic pfn=de600 > > > in the range of [dcc00,de00] which is freed by free_memmap, start_pfn > > > = dcc00, dcc00000 end_pfn = de700, de700000 > > > > > > we see the PC is at PageLRU, same reason like arm64 panic log, > > > > > > "PageBuddy in move_freepages returns false > > > Then we call PageLRU, the macro calls PF_HEAD which is compound_page() > > > compound_page reads page->compound_head, it is 0xffffffffffffffff, so it > > > resturns 0xfffffffffffffffe - and accessing this address causes crash" > > > > > > Can you see stack backtrace beyond move_freepages_block? > > > > > > I do some oom test, so the log is about memory allocate, > > > > > > [<c02383c8>] (move_freepages_block) from [<c0238668>] > > > (steal_suitable_fallback+0x174/0x1f4) > > > > > > [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4) > > > > Hmm, this is called with a page from free list, having a page from a freed > > part of the memory map passed to steal_suitable_fallback() means that there > > is an issue with creation of the free list. > > > > Can you please add "memblock=debug" to the kernel command line and post the > > log? > > Here is the log, > > CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d > > CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache > OF: fdt: Machine model: HISI-CA9 > memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188 > memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4 > Malformed early option 'vecpage_wrprotect' > Memory policy: Data cache writealloc > memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c > memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c > memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c > memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30 > MEMBLOCK configuration: > memory size = 0x4c0fffff reserved size = 0x027ef058 > memory.cnt = 0xa > memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 > memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 > memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 > memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 > memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 > memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 > memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 > memory[0x7] [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0 > memory[0x8] [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0 > memory[0x9] [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0 > reserved.cnt = 0x4 > reserved[0x0] [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0 > reserved[0x1] [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0 > reserved[0x2] [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0 > reserved[0x3] [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0 ... > Zone ranges: > Normal [mem 0x0000000080a00000-0x00000000b01fffff] > HighMem [mem 0x00000000b0200000-0x00000000ffffefff] > Movable zone start for each node > Early memory node ranges > node 0: [mem 0x0000000080a00000-0x00000000855fffff] > node 0: [mem 0x0000000086a00000-0x0000000087dfffff] > node 0: [mem 0x000000008bd00000-0x000000008c4fffff] > node 0: [mem 0x000000008e300000-0x000000008ecfffff] > node 0: [mem 0x0000000090d00000-0x00000000bfffffff] > node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] > node 0: [mem 0x00000000de700000-0x00000000de9fffff] > node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] > node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] > node 0: [mem 0x00000000fda00000-0x00000000ffffefff] > Zeroed struct page in unavailable ranges: 513 pages > Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff] > On node 0 totalpages: 311551 > Normal zone: 1230 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 157440 pages, LIFO batch:31 > HighMem zone: 154111 pages, LIFO batch:31 AFAICT the range [de600000, de7ff000] should not be added to the free lists. Can you try with the below patch: diff --git a/mm/memblock.c b/mm/memblock.c index afaefa8fc6ab..7f3c33d53f87 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1994,6 +1994,8 @@ static unsigned long __init __free_memory_core(phys_addr_t start, unsigned long end_pfn = min_t(unsigned long, PFN_DOWN(end), max_low_pfn); + pr_info("%s: range: %pa - %pa, pfn: %lx - %lx\n", __func__, &start, &end, start_pfn, end_pfn); + if (start_pfn >= end_pfn) return 0; > > > [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08) > > > [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c) > > > [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8) > > > [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4) > > > [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0) > > > [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60) > > > > > > Zone ranges: > > > Normal [mem 0x0000000080a00000-0x00000000b01fffff] > > > HighMem [mem 0x00000000b0200000-0x00000000ffffefff] > > > Movable zone start for each node > > > Early memory node ranges > > > node 0: [mem 0x0000000080a00000-0x00000000855fffff] > > > node 0: [mem 0x0000000086a00000-0x0000000087dfffff] > > > node 0: [mem 0x000000008bd00000-0x000000008c4fffff] > > > node 0: [mem 0x000000008e300000-0x000000008ecfffff] > > > node 0: [mem 0x0000000090d00000-0x00000000bfffffff] > > > node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] > > > node 0: [mem 0x00000000de700000-0x00000000de9fffff] > > > node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] > > > node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] > > > node 0: [mem 0x00000000fda00000-0x00000000ffffefff] > > > > > > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 > > > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 > > > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > > > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 > > > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > > > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > > > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > > > === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000] > > > : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff > > > 8<--- cut here --- > > > Unable to handle kernel paging request at virtual address fffffffe > > > pgd = 5dd50df5 > > > [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 > > > Internal error: Oops: 37 [#1] SMP ARM > > > Modules linked in: gmac(O) > > > CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 > > > Hardware name: Hisilicon A9 > > > PC is at move_freepages_block+0x150/0x278 > > > LR is at move_freepages_block+0x150/0x278 > > > pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 > > > sp : c4179cf8 ip : 00000000 fp : 00000001 > > > r10: c4179d58 r9 : 000de7ff r8 : 00000000 > > > r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 > > > r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe > > > Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user > > > Control: 1ac5387d Table: 83b0c04a DAC: 55555555 > > > Process test-oom (pid: 635, stack limit = 0x25d667df) > > > > > >
On 2021/4/27 14:23, Mike Rapoport wrote: > On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote: >> On 2021/4/26 13:20, Mike Rapoport wrote: >>> On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: >>>> On 2021/4/25 15:19, Mike Rapoport wrote: >>>> >>>> On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: >>>> >>>> I tested this patchset(plus arm32 change, like arm64 does) >>>> based on lts 5.10,add some debug log, the useful info shows >>>> below, if we enable HOLES_IN_ZONE, no panic, any idea, >>>> thanks. >>>> >>>> Are there any changes on top of 5.10 except for pfn_valid() patch? >>>> Do you see this panic on 5.10 without the changes? >>>> >>>> Yes, there are some BSP support for arm board based on 5.10, > Is it possible to test 5.12? > >>>> with or without your patch will get same panic, the panic pfn=de600 >>>> in the range of [dcc00,de00] which is freed by free_memmap, start_pfn >>>> = dcc00, dcc00000 end_pfn = de700, de700000 >>>> >>>> we see the PC is at PageLRU, same reason like arm64 panic log, >>>> >>>> "PageBuddy in move_freepages returns false >>>> Then we call PageLRU, the macro calls PF_HEAD which is compound_page() >>>> compound_page reads page->compound_head, it is 0xffffffffffffffff, so it >>>> resturns 0xfffffffffffffffe - and accessing this address causes crash" >>>> >>>> Can you see stack backtrace beyond move_freepages_block? >>>> >>>> I do some oom test, so the log is about memory allocate, >>>> >>>> [<c02383c8>] (move_freepages_block) from [<c0238668>] >>>> (steal_suitable_fallback+0x174/0x1f4) >>>> >>>> [<c0238668>] (steal_suitable_fallback) from [<c023999c>] (get_page_from_freelist+0x490/0x9a4) >>> Hmm, this is called with a page from free list, having a page from a freed >>> part of the memory map passed to steal_suitable_fallback() means that there >>> is an issue with creation of the free list. >>> >>> Can you please add "memblock=debug" to the kernel command line and post the >>> log? >> Here is the log, >> >> CPU: ARMv7 Processor [413fc090] revision 0 (ARMv7), cr=1ac5387d >> >> CPU: PIPT / VIPT nonaliasing data cache, VIPT aliasing instruction cache >> OF: fdt: Machine model: HISI-CA9 >> memblock_add: [0x80a00000-0x855fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0x86a00000-0x87dfffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0x8bd00000-0x8c4fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0x8e300000-0x8ecfffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0x90d00000-0xbfffffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xcc000000-0xdc9fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xe0800000-0xe0bfffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xf5300000-0xf5bfffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xf5c00000-0xf6ffffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xfe100000-0xfebfffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xfec00000-0xffffffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xde700000-0xde9fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xf4b00000-0xf52fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_add: [0xfda00000-0xfe0fffff] early_init_dt_scan_memory+0x11c/0x188 >> memblock_reserve: [0x80a01000-0x80a02d2e] setup_arch+0x68/0x5c4 >> Malformed early option 'vecpage_wrprotect' >> Memory policy: Data cache writealloc >> memblock_reserve: [0x80b00000-0x812e8057] arm_memblock_init+0x34/0x14c >> memblock_reserve: [0x83000000-0x84ffffff] arm_memblock_init+0x100/0x14c >> memblock_reserve: [0x80a04000-0x80a07fff] arm_memblock_init+0xa0/0x14c >> memblock_reserve: [0x80a00000-0x80a02fff] hisi_mem_reserve+0x14/0x30 >> MEMBLOCK configuration: >> memory size = 0x4c0fffff reserved size = 0x027ef058 >> memory.cnt = 0xa >> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >> memory[0x7] [0xe0800000-0xe0bfffff], 0x00400000 bytes flags: 0x0 >> memory[0x8] [0xf4b00000-0xf6ffffff], 0x02500000 bytes flags: 0x0 >> memory[0x9] [0xfda00000-0xfffffffe], 0x025fffff bytes flags: 0x0 >> reserved.cnt = 0x4 >> reserved[0x0] [0x80a00000-0x80a02fff], 0x00003000 bytes flags: 0x0 >> reserved[0x1] [0x80a04000-0x80a07fff], 0x00004000 bytes flags: 0x0 >> reserved[0x2] [0x80b00000-0x812e8057], 0x007e8058 bytes flags: 0x0 >> reserved[0x3] [0x83000000-0x84ffffff], 0x02000000 bytes flags: 0x0 > ... >> Zone ranges: >> Normal [mem 0x0000000080a00000-0x00000000b01fffff] >> HighMem [mem 0x00000000b0200000-0x00000000ffffefff] >> Movable zone start for each node >> Early memory node ranges >> node 0: [mem 0x0000000080a00000-0x00000000855fffff] >> node 0: [mem 0x0000000086a00000-0x0000000087dfffff] >> node 0: [mem 0x000000008bd00000-0x000000008c4fffff] >> node 0: [mem 0x000000008e300000-0x000000008ecfffff] >> node 0: [mem 0x0000000090d00000-0x00000000bfffffff] >> node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] >> node 0: [mem 0x00000000de700000-0x00000000de9fffff] >> node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] >> node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] >> node 0: [mem 0x00000000fda00000-0x00000000ffffefff] >> Zeroed struct page in unavailable ranges: 513 pages >> Initmem setup node 0 [mem 0x0000000080a00000-0x00000000ffffefff] >> On node 0 totalpages: 311551 >> Normal zone: 1230 pages used for memmap >> Normal zone: 0 pages reserved >> Normal zone: 157440 pages, LIFO batch:31 >> HighMem zone: 154111 pages, LIFO batch:31 > AFAICT the range [de600000, de7ff000] should not be added to the free > lists. > > Can you try with the below patch: > > diff --git a/mm/memblock.c b/mm/memblock.c > index afaefa8fc6ab..7f3c33d53f87 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1994,6 +1994,8 @@ static unsigned long __init __free_memory_core(phys_addr_t start, > unsigned long end_pfn = min_t(unsigned long, > PFN_DOWN(end), max_low_pfn); > > + pr_info("%s: range: %pa - %pa, pfn: %lx - %lx\n", __func__, &start, &end, start_pfn, end_pfn); > + > if (start_pfn >= end_pfn) > return 0; > __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 __free_memory_core, range: 0xaf430000 - 0xaf454000, pfn: af430 - af454 __free_memory_core, range: 0xaf510000 - 0xaf546000, pfn: af510 - af546 __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 __free_memory_core, range: 0xafd98000 - 0xafdce000, pfn: afd98 - afdce __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 > >>>> [<c023999c>] (get_page_from_freelist) from [<c023a4dc>] (__alloc_pages_nodemask+0x188/0xc08) >>>> [<c023a4dc>] (__alloc_pages_nodemask) from [<c0223078>] (alloc_zeroed_user_highpage_movable+0x14/0x3c) >>>> [<c0223078>] (alloc_zeroed_user_highpage_movable) from [<c0226768>] (handle_mm_fault+0x254/0xac8) >>>> [<c0226768>] (handle_mm_fault) from [<c04ba09c>] (do_page_fault+0x228/0x2f4) >>>> [<c04ba09c>] (do_page_fault) from [<c0111d80>] (do_DataAbort+0x48/0xd0) >>>> [<c0111d80>] (do_DataAbort) from [<c0100e00>] (__dabt_usr+0x40/0x60) >>>> >>>> Zone ranges: >>>> Normal [mem 0x0000000080a00000-0x00000000b01fffff] >>>> HighMem [mem 0x00000000b0200000-0x00000000ffffefff] >>>> Movable zone start for each node >>>> Early memory node ranges >>>> node 0: [mem 0x0000000080a00000-0x00000000855fffff] >>>> node 0: [mem 0x0000000086a00000-0x0000000087dfffff] >>>> node 0: [mem 0x000000008bd00000-0x000000008c4fffff] >>>> node 0: [mem 0x000000008e300000-0x000000008ecfffff] >>>> node 0: [mem 0x0000000090d00000-0x00000000bfffffff] >>>> node 0: [mem 0x00000000cc000000-0x00000000dc9fffff] >>>> node 0: [mem 0x00000000de700000-0x00000000de9fffff] >>>> node 0: [mem 0x00000000e0800000-0x00000000e0bfffff] >>>> node 0: [mem 0x00000000f4b00000-0x00000000f6ffffff] >>>> node 0: [mem 0x00000000fda00000-0x00000000ffffefff] >>>> >>>> ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86a00, 86a00000 >>>> ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e300, 8e300000 >>>> ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 >>>> ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de700, de700000 >>>> ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 >>>> ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 >>>> ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 >>>> === >move_freepages: start_pfn/end_pfn [de601, de7ff], [de600000, de7ff000] >>>> : pfn =de600 pfn2phy = de600000 , page = ef3cc000, page-flags = ffffffff >>>> 8<--- cut here --- >>>> Unable to handle kernel paging request at virtual address fffffffe >>>> pgd = 5dd50df5 >>>> [fffffffe] *pgd=affff861, *pte=00000000, *ppte=00000000 >>>> Internal error: Oops: 37 [#1] SMP ARM >>>> Modules linked in: gmac(O) >>>> CPU: 2 PID: 635 Comm: test-oom Tainted: G O 5.10.0+ #31 >>>> Hardware name: Hisilicon A9 >>>> PC is at move_freepages_block+0x150/0x278 >>>> LR is at move_freepages_block+0x150/0x278 >>>> pc : [<c02383a4>] lr : [<c02383a4>] psr: 200e0393 >>>> sp : c4179cf8 ip : 00000000 fp : 00000001 >>>> r10: c4179d58 r9 : 000de7ff r8 : 00000000 >>>> r7 : c0863280 r6 : 000de600 r5 : 000de600 r4 : ef3cc000 >>>> r3 : ffffffff r2 : 00000000 r1 : ef5d069c r0 : fffffffe >>>> Flags: nzCv IRQs off FIQs on Mode SVC_32 ISA ARM Segment user >>>> Control: 1ac5387d Table: 83b0c04a DAC: 55555555 >>>> Process test-oom (pid: 635, stack limit = 0x25d667df) >>>> >>>>
On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote: > > On 2021/4/27 14:23, Mike Rapoport wrote: > > On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote: > > > On 2021/4/26 13:20, Mike Rapoport wrote: > > > > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: > > > > > On 2021/4/25 15:19, Mike Rapoport wrote: > > > > > > > > > > On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: > > > > > > > > > > I tested this patchset(plus arm32 change, like arm64 does) > > > > > based on lts 5.10,add some debug log, the useful info shows > > > > > below, if we enable HOLES_IN_ZONE, no panic, any idea, > > > > > thanks. > > > > > > > > > > Are there any changes on top of 5.10 except for pfn_valid() patch? > > > > > Do you see this panic on 5.10 without the changes? > > > > > > > > > > Yes, there are some BSP support for arm board based on 5.10, > > Is it possible to test 5.12? Do you use SPARSMEM? If yes, what is your section size? What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration?
On 2021/4/28 13:59, Mike Rapoport wrote: > On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote: >> On 2021/4/27 14:23, Mike Rapoport wrote: >>> On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote: >>>> On 2021/4/26 13:20, Mike Rapoport wrote: >>>>> On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: >>>>>> On 2021/4/25 15:19, Mike Rapoport wrote: >>>>>> >>>>>> On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: >>>>>> >>>>>> I tested this patchset(plus arm32 change, like arm64 does) >>>>>> based on lts 5.10,add some debug log, the useful info shows >>>>>> below, if we enable HOLES_IN_ZONE, no panic, any idea, >>>>>> thanks. >>>>>> >>>>>> Are there any changes on top of 5.10 except for pfn_valid() patch? >>>>>> Do you see this panic on 5.10 without the changes? >>>>>> >>>>>> Yes, there are some BSP support for arm board based on 5.10, >>> Is it possible to test 5.12? > Do you use SPARSMEM? If yes, what is your section size? > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? Yes, CONFIG_SPARSEMEM=y CONFIG_SPARSEMEM_STATIC=y CONFIG_FORCE_MAX_ZONEORDER = 11 CONFIG_PAGE_OFFSET=0xC0000000 CONFIG_HAVE_ARCH_PFN_VALID=y CONFIG_HIGHMEM=y #define SECTION_SIZE_BITS 26 #define MAX_PHYSADDR_BITS 32 #define MAX_PHYSMEM_BITS 32 >
On Thu, Apr 29, 2021 at 08:48:26AM +0800, Kefeng Wang wrote: > > On 2021/4/28 13:59, Mike Rapoport wrote: > > On Tue, Apr 27, 2021 at 07:08:59PM +0800, Kefeng Wang wrote: > > > On 2021/4/27 14:23, Mike Rapoport wrote: > > > > On Mon, Apr 26, 2021 at 11:26:38PM +0800, Kefeng Wang wrote: > > > > > On 2021/4/26 13:20, Mike Rapoport wrote: > > > > > > On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: > > > > > > > On 2021/4/25 15:19, Mike Rapoport wrote: > > > > > > > > > > > > > > On Fri, Apr 23, 2021 at 04:11:16PM +0800, Kefeng Wang wrote: > > > > > > > > > > > > > > I tested this patchset(plus arm32 change, like arm64 does) > > > > > > > based on lts 5.10,add some debug log, the useful info shows > > > > > > > below, if we enable HOLES_IN_ZONE, no panic, any idea, > > > > > > > thanks. > > > > > > > > > > > > > > Are there any changes on top of 5.10 except for pfn_valid() patch? > > > > > > > Do you see this panic on 5.10 without the changes? > > > > > > > > > > > > > > Yes, there are some BSP support for arm board based on 5.10, > > > > Is it possible to test 5.12? > > Do you use SPARSMEM? If yes, what is your section size? > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? > > Yes, > > CONFIG_SPARSEMEM=y > > CONFIG_SPARSEMEM_STATIC=y > > CONFIG_FORCE_MAX_ZONEORDER = 11 > > CONFIG_PAGE_OFFSET=0xC0000000 > CONFIG_HAVE_ARCH_PFN_VALID=y > CONFIG_HIGHMEM=y > #define SECTION_SIZE_BITS 26 > #define MAX_PHYSADDR_BITS 32 > #define MAX_PHYSMEM_BITS 32 It seems that with SPARSEMEM we don't align the freed parts on pageblock boundaries. Can you try the patch below: diff --git a/mm/memblock.c b/mm/memblock.c index afaefa8fc6ab..1926369b52ec 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void) * due to SPARSEMEM sections which aren't present. */ start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); -#else +#endif /* * Align down here since the VM subsystem insists that the * memmap entries are valid from the bank start aligned to * MAX_ORDER_NR_PAGES. */ start = round_down(start, MAX_ORDER_NR_PAGES); -#endif /* * If we had a previous bank, and there is a space
On 2021/4/29 14:57, Mike Rapoport wrote: >>> Do you use SPARSMEM? If yes, what is your section size? >>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? >> Yes, >> >> CONFIG_SPARSEMEM=y >> >> CONFIG_SPARSEMEM_STATIC=y >> >> CONFIG_FORCE_MAX_ZONEORDER = 11 >> >> CONFIG_PAGE_OFFSET=0xC0000000 >> CONFIG_HAVE_ARCH_PFN_VALID=y >> CONFIG_HIGHMEM=y >> #define SECTION_SIZE_BITS 26 >> #define MAX_PHYSADDR_BITS 32 >> #define MAX_PHYSMEM_BITS 32 With the patch, the addr is aligned, but the panic still occurred, new free memory log is below, memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450 __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540 __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8 __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 > It seems that with SPARSEMEM we don't align the freed parts on pageblock > boundaries. > > Can you try the patch below: > > diff --git a/mm/memblock.c b/mm/memblock.c > index afaefa8fc6ab..1926369b52ec 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void) > * due to SPARSEMEM sections which aren't present. > */ > start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); > -#else > +#endif > /* > * Align down here since the VM subsystem insists that the > * memmap entries are valid from the bank start aligned to > * MAX_ORDER_NR_PAGES. > */ > start = round_down(start, MAX_ORDER_NR_PAGES); > -#endif > > /* > * If we had a previous bank, and there is a space > >
On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: > > On 2021/4/29 14:57, Mike Rapoport wrote: > > > > > Do you use SPARSMEM? If yes, what is your section size? > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? > > > Yes, > > > > > > CONFIG_SPARSEMEM=y > > > > > > CONFIG_SPARSEMEM_STATIC=y > > > > > > CONFIG_FORCE_MAX_ZONEORDER = 11 > > > > > > CONFIG_PAGE_OFFSET=0xC0000000 > > > CONFIG_HAVE_ARCH_PFN_VALID=y > > > CONFIG_HIGHMEM=y > > > #define SECTION_SIZE_BITS 26 > > > #define MAX_PHYSADDR_BITS 32 > > > #define MAX_PHYSMEM_BITS 32 > > > With the patch, the addr is aligned, but the panic still occurred, Is this the same panic at move_freepages() for range [de600, de7ff]? Do you enable CONFIG_ARM_LPAE? > new free memory log is below, > > memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c > > memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c > memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c > memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c > memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c > memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c > memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c > __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 > __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 > __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 > __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 > __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 > __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 > __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 > __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 > __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450 > __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540 > __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 > __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8 > __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 > __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 > __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 > __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d > __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 > __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe > __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe > __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe > __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe > __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe > __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe > __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe > __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe > __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe > __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 The range [de600, de7ff] > __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 > __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 > __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 > > It seems that with SPARSEMEM we don't align the freed parts on pageblock > > boundaries. > > > > Can you try the patch below: > > > > diff --git a/mm/memblock.c b/mm/memblock.c > > index afaefa8fc6ab..1926369b52ec 100644 > > --- a/mm/memblock.c > > +++ b/mm/memblock.c > > @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void) > > * due to SPARSEMEM sections which aren't present. > > */ > > start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); > > -#else > > +#endif > > /* > > * Align down here since the VM subsystem insists that the > > * memmap entries are valid from the bank start aligned to > > * MAX_ORDER_NR_PAGES. > > */ > > start = round_down(start, MAX_ORDER_NR_PAGES); > > -#endif > > /* > > * If we had a previous bank, and there is a space > >
On 2021/4/30 17:51, Mike Rapoport wrote: > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: >> >> On 2021/4/29 14:57, Mike Rapoport wrote: >> >>>>> Do you use SPARSMEM? If yes, what is your section size? >>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? >>>> Yes, >>>> >>>> CONFIG_SPARSEMEM=y >>>> >>>> CONFIG_SPARSEMEM_STATIC=y >>>> >>>> CONFIG_FORCE_MAX_ZONEORDER = 11 >>>> >>>> CONFIG_PAGE_OFFSET=0xC0000000 >>>> CONFIG_HAVE_ARCH_PFN_VALID=y >>>> CONFIG_HIGHMEM=y >>>> #define SECTION_SIZE_BITS 26 >>>> #define MAX_PHYSADDR_BITS 32 >>>> #define MAX_PHYSMEM_BITS 32 >> >> >> With the patch, the addr is aligned, but the panic still occurred, > > Is this the same panic at move_freepages() for range [de600, de7ff]? > > Do you enable CONFIG_ARM_LPAE? no, the CONFIG_ARM_LPAE is not set, and yes with same panic at move_freepages at start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 > >> new free memory log is below, >> >> memblock_free: [0xaf430000-0xaf44ffff] mem_init+0x158/0x23c >> >> memblock_free: [0xaf510000-0xaf53ffff] mem_init+0x158/0x23c >> memblock_free: [0xaf560000-0xaf57ffff] mem_init+0x158/0x23c >> memblock_free: [0xafd98000-0xafdc7fff] mem_init+0x158/0x23c >> memblock_free: [0xafdd8000-0xafdfffff] mem_init+0x158/0x23c >> memblock_free: [0xafe18000-0xafe7ffff] mem_init+0x158/0x23c >> memblock_free: [0xafee0000-0xafefffff] mem_init+0x158/0x23c >> __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 >> __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 >> __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 >> __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 >> __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 >> __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 >> __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 >> __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 >> __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450 >> __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540 >> __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 >> __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8 >> __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 >> __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 >> __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 >> __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d >> __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 >> __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe >> __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe >> __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe >> __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe >> __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe >> __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe >> __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe >> __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe >> __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe >> __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe >> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 >> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 >> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 > > The range [de600, de7ff] the __free_memory_core will check the start pfn and end pfn, if (start_pfn >= end_pfn) return 0; __free_pages_memory(start_pfn, end_pfn); so the memory will not be freed to buddy, confused... > >> __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 >> __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 >> __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 >>> It seems that with SPARSEMEM we don't align the freed parts on pageblock >>> boundaries. >>> >>> Can you try the patch below: >>> >>> diff --git a/mm/memblock.c b/mm/memblock.c >>> index afaefa8fc6ab..1926369b52ec 100644 >>> --- a/mm/memblock.c >>> +++ b/mm/memblock.c >>> @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void) >>> * due to SPARSEMEM sections which aren't present. >>> */ >>> start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); >>> -#else >>> +#endif >>> /* >>> * Align down here since the VM subsystem insists that the >>> * memmap entries are valid from the bank start aligned to >>> * MAX_ORDER_NR_PAGES. >>> */ >>> start = round_down(start, MAX_ORDER_NR_PAGES); >>> -#endif >>> /* >>> * If we had a previous bank, and there is a space >>> >
On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote: > > > On 2021/4/30 17:51, Mike Rapoport wrote: > > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: > > > > > > On 2021/4/29 14:57, Mike Rapoport wrote: > > > > > > > > > Do you use SPARSMEM? If yes, what is your section size? > > > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? > > > > > Yes, > > > > > > > > > > CONFIG_SPARSEMEM=y > > > > > > > > > > CONFIG_SPARSEMEM_STATIC=y > > > > > > > > > > CONFIG_FORCE_MAX_ZONEORDER = 11 > > > > > > > > > > CONFIG_PAGE_OFFSET=0xC0000000 > > > > > CONFIG_HAVE_ARCH_PFN_VALID=y > > > > > CONFIG_HIGHMEM=y > > > > > #define SECTION_SIZE_BITS 26 > > > > > #define MAX_PHYSADDR_BITS 32 > > > > > #define MAX_PHYSMEM_BITS 32 > > > > > > > > > With the patch, the addr is aligned, but the panic still occurred, > > > > Is this the same panic at move_freepages() for range [de600, de7ff]? > > > > Do you enable CONFIG_ARM_LPAE? > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at > move_freepages at > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page > =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 > > > > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 > > > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 > > > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 Hmm, [de600, de7ff] is not added to the free lists which is correct. But then it's unclear how the page for de600 gets to move_freepages()... Can't say I have any bright ideas to try here... > the __free_memory_core will check the start pfn and end pfn, > > if (start_pfn >= end_pfn) > return 0; > > __free_pages_memory(start_pfn, end_pfn); > so the memory will not be freed to buddy, confused... It's a check for range validity, all valid ranges are added. > > > __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 > > > __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 > > > __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 > > > > It seems that with SPARSEMEM we don't align the freed parts on pageblock > > > > boundaries. > > > > > > > > Can you try the patch below: > > > > > > > > diff --git a/mm/memblock.c b/mm/memblock.c > > > > index afaefa8fc6ab..1926369b52ec 100644 > > > > --- a/mm/memblock.c > > > > +++ b/mm/memblock.c > > > > @@ -1941,14 +1941,13 @@ static void __init free_unused_memmap(void) > > > > * due to SPARSEMEM sections which aren't present. > > > > */ > > > > start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); > > > > -#else > > > > +#endif > > > > /* > > > > * Align down here since the VM subsystem insists that the > > > > * memmap entries are valid from the bank start aligned to > > > > * MAX_ORDER_NR_PAGES. > > > > */ > > > > start = round_down(start, MAX_ORDER_NR_PAGES); > > > > -#endif > > > > /* > > > > * If we had a previous bank, and there is a space > > > > > >
On 03.05.21 08:26, Mike Rapoport wrote: > On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote: >> >> >> On 2021/4/30 17:51, Mike Rapoport wrote: >>> On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: >>>> >>>> On 2021/4/29 14:57, Mike Rapoport wrote: >>>> >>>>>>> Do you use SPARSMEM? If yes, what is your section size? >>>>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? >>>>>> Yes, >>>>>> >>>>>> CONFIG_SPARSEMEM=y >>>>>> >>>>>> CONFIG_SPARSEMEM_STATIC=y >>>>>> >>>>>> CONFIG_FORCE_MAX_ZONEORDER = 11 >>>>>> >>>>>> CONFIG_PAGE_OFFSET=0xC0000000 >>>>>> CONFIG_HAVE_ARCH_PFN_VALID=y >>>>>> CONFIG_HIGHMEM=y >>>>>> #define SECTION_SIZE_BITS 26 >>>>>> #define MAX_PHYSADDR_BITS 32 >>>>>> #define MAX_PHYSMEM_BITS 32 >>>> >>>> >>>> With the patch, the addr is aligned, but the panic still occurred, >>> >>> Is this the same panic at move_freepages() for range [de600, de7ff]? >>> >>> Do you enable CONFIG_ARM_LPAE? >> >> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >> move_freepages at >> >> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page >> =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 >> >>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 >>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 >>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 > > Hmm, [de600, de7ff] is not added to the free lists which is correct. But > then it's unclear how the page for de600 gets to move_freepages()... > > Can't say I have any bright ideas to try here... Are we missing some checks (e.g., PageReserved()) that pfn_valid_within() would have "caught" before?
On Mon, May 03, 2021 at 10:07:01AM +0200, David Hildenbrand wrote: > On 03.05.21 08:26, Mike Rapoport wrote: > > On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote: > > > > > > > > > On 2021/4/30 17:51, Mike Rapoport wrote: > > > > On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: > > > > > > > > > > On 2021/4/29 14:57, Mike Rapoport wrote: > > > > > > > > > > > > > Do you use SPARSMEM? If yes, what is your section size? > > > > > > > > What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? > > > > > > > Yes, > > > > > > > > > > > > > > CONFIG_SPARSEMEM=y > > > > > > > > > > > > > > CONFIG_SPARSEMEM_STATIC=y > > > > > > > > > > > > > > CONFIG_FORCE_MAX_ZONEORDER = 11 > > > > > > > > > > > > > > CONFIG_PAGE_OFFSET=0xC0000000 > > > > > > > CONFIG_HAVE_ARCH_PFN_VALID=y > > > > > > > CONFIG_HIGHMEM=y > > > > > > > #define SECTION_SIZE_BITS 26 > > > > > > > #define MAX_PHYSADDR_BITS 32 > > > > > > > #define MAX_PHYSMEM_BITS 32 > > > > > > > > > > > > > > > With the patch, the addr is aligned, but the panic still occurred, > > > > > > > > Is this the same panic at move_freepages() for range [de600, de7ff]? > > > > > > > > Do you enable CONFIG_ARM_LPAE? > > > > > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at > > > move_freepages at > > > > > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page > > > =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 > > > > > > > > __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 > > > > > __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 > > > > > __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 > > > > Hmm, [de600, de7ff] is not added to the free lists which is correct. But > > then it's unclear how the page for de600 gets to move_freepages()... > > > > Can't say I have any bright ideas to try here... > > Are we missing some checks (e.g., PageReserved()) that pfn_valid_within() > would have "caught" before? Unless I'm missing something the crash happens in __rmqueue_fallback(): do_steal: page = get_page_from_free_area(area, fallback_mt); steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, can_steal); -> move_freepages() -> BUG() So a page from free area should be sane as the freed range was never added it to the free lists. And honestly, with the memory layout reported elsewhere in the stack I'd say that the bootloader/fdt beg for fixes...
On 2021/5/3 16:44, Mike Rapoport wrote: > On Mon, May 03, 2021 at 10:07:01AM +0200, David Hildenbrand wrote: >> On 03.05.21 08:26, Mike Rapoport wrote: >>> On Fri, Apr 30, 2021 at 07:24:37PM +0800, Kefeng Wang wrote: >>>> >>>> >>>> On 2021/4/30 17:51, Mike Rapoport wrote: >>>>> On Thu, Apr 29, 2021 at 06:22:55PM +0800, Kefeng Wang wrote: >>>>>> >>>>>> On 2021/4/29 14:57, Mike Rapoport wrote: >>>>>> >>>>>>>>> Do you use SPARSMEM? If yes, what is your section size? >>>>>>>>> What is the value if CONFIG_FORCE_MAX_ZONEORDER in your configuration? >>>>>>>> Yes, >>>>>>>> >>>>>>>> CONFIG_SPARSEMEM=y >>>>>>>> >>>>>>>> CONFIG_SPARSEMEM_STATIC=y >>>>>>>> >>>>>>>> CONFIG_FORCE_MAX_ZONEORDER = 11 >>>>>>>> >>>>>>>> CONFIG_PAGE_OFFSET=0xC0000000 >>>>>>>> CONFIG_HAVE_ARCH_PFN_VALID=y >>>>>>>> CONFIG_HIGHMEM=y >>>>>>>> #define SECTION_SIZE_BITS 26 >>>>>>>> #define MAX_PHYSADDR_BITS 32 >>>>>>>> #define MAX_PHYSMEM_BITS 32 >>>>>> >>>>>> >>>>>> With the patch, the addr is aligned, but the panic still occurred, >>>>> >>>>> Is this the same panic at move_freepages() for range [de600, de7ff]? >>>>> >>>>> Do you enable CONFIG_ARM_LPAE? >>>> >>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>> move_freepages at >>>> >>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn =de600, page >>>> =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 >>>> >>>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 >>>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 >>>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 >>> >>> Hmm, [de600, de7ff] is not added to the free lists which is correct. But >>> then it's unclear how the page for de600 gets to move_freepages()... >>> >>> Can't say I have any bright ideas to try here... >> >> Are we missing some checks (e.g., PageReserved()) that pfn_valid_within() >> would have "caught" before? > > Unless I'm missing something the crash happens in __rmqueue_fallback(): > > do_steal: > page = get_page_from_free_area(area, fallback_mt); > > steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, > can_steal); > -> move_freepages() > -> BUG() > > So a page from free area should be sane as the freed range was never added > it to the free lists. Sorry for the late response due to the vacation. The pfn in range [de600, de7ff] won't be added into the free lists via __free_memory_core(), but the pfn could be added into freelists via free_highmem_page() I add some debug[1] in add_to_free_list(), we could see the calltrace free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] add_to_free_list, ===> pfn = de700 ------------[ cut here ]------------ WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec pfn = de700 Modules linked in: CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 Hardware name: Hisilicon A9 [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] (add_to_free_list+0x8c/0xec) [<c023721c>] (add_to_free_list) from [<c0237e00>] (free_pcppages_bulk+0x200/0x278) [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] (free_unref_page+0x58/0x68) [<c0238d14>] (free_unref_page) from [<c023bb54>] (free_highmem_page+0xc/0x50) [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) [<c0700b38>] (start_kernel) from [<00000000>] (0x0) so any idea? [1] debug diff --git a/arch/arm/mm/init.c b/arch/arm/mm/init.c index 1ba9f9f9dbd8..ee3619c04f93 100644 --- a/arch/arm/mm/init.c +++ b/arch/arm/mm/init.c @@ -286,7 +286,7 @@ static void __init free_highpages(void) /* Truncate partial highmem entries */ if (start < max_low) start = max_low; - + pr_info("%s, range_pfn [%lx, %lx], range_addr [%x, %x]\n", __func__, start, end, range_start, range_end); for (; start < end; start++) free_highmem_page(pfn_to_page(start)); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 592479f43c74..920f041f0c6f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -892,7 +892,14 @@ compaction_capture(struct capture_control *capc, struct page *page, static inline void add_to_free_list(struct page *page, struct zone *zone, unsigned int order, int migratetype) { + unsigned long pfn; struct free_area *area = &zone->free_area[order]; + pfn = page_to_pfn(page); + if (pfn >= 0xde600 && pfn < 0xde7ff) { + pr_info("%s, ===> pfn = %lx", __func__, pfn); + WARN_ONCE(pfn == 0xde700, "pfn = %lx", pfn); + } > > And honestly, with the memory layout reported elsewhere in the stack I'd > say that the bootloader/fdt beg for fixes... >
On 2021/5/6 20:47, Kefeng Wang wrote: > > >>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>>> move_freepages at >>>>> >>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] : pfn >>>>> =de600, page >>>>> =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 >>>>> >>>>>>> __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - >>>>>>> b0200 >>>>>>> __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - >>>>>>> b0200 >>>>>>> __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - >>>>>>> b0200 >>>> >>>> Hmm, [de600, de7ff] is not added to the free lists which is correct. >>>> But >>>> then it's unclear how the page for de600 gets to move_freepages()... >>>> >>>> Can't say I have any bright ideas to try here... >>> >>> Are we missing some checks (e.g., PageReserved()) that >>> pfn_valid_within() >>> would have "caught" before? >> >> Unless I'm missing something the crash happens in __rmqueue_fallback(): >> >> do_steal: >> page = get_page_from_free_area(area, fallback_mt); >> >> steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, >> can_steal); >> -> move_freepages() >> -> BUG() >> >> So a page from free area should be sane as the freed range was never >> added >> it to the free lists. > > Sorry for the late response due to the vacation. > > The pfn in range [de600, de7ff] won't be added into the free lists via > __free_memory_core(), but the pfn could be added into freelists via > free_highmem_page() > > I add some debug[1] in add_to_free_list(), we could see the calltrace > > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] > add_to_free_list, ===> pfn = de700 > ------------[ cut here ]------------ > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec > pfn = de700 > Modules linked in: > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 > Hardware name: Hisilicon A9 > [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) > [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) > [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) > [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] > (add_to_free_list+0x8c/0xec) > [<c023721c>] (add_to_free_list) from [<c0237e00>] > (free_pcppages_bulk+0x200/0x278) > [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] > (free_unref_page+0x58/0x68) > [<c0238d14>] (free_unref_page) from [<c023bb54>] > (free_highmem_page+0xc/0x50) > [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) > [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) > [<c0700b38>] (start_kernel) from [<00000000>] (0x0) > > so any idea? If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], but the range of [de600,de700] without ‘struct page' will lead to this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, and the same issue will occurred in isolate_freepages_block(), maybe there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in ARCH_HISI, any better solution? Thanks.
On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote: > > On 2021/5/6 20:47, Kefeng Wang wrote: > > > > > > > > > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at > > > > > > move_freepages at > > > > > > > > > > > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] > > > > > > : pfn =de600, page > > > > > > =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 > > > > > > > > > > > > > > __free_memory_core, range: 0xb0200000 - > > > > > > > > 0xc0000000, pfn: b0200 - b0200 > > > > > > > > __free_memory_core, range: 0xcc000000 - > > > > > > > > 0xdca00000, pfn: cc000 - b0200 > > > > > > > > __free_memory_core, range: 0xde700000 - > > > > > > > > 0xdea00000, pfn: de700 - b0200 > > > > > > > > > > Hmm, [de600, de7ff] is not added to the free lists which is > > > > > correct. But > > > > > then it's unclear how the page for de600 gets to move_freepages()... > > > > > > > > > > Can't say I have any bright ideas to try here... > > > > > > > > Are we missing some checks (e.g., PageReserved()) that > > > > pfn_valid_within() > > > > would have "caught" before? > > > > > > Unless I'm missing something the crash happens in __rmqueue_fallback(): > > > > > > do_steal: > > > page = get_page_from_free_area(area, fallback_mt); > > > > > > steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, > > > can_steal); > > > -> move_freepages() > > > -> BUG() > > > > > > So a page from free area should be sane as the freed range was never > > > added > > > it to the free lists. > > > > Sorry for the late response due to the vacation. > > > > The pfn in range [de600, de7ff] won't be added into the free lists via > > __free_memory_core(), but the pfn could be added into freelists via > > free_highmem_page() > > > > I add some debug[1] in add_to_free_list(), we could see the calltrace > > > > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] > > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] > > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] > > add_to_free_list, ===> pfn = de700 > > ------------[ cut here ]------------ > > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec > > pfn = de700 > > Modules linked in: > > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 > > Hardware name: Hisilicon A9 > > [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) > > [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) > > [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) > > [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] > > (add_to_free_list+0x8c/0xec) > > [<c023721c>] (add_to_free_list) from [<c0237e00>] > > (free_pcppages_bulk+0x200/0x278) > > [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] > > (free_unref_page+0x58/0x68) > > [<c0238d14>] (free_unref_page) from [<c023bb54>] > > (free_highmem_page+0xc/0x50) > > [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) > > [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) > > [<c0700b38>] (start_kernel) from [<00000000>] (0x0) > > > > so any idea? > > If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the > start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], > but the range of [de600,de700] without ‘struct page' will lead to > this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, > and the same issue will occurred in isolate_freepages_block(), maybe I think your analysis is correct except one minor detail. With the #ifdef fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not be freed so there should be a struct page. Did you check what parts of the memmap are actually freed with this patch applied? Would you get a panic if you add dump_page(pfn_to_page(0xde600), ""); say, in the end of memblock_free_all()? > there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve > this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in > ARCH_HISI, any better solution? Thanks. I don't think that HOLES_IN_ZONE is the right solution. I believe that we must keep the memory map aligned on pageblock boundaries. That's surely not the case for SPARSEMEM as of now, and if my fix is not enough we need to find where it went wrong. Besides, I'd say that if it is possible to update your firmware to make the memory layout reported to the kernel less, hmm, esoteric, you would hit less corner cases. [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org
On 2021/5/7 18:30, Mike Rapoport wrote: > On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote: >> >> On 2021/5/6 20:47, Kefeng Wang wrote: >>> >>> >>>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>>>>> move_freepages at >>>>>>> >>>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] >>>>>>> : pfn =de600, page >>>>>>> =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 >>>>>>> >>>>>>>>> __free_memory_core, range: 0xb0200000 - >>>>>>>>> 0xc0000000, pfn: b0200 - b0200 >>>>>>>>> __free_memory_core, range: 0xcc000000 - >>>>>>>>> 0xdca00000, pfn: cc000 - b0200 >>>>>>>>> __free_memory_core, range: 0xde700000 - >>>>>>>>> 0xdea00000, pfn: de700 - b0200 >>>>>> >>>>>> Hmm, [de600, de7ff] is not added to the free lists which is >>>>>> correct. But >>>>>> then it's unclear how the page for de600 gets to move_freepages()... >>>>>> >>>>>> Can't say I have any bright ideas to try here... >>>>> >>>>> Are we missing some checks (e.g., PageReserved()) that >>>>> pfn_valid_within() >>>>> would have "caught" before? >>>> >>>> Unless I'm missing something the crash happens in __rmqueue_fallback(): >>>> >>>> do_steal: >>>> page = get_page_from_free_area(area, fallback_mt); >>>> >>>> steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, >>>> can_steal); >>>> -> move_freepages() >>>> -> BUG() >>>> >>>> So a page from free area should be sane as the freed range was never >>>> added >>>> it to the free lists. >>> >>> Sorry for the late response due to the vacation. >>> >>> The pfn in range [de600, de7ff] won't be added into the free lists via >>> __free_memory_core(), but the pfn could be added into freelists via >>> free_highmem_page() >>> >>> I add some debug[1] in add_to_free_list(), we could see the calltrace >>> >>> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] >>> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] >>> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] >>> add_to_free_list, ===> pfn = de700 >>> ------------[ cut here ]------------ >>> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec >>> pfn = de700 >>> Modules linked in: >>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 >>> Hardware name: Hisilicon A9 >>> [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) >>> [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) >>> [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) >>> [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] >>> (add_to_free_list+0x8c/0xec) >>> [<c023721c>] (add_to_free_list) from [<c0237e00>] >>> (free_pcppages_bulk+0x200/0x278) >>> [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] >>> (free_unref_page+0x58/0x68) >>> [<c0238d14>] (free_unref_page) from [<c023bb54>] >>> (free_highmem_page+0xc/0x50) >>> [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) >>> [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) >>> [<c0700b38>] (start_kernel) from [<00000000>] (0x0) >>> >>> so any idea? >> >> If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the >> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], >> but the range of [de600,de700] without ‘struct page' will lead to >> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, >> and the same issue will occurred in isolate_freepages_block(), maybe > > I think your analysis is correct except one minor detail. With the #ifdef > fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not > be freed so there should be a struct page. Did you check what parts of the > memmap are actually freed with this patch applied? > Would you get a panic if you add > > dump_page(pfn_to_page(0xde600), ""); > > say, in the end of memblock_free_all()? The memory is not continuous, see MEMBLOCK: memory size = 0x4c0fffff reserved size = 0x027ef058 memory.cnt = 0xa memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 ... The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] is not available memory, and we won't create memmap , so with or without your patch, we can't see the range in free_memmap(), right? > >> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve >> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in >> ARCH_HISI, any better solution? Thanks. > > I don't think that HOLES_IN_ZONE is the right solution. I believe that we > must keep the memory map aligned on pageblock boundaries. That's surely not the > case for SPARSEMEM as of now, and if my fix is not enough we need to find > where it went wrong. > > Besides, I'd say that if it is possible to update your firmware to make the > memory layout reported to the kernel less, hmm, esoteric, you would hit > less corner cases. Sorry, memory layout is customized and we can't change it, some memory is for special purposes by our production. > > [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org >
On Fri, May 07, 2021 at 08:34:52PM +0800, Kefeng Wang wrote: > > > On 2021/5/7 18:30, Mike Rapoport wrote: > > On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote: > > > > > > On 2021/5/6 20:47, Kefeng Wang wrote: > > > > > > > > > > > > no, the CONFIG_ARM_LPAE is not set, and yes with same panic at > > > > > > > > move_freepages at > > > > > > > > > > > > > > > > start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] > > > > > > > > : pfn =de600, page > > > > > > > > =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 > > > > > > > > > > > > > > > > > > __free_memory_core, range: 0xb0200000 - > > > > > > > > > > 0xc0000000, pfn: b0200 - b0200 > > > > > > > > > > __free_memory_core, range: 0xcc000000 - > > > > > > > > > > 0xdca00000, pfn: cc000 - b0200 > > > > > > > > > > __free_memory_core, range: 0xde700000 - > > > > > > > > > > 0xdea00000, pfn: de700 - b0200 > > > > > > > > > > > > > > Hmm, [de600, de7ff] is not added to the free lists which is > > > > > > > correct. But > > > > > > > then it's unclear how the page for de600 gets to move_freepages()... > > > > > > > > > > > > > > Can't say I have any bright ideas to try here... > > > > > > > > > > > > Are we missing some checks (e.g., PageReserved()) that > > > > > > pfn_valid_within() > > > > > > would have "caught" before? > > > > > > > > > > Unless I'm missing something the crash happens in __rmqueue_fallback(): > > > > > > > > > > do_steal: > > > > > page = get_page_from_free_area(area, fallback_mt); > > > > > > > > > > steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, > > > > > can_steal); > > > > > -> move_freepages() > > > > > -> BUG() > > > > > > > > > > So a page from free area should be sane as the freed range was never > > > > > added > > > > > it to the free lists. > > > > > > > > Sorry for the late response due to the vacation. > > > > > > > > The pfn in range [de600, de7ff] won't be added into the free lists via > > > > __free_memory_core(), but the pfn could be added into freelists via > > > > free_highmem_page() > > > > > > > > I add some debug[1] in add_to_free_list(), we could see the calltrace > > > > > > > > free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] > > > > free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] > > > > free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] > > > > add_to_free_list, ===> pfn = de700 > > > > ------------[ cut here ]------------ > > > > WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec > > > > pfn = de700 > > > > Modules linked in: > > > > CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 > > > > Hardware name: Hisilicon A9 > > > > [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) > > > > [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) > > > > [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) > > > > [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] > > > > (add_to_free_list+0x8c/0xec) > > > > [<c023721c>] (add_to_free_list) from [<c0237e00>] > > > > (free_pcppages_bulk+0x200/0x278) > > > > [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] > > > > (free_unref_page+0x58/0x68) > > > > [<c0238d14>] (free_unref_page) from [<c023bb54>] > > > > (free_highmem_page+0xc/0x50) > > > > [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) > > > > [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) > > > > [<c0700b38>] (start_kernel) from [<00000000>] (0x0) > > > > > > > > so any idea? > > > > > > If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the > > > start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], > > > but the range of [de600,de700] without ‘struct page' will lead to > > > this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, > > > and the same issue will occurred in isolate_freepages_block(), maybe > > > > I think your analysis is correct except one minor detail. With the #ifdef > > fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not > > be freed so there should be a struct page. Did you check what parts of the > > memmap are actually freed with this patch applied? > > Would you get a panic if you add > > > > dump_page(pfn_to_page(0xde600), ""); > > > > say, in the end of memblock_free_all()? > > The memory is not continuous, see MEMBLOCK: > memory size = 0x4c0fffff reserved size = 0x027ef058 > memory.cnt = 0xa > memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 > memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 > memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 > memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 > memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 > memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 > memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 > ... > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] > is not available memory, and we won't create memmap , so with or without > your patch, we can't see the range in free_memmap(), right? This is not available memory and we won't see the reange in free_memmap(), but we still should create memmap for it and that's what my patch tried to do. There are a lot of places in core mm that operate on pageblocks and free_unused_memmap() should make sure that any pageblock has a valid memory map. Currently, that's not the case when SPARSEMEM=y and my patch tried to fix it. Can you please send log with my patch applied and with the printing of ranges that are freed in free_unused_memmap() you've used in previous mails? > > > there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve > > > this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in > > > ARCH_HISI, any better solution? Thanks. > > > > I don't think that HOLES_IN_ZONE is the right solution. I believe that we > > must keep the memory map aligned on pageblock boundaries. That's surely not the > > case for SPARSEMEM as of now, and if my fix is not enough we need to find > > where it went wrong. > > > > Besides, I'd say that if it is possible to update your firmware to make the > > memory layout reported to the kernel less, hmm, esoteric, you would hit > > less corner cases. > > Sorry, memory layout is customized and we can't change it, some memory is > for special purposes by our production. I understand that this memory cannot be used by Linux, but the firmware may supply the kernel with actual physical memory layout and then mark all the special purpose memory that kernel should not touch as reserved. > > [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org > >
On 2021/5/9 13:59, Mike Rapoport wrote: > On Fri, May 07, 2021 at 08:34:52PM +0800, Kefeng Wang wrote: >> >> >> On 2021/5/7 18:30, Mike Rapoport wrote: >>> On Fri, May 07, 2021 at 03:17:08PM +0800, Kefeng Wang wrote: >>>> >>>> On 2021/5/6 20:47, Kefeng Wang wrote: >>>>> >>>>>>>>> no, the CONFIG_ARM_LPAE is not set, and yes with same panic at >>>>>>>>> move_freepages at >>>>>>>>> >>>>>>>>> start_pfn/end_pfn [de600, de7ff], [de600000, de7ff000] >>>>>>>>> : pfn =de600, page >>>>>>>>> =ef3cc000, page-flags = ffffffff, pfn2phy = de600000 >>>>>>>>> >>>>>>>>>>> __free_memory_core, range: 0xb0200000 - >>>>>>>>>>> 0xc0000000, pfn: b0200 - b0200 >>>>>>>>>>> __free_memory_core, range: 0xcc000000 - >>>>>>>>>>> 0xdca00000, pfn: cc000 - b0200 >>>>>>>>>>> __free_memory_core, range: 0xde700000 - >>>>>>>>>>> 0xdea00000, pfn: de700 - b0200 >>>>>>>> >>>>>>>> Hmm, [de600, de7ff] is not added to the free lists which is >>>>>>>> correct. But >>>>>>>> then it's unclear how the page for de600 gets to move_freepages()... >>>>>>>> >>>>>>>> Can't say I have any bright ideas to try here... >>>>>>> >>>>>>> Are we missing some checks (e.g., PageReserved()) that >>>>>>> pfn_valid_within() >>>>>>> would have "caught" before? >>>>>> >>>>>> Unless I'm missing something the crash happens in __rmqueue_fallback(): >>>>>> >>>>>> do_steal: >>>>>> page = get_page_from_free_area(area, fallback_mt); >>>>>> >>>>>> steal_suitable_fallback(zone, page, alloc_flags, start_migratetype, >>>>>> can_steal); >>>>>> -> move_freepages() >>>>>> -> BUG() >>>>>> >>>>>> So a page from free area should be sane as the freed range was never >>>>>> added >>>>>> it to the free lists. >>>>> >>>>> Sorry for the late response due to the vacation. >>>>> >>>>> The pfn in range [de600, de7ff] won't be added into the free lists via >>>>> __free_memory_core(), but the pfn could be added into freelists via >>>>> free_highmem_page() >>>>> >>>>> I add some debug[1] in add_to_free_list(), we could see the calltrace >>>>> >>>>> free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] >>>>> free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] >>>>> free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] >>>>> add_to_free_list, ===> pfn = de700 >>>>> ------------[ cut here ]------------ >>>>> WARNING: CPU: 0 PID: 0 at mm/page_alloc.c:900 add_to_free_list+0x8c/0xec >>>>> pfn = de700 >>>>> Modules linked in: >>>>> CPU: 0 PID: 0 Comm: swapper Not tainted 5.10.0+ #48 >>>>> Hardware name: Hisilicon A9 >>>>> [<c010a600>] (show_stack) from [<c04b21c4>] (dump_stack+0x9c/0xc0) >>>>> [<c04b21c4>] (dump_stack) from [<c011c708>] (__warn+0xc0/0xec) >>>>> [<c011c708>] (__warn) from [<c011c7a8>] (warn_slowpath_fmt+0x74/0xa4) >>>>> [<c011c7a8>] (warn_slowpath_fmt) from [<c023721c>] >>>>> (add_to_free_list+0x8c/0xec) >>>>> [<c023721c>] (add_to_free_list) from [<c0237e00>] >>>>> (free_pcppages_bulk+0x200/0x278) >>>>> [<c0237e00>] (free_pcppages_bulk) from [<c0238d14>] >>>>> (free_unref_page+0x58/0x68) >>>>> [<c0238d14>] (free_unref_page) from [<c023bb54>] >>>>> (free_highmem_page+0xc/0x50) >>>>> [<c023bb54>] (free_highmem_page) from [<c070620c>] (mem_init+0x21c/0x254) >>>>> [<c070620c>] (mem_init) from [<c0700b38>] (start_kernel+0x258/0x5c0) >>>>> [<c0700b38>] (start_kernel) from [<00000000>] (0x0) >>>>> >>>>> so any idea? >>>> >>>> If pfn = 0xde700, due to the pageblock_nr_pages = 0x200, then the >>>> start_pfn,end_pfn passed to move_freepages() will be [de600, de7ff], >>>> but the range of [de600,de700] without ‘struct page' will lead to >>>> this panic when pfn_valid_within not enabled if no HOLES_IN_ZONE, >>>> and the same issue will occurred in isolate_freepages_block(), maybe >>> >>> I think your analysis is correct except one minor detail. With the #ifdef >>> fix I've proposed earlieri [1] the memmap for [0xde600, 0xde700] should not >>> be freed so there should be a struct page. Did you check what parts of the >>> memmap are actually freed with this patch applied? >>> Would you get a panic if you add >>> >>> dump_page(pfn_to_page(0xde600), ""); >>> >>> say, in the end of memblock_free_all()? >> >> The memory is not continuous, see MEMBLOCK: >> memory size = 0x4c0fffff reserved size = 0x027ef058 >> memory.cnt = 0xa >> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >> ... >> >> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] >> is not available memory, and we won't create memmap , so with or without >> your patch, we can't see the range in free_memmap(), right? > > > This is not available memory and we won't see the reange in free_memmap(), > but we still should create memmap for it and that's what my patch tried to > do. > > There are a lot of places in core mm that operate on pageblocks and > free_unused_memmap() should make sure that any pageblock has a valid memory > map. > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix > it. > > Can you please send log with my patch applied and with the printing of > ranges that are freed in free_unused_memmap() you've used in previous > mails? with your patch[1] and debug print in free_memmap, ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 __free_memory_core, range: 0x80a03000 - 0x80a04000, pfn: 80a03 - 80a04 __free_memory_core, range: 0x80a08000 - 0x80b00000, pfn: 80a08 - 80b00 __free_memory_core, range: 0x812e8058 - 0x83000000, pfn: 812e9 - 83000 __free_memory_core, range: 0x85000000 - 0x85600000, pfn: 85000 - 85600 __free_memory_core, range: 0x86a00000 - 0x87e00000, pfn: 86a00 - 87e00 __free_memory_core, range: 0x8bd00000 - 0x8c500000, pfn: 8bd00 - 8c500 __free_memory_core, range: 0x8e300000 - 0x8ed00000, pfn: 8e300 - 8ed00 __free_memory_core, range: 0x90d00000 - 0xaf2c0000, pfn: 90d00 - af2c0 __free_memory_core, range: 0xaf430000 - 0xaf450000, pfn: af430 - af450 __free_memory_core, range: 0xaf510000 - 0xaf540000, pfn: af510 - af540 __free_memory_core, range: 0xaf560000 - 0xaf580000, pfn: af560 - af580 __free_memory_core, range: 0xafd98000 - 0xafdc8000, pfn: afd98 - afdc8 __free_memory_core, range: 0xafdd8000 - 0xafe00000, pfn: afdd8 - afe00 __free_memory_core, range: 0xafe18000 - 0xafe80000, pfn: afe18 - afe80 __free_memory_core, range: 0xafee0000 - 0xaff00000, pfn: afee0 - aff00 __free_memory_core, range: 0xaff80000 - 0xaff8d000, pfn: aff80 - aff8d __free_memory_core, range: 0xafff2000 - 0xafff4580, pfn: afff2 - afff4 __free_memory_core, range: 0xafffe000 - 0xafffe0e0, pfn: afffe - afffe __free_memory_core, range: 0xafffe4fc - 0xafffe500, pfn: affff - afffe __free_memory_core, range: 0xafffe6e4 - 0xafffe700, pfn: affff - afffe __free_memory_core, range: 0xafffe8dc - 0xafffe8e0, pfn: affff - afffe __free_memory_core, range: 0xafffe970 - 0xafffe980, pfn: affff - afffe __free_memory_core, range: 0xafffe990 - 0xafffe9a0, pfn: affff - afffe __free_memory_core, range: 0xafffe9a4 - 0xafffe9c0, pfn: affff - afffe __free_memory_core, range: 0xafffeb54 - 0xafffeb60, pfn: affff - afffe __free_memory_core, range: 0xafffecf4 - 0xafffed00, pfn: affff - afffe __free_memory_core, range: 0xafffefc4 - 0xafffefd8, pfn: affff - afffe __free_memory_core, range: 0xb0200000 - 0xc0000000, pfn: b0200 - b0200 __free_memory_core, range: 0xcc000000 - 0xdca00000, pfn: cc000 - b0200 __free_memory_core, range: 0xde700000 - 0xdea00000, pfn: de700 - b0200 __free_memory_core, range: 0xe0800000 - 0xe0c00000, pfn: e0800 - b0200 __free_memory_core, range: 0xf4b00000 - 0xf7000000, pfn: f4b00 - b0200 __free_memory_core, range: 0xfda00000 - 0xffffffff, pfn: fda00 - b0200 free_highpages, range_pfn [b0200, c0000], range_addr [b0200000, c0000000] free_highpages, range_pfn [cc000, dca00], range_addr [cc000000, dca00000] free_highpages, range_pfn [de700, dea00], range_addr [de700000, dea00000] free_highpages, range_pfn [e0800, e0c00], range_addr [e0800000, e0c00000] free_highpages, range_pfn [f4b00, f7000], range_addr [f4b00000, f7000000] free_highpages, range_pfn [fda00, fffff], range_addr [fda00000, ffffffff] > >>>> there are some scene, so I select HOLES_IN_ZONE in ARCH_HISI(ARM) to solve >>>> this issue in our 5.10, should we select HOLES_IN_ZONE in all ARM or only in >>>> ARCH_HISI, any better solution? Thanks. >>> >>> I don't think that HOLES_IN_ZONE is the right solution. I believe that we >>> must keep the memory map aligned on pageblock boundaries. That's surely not the >>> case for SPARSEMEM as of now, and if my fix is not enough we need to find >>> where it went wrong. >>> >>> Besides, I'd say that if it is possible to update your firmware to make the >>> memory layout reported to the kernel less, hmm, esoteric, you would hit >>> less corner cases. >> >> Sorry, memory layout is customized and we can't change it, some memory is >> for special purposes by our production. > > I understand that this memory cannot be used by Linux, but the firmware may > supply the kernel with actual physical memory layout and then mark all > the special purpose memory that kernel should not touch as reserved. We only can modify kernel, so it is not practicable for our production, and this way looks like a workaround, we need find a way to solve the issue from kernel side. [1] https://lore.kernel.org/lkml/YIpY8TXCSc7Lfa2Z@kernel.org
On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: > > > > The memory is not continuous, see MEMBLOCK: > > > memory size = 0x4c0fffff reserved size = 0x027ef058 > > > memory.cnt = 0xa > > > memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 > > > memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 > > > memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 > > > memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 > > > memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 > > > memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 > > > memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 > > > ... > > > > > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] > > > is not available memory, and we won't create memmap , so with or without > > > your patch, we can't see the range in free_memmap(), right? > > > > This is not available memory and we won't see the reange in free_memmap(), > > but we still should create memmap for it and that's what my patch tried to > > do. > > > > There are a lot of places in core mm that operate on pageblocks and > > free_unused_memmap() should make sure that any pageblock has a valid memory > > map. > > > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix > > it. > > > > Can you please send log with my patch applied and with the printing of > > ranges that are freed in free_unused_memmap() you've used in previous > > mails? > with your patch[1] and debug print in free_memmap, > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 It seems that freeing of the memory map is suboptimal still because that code was not designed for memory layout that has more holes than Swiss cheese. Still, the range [0xde600,0xde700] is not freed and there should be struct pages for this range. Can you add dump_page(pfn_to_page(0xde600), ""); say, in the end of memblock_free_all()?
On 2021/5/11 16:48, Mike Rapoport wrote: > On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: >> >>>> The memory is not continuous, see MEMBLOCK: >>>> memory size = 0x4c0fffff reserved size = 0x027ef058 >>>> memory.cnt = 0xa >>>> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >>>> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >>>> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >>>> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >>>> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >>>> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >>>> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >>>> ... >>>> >>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] >>>> is not available memory, and we won't create memmap , so with or without >>>> your patch, we can't see the range in free_memmap(), right? >>> >>> This is not available memory and we won't see the reange in free_memmap(), >>> but we still should create memmap for it and that's what my patch tried to >>> do. >>> >>> There are a lot of places in core mm that operate on pageblocks and >>> free_unused_memmap() should make sure that any pageblock has a valid memory >>> map. >>> >>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix >>> it. >>> >>> Can you please send log with my patch applied and with the printing of >>> ranges that are freed in free_unused_memmap() you've used in previous >>> mails? > >> with your patch[1] and debug print in free_memmap, >> ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 >> ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 >> ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 >> ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 >> ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 >> ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 >> ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > > It seems that freeing of the memory map is suboptimal still because that > code was not designed for memory layout that has more holes than Swiss > cheese. > > Still, the range [0xde600,0xde700] is not freed and there should be struct > pages for this range. > > Can you add > > dump_page(pfn_to_page(0xde600), ""); > > say, in the end of memblock_free_all()? > > The range [0xde600,0xde700] is not memory, so it won't create struct page for it when sparse_init? After apply patch[1], the dump_page log, page:ef3cc000 is uninitialized and poisoned raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff page dumped because: [1] https://lore.kernel.org/linux-mm/20210512031057.13580-3-wangkefeng.wang@huawei.com/T/#u
On Sun, Apr 25, 2021 at 03:51:56PM +0800, Kefeng Wang wrote: > we see the PC is at PageLRU, same reason like arm64 panic log, > > "PageBuddy in move_freepages returns false Then we call PageLRU, the macro > calls PF_HEAD which is compound_page() compound_page reads > page->compound_head, it is 0xffffffffffffffff, so it resturns > 0xfffffffffffffffe - and accessing this address causes crash" Oh. I posted patches to fix this back in 2018. https://lore.kernel.org/linux-mm/20180414043145.3953-6-willy@infradead.org/ and 2019. https://lore.kernel.org/linux-mm/20190501202433.GC28500@bombadil.infradead.org/ and 2020. https://lore.kernel.org/linux-mm/20200408150148.25290-6-willy@infradead.org/ Looks like it's about that time of year for me to try to fix this again.
On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote: > > On 2021/5/11 16:48, Mike Rapoport wrote: > > On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: > > > > > > > > The memory is not continuous, see MEMBLOCK: > > > > > memory size = 0x4c0fffff reserved size = 0x027ef058 > > > > > memory.cnt = 0xa > > > > > memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 > > > > > memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 > > > > > memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 > > > > > memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 > > > > > memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 > > > > > memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 > > > > > memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 > > > > > ... > > > > > > > > > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] > > > > > is not available memory, and we won't create memmap , so with or without > > > > > your patch, we can't see the range in free_memmap(), right? > > > > > > > > This is not available memory and we won't see the reange in free_memmap(), > > > > but we still should create memmap for it and that's what my patch tried to > > > > do. > > > > > > > > There are a lot of places in core mm that operate on pageblocks and > > > > free_unused_memmap() should make sure that any pageblock has a valid memory > > > > map. > > > > > > > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix > > > > it. > > > > > > > > Can you please send log with my patch applied and with the printing of > > > > ranges that are freed in free_unused_memmap() you've used in previous > > > > mails? > > > > > with your patch[1] and debug print in free_memmap, > > > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 > > > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 > > > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > > > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 > > > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > > > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > > > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > > > > It seems that freeing of the memory map is suboptimal still because that > > code was not designed for memory layout that has more holes than Swiss > > cheese. > > > > Still, the range [0xde600,0xde700] is not freed and there should be struct > > pages for this range. > > > > Can you add > > > > dump_page(pfn_to_page(0xde600), ""); > > > > say, in the end of memblock_free_all()? > > > The range [0xde600,0xde700] is not memory, so it won't create struct page > for it when sparse_init? sparse_init() indeed does not create memory map for unpopulated memory, but it has pretty coarse granularity, i.e. 64M in your configuration. A hole should be at least 64M in order to skip allocation of the memory map for it. For example, your memory layout has a hole of 192M at pfn 0xc0000 and this hole won't have the memory map. However the hole 0xdca00 - 0xde70 will still have a memory map in the section that covers 0xdc000 - 0xe0000. I've tried outline this in a sketch below, hope it helps. Memory: c0000 cc000 dca00 --------------------------+ +--------------------------+ +----+ memory bank |<- hole ->| memory bank | | mb | --------------------------+ +--------------------------+ +----+ de700 dea00 Memory map: b0000 b4000 c0000 cc000 d0000 d8000 dc000 +--------+--------+- ... -+ +--------+- ... -+--------+---------+ | memmap | memmap | ... |<- hole ->| memmap | ... | memmap | memmap | +--------+--------+- ... -+ +--------+- ... -+--------+---------+ > After apply patch[1], the dump_page log, > > page:ef3cc000 is uninitialized and poisoned > raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff > page dumped because: This means that there is a memory map entry, and it got poisoned during the initialization and never got reinitialized to sensible values, which would be PageReserved() in this case. I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor initialization of struct page for holes in memory layout") in the mainline tree. Can you backport it to your 5.10 tree and check if it helps?
On 2021/5/12 16:26, Mike Rapoport wrote: > On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote: >> >> On 2021/5/11 16:48, Mike Rapoport wrote: >>> On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: >>>> >>>>>> The memory is not continuous, see MEMBLOCK: >>>>>> memory size = 0x4c0fffff reserved size = 0x027ef058 >>>>>> memory.cnt = 0xa >>>>>> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >>>>>> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >>>>>> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >>>>>> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >>>>>> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >>>>>> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >>>>>> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >>>>>> ... >>>>>> >>>>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] >>>>>> is not available memory, and we won't create memmap , so with or without >>>>>> your patch, we can't see the range in free_memmap(), right? >>>>> >>>>> This is not available memory and we won't see the reange in free_memmap(), >>>>> but we still should create memmap for it and that's what my patch tried to >>>>> do. >>>>> >>>>> There are a lot of places in core mm that operate on pageblocks and >>>>> free_unused_memmap() should make sure that any pageblock has a valid memory >>>>> map. >>>>> >>>>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix >>>>> it. >>>>> >>>>> Can you please send log with my patch applied and with the printing of >>>>> ranges that are freed in free_unused_memmap() you've used in previous >>>>> mails? >>> >>>> with your patch[1] and debug print in free_memmap, >>>> ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 >>>> ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 >>>> ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 >>>> ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 >>>> ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 >>>> ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 >>>> ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 >>> >>> It seems that freeing of the memory map is suboptimal still because that >>> code was not designed for memory layout that has more holes than Swiss >>> cheese. >>> >>> Still, the range [0xde600,0xde700] is not freed and there should be struct >>> pages for this range. >>> >>> Can you add >>> >>> dump_page(pfn_to_page(0xde600), ""); >>> >>> say, in the end of memblock_free_all()? >>> >> The range [0xde600,0xde700] is not memory, so it won't create struct page >> for it when sparse_init? > > sparse_init() indeed does not create memory map for unpopulated memory, but > it has pretty coarse granularity, i.e. 64M in your configuration. A hole > should be at least 64M in order to skip allocation of the memory map for > it. > > For example, your memory layout has a hole of 192M at pfn 0xc0000 and this > hole won't have the memory map. > > However the hole 0xdca00 - 0xde70 will still have a memory map in the > section that covers 0xdc000 - 0xe0000. > > I've tried outline this in a sketch below, hope it helps. > > Memory: > c0000 cc000 dca00 > --------------------------+ +--------------------------+ +----+ > memory bank |<- hole ->| memory bank | | mb | > --------------------------+ +--------------------------+ +----+ > de700 dea00 > > Memory map: > > b0000 b4000 c0000 cc000 d0000 d8000 dc000 > +--------+--------+- ... -+ +--------+- ... -+--------+---------+ > | memmap | memmap | ... |<- hole ->| memmap | ... | memmap | memmap | > +--------+--------+- ... -+ +--------+- ... -+--------+---------+ > > Thanks for the sketch, it is more clear, >> After apply patch[1], the dump_page log, >> >> page:ef3cc000 is uninitialized and poisoned >> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff >> page dumped because: > > This means that there is a memory map entry, and it got poisoned during the > initialization and never got reinitialized to sensible values, which would > be PageReserved() in this case. > > I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor > initialization of struct page for holes in memory layout") in the mainline > tree. > > Can you backport it to your 5.10 tree and check if it helps? > Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5 commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb Author: Mike Rapoport <rppt@kernel.org> Date: Fri Mar 12 21:07:12 2021 -0800 mm/page_alloc.c: refactor initialization of struct page for holes in memory layout commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream. but check init_unavailable_range(), we need deal with the hole in the range of one pageblock. For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn begin with 0xde700. If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600 to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be initialized. After add the following patch, the oom test could passed, diff --git a/mm/page_alloc.c b/mm/page_alloc.c index aaa1655cf682..0c7e04f86f9f 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -6484,13 +6484,14 @@ static u64 __meminit init_unavailable_range(unsigned long spfn, unsigned long epfn, int zone, int node) { - unsigned long pfn; + unsigned long pfn, pfn_down; + unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages); u64 pgcnt = 0; for (pfn = spfn; pfn < epfn; pfn++) { - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) - + pageblock_nr_pages - 1; + pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages); + if (!pfn_valid(pfn_down) && pfn_down != epfn_down) { + pfn = pfn_down + pageblock_nr_pages - 1; continue; } __init_single_page(pfn_to_page(pfn), pfn, zone, node); Before: On node 0 totalpages: 311551 Normal zone: 1230 pages used for memmap Normal zone: 0 pages reserved Normal zone: 157440 pages, LIFO batch:31 Normal zone: 16384 pages in unavailable ranges HighMem zone: 154111 pages, LIFO batch:31 HighMem zone: 1 pages in unavailable ranges page:ef3cc000 is uninitialized and poisoned raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff After: On node 0 totalpages: 311551 Normal zone: 1230 pages used for memmap Normal zone: 0 pages reserved Normal zone: 157440 pages, LIFO batch:31 Normal zone: 17152 pages in unavailable ranges HighMem zone: 154111 pages, LIFO batch:31 HighMem zone: 513 pages in unavailable ranges ... page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600 flags: 0xdd001000(reserved) raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001
On Thu, May 13, 2021 at 11:44:00AM +0800, Kefeng Wang wrote: > On 2021/5/12 16:26, Mike Rapoport wrote: > > On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote: > > > > > > On 2021/5/11 16:48, Mike Rapoport wrote: > > > > On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: > > > > > > > > > > > > The memory is not continuous, see MEMBLOCK: > > > > > > > memory size = 0x4c0fffff reserved size = 0x027ef058 > > > > > > > memory.cnt = 0xa > > > > > > > memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 > > > > > > > memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 > > > > > > > memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 > > > > > > > memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 > > > > > > > memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 > > > > > > > memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 > > > > > > > memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 > > > > > > > ... > > > > > > > > > > > > > > The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] > > > > > > > is not available memory, and we won't create memmap , so with or without > > > > > > > your patch, we can't see the range in free_memmap(), right? > > > > > > > > > > > > This is not available memory and we won't see the reange in free_memmap(), > > > > > > but we still should create memmap for it and that's what my patch tried to > > > > > > do. > > > > > > > > > > > > There are a lot of places in core mm that operate on pageblocks and > > > > > > free_unused_memmap() should make sure that any pageblock has a valid memory > > > > > > map. > > > > > > > > > > > > Currently, that's not the case when SPARSEMEM=y and my patch tried to fix > > > > > > it. > > > > > > > > > > > > Can you please send log with my patch applied and with the printing of > > > > > > ranges that are freed in free_unused_memmap() you've used in previous > > > > > > mails? > > > > > > > > > with your patch[1] and debug print in free_memmap, > > > > > ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 > > > > > ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 > > > > > ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 > > > > > ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 > > > > > ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 > > > > > ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 > > > > > ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 > > > > > > > > It seems that freeing of the memory map is suboptimal still because that > > > > code was not designed for memory layout that has more holes than Swiss > > > > cheese. > > > > > > > > Still, the range [0xde600,0xde700] is not freed and there should be struct > > > > pages for this range. > > > > > > > > Can you add > > > > > > > > dump_page(pfn_to_page(0xde600), ""); > > > > > > > > say, in the end of memblock_free_all()? > > > > > > > The range [0xde600,0xde700] is not memory, so it won't create struct page > > > for it when sparse_init? > > > > sparse_init() indeed does not create memory map for unpopulated memory, but > > it has pretty coarse granularity, i.e. 64M in your configuration. A hole > > should be at least 64M in order to skip allocation of the memory map for > > it. > > > > For example, your memory layout has a hole of 192M at pfn 0xc0000 and this > > hole won't have the memory map. > > > > However the hole 0xdca00 - 0xde70 will still have a memory map in the > > section that covers 0xdc000 - 0xe0000. > > > > I've tried outline this in a sketch below, hope it helps. > > > > Memory: > > c0000 cc000 dca00 > > --------------------------+ +--------------------------+ +----+ > > memory bank |<- hole ->| memory bank | | mb | > > --------------------------+ +--------------------------+ +----+ > > de700 dea00 > > > > Memory map: > > > > b0000 b4000 c0000 cc000 d0000 d8000 dc000 > > +--------+--------+- ... -+ +--------+- ... -+--------+---------+ > > | memmap | memmap | ... |<- hole ->| memmap | ... | memmap | memmap | > > +--------+--------+- ... -+ +--------+- ... -+--------+---------+ > > > > > Thanks for the sketch, it is more clear, > > > > After apply patch[1], the dump_page log, > > > > > > page:ef3cc000 is uninitialized and poisoned > > > raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff > > > page dumped because: > > > > This means that there is a memory map entry, and it got poisoned during the > > initialization and never got reinitialized to sensible values, which would > > be PageReserved() in this case. > > > > I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor > > initialization of struct page for holes in memory layout") in the mainline > > tree. > > > > Can you backport it to your 5.10 tree and check if it helps? > Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5 Ah, you are using stable 5.10.y. > commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb > Author: Mike Rapoport <rppt@kernel.org> > Date: Fri Mar 12 21:07:12 2021 -0800 > > mm/page_alloc.c: refactor initialization of struct page for holes in > memory layout > > commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream. > > but check init_unavailable_range(), we need deal with the hole in the > range of one pageblock. > > For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn begin > with 0xde700. > > If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is > pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600 > to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be > initialized. The pfn 0xde600 is valid in the sense that there is a memory map for that pfn. Yet, with ARM's custom pfn_valid() will treat it as invalid because there is a hole. > After add the following patch, the oom test could passed, > diff --git a/mm/page_alloc.c b/mm/page_alloc.c > index aaa1655cf682..0c7e04f86f9f 100644 > --- a/mm/page_alloc.c > +++ b/mm/page_alloc.c > @@ -6484,13 +6484,14 @@ static u64 __meminit init_unavailable_range(unsigned > long spfn, > unsigned long epfn, > int zone, int node) > { > - unsigned long pfn; > + unsigned long pfn, pfn_down; > + unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages); > u64 pgcnt = 0; > > for (pfn = spfn; pfn < epfn; pfn++) { > - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { > - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) > - + pageblock_nr_pages - 1; > + pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages); > + if (!pfn_valid(pfn_down) && pfn_down != epfn_down) { > + pfn = pfn_down + pageblock_nr_pages - 1; > continue; > } > __init_single_page(pfn_to_page(pfn), pfn, zone, node); I'd rather prefer to keep init_unavailable_range() and the assumption that the memory map always covers an entire pageblock. Can you please try the below hack. Essentially, it makes arm with SPARSEMEM to use the generic pfn_valid() and updates the freeing of the memory map to have the entire pageblocks covered. If this works I'll send formal patches for those changes. diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig index 24804f11302d..86ee711a3fdb 100644 --- a/arch/arm/Kconfig +++ b/arch/arm/Kconfig @@ -73,7 +73,7 @@ config ARM select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL select HAVE_ARCH_MMAP_RND_BITS if MMU - select HAVE_ARCH_PFN_VALID +# select HAVE_ARCH_PFN_VALID select HAVE_ARCH_SECCOMP select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT select HAVE_ARCH_THREAD_STRUCT_WHITELIST diff --git a/mm/memblock.c b/mm/memblock.c index 504435753259..0d7bef1b49c3 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1928,9 +1928,11 @@ static void __init free_unused_memmap(void) unsigned long start, end, prev_end = 0; int i; +#ifndef CONFIG_ARM if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) || IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) return; +#endif /* * This relies on each bank being in address order. @@ -1943,14 +1945,13 @@ static void __init free_unused_memmap(void) * due to SPARSEMEM sections which aren't present. */ start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); -#else +#endif /* * Align down here since the VM subsystem insists that the * memmap entries are valid from the bank start aligned to * MAX_ORDER_NR_PAGES. */ start = round_down(start, MAX_ORDER_NR_PAGES); -#endif /* * If we had a previous bank, and there is a space > Before: > On node 0 totalpages: 311551 > Normal zone: 1230 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 157440 pages, LIFO batch:31 > Normal zone: 16384 pages in unavailable ranges > HighMem zone: 154111 pages, LIFO batch:31 > HighMem zone: 1 pages in unavailable ranges > > page:ef3cc000 is uninitialized and poisoned > raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff > > After: > On node 0 totalpages: 311551 > Normal zone: 1230 pages used for memmap > Normal zone: 0 pages reserved > Normal zone: 157440 pages, LIFO batch:31 > Normal zone: 17152 pages in unavailable ranges > HighMem zone: 154111 pages, LIFO batch:31 > HighMem zone: 513 pages in unavailable ranges > ... > page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600 > flags: 0xdd001000(reserved) > raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001 >
On 2021/5/13 18:55, Mike Rapoport wrote: > On Thu, May 13, 2021 at 11:44:00AM +0800, Kefeng Wang wrote: >> On 2021/5/12 16:26, Mike Rapoport wrote: >>> On Wed, May 12, 2021 at 11:08:14AM +0800, Kefeng Wang wrote: >>>> >>>> On 2021/5/11 16:48, Mike Rapoport wrote: >>>>> On Mon, May 10, 2021 at 11:10:20AM +0800, Kefeng Wang wrote: >>>>>> >>>>>>>> The memory is not continuous, see MEMBLOCK: >>>>>>>> memory size = 0x4c0fffff reserved size = 0x027ef058 >>>>>>>> memory.cnt = 0xa >>>>>>>> memory[0x0] [0x80a00000-0x855fffff], 0x04c00000 bytes flags: 0x0 >>>>>>>> memory[0x1] [0x86a00000-0x87dfffff], 0x01400000 bytes flags: 0x0 >>>>>>>> memory[0x2] [0x8bd00000-0x8c4fffff], 0x00800000 bytes flags: 0x0 >>>>>>>> memory[0x3] [0x8e300000-0x8ecfffff], 0x00a00000 bytes flags: 0x0 >>>>>>>> memory[0x4] [0x90d00000-0xbfffffff], 0x2f300000 bytes flags: 0x0 >>>>>>>> memory[0x5] [0xcc000000-0xdc9fffff], 0x10a00000 bytes flags: 0x0 >>>>>>>> memory[0x6] [0xde700000-0xde9fffff], 0x00300000 bytes flags: 0x0 >>>>>>>> ... >>>>>>>> >>>>>>>> The pfn_range [0xde600,0xde700] => addr_range [0xde600000,0xde700000] >>>>>>>> is not available memory, and we won't create memmap , so with or without >>>>>>>> your patch, we can't see the range in free_memmap(), right? >>>>>>> >>>>>>> This is not available memory and we won't see the reange in free_memmap(), >>>>>>> but we still should create memmap for it and that's what my patch tried to >>>>>>> do. >>>>>>> >>>>>>> There are a lot of places in core mm that operate on pageblocks and >>>>>>> free_unused_memmap() should make sure that any pageblock has a valid memory >>>>>>> map. >>>>>>> >>>>>>> Currently, that's not the case when SPARSEMEM=y and my patch tried to fix >>>>>>> it. >>>>>>> >>>>>>> Can you please send log with my patch applied and with the printing of >>>>>>> ranges that are freed in free_unused_memmap() you've used in previous >>>>>>> mails? >>>>> >>>>>> with your patch[1] and debug print in free_memmap, >>>>>> ----> free_memmap, start_pfn = 85800, 85800000 end_pfn = 86800, 86800000 >>>>>> ----> free_memmap, start_pfn = 8c800, 8c800000 end_pfn = 8e000, 8e000000 >>>>>> ----> free_memmap, start_pfn = 8f000, 8f000000 end_pfn = 90000, 90000000 >>>>>> ----> free_memmap, start_pfn = dcc00, dcc00000 end_pfn = de400, de400000 >>>>>> ----> free_memmap, start_pfn = dec00, dec00000 end_pfn = e0000, e0000000 >>>>>> ----> free_memmap, start_pfn = e0c00, e0c00000 end_pfn = e4000, e4000000 >>>>>> ----> free_memmap, start_pfn = f7000, f7000000 end_pfn = f8000, f8000000 >>>>> >>>>> It seems that freeing of the memory map is suboptimal still because that >>>>> code was not designed for memory layout that has more holes than Swiss >>>>> cheese. >>>>> >>>>> Still, the range [0xde600,0xde700] is not freed and there should be struct >>>>> pages for this range. >>>>> >>>>> Can you add >>>>> >>>>> dump_page(pfn_to_page(0xde600), ""); >>>>> >>>>> say, in the end of memblock_free_all()? >>>>> >>>> The range [0xde600,0xde700] is not memory, so it won't create struct page >>>> for it when sparse_init? >>> >>> sparse_init() indeed does not create memory map for unpopulated memory, but >>> it has pretty coarse granularity, i.e. 64M in your configuration. A hole >>> should be at least 64M in order to skip allocation of the memory map for >>> it. >>> >>> For example, your memory layout has a hole of 192M at pfn 0xc0000 and this >>> hole won't have the memory map. >>> >>> However the hole 0xdca00 - 0xde70 will still have a memory map in the >>> section that covers 0xdc000 - 0xe0000. >>> >>> I've tried outline this in a sketch below, hope it helps. >>> >>> Memory: >>> c0000 cc000 dca00 >>> --------------------------+ +--------------------------+ +----+ >>> memory bank |<- hole ->| memory bank | | mb | >>> --------------------------+ +--------------------------+ +----+ >>> de700 dea00 >>> >>> Memory map: >>> >>> b0000 b4000 c0000 cc000 d0000 d8000 dc000 >>> +--------+--------+- ... -+ +--------+- ... -+--------+---------+ >>> | memmap | memmap | ... |<- hole ->| memmap | ... | memmap | memmap | >>> +--------+--------+- ... -+ +--------+- ... -+--------+---------+ >>> >>> >> Thanks for the sketch, it is more clear, >> >>>> After apply patch[1], the dump_page log, >>>> >>>> page:ef3cc000 is uninitialized and poisoned >>>> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff >>>> page dumped because: >>> >>> This means that there is a memory map entry, and it got poisoned during the >>> initialization and never got reinitialized to sensible values, which would >>> be PageReserved() in this case. >>> >>> I believe this was fixed by commit 0740a50b9baa ("mm/page_alloc.c: refactor >>> initialization of struct page for holes in memory layout") in the mainline >>> tree. >>> >>> Can you backport it to your 5.10 tree and check if it helps? >> Hi Mike, the 0740a50b9baa is already in 5.10, tags/v5.10.24~5 > > Ah, you are using stable 5.10.y. > >> commit 4c84191cbc3eff49568d3c5cccb628fa382cf7fb >> Author: Mike Rapoport <rppt@kernel.org> >> Date: Fri Mar 12 21:07:12 2021 -0800 >> >> mm/page_alloc.c: refactor initialization of struct page for holes in >> memory layout >> >> commit 0740a50b9baa4472cfb12442df4b39e2712a64a4 upstream. >> >> but check init_unavailable_range(), we need deal with the hole in the >> range of one pageblock. >> >> For our scene, pageblock range: 0xde600,0xde7ff, but the available pfn begin >> with 0xde700. >> >> If pfn(eg, 0xde600) is not valid, the step in init_unavailable_range is >> pageblock_nr_pages, and ALIGN_DOWN(pfn, pageblock_nr_pages) from 0xde600 >> to 0xde700 is same, so the page range for pfn [0xde600,0xde700] won't be >> initialized. > > The pfn 0xde600 is valid in the sense that there is a memory map for that > pfn. Yet, with ARM's custom pfn_valid() will treat it as invalid because > there is a hole. > >> After add the following patch, the oom test could passed, > >> diff --git a/mm/page_alloc.c b/mm/page_alloc.c >> index aaa1655cf682..0c7e04f86f9f 100644 >> --- a/mm/page_alloc.c >> +++ b/mm/page_alloc.c >> @@ -6484,13 +6484,14 @@ static u64 __meminit init_unavailable_range(unsigned >> long spfn, >> unsigned long epfn, >> int zone, int node) >> { >> - unsigned long pfn; >> + unsigned long pfn, pfn_down; >> + unsigned long epfn_down = ALIGN_DOWN(epfn, pageblock_nr_pages); >> u64 pgcnt = 0; >> >> for (pfn = spfn; pfn < epfn; pfn++) { >> - if (!pfn_valid(ALIGN_DOWN(pfn, pageblock_nr_pages))) { >> - pfn = ALIGN_DOWN(pfn, pageblock_nr_pages) >> - + pageblock_nr_pages - 1; >> + pfn_down = ALIGN_DOWN(pfn, pageblock_nr_pages); >> + if (!pfn_valid(pfn_down) && pfn_down != epfn_down) { >> + pfn = pfn_down + pageblock_nr_pages - 1; >> continue; >> } >> __init_single_page(pfn_to_page(pfn), pfn, zone, node); > > I'd rather prefer to keep init_unavailable_range() and the assumption that > the memory map always covers an entire pageblock. > > Can you please try the below hack. Essentially, it makes arm with SPARSEMEM > to use the generic pfn_valid() and updates the freeing of the memory map to > have the entire pageblocks covered. > > If this works I'll send formal patches for those changes. > > > diff --git a/arch/arm/Kconfig b/arch/arm/Kconfig > index 24804f11302d..86ee711a3fdb 100644 > --- a/arch/arm/Kconfig > +++ b/arch/arm/Kconfig > @@ -73,7 +73,7 @@ config ARM > select HAVE_ARCH_KGDB if !CPU_ENDIAN_BE32 && MMU > select HAVE_ARCH_KASAN if MMU && !XIP_KERNEL > select HAVE_ARCH_MMAP_RND_BITS if MMU > - select HAVE_ARCH_PFN_VALID > +# select HAVE_ARCH_PFN_VALID > select HAVE_ARCH_SECCOMP > select HAVE_ARCH_SECCOMP_FILTER if AEABI && !OABI_COMPAT > select HAVE_ARCH_THREAD_STRUCT_WHITELIST > diff --git a/mm/memblock.c b/mm/memblock.c > index 504435753259..0d7bef1b49c3 100644 > --- a/mm/memblock.c > +++ b/mm/memblock.c > @@ -1928,9 +1928,11 @@ static void __init free_unused_memmap(void) > unsigned long start, end, prev_end = 0; > int i; > > +#ifndef CONFIG_ARM > if (!IS_ENABLED(CONFIG_HAVE_ARCH_PFN_VALID) || > IS_ENABLED(CONFIG_SPARSEMEM_VMEMMAP)) > return; > +#endif > > /* > * This relies on each bank being in address order. > @@ -1943,14 +1945,13 @@ static void __init free_unused_memmap(void) > * due to SPARSEMEM sections which aren't present. > */ > start = min(start, ALIGN(prev_end, PAGES_PER_SECTION)); > -#else > +#endif > /* > * Align down here since the VM subsystem insists that the > * memmap entries are valid from the bank start aligned to > * MAX_ORDER_NR_PAGES. > */ > start = round_down(start, MAX_ORDER_NR_PAGES); > -#endif > > /* > * If we had a previous bank, and there is a space > > Without HAVE_ARCH_PFN_VALID, init_unavailable_range will set those page with Reserved flag, and yes, it works for oom test. On node 0 totalpages: 311551 Normal zone: 1230 pages used for memmap Normal zone: 0 pages reserved Normal zone: 157440 pages, LIFO batch:31 Normal zone: 55552 pages in unavailable ranges HighMem zone: 154111 pages, LIFO batch:31 HighMem zone: 41985 pages in unavailable ranges Thanks for your kindly guidance. >> Before: >> On node 0 totalpages: 311551 >> Normal zone: 1230 pages used for memmap >> Normal zone: 0 pages reserved >> Normal zone: 157440 pages, LIFO batch:31 >> Normal zone: 16384 pages in unavailable ranges >> HighMem zone: 154111 pages, LIFO batch:31 >> HighMem zone: 1 pages in unavailable ranges >> >> page:ef3cc000 is uninitialized and poisoned >> raw: ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff ffffffff >> >> After: >> On node 0 totalpages: 311551 >> Normal zone: 1230 pages used for memmap >> Normal zone: 0 pages reserved >> Normal zone: 157440 pages, LIFO batch:31 >> Normal zone: 17152 pages in unavailable ranges >> HighMem zone: 154111 pages, LIFO batch:31 >> HighMem zone: 513 pages in unavailable ranges >> ... >> page:(ptrval) refcount:1 mapcount:0 mapping:00000000 index:0x0 pfn:0xde600 >> flags: 0xdd001000(reserved) >> raw: dd001000 ef3cc004 ef3cc004 00000000 00000000 00000000 ffffffff 00000001 >> >
From: Mike Rapoport <rppt@linux.ibm.com> Hi, These patches aim to remove CONFIG_HOLES_IN_ZONE and essentially hardwire pfn_valid_within() to 1. The idea is to mark NOMAP pages as reserved in the memory map and restore the intended semantics of pfn_valid() to designate availability of struct page for a pfn. With this the core mm will be able to cope with the fact that it cannot use NOMAP pages and the holes created by NOMAP ranges within MAX_ORDER blocks will be treated correctly even without the need for pfn_valid_within. The patches are only boot tested on qemu-system-aarch64 so I'd really appreciate memory stress tests on real hardware. If this actually works we'll be one step closer to drop custom pfn_valid() on arm64 altogether. v2: * Add check for PFN overflow in pfn_is_map_memory() * Add Acked-by and Reviewed-by tags, thanks David. v1: Link: https://lore.kernel.org/lkml/20210420090925.7457-1-rppt@kernel.org * Add comment about the semantics of pfn_valid() as Anshuman suggested * Extend comments about MEMBLOCK_NOMAP, per Anshuman * Use pfn_is_map_memory() name for the exported wrapper for memblock_is_map_memory(). It is still local to arch/arm64 in the end because of header dependency issues. rfc: Link: https://lore.kernel.org/lkml/20210407172607.8812-1-rppt@kernel.org Mike Rapoport (4): include/linux/mmzone.h: add documentation for pfn_valid() memblock: update initialization of reserved pages arm64: decouple check whether pfn is in linear map from pfn_valid() arm64: drop pfn_valid_within() and simplify pfn_valid() arch/arm64/Kconfig | 3 --- arch/arm64/include/asm/memory.h | 2 +- arch/arm64/include/asm/page.h | 1 + arch/arm64/kvm/mmu.c | 2 +- arch/arm64/mm/init.c | 10 ++++++++-- arch/arm64/mm/ioremap.c | 4 ++-- arch/arm64/mm/mmu.c | 2 +- include/linux/memblock.h | 4 +++- include/linux/mmzone.h | 11 +++++++++++ mm/memblock.c | 28 ++++++++++++++++++++++++++-- 10 files changed, 54 insertions(+), 13 deletions(-) base-commit: e49d033bddf5b565044e2abe4241353959bc9120