Message ID | 20240601133402.2675-1-richard.weiyang@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: increase totalram_pages on freeing to buddy system | expand |
On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >On 01.06.24 17:32, David Hildenbrand wrote: >> On 01.06.24 15:34, Wei Yang wrote: >> > Total memory represents pages managed by buddy system. >> >> No, that's managed pages. >> >> > After the >> > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >> > being managed. >> > >> >> I recall one reason that is done, so other subsystem know the total >> memory size even before deferred init is done. >> >> > free_low_memory_core_early() returns number of pages for all free pages, >> > even at this moment only early initialized pages are freed to buddy >> > system. This means the total memory at this moment is not correct. >> > >> > Let's increase it when pages are freed to buddy system. >> >> I'm missing the "why", and the very first sentence of this patch is wrong. > >Correction: your statement was correct :) That's why >adjust_managed_page_count() adjusts that as well. > >__free_pages_core() only adjusts managed page count, because it assumes >totalram has already been adjusted early during boot. > >The reason we have this split for now, I think, is because of subsystems that >call totalram_pages() during init. > >So the "why" question remains, because this change has the potential to break >other stuff. > Thanks, I didn't notice this. >-- >Cheers, > >David / dhildenb
On 02.06.24 02:58, Wei Yang wrote: > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >> On 01.06.24 17:32, David Hildenbrand wrote: >>> On 01.06.24 15:34, Wei Yang wrote: >>>> Total memory represents pages managed by buddy system. >>> >>> No, that's managed pages. >>> >>>> After the >>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >>>> being managed. >>>> >>> >>> I recall one reason that is done, so other subsystem know the total >>> memory size even before deferred init is done. >>> >>>> free_low_memory_core_early() returns number of pages for all free pages, >>>> even at this moment only early initialized pages are freed to buddy >>>> system. This means the total memory at this moment is not correct. >>>> >>>> Let's increase it when pages are freed to buddy system. >>> >>> I'm missing the "why", and the very first sentence of this patch is wrong. >> >> Correction: your statement was correct :) That's why >> adjust_managed_page_count() adjusts that as well. >> >> __free_pages_core() only adjusts managed page count, because it assumes >> totalram has already been adjusted early during boot. >> >> The reason we have this split for now, I think, is because of subsystems that >> call totalram_pages() during init. >> >> So the "why" question remains, because this change has the potential to break >> other stuff. >> > > Thanks, I didn't notice this. I think having your cleanup would be very nice, as I have patches in the works that would benefit from being able to move the totalram update from memory hotplug code to __free_pages_core(). We'd have to make sure that no code relies on totalram being sane/fixed during boot for the initial memory. I think right now we might have such code. Further, we currently require only a single atomic RMW instruction to adjust totalram during boot, moving it to __free_pages_core() would imply more atomics: but usually only one per MAX_ORDER page, so I doubt this would make a big difference.
On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >On 02.06.24 02:58, Wei Yang wrote: >> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >> > On 01.06.24 17:32, David Hildenbrand wrote: >> > > On 01.06.24 15:34, Wei Yang wrote: >> > > > Total memory represents pages managed by buddy system. >> > > >> > > No, that's managed pages. >> > > >> > > > After the >> > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >> > > > being managed. >> > > > >> > > >> > > I recall one reason that is done, so other subsystem know the total >> > > memory size even before deferred init is done. >> > > >> > > > free_low_memory_core_early() returns number of pages for all free pages, >> > > > even at this moment only early initialized pages are freed to buddy >> > > > system. This means the total memory at this moment is not correct. >> > > > >> > > > Let's increase it when pages are freed to buddy system. >> > > >> > > I'm missing the "why", and the very first sentence of this patch is wrong. >> > >> > Correction: your statement was correct :) That's why >> > adjust_managed_page_count() adjusts that as well. >> > >> > __free_pages_core() only adjusts managed page count, because it assumes >> > totalram has already been adjusted early during boot. >> > >> > The reason we have this split for now, I think, is because of subsystems that >> > call totalram_pages() during init. >> > >> > So the "why" question remains, because this change has the potential to break >> > other stuff. >> > >> >> Thanks, I didn't notice this. > >I think having your cleanup would be very nice, as I have patches in the >works that would benefit from being able to move the totalram update from >memory hotplug code to __free_pages_core(). > I got the same feeling. >We'd have to make sure that no code relies on totalram being sane/fixed >during boot for the initial memory. I think right now we might have such >code. > One concern is totalram would change when hotplug is enabled. That sounds those codes should do some re-calculation after totalram changes? >Further, we currently require only a single atomic RMW instruction to adjust >totalram during boot, moving it to __free_pages_core() would imply more >atomics: but usually only one per MAX_ORDER page, so I doubt this would make >a big difference. > I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and with defer_init only low zone's memory is initialized during boot. Per my understanding, low zone's memory is 4GB for x86. So the extra calculation is 4GB / 2MB = 2K. >-- >Cheers, > >David / dhildenb
On 03.06.24 22:01, Wei Yang wrote: > On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >> On 02.06.24 02:58, Wei Yang wrote: >>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >>>> On 01.06.24 17:32, David Hildenbrand wrote: >>>>> On 01.06.24 15:34, Wei Yang wrote: >>>>>> Total memory represents pages managed by buddy system. >>>>> >>>>> No, that's managed pages. >>>>> >>>>>> After the >>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >>>>>> being managed. >>>>>> >>>>> >>>>> I recall one reason that is done, so other subsystem know the total >>>>> memory size even before deferred init is done. >>>>> >>>>>> free_low_memory_core_early() returns number of pages for all free pages, >>>>>> even at this moment only early initialized pages are freed to buddy >>>>>> system. This means the total memory at this moment is not correct. >>>>>> >>>>>> Let's increase it when pages are freed to buddy system. >>>>> >>>>> I'm missing the "why", and the very first sentence of this patch is wrong. >>>> >>>> Correction: your statement was correct :) That's why >>>> adjust_managed_page_count() adjusts that as well. >>>> >>>> __free_pages_core() only adjusts managed page count, because it assumes >>>> totalram has already been adjusted early during boot. >>>> >>>> The reason we have this split for now, I think, is because of subsystems that >>>> call totalram_pages() during init. >>>> >>>> So the "why" question remains, because this change has the potential to break >>>> other stuff. >>>> >>> >>> Thanks, I didn't notice this. >> >> I think having your cleanup would be very nice, as I have patches in the >> works that would benefit from being able to move the totalram update from >> memory hotplug code to __free_pages_core(). >> > > I got the same feeling. > >> We'd have to make sure that no code relies on totalram being sane/fixed >> during boot for the initial memory. I think right now we might have such >> code. >> > > One concern is totalram would change when hotplug is enabled. That sounds > those codes should do some re-calculation after totalram changes? We don't have such code in place -- there were discussions regarding that recently. It would be reasonable to take a look at all totalram_pages() users and determine if they could be affected by deferring updating it. At least page_alloc_init_late()->deferred_init_memmap() happens before do_basic_setup()->do_initcalls(), which is good. So maybe it's not a big concern and this separate totalram pages accounting is much rather some legacy leftover. > >> Further, we currently require only a single atomic RMW instruction to adjust >> totalram during boot, moving it to __free_pages_core() would imply more >> atomics: but usually only one per MAX_ORDER page, so I doubt this would make >> a big difference. >> > > I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and > with defer_init only low zone's memory is initialized during boot. Per my > understanding, low zone's memory is 4GB for x86. So the extra calculation is > 4GB / 2MB = 2K. Well, for all deferred-initialized memory you would now also require these -- or if deferred-init would be disabled. Sounds like an interesting measurement if that would be measurable at all.
On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote: >On 03.06.24 22:01, Wei Yang wrote: >> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >> > On 02.06.24 02:58, Wei Yang wrote: >> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >> > > > On 01.06.24 17:32, David Hildenbrand wrote: >> > > > > On 01.06.24 15:34, Wei Yang wrote: >> > > > > > Total memory represents pages managed by buddy system. >> > > > > >> > > > > No, that's managed pages. >> > > > > >> > > > > > After the >> > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >> > > > > > being managed. >> > > > > > >> > > > > >> > > > > I recall one reason that is done, so other subsystem know the total >> > > > > memory size even before deferred init is done. >> > > > > >> > > > > > free_low_memory_core_early() returns number of pages for all free pages, >> > > > > > even at this moment only early initialized pages are freed to buddy >> > > > > > system. This means the total memory at this moment is not correct. >> > > > > > >> > > > > > Let's increase it when pages are freed to buddy system. >> > > > > >> > > > > I'm missing the "why", and the very first sentence of this patch is wrong. >> > > > >> > > > Correction: your statement was correct :) That's why >> > > > adjust_managed_page_count() adjusts that as well. >> > > > >> > > > __free_pages_core() only adjusts managed page count, because it assumes >> > > > totalram has already been adjusted early during boot. >> > > > >> > > > The reason we have this split for now, I think, is because of subsystems that >> > > > call totalram_pages() during init. >> > > > >> > > > So the "why" question remains, because this change has the potential to break >> > > > other stuff. >> > > > >> > > >> > > Thanks, I didn't notice this. >> > >> > I think having your cleanup would be very nice, as I have patches in the >> > works that would benefit from being able to move the totalram update from >> > memory hotplug code to __free_pages_core(). >> > >> >> I got the same feeling. >> >> > We'd have to make sure that no code relies on totalram being sane/fixed >> > during boot for the initial memory. I think right now we might have such >> > code. >> > >> >> One concern is totalram would change when hotplug is enabled. That sounds >> those codes should do some re-calculation after totalram changes? > >We don't have such code in place -- there were discussions regarding that >recently. > >It would be reasonable to take a look at all totalram_pages() users and >determine if they could be affected by deferring updating it. > >At least page_alloc_init_late()->deferred_init_memmap() happens before >do_basic_setup()->do_initcalls(), which is good. > But deferred_init_memmap() will spawn threads to do the work. I am afraid do_initcalls() won't wait for the completion of defer_init? Do I miss something? >So maybe it's not a big concern and this separate totalram pages accounting >is much rather some legacy leftover. > >> >> > Further, we currently require only a single atomic RMW instruction to adjust >> > totalram during boot, moving it to __free_pages_core() would imply more >> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make >> > a big difference. >> > >> >> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and >> with defer_init only low zone's memory is initialized during boot. Per my >> understanding, low zone's memory is 4GB for x86. So the extra calculation is >> 4GB / 2MB = 2K. > >Well, for all deferred-initialized memory you would now also require these -- >or if deferred-init would be disabled. Sounds like an interesting measurement >if that would be measurable at all. > >-- >Cheers, > >David / dhildenb
On 06.06.24 00:44, Wei Yang wrote: > On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote: >> On 03.06.24 22:01, Wei Yang wrote: >>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >>>> On 02.06.24 02:58, Wei Yang wrote: >>>>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >>>>>> On 01.06.24 17:32, David Hildenbrand wrote: >>>>>>> On 01.06.24 15:34, Wei Yang wrote: >>>>>>>> Total memory represents pages managed by buddy system. >>>>>>> >>>>>>> No, that's managed pages. >>>>>>> >>>>>>>> After the >>>>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >>>>>>>> being managed. >>>>>>>> >>>>>>> >>>>>>> I recall one reason that is done, so other subsystem know the total >>>>>>> memory size even before deferred init is done. >>>>>>> >>>>>>>> free_low_memory_core_early() returns number of pages for all free pages, >>>>>>>> even at this moment only early initialized pages are freed to buddy >>>>>>>> system. This means the total memory at this moment is not correct. >>>>>>>> >>>>>>>> Let's increase it when pages are freed to buddy system. >>>>>>> >>>>>>> I'm missing the "why", and the very first sentence of this patch is wrong. >>>>>> >>>>>> Correction: your statement was correct :) That's why >>>>>> adjust_managed_page_count() adjusts that as well. >>>>>> >>>>>> __free_pages_core() only adjusts managed page count, because it assumes >>>>>> totalram has already been adjusted early during boot. >>>>>> >>>>>> The reason we have this split for now, I think, is because of subsystems that >>>>>> call totalram_pages() during init. >>>>>> >>>>>> So the "why" question remains, because this change has the potential to break >>>>>> other stuff. >>>>>> >>>>> >>>>> Thanks, I didn't notice this. >>>> >>>> I think having your cleanup would be very nice, as I have patches in the >>>> works that would benefit from being able to move the totalram update from >>>> memory hotplug code to __free_pages_core(). >>>> >>> >>> I got the same feeling. >>> >>>> We'd have to make sure that no code relies on totalram being sane/fixed >>>> during boot for the initial memory. I think right now we might have such >>>> code. >>>> >>> >>> One concern is totalram would change when hotplug is enabled. That sounds >>> those codes should do some re-calculation after totalram changes? >> >> We don't have such code in place -- there were discussions regarding that >> recently. >> >> It would be reasonable to take a look at all totalram_pages() users and >> determine if they could be affected by deferring updating it. >> >> At least page_alloc_init_late()->deferred_init_memmap() happens before >> do_basic_setup()->do_initcalls(), which is good. >> > > But deferred_init_memmap() will spawn threads to do the work. I am afraid > do_initcalls() won't wait for the completion of defer_init? Do I miss > something? Don't we wait for them to finish? /* Block until all are initialised */ wait_for_completion(&pgdat_init_all_done_comp);
On Thu, Jun 06, 2024 at 09:14:25AM +0200, David Hildenbrand wrote: >On 06.06.24 00:44, Wei Yang wrote: >> On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote: >> > On 03.06.24 22:01, Wei Yang wrote: >> > > On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >> > > > On 02.06.24 02:58, Wei Yang wrote: >> > > > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >> > > > > > On 01.06.24 17:32, David Hildenbrand wrote: >> > > > > > > On 01.06.24 15:34, Wei Yang wrote: >> > > > > > > > Total memory represents pages managed by buddy system. >> > > > > > > >> > > > > > > No, that's managed pages. >> > > > > > > >> > > > > > > > After the >> > > > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before >> > > > > > > > being managed. >> > > > > > > > >> > > > > > > >> > > > > > > I recall one reason that is done, so other subsystem know the total >> > > > > > > memory size even before deferred init is done. >> > > > > > > >> > > > > > > > free_low_memory_core_early() returns number of pages for all free pages, >> > > > > > > > even at this moment only early initialized pages are freed to buddy >> > > > > > > > system. This means the total memory at this moment is not correct. >> > > > > > > > >> > > > > > > > Let's increase it when pages are freed to buddy system. >> > > > > > > >> > > > > > > I'm missing the "why", and the very first sentence of this patch is wrong. >> > > > > > >> > > > > > Correction: your statement was correct :) That's why >> > > > > > adjust_managed_page_count() adjusts that as well. >> > > > > > >> > > > > > __free_pages_core() only adjusts managed page count, because it assumes >> > > > > > totalram has already been adjusted early during boot. >> > > > > > >> > > > > > The reason we have this split for now, I think, is because of subsystems that >> > > > > > call totalram_pages() during init. >> > > > > > >> > > > > > So the "why" question remains, because this change has the potential to break >> > > > > > other stuff. >> > > > > > >> > > > > >> > > > > Thanks, I didn't notice this. >> > > > >> > > > I think having your cleanup would be very nice, as I have patches in the >> > > > works that would benefit from being able to move the totalram update from >> > > > memory hotplug code to __free_pages_core(). >> > > > >> > > >> > > I got the same feeling. >> > > >> > > > We'd have to make sure that no code relies on totalram being sane/fixed >> > > > during boot for the initial memory. I think right now we might have such >> > > > code. >> > > > >> > > >> > > One concern is totalram would change when hotplug is enabled. That sounds >> > > those codes should do some re-calculation after totalram changes? >> > >> > We don't have such code in place -- there were discussions regarding that >> > recently. >> > >> > It would be reasonable to take a look at all totalram_pages() users and >> > determine if they could be affected by deferring updating it. >> > >> > At least page_alloc_init_late()->deferred_init_memmap() happens before >> > do_basic_setup()->do_initcalls(), which is good. >> > >> >> But deferred_init_memmap() will spawn threads to do the work. I am afraid >> do_initcalls() won't wait for the completion of defer_init? Do I miss >> something? > >Don't we wait for them to finish? > >/* Block until all are initialised */ >wait_for_completion(&pgdat_init_all_done_comp); You are right. I missed this. So we still need to wait for mm fully initialized and then to initialise others. I thought everything would start in parallel. > >-- >Cheers, > >David / dhildenb
On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote: >On 03.06.24 22:01, Wei Yang wrote: >> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote: >> > On 02.06.24 02:58, Wei Yang wrote: >> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote: >> > > > On 01.06.24 17:32, David Hildenbrand wrote: >> > > > > On 01.06.24 15:34, Wei Yang wrote: >> > > > > > Total memory represents pages managed by buddy system. >> > > > > >> > > > > No, that's managed pages. >> > > > > [...] >> > > > So the "why" question remains, because this change has the potential to break >> > > > other stuff. >> > > > >> > > >> > > Thanks, I didn't notice this. >> > >> > I think having your cleanup would be very nice, as I have patches in the >> > works that would benefit from being able to move the totalram update from >> > memory hotplug code to __free_pages_core(). >> > >> >> I got the same feeling. >> >> > We'd have to make sure that no code relies on totalram being sane/fixed >> > during boot for the initial memory. I think right now we might have such >> > code. >> > >> >> One concern is totalram would change when hotplug is enabled. That sounds >> those codes should do some re-calculation after totalram changes? > >We don't have such code in place -- there were discussions regarding that >recently. > >It would be reasonable to take a look at all totalram_pages() users and >determine if they could be affected by deferring updating it. > >At least page_alloc_init_late()->deferred_init_memmap() happens before >do_basic_setup()->do_initcalls(), which is good. > >So maybe it's not a big concern and this separate totalram pages accounting >is much rather some legacy leftover. > I grepped the whole tree and found following 4 points may need to adjust. Hope I don't miss one. * arch/s390/mm/init.c:73: while (order > 2 && (totalram_pages() >> 10) < (1UL << order)) * arch/um/kernel/mem.c:76: max_low_pfn = totalram_pages(); * kernel/fork.c:1002: unsigned long nr_pages = totalram_pages(); * mm/mm_init.c:2689: K(physpages - totalram_pages() - totalcma_pages), * arch/s390/mm/init.c:73: while (order > 2 && (totalram_pages() >> 10) < (1UL << order)) mem_init memblock_free_all setup_zero_pages This calculate the size of empty_zero_page. Not sure if we can postpone this function call after defer init. * arch/um/kernel/mem.c:76: max_low_pfn = totalram_pages(); mem_init memblock_free_all max_low_pfn = totalram_pages The usage seems not correct. totalram_pages return number of pages, but here it seems need the pfn value. * kernel/fork.c:1002: unsigned long nr_pages = totalram_pages(); start_kernel fork_init set_max_threads Not sure it would be fine to set rlimit again after defer init. * mm/mm_init.c:2689: K(physpages - totalram_pages() - totalcma_pages), Per my understanding, we can print the info after defer init. >> >> > Further, we currently require only a single atomic RMW instruction to adjust >> > totalram during boot, moving it to __free_pages_core() would imply more >> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make >> > a big difference. >> > >> >> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and >> with defer_init only low zone's memory is initialized during boot. Per my >> understanding, low zone's memory is 4GB for x86. So the extra calculation is >> 4GB / 2MB = 2K. > >Well, for all deferred-initialized memory you would now also require these -- >or if deferred-init would be disabled. Sounds like an interesting measurement >if that would be measurable at all. > >-- >Cheers, > >David / dhildenb
Hello, kernel test robot noticed "xfstests.generic.224.fail" on: commit: a4722639595dc3fa15c793a5b00bb4b023a34288 ("[PATCH] mm: increase totalram_pages on freeing to buddy system") url: https://github.com/intel-lab-lkp/linux/commits/Wei-Yang/mm-increase-totalram_pages-on-freeing-to-buddy-system/20240601-213533 base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything patch link: https://lore.kernel.org/all/20240601133402.2675-1-richard.weiyang@gmail.com/ patch subject: [PATCH] mm: increase totalram_pages on freeing to buddy system in testcase: xfstests version: xfstests-x86_64-98379713-1_20240603 with following parameters: disk: 4HDD fs: udf test: generic-224 compiler: gcc-13 test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202406111644.931d379c-oliver.sang@intel.com 2024-06-09 06:39:08 export TEST_DIR=/fs/sda1 2024-06-09 06:39:08 export TEST_DEV=/dev/sda1 2024-06-09 06:39:08 export FSTYP=udf 2024-06-09 06:39:08 export SCRATCH_MNT=/fs/scratch 2024-06-09 06:39:08 mkdir /fs/scratch -p 2024-06-09 06:39:08 export DISABLE_UDF_TEST=1 2024-06-09 06:39:08 export SCRATCH_DEV=/dev/sda4 2024-06-09 06:39:08 echo generic/224 2024-06-09 06:39:08 ./check generic/224 FSTYP -- udf PLATFORM -- Linux/x86_64 lkp-skl-d02 6.10.0-rc1-00201-ga4722639595d #1 SMP PREEMPT_DYNAMIC Fri Jun 7 19:31:50 CST 2024 MKFS_OPTIONS -- /dev/sda4 MOUNT_OPTIONS -- /dev/sda4 /fs/scratch generic/224 [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//generic/224.out.bad) --- tests/generic/224.out 2024-06-03 12:10:00.000000000 +0000 +++ /lkp/benchmarks/xfstests/results//generic/224.out.bad 2024-06-09 06:41:46.383320015 +0000 @@ -1,2 +1,34 @@ QA output created by 224 -*** Silence is golden *** +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable ... (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/224.out /lkp/benchmarks/xfstests/results//generic/224.out.bad' to see the entire diff) Ran: generic/224 Failures: generic/224 Failed 1 of 1 tests The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20240611/202406111644.931d379c-oliver.sang@intel.com
diff --git a/mm/memblock.c b/mm/memblock.c index 98d25689cf10..51dd143eabb4 100644 --- a/mm/memblock.c +++ b/mm/memblock.c @@ -1716,10 +1716,8 @@ void __init memblock_free_late(phys_addr_t base, phys_addr_t size) cursor = PFN_UP(base); end = PFN_DOWN(base + size); - for (; cursor < end; cursor++) { + for (; cursor < end; cursor++) memblock_free_pages(pfn_to_page(cursor), cursor, 0); - totalram_pages_inc(); - } } /* @@ -2128,7 +2126,7 @@ static void __init __free_pages_memory(unsigned long start, unsigned long end) } } -static unsigned long __init __free_memory_core(phys_addr_t start, +static void __init __free_memory_core(phys_addr_t start, phys_addr_t end) { unsigned long start_pfn = PFN_UP(start); @@ -2136,11 +2134,9 @@ static unsigned long __init __free_memory_core(phys_addr_t start, PFN_DOWN(end), max_low_pfn); if (start_pfn >= end_pfn) - return 0; + return; __free_pages_memory(start_pfn, end_pfn); - - return end_pfn - start_pfn; } static void __init memmap_init_reserved_pages(void) @@ -2182,9 +2178,8 @@ static void __init memmap_init_reserved_pages(void) } } -static unsigned long __init free_low_memory_core_early(void) +static void __init free_low_memory_core_early(void) { - unsigned long count = 0; phys_addr_t start, end; u64 i; @@ -2199,9 +2194,7 @@ static unsigned long __init free_low_memory_core_early(void) */ for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end, NULL) - count += __free_memory_core(start, end); - - return count; + __free_memory_core(start, end); } static int reset_managed_pages_done __initdata; @@ -2232,13 +2225,9 @@ void __init reset_all_zones_managed_pages(void) */ void __init memblock_free_all(void) { - unsigned long pages; - free_unused_memmap(); reset_all_zones_managed_pages(); - - pages = free_low_memory_core_early(); - totalram_pages_add(pages); + free_low_memory_core_early(); } #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK) diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c index 431b1f6753c0..84bb4b0205d7 100644 --- a/mm/memory_hotplug.c +++ b/mm/memory_hotplug.c @@ -637,7 +637,6 @@ void generic_online_page(struct page *page, unsigned int order) */ debug_pagealloc_map_pages(page, 1 << order); __free_pages_core(page, order); - totalram_pages_add(1UL << order); } EXPORT_SYMBOL_GPL(generic_online_page); diff --git a/mm/page_alloc.c b/mm/page_alloc.c index 00954d8c9c30..33cd15efac9e 100644 --- a/mm/page_alloc.c +++ b/mm/page_alloc.c @@ -1234,6 +1234,7 @@ void __free_pages_core(struct page *page, unsigned int order) set_page_count(p, 0); atomic_long_add(nr_pages, &page_zone(page)->managed_pages); + totalram_pages_add(nr_pages); if (page_contains_unaccepted(page, order)) { if (order == MAX_PAGE_ORDER && __free_unaccepted(page))
Total memory represents pages managed by buddy system. After the introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before being managed. free_low_memory_core_early() returns number of pages for all free pages, even at this moment only early initialized pages are freed to buddy system. This means the total memory at this moment is not correct. Let's increase it when pages are freed to buddy system. Signed-off-by: Wei Yang <richard.weiyang@gmail.com> --- mm/memblock.c | 23 ++++++----------------- mm/memory_hotplug.c | 1 - mm/page_alloc.c | 1 + 3 files changed, 7 insertions(+), 18 deletions(-)