diff mbox series

mm: increase totalram_pages on freeing to buddy system

Message ID 20240601133402.2675-1-richard.weiyang@gmail.com (mailing list archive)
State New
Headers show
Series mm: increase totalram_pages on freeing to buddy system | expand

Commit Message

Wei Yang June 1, 2024, 1:34 p.m. UTC
Total memory represents pages managed by buddy system. After the
introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
being managed.

free_low_memory_core_early() returns number of pages for all free pages,
even at this moment only early initialized pages are freed to buddy
system. This means the total memory at this moment is not correct.

Let's increase it when pages are freed to buddy system.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memblock.c       | 23 ++++++-----------------
 mm/memory_hotplug.c |  1 -
 mm/page_alloc.c     |  1 +
 3 files changed, 7 insertions(+), 18 deletions(-)

Comments

Wei Yang June 2, 2024, 12:58 a.m. UTC | #1
On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>On 01.06.24 17:32, David Hildenbrand wrote:
>> On 01.06.24 15:34, Wei Yang wrote:
>> > Total memory represents pages managed by buddy system.
>> 
>> No, that's managed pages.
>> 
>> > After the
>> > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > being managed.
>> > 
>> 
>> I recall one reason that is done, so other subsystem know the total
>> memory size even before deferred init is done.
>> 
>> > free_low_memory_core_early() returns number of pages for all free pages,
>> > even at this moment only early initialized pages are freed to buddy
>> > system. This means the total memory at this moment is not correct.
>> > 
>> > Let's increase it when pages are freed to buddy system.
>> 
>> I'm missing the "why", and the very first sentence of this patch is wrong.
>
>Correction: your statement was correct :) That's why
>adjust_managed_page_count() adjusts that as well.
>
>__free_pages_core() only adjusts managed page count, because it assumes
>totalram has already been adjusted early during boot.
>
>The reason we have this split for now, I think, is because of subsystems that
>call totalram_pages() during init.
>
>So the "why" question remains, because this change has the potential to break
>other stuff.
>

Thanks, I didn't notice this.

>-- 
>Cheers,
>
>David / dhildenb
David Hildenbrand June 3, 2024, 8:55 a.m. UTC | #2
On 02.06.24 02:58, Wei Yang wrote:
> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> On 01.06.24 17:32, David Hildenbrand wrote:
>>> On 01.06.24 15:34, Wei Yang wrote:
>>>> Total memory represents pages managed by buddy system.
>>>
>>> No, that's managed pages.
>>>
>>>> After the
>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>> being managed.
>>>>
>>>
>>> I recall one reason that is done, so other subsystem know the total
>>> memory size even before deferred init is done.
>>>
>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>> even at this moment only early initialized pages are freed to buddy
>>>> system. This means the total memory at this moment is not correct.
>>>>
>>>> Let's increase it when pages are freed to buddy system.
>>>
>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>
>> Correction: your statement was correct :) That's why
>> adjust_managed_page_count() adjusts that as well.
>>
>> __free_pages_core() only adjusts managed page count, because it assumes
>> totalram has already been adjusted early during boot.
>>
>> The reason we have this split for now, I think, is because of subsystems that
>> call totalram_pages() during init.
>>
>> So the "why" question remains, because this change has the potential to break
>> other stuff.
>>
> 
> Thanks, I didn't notice this.

I think having your cleanup would be very nice, as I have patches in the 
works that would benefit from being able to move the totalram update 
from memory hotplug code to __free_pages_core().

We'd have to make sure that no code relies on totalram being sane/fixed 
during boot for the initial memory. I think right now we might have such 
code.

Further, we currently require only a single atomic RMW instruction to 
adjust totalram during boot, moving it to __free_pages_core() would 
imply more atomics: but usually only one per MAX_ORDER page, so I doubt 
this would make a big difference.
Wei Yang June 3, 2024, 8:01 p.m. UTC | #3
On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>On 02.06.24 02:58, Wei Yang wrote:
>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > Total memory represents pages managed by buddy system.
>> > > 
>> > > No, that's managed pages.
>> > > 
>> > > > After the
>> > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > being managed.
>> > > > 
>> > > 
>> > > I recall one reason that is done, so other subsystem know the total
>> > > memory size even before deferred init is done.
>> > > 
>> > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > even at this moment only early initialized pages are freed to buddy
>> > > > system. This means the total memory at this moment is not correct.
>> > > > 
>> > > > Let's increase it when pages are freed to buddy system.
>> > > 
>> > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > 
>> > Correction: your statement was correct :) That's why
>> > adjust_managed_page_count() adjusts that as well.
>> > 
>> > __free_pages_core() only adjusts managed page count, because it assumes
>> > totalram has already been adjusted early during boot.
>> > 
>> > The reason we have this split for now, I think, is because of subsystems that
>> > call totalram_pages() during init.
>> > 
>> > So the "why" question remains, because this change has the potential to break
>> > other stuff.
>> > 
>> 
>> Thanks, I didn't notice this.
>
>I think having your cleanup would be very nice, as I have patches in the
>works that would benefit from being able to move the totalram update from
>memory hotplug code to __free_pages_core().
>

I got the same feeling.

>We'd have to make sure that no code relies on totalram being sane/fixed
>during boot for the initial memory. I think right now we might have such
>code.
>

One concern is totalram would change when hotplug is enabled. That sounds
those codes should do some re-calculation after totalram changes?

>Further, we currently require only a single atomic RMW instruction to adjust
>totalram during boot, moving it to __free_pages_core() would imply more
>atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>a big difference.
>

I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
with defer_init only low zone's memory is initialized during boot. Per my
understanding, low zone's memory is 4GB for x86. So the extra calculation is
4GB / 2MB = 2K.
>-- 
>Cheers,
>
>David / dhildenb
David Hildenbrand June 3, 2024, 8:43 p.m. UTC | #4
On 03.06.24 22:01, Wei Yang wrote:
> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> On 02.06.24 02:58, Wei Yang wrote:
>>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>>>> On 01.06.24 17:32, David Hildenbrand wrote:
>>>>> On 01.06.24 15:34, Wei Yang wrote:
>>>>>> Total memory represents pages managed by buddy system.
>>>>>
>>>>> No, that's managed pages.
>>>>>
>>>>>> After the
>>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>>>> being managed.
>>>>>>
>>>>>
>>>>> I recall one reason that is done, so other subsystem know the total
>>>>> memory size even before deferred init is done.
>>>>>
>>>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>>>> even at this moment only early initialized pages are freed to buddy
>>>>>> system. This means the total memory at this moment is not correct.
>>>>>>
>>>>>> Let's increase it when pages are freed to buddy system.
>>>>>
>>>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>>>
>>>> Correction: your statement was correct :) That's why
>>>> adjust_managed_page_count() adjusts that as well.
>>>>
>>>> __free_pages_core() only adjusts managed page count, because it assumes
>>>> totalram has already been adjusted early during boot.
>>>>
>>>> The reason we have this split for now, I think, is because of subsystems that
>>>> call totalram_pages() during init.
>>>>
>>>> So the "why" question remains, because this change has the potential to break
>>>> other stuff.
>>>>
>>>
>>> Thanks, I didn't notice this.
>>
>> I think having your cleanup would be very nice, as I have patches in the
>> works that would benefit from being able to move the totalram update from
>> memory hotplug code to __free_pages_core().
>>
> 
> I got the same feeling.
> 
>> We'd have to make sure that no code relies on totalram being sane/fixed
>> during boot for the initial memory. I think right now we might have such
>> code.
>>
> 
> One concern is totalram would change when hotplug is enabled. That sounds
> those codes should do some re-calculation after totalram changes?

We don't have such code in place -- there were discussions regarding 
that recently.

It would be reasonable to take a look at all totalram_pages() users and 
determine if they could be affected by deferring updating it.

At least page_alloc_init_late()->deferred_init_memmap() happens before 
do_basic_setup()->do_initcalls(), which is good.

So maybe it's not a big concern and this separate totalram pages 
accounting is much rather some legacy leftover.

> 
>> Further, we currently require only a single atomic RMW instruction to adjust
>> totalram during boot, moving it to __free_pages_core() would imply more
>> atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> a big difference.
>>
> 
> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
> with defer_init only low zone's memory is initialized during boot. Per my
> understanding, low zone's memory is 4GB for x86. So the extra calculation is
> 4GB / 2MB = 2K.

Well, for all deferred-initialized memory you would now also require 
these -- or if deferred-init would be disabled. Sounds like an 
interesting measurement if that would be measurable at all.
Wei Yang June 5, 2024, 10:44 p.m. UTC | #5
On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>On 03.06.24 22:01, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > On 02.06.24 02:58, Wei Yang wrote:
>> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > Total memory represents pages managed by buddy system.
>> > > > > 
>> > > > > No, that's managed pages.
>> > > > > 
>> > > > > > After the
>> > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > > > being managed.
>> > > > > > 
>> > > > > 
>> > > > > I recall one reason that is done, so other subsystem know the total
>> > > > > memory size even before deferred init is done.
>> > > > > 
>> > > > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > > > even at this moment only early initialized pages are freed to buddy
>> > > > > > system. This means the total memory at this moment is not correct.
>> > > > > > 
>> > > > > > Let's increase it when pages are freed to buddy system.
>> > > > > 
>> > > > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > > > 
>> > > > Correction: your statement was correct :) That's why
>> > > > adjust_managed_page_count() adjusts that as well.
>> > > > 
>> > > > __free_pages_core() only adjusts managed page count, because it assumes
>> > > > totalram has already been adjusted early during boot.
>> > > > 
>> > > > The reason we have this split for now, I think, is because of subsystems that
>> > > > call totalram_pages() during init.
>> > > > 
>> > > > So the "why" question remains, because this change has the potential to break
>> > > > other stuff.
>> > > > 
>> > > 
>> > > Thanks, I didn't notice this.
>> > 
>> > I think having your cleanup would be very nice, as I have patches in the
>> > works that would benefit from being able to move the totalram update from
>> > memory hotplug code to __free_pages_core().
>> > 
>> 
>> I got the same feeling.
>> 
>> > We'd have to make sure that no code relies on totalram being sane/fixed
>> > during boot for the initial memory. I think right now we might have such
>> > code.
>> > 
>> 
>> One concern is totalram would change when hotplug is enabled. That sounds
>> those codes should do some re-calculation after totalram changes?
>
>We don't have such code in place -- there were discussions regarding that
>recently.
>
>It would be reasonable to take a look at all totalram_pages() users and
>determine if they could be affected by deferring updating it.
>
>At least page_alloc_init_late()->deferred_init_memmap() happens before
>do_basic_setup()->do_initcalls(), which is good.
>

But deferred_init_memmap() will spawn threads to do the work. I am afraid
do_initcalls() won't wait for the completion of defer_init? Do I miss
something?

>So maybe it's not a big concern and this separate totalram pages accounting
>is much rather some legacy leftover.
>
>> 
>> > Further, we currently require only a single atomic RMW instruction to adjust
>> > totalram during boot, moving it to __free_pages_core() would imply more
>> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> > a big difference.
>> > 
>> 
>> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
>> with defer_init only low zone's memory is initialized during boot. Per my
>> understanding, low zone's memory is 4GB for x86. So the extra calculation is
>> 4GB / 2MB = 2K.
>
>Well, for all deferred-initialized memory you would now also require these --
>or if deferred-init would be disabled. Sounds like an interesting measurement
>if that would be measurable at all.
>
>-- 
>Cheers,
>
>David / dhildenb
David Hildenbrand June 6, 2024, 7:14 a.m. UTC | #6
On 06.06.24 00:44, Wei Yang wrote:
> On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>> On 03.06.24 22:01, Wei Yang wrote:
>>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>>>> On 02.06.24 02:58, Wei Yang wrote:
>>>>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>>>>>> On 01.06.24 17:32, David Hildenbrand wrote:
>>>>>>> On 01.06.24 15:34, Wei Yang wrote:
>>>>>>>> Total memory represents pages managed by buddy system.
>>>>>>>
>>>>>>> No, that's managed pages.
>>>>>>>
>>>>>>>> After the
>>>>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>>>>>> being managed.
>>>>>>>>
>>>>>>>
>>>>>>> I recall one reason that is done, so other subsystem know the total
>>>>>>> memory size even before deferred init is done.
>>>>>>>
>>>>>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>>>>>> even at this moment only early initialized pages are freed to buddy
>>>>>>>> system. This means the total memory at this moment is not correct.
>>>>>>>>
>>>>>>>> Let's increase it when pages are freed to buddy system.
>>>>>>>
>>>>>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>>>>>
>>>>>> Correction: your statement was correct :) That's why
>>>>>> adjust_managed_page_count() adjusts that as well.
>>>>>>
>>>>>> __free_pages_core() only adjusts managed page count, because it assumes
>>>>>> totalram has already been adjusted early during boot.
>>>>>>
>>>>>> The reason we have this split for now, I think, is because of subsystems that
>>>>>> call totalram_pages() during init.
>>>>>>
>>>>>> So the "why" question remains, because this change has the potential to break
>>>>>> other stuff.
>>>>>>
>>>>>
>>>>> Thanks, I didn't notice this.
>>>>
>>>> I think having your cleanup would be very nice, as I have patches in the
>>>> works that would benefit from being able to move the totalram update from
>>>> memory hotplug code to __free_pages_core().
>>>>
>>>
>>> I got the same feeling.
>>>
>>>> We'd have to make sure that no code relies on totalram being sane/fixed
>>>> during boot for the initial memory. I think right now we might have such
>>>> code.
>>>>
>>>
>>> One concern is totalram would change when hotplug is enabled. That sounds
>>> those codes should do some re-calculation after totalram changes?
>>
>> We don't have such code in place -- there were discussions regarding that
>> recently.
>>
>> It would be reasonable to take a look at all totalram_pages() users and
>> determine if they could be affected by deferring updating it.
>>
>> At least page_alloc_init_late()->deferred_init_memmap() happens before
>> do_basic_setup()->do_initcalls(), which is good.
>>
> 
> But deferred_init_memmap() will spawn threads to do the work. I am afraid
> do_initcalls() won't wait for the completion of defer_init? Do I miss
> something?

Don't we wait for them to finish?

/* Block until all are initialised */
wait_for_completion(&pgdat_init_all_done_comp);
Wei Yang June 6, 2024, 11:25 p.m. UTC | #7
On Thu, Jun 06, 2024 at 09:14:25AM +0200, David Hildenbrand wrote:
>On 06.06.24 00:44, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>> > On 03.06.24 22:01, Wei Yang wrote:
>> > > On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > > > On 02.06.24 02:58, Wei Yang wrote:
>> > > > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > > > Total memory represents pages managed by buddy system.
>> > > > > > > 
>> > > > > > > No, that's managed pages.
>> > > > > > > 
>> > > > > > > > After the
>> > > > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > > > > > being managed.
>> > > > > > > > 
>> > > > > > > 
>> > > > > > > I recall one reason that is done, so other subsystem know the total
>> > > > > > > memory size even before deferred init is done.
>> > > > > > > 
>> > > > > > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > > > > > even at this moment only early initialized pages are freed to buddy
>> > > > > > > > system. This means the total memory at this moment is not correct.
>> > > > > > > > 
>> > > > > > > > Let's increase it when pages are freed to buddy system.
>> > > > > > > 
>> > > > > > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > > > > > 
>> > > > > > Correction: your statement was correct :) That's why
>> > > > > > adjust_managed_page_count() adjusts that as well.
>> > > > > > 
>> > > > > > __free_pages_core() only adjusts managed page count, because it assumes
>> > > > > > totalram has already been adjusted early during boot.
>> > > > > > 
>> > > > > > The reason we have this split for now, I think, is because of subsystems that
>> > > > > > call totalram_pages() during init.
>> > > > > > 
>> > > > > > So the "why" question remains, because this change has the potential to break
>> > > > > > other stuff.
>> > > > > > 
>> > > > > 
>> > > > > Thanks, I didn't notice this.
>> > > > 
>> > > > I think having your cleanup would be very nice, as I have patches in the
>> > > > works that would benefit from being able to move the totalram update from
>> > > > memory hotplug code to __free_pages_core().
>> > > > 
>> > > 
>> > > I got the same feeling.
>> > > 
>> > > > We'd have to make sure that no code relies on totalram being sane/fixed
>> > > > during boot for the initial memory. I think right now we might have such
>> > > > code.
>> > > > 
>> > > 
>> > > One concern is totalram would change when hotplug is enabled. That sounds
>> > > those codes should do some re-calculation after totalram changes?
>> > 
>> > We don't have such code in place -- there were discussions regarding that
>> > recently.
>> > 
>> > It would be reasonable to take a look at all totalram_pages() users and
>> > determine if they could be affected by deferring updating it.
>> > 
>> > At least page_alloc_init_late()->deferred_init_memmap() happens before
>> > do_basic_setup()->do_initcalls(), which is good.
>> > 
>> 
>> But deferred_init_memmap() will spawn threads to do the work. I am afraid
>> do_initcalls() won't wait for the completion of defer_init? Do I miss
>> something?
>
>Don't we wait for them to finish?
>
>/* Block until all are initialised */
>wait_for_completion(&pgdat_init_all_done_comp);

You are right. I missed this.

So we still need to wait for mm fully initialized and then to initialise
others. I thought everything would start in parallel.

>
>-- 
>Cheers,
>
>David / dhildenb
Wei Yang June 7, 2024, 1:50 a.m. UTC | #8
On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>On 03.06.24 22:01, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > On 02.06.24 02:58, Wei Yang wrote:
>> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > Total memory represents pages managed by buddy system.
>> > > > > 
>> > > > > No, that's managed pages.
>> > > > > 
[...]
>> > > > So the "why" question remains, because this change has the potential to break
>> > > > other stuff.
>> > > > 
>> > > 
>> > > Thanks, I didn't notice this.
>> > 
>> > I think having your cleanup would be very nice, as I have patches in the
>> > works that would benefit from being able to move the totalram update from
>> > memory hotplug code to __free_pages_core().
>> > 
>> 
>> I got the same feeling.
>> 
>> > We'd have to make sure that no code relies on totalram being sane/fixed
>> > during boot for the initial memory. I think right now we might have such
>> > code.
>> > 
>> 
>> One concern is totalram would change when hotplug is enabled. That sounds
>> those codes should do some re-calculation after totalram changes?
>
>We don't have such code in place -- there were discussions regarding that
>recently.
>
>It would be reasonable to take a look at all totalram_pages() users and
>determine if they could be affected by deferring updating it.
>
>At least page_alloc_init_late()->deferred_init_memmap() happens before
>do_basic_setup()->do_initcalls(), which is good.
>
>So maybe it's not a big concern and this separate totalram pages accounting
>is much rather some legacy leftover.
>

I grepped the whole tree and found following 4 points may need to adjust. Hope
I don't miss one.

   * arch/s390/mm/init.c:73:	while (order > 2 && (totalram_pages() >> 10) < (1UL << order))
   * arch/um/kernel/mem.c:76:	max_low_pfn = totalram_pages();
   * kernel/fork.c:1002:	unsigned long nr_pages = totalram_pages();
   * mm/mm_init.c:2689:		K(physpages - totalram_pages() - totalcma_pages),

* arch/s390/mm/init.c:73:	while (order > 2 && (totalram_pages() >> 10) < (1UL << order))

    mem_init
        memblock_free_all
        setup_zero_pages

This calculate the size of empty_zero_page. Not sure if we can postpone this
function call after defer init.

* arch/um/kernel/mem.c:76:	max_low_pfn = totalram_pages();

    mem_init
        memblock_free_all
        max_low_pfn = totalram_pages

The usage seems not correct. totalram_pages return number of pages, but here
it seems need the pfn value.

* kernel/fork.c:1002:	unsigned long nr_pages = totalram_pages();

    start_kernel
        fork_init
            set_max_threads

Not sure it would be fine to set rlimit again after defer init.

* mm/mm_init.c:2689:		K(physpages - totalram_pages() - totalcma_pages),

Per my understanding, we can print the info after defer init.

>> 
>> > Further, we currently require only a single atomic RMW instruction to adjust
>> > totalram during boot, moving it to __free_pages_core() would imply more
>> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> > a big difference.
>> > 
>> 
>> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
>> with defer_init only low zone's memory is initialized during boot. Per my
>> understanding, low zone's memory is 4GB for x86. So the extra calculation is
>> 4GB / 2MB = 2K.
>
>Well, for all deferred-initialized memory you would now also require these --
>or if deferred-init would be disabled. Sounds like an interesting measurement
>if that would be measurable at all.
>
>-- 
>Cheers,
>
>David / dhildenb
kernel test robot June 11, 2024, 8:48 a.m. UTC | #9
Hello,

kernel test robot noticed "xfstests.generic.224.fail" on:

commit: a4722639595dc3fa15c793a5b00bb4b023a34288 ("[PATCH] mm: increase totalram_pages on freeing to buddy system")
url: https://github.com/intel-lab-lkp/linux/commits/Wei-Yang/mm-increase-totalram_pages-on-freeing-to-buddy-system/20240601-213533
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20240601133402.2675-1-richard.weiyang@gmail.com/
patch subject: [PATCH] mm: increase totalram_pages on freeing to buddy system

in testcase: xfstests
version: xfstests-x86_64-98379713-1_20240603
with following parameters:

	disk: 4HDD
	fs: udf
	test: generic-224



compiler: gcc-13
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202406111644.931d379c-oliver.sang@intel.com

2024-06-09 06:39:08 export TEST_DIR=/fs/sda1
2024-06-09 06:39:08 export TEST_DEV=/dev/sda1
2024-06-09 06:39:08 export FSTYP=udf
2024-06-09 06:39:08 export SCRATCH_MNT=/fs/scratch
2024-06-09 06:39:08 mkdir /fs/scratch -p
2024-06-09 06:39:08 export DISABLE_UDF_TEST=1
2024-06-09 06:39:08 export SCRATCH_DEV=/dev/sda4
2024-06-09 06:39:08 echo generic/224
2024-06-09 06:39:08 ./check generic/224
FSTYP         -- udf
PLATFORM      -- Linux/x86_64 lkp-skl-d02 6.10.0-rc1-00201-ga4722639595d #1 SMP PREEMPT_DYNAMIC Fri Jun  7 19:31:50 CST 2024
MKFS_OPTIONS  -- /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /fs/scratch

generic/224       [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//generic/224.out.bad)
    --- tests/generic/224.out	2024-06-03 12:10:00.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//generic/224.out.bad	2024-06-09 06:41:46.383320015 +0000
    @@ -1,2 +1,34 @@
     QA output created by 224
    -*** Silence is golden ***
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/224.out /lkp/benchmarks/xfstests/results//generic/224.out.bad'  to see the entire diff)
Ran: generic/224
Failures: generic/224
Failed 1 of 1 tests




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240611/202406111644.931d379c-oliver.sang@intel.com
diff mbox series

Patch

diff --git a/mm/memblock.c b/mm/memblock.c
index 98d25689cf10..51dd143eabb4 100644
--- a/mm/memblock.c
+++ b/mm/memblock.c
@@ -1716,10 +1716,8 @@  void __init memblock_free_late(phys_addr_t base, phys_addr_t size)
 	cursor = PFN_UP(base);
 	end = PFN_DOWN(base + size);
 
-	for (; cursor < end; cursor++) {
+	for (; cursor < end; cursor++)
 		memblock_free_pages(pfn_to_page(cursor), cursor, 0);
-		totalram_pages_inc();
-	}
 }
 
 /*
@@ -2128,7 +2126,7 @@  static void __init __free_pages_memory(unsigned long start, unsigned long end)
 	}
 }
 
-static unsigned long __init __free_memory_core(phys_addr_t start,
+static void __init __free_memory_core(phys_addr_t start,
 				 phys_addr_t end)
 {
 	unsigned long start_pfn = PFN_UP(start);
@@ -2136,11 +2134,9 @@  static unsigned long __init __free_memory_core(phys_addr_t start,
 				      PFN_DOWN(end), max_low_pfn);
 
 	if (start_pfn >= end_pfn)
-		return 0;
+		return;
 
 	__free_pages_memory(start_pfn, end_pfn);
-
-	return end_pfn - start_pfn;
 }
 
 static void __init memmap_init_reserved_pages(void)
@@ -2182,9 +2178,8 @@  static void __init memmap_init_reserved_pages(void)
 	}
 }
 
-static unsigned long __init free_low_memory_core_early(void)
+static void __init free_low_memory_core_early(void)
 {
-	unsigned long count = 0;
 	phys_addr_t start, end;
 	u64 i;
 
@@ -2199,9 +2194,7 @@  static unsigned long __init free_low_memory_core_early(void)
 	 */
 	for_each_free_mem_range(i, NUMA_NO_NODE, MEMBLOCK_NONE, &start, &end,
 				NULL)
-		count += __free_memory_core(start, end);
-
-	return count;
+		__free_memory_core(start, end);
 }
 
 static int reset_managed_pages_done __initdata;
@@ -2232,13 +2225,9 @@  void __init reset_all_zones_managed_pages(void)
  */
 void __init memblock_free_all(void)
 {
-	unsigned long pages;
-
 	free_unused_memmap();
 	reset_all_zones_managed_pages();
-
-	pages = free_low_memory_core_early();
-	totalram_pages_add(pages);
+	free_low_memory_core_early();
 }
 
 #if defined(CONFIG_DEBUG_FS) && defined(CONFIG_ARCH_KEEP_MEMBLOCK)
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index 431b1f6753c0..84bb4b0205d7 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -637,7 +637,6 @@  void generic_online_page(struct page *page, unsigned int order)
 	 */
 	debug_pagealloc_map_pages(page, 1 << order);
 	__free_pages_core(page, order);
-	totalram_pages_add(1UL << order);
 }
 EXPORT_SYMBOL_GPL(generic_online_page);
 
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 00954d8c9c30..33cd15efac9e 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -1234,6 +1234,7 @@  void __free_pages_core(struct page *page, unsigned int order)
 	set_page_count(p, 0);
 
 	atomic_long_add(nr_pages, &page_zone(page)->managed_pages);
+	totalram_pages_add(nr_pages);
 
 	if (page_contains_unaccepted(page, order)) {
 		if (order == MAX_PAGE_ORDER && __free_unaccepted(page))