mm: increase totalram_pages on freeing to buddy system

Message ID	20240601133402.2675-1-richard.weiyang@gmail.com (mailing list archive)
State	New
Headers	show Return-Path: <owner-linux-mm@kvack.org> From: Wei Yang <richard.weiyang@gmail.com> To: rppt@kernel.org, akpm@linux-foundation.org, david@redhat.com, osalvador@suse.de Cc: linux-mm@kvack.org, Wei Yang <richard.weiyang@gmail.com> Subject: [PATCH] mm: increase totalram_pages on freeing to buddy system Date: Sat, 1 Jun 2024 13:34:02 +0000 Message-Id: <20240601133402.2675-1-richard.weiyang@gmail.com> Sender: owner-linux-mm@kvack.org Precedence: bulk
Series	mm: increase totalram_pages on freeing to buddy system \| expand mm: increase totalram_pages on freeing to buddy system

Wei Yang June 1, 2024, 1:34 p.m. UTC

Total memory represents pages managed by buddy system. After the
introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
being managed.

free_low_memory_core_early() returns number of pages for all free pages,
even at this moment only early initialized pages are freed to buddy
system. This means the total memory at this moment is not correct.

Let's increase it when pages are freed to buddy system.

Signed-off-by: Wei Yang <richard.weiyang@gmail.com>
---
 mm/memblock.c       | 23 ++++++-----------------
 mm/memory_hotplug.c |  1 -
 mm/page_alloc.c     |  1 +
 3 files changed, 7 insertions(+), 18 deletions(-)

Wei Yang June 2, 2024, 12:58 a.m. UTC | #1

On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>On 01.06.24 17:32, David Hildenbrand wrote:
>> On 01.06.24 15:34, Wei Yang wrote:
>> > Total memory represents pages managed by buddy system.
>> 
>> No, that's managed pages.
>> 
>> > After the
>> > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > being managed.
>> > 
>> 
>> I recall one reason that is done, so other subsystem know the total
>> memory size even before deferred init is done.
>> 
>> > free_low_memory_core_early() returns number of pages for all free pages,
>> > even at this moment only early initialized pages are freed to buddy
>> > system. This means the total memory at this moment is not correct.
>> > 
>> > Let's increase it when pages are freed to buddy system.
>> 
>> I'm missing the "why", and the very first sentence of this patch is wrong.
>
>Correction: your statement was correct :) That's why
>adjust_managed_page_count() adjusts that as well.
>
>__free_pages_core() only adjusts managed page count, because it assumes
>totalram has already been adjusted early during boot.
>
>The reason we have this split for now, I think, is because of subsystems that
>call totalram_pages() during init.
>
>So the "why" question remains, because this change has the potential to break
>other stuff.
>

Thanks, I didn't notice this.

>-- 
>Cheers,
>
>David / dhildenb

David Hildenbrand June 3, 2024, 8:55 a.m. UTC | #2

On 02.06.24 02:58, Wei Yang wrote:
> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> On 01.06.24 17:32, David Hildenbrand wrote:
>>> On 01.06.24 15:34, Wei Yang wrote:
>>>> Total memory represents pages managed by buddy system.
>>>
>>> No, that's managed pages.
>>>
>>>> After the
>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>> being managed.
>>>>
>>>
>>> I recall one reason that is done, so other subsystem know the total
>>> memory size even before deferred init is done.
>>>
>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>> even at this moment only early initialized pages are freed to buddy
>>>> system. This means the total memory at this moment is not correct.
>>>>
>>>> Let's increase it when pages are freed to buddy system.
>>>
>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>
>> Correction: your statement was correct :) That's why
>> adjust_managed_page_count() adjusts that as well.
>>
>> __free_pages_core() only adjusts managed page count, because it assumes
>> totalram has already been adjusted early during boot.
>>
>> The reason we have this split for now, I think, is because of subsystems that
>> call totalram_pages() during init.
>>
>> So the "why" question remains, because this change has the potential to break
>> other stuff.
>>
> 
> Thanks, I didn't notice this.

I think having your cleanup would be very nice, as I have patches in the 
works that would benefit from being able to move the totalram update 
from memory hotplug code to __free_pages_core().

We'd have to make sure that no code relies on totalram being sane/fixed 
during boot for the initial memory. I think right now we might have such 
code.

Further, we currently require only a single atomic RMW instruction to 
adjust totalram during boot, moving it to __free_pages_core() would 
imply more atomics: but usually only one per MAX_ORDER page, so I doubt 
this would make a big difference.

Wei Yang June 3, 2024, 8:01 p.m. UTC | #3

On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>On 02.06.24 02:58, Wei Yang wrote:
>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > Total memory represents pages managed by buddy system.
>> > > 
>> > > No, that's managed pages.
>> > > 
>> > > > After the
>> > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > being managed.
>> > > > 
>> > > 
>> > > I recall one reason that is done, so other subsystem know the total
>> > > memory size even before deferred init is done.
>> > > 
>> > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > even at this moment only early initialized pages are freed to buddy
>> > > > system. This means the total memory at this moment is not correct.
>> > > > 
>> > > > Let's increase it when pages are freed to buddy system.
>> > > 
>> > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > 
>> > Correction: your statement was correct :) That's why
>> > adjust_managed_page_count() adjusts that as well.
>> > 
>> > __free_pages_core() only adjusts managed page count, because it assumes
>> > totalram has already been adjusted early during boot.
>> > 
>> > The reason we have this split for now, I think, is because of subsystems that
>> > call totalram_pages() during init.
>> > 
>> > So the "why" question remains, because this change has the potential to break
>> > other stuff.
>> > 
>> 
>> Thanks, I didn't notice this.
>
>I think having your cleanup would be very nice, as I have patches in the
>works that would benefit from being able to move the totalram update from
>memory hotplug code to __free_pages_core().
>

I got the same feeling.

>We'd have to make sure that no code relies on totalram being sane/fixed
>during boot for the initial memory. I think right now we might have such
>code.
>

One concern is totalram would change when hotplug is enabled. That sounds
those codes should do some re-calculation after totalram changes?

>Further, we currently require only a single atomic RMW instruction to adjust
>totalram during boot, moving it to __free_pages_core() would imply more
>atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>a big difference.
>

I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
with defer_init only low zone's memory is initialized during boot. Per my
understanding, low zone's memory is 4GB for x86. So the extra calculation is
4GB / 2MB = 2K.
>-- 
>Cheers,
>
>David / dhildenb

David Hildenbrand June 3, 2024, 8:43 p.m. UTC | #4

On 03.06.24 22:01, Wei Yang wrote:
> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> On 02.06.24 02:58, Wei Yang wrote:
>>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>>>> On 01.06.24 17:32, David Hildenbrand wrote:
>>>>> On 01.06.24 15:34, Wei Yang wrote:
>>>>>> Total memory represents pages managed by buddy system.
>>>>>
>>>>> No, that's managed pages.
>>>>>
>>>>>> After the
>>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>>>> being managed.
>>>>>>
>>>>>
>>>>> I recall one reason that is done, so other subsystem know the total
>>>>> memory size even before deferred init is done.
>>>>>
>>>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>>>> even at this moment only early initialized pages are freed to buddy
>>>>>> system. This means the total memory at this moment is not correct.
>>>>>>
>>>>>> Let's increase it when pages are freed to buddy system.
>>>>>
>>>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>>>
>>>> Correction: your statement was correct :) That's why
>>>> adjust_managed_page_count() adjusts that as well.
>>>>
>>>> __free_pages_core() only adjusts managed page count, because it assumes
>>>> totalram has already been adjusted early during boot.
>>>>
>>>> The reason we have this split for now, I think, is because of subsystems that
>>>> call totalram_pages() during init.
>>>>
>>>> So the "why" question remains, because this change has the potential to break
>>>> other stuff.
>>>>
>>>
>>> Thanks, I didn't notice this.
>>
>> I think having your cleanup would be very nice, as I have patches in the
>> works that would benefit from being able to move the totalram update from
>> memory hotplug code to __free_pages_core().
>>
> 
> I got the same feeling.
> 
>> We'd have to make sure that no code relies on totalram being sane/fixed
>> during boot for the initial memory. I think right now we might have such
>> code.
>>
> 
> One concern is totalram would change when hotplug is enabled. That sounds
> those codes should do some re-calculation after totalram changes?

We don't have such code in place -- there were discussions regarding 
that recently.

It would be reasonable to take a look at all totalram_pages() users and 
determine if they could be affected by deferring updating it.

At least page_alloc_init_late()->deferred_init_memmap() happens before 
do_basic_setup()->do_initcalls(), which is good.

So maybe it's not a big concern and this separate totalram pages 
accounting is much rather some legacy leftover.

> 
>> Further, we currently require only a single atomic RMW instruction to adjust
>> totalram during boot, moving it to __free_pages_core() would imply more
>> atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> a big difference.
>>
> 
> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
> with defer_init only low zone's memory is initialized during boot. Per my
> understanding, low zone's memory is 4GB for x86. So the extra calculation is
> 4GB / 2MB = 2K.

Well, for all deferred-initialized memory you would now also require 
these -- or if deferred-init would be disabled. Sounds like an 
interesting measurement if that would be measurable at all.

Wei Yang June 5, 2024, 10:44 p.m. UTC | #5

On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>On 03.06.24 22:01, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > On 02.06.24 02:58, Wei Yang wrote:
>> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > Total memory represents pages managed by buddy system.
>> > > > > 
>> > > > > No, that's managed pages.
>> > > > > 
>> > > > > > After the
>> > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > > > being managed.
>> > > > > > 
>> > > > > 
>> > > > > I recall one reason that is done, so other subsystem know the total
>> > > > > memory size even before deferred init is done.
>> > > > > 
>> > > > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > > > even at this moment only early initialized pages are freed to buddy
>> > > > > > system. This means the total memory at this moment is not correct.
>> > > > > > 
>> > > > > > Let's increase it when pages are freed to buddy system.
>> > > > > 
>> > > > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > > > 
>> > > > Correction: your statement was correct :) That's why
>> > > > adjust_managed_page_count() adjusts that as well.
>> > > > 
>> > > > __free_pages_core() only adjusts managed page count, because it assumes
>> > > > totalram has already been adjusted early during boot.
>> > > > 
>> > > > The reason we have this split for now, I think, is because of subsystems that
>> > > > call totalram_pages() during init.
>> > > > 
>> > > > So the "why" question remains, because this change has the potential to break
>> > > > other stuff.
>> > > > 
>> > > 
>> > > Thanks, I didn't notice this.
>> > 
>> > I think having your cleanup would be very nice, as I have patches in the
>> > works that would benefit from being able to move the totalram update from
>> > memory hotplug code to __free_pages_core().
>> > 
>> 
>> I got the same feeling.
>> 
>> > We'd have to make sure that no code relies on totalram being sane/fixed
>> > during boot for the initial memory. I think right now we might have such
>> > code.
>> > 
>> 
>> One concern is totalram would change when hotplug is enabled. That sounds
>> those codes should do some re-calculation after totalram changes?
>
>We don't have such code in place -- there were discussions regarding that
>recently.
>
>It would be reasonable to take a look at all totalram_pages() users and
>determine if they could be affected by deferring updating it.
>
>At least page_alloc_init_late()->deferred_init_memmap() happens before
>do_basic_setup()->do_initcalls(), which is good.
>

But deferred_init_memmap() will spawn threads to do the work. I am afraid
do_initcalls() won't wait for the completion of defer_init? Do I miss
something?

>So maybe it's not a big concern and this separate totalram pages accounting
>is much rather some legacy leftover.
>
>> 
>> > Further, we currently require only a single atomic RMW instruction to adjust
>> > totalram during boot, moving it to __free_pages_core() would imply more
>> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> > a big difference.
>> > 
>> 
>> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
>> with defer_init only low zone's memory is initialized during boot. Per my
>> understanding, low zone's memory is 4GB for x86. So the extra calculation is
>> 4GB / 2MB = 2K.
>
>Well, for all deferred-initialized memory you would now also require these --
>or if deferred-init would be disabled. Sounds like an interesting measurement
>if that would be measurable at all.
>
>-- 
>Cheers,
>
>David / dhildenb

David Hildenbrand June 6, 2024, 7:14 a.m. UTC | #6

On 06.06.24 00:44, Wei Yang wrote:
> On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>> On 03.06.24 22:01, Wei Yang wrote:
>>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>>>> On 02.06.24 02:58, Wei Yang wrote:
>>>>> On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>>>>>> On 01.06.24 17:32, David Hildenbrand wrote:
>>>>>>> On 01.06.24 15:34, Wei Yang wrote:
>>>>>>>> Total memory represents pages managed by buddy system.
>>>>>>>
>>>>>>> No, that's managed pages.
>>>>>>>
>>>>>>>> After the
>>>>>>>> introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>>>>>>>> being managed.
>>>>>>>>
>>>>>>>
>>>>>>> I recall one reason that is done, so other subsystem know the total
>>>>>>> memory size even before deferred init is done.
>>>>>>>
>>>>>>>> free_low_memory_core_early() returns number of pages for all free pages,
>>>>>>>> even at this moment only early initialized pages are freed to buddy
>>>>>>>> system. This means the total memory at this moment is not correct.
>>>>>>>>
>>>>>>>> Let's increase it when pages are freed to buddy system.
>>>>>>>
>>>>>>> I'm missing the "why", and the very first sentence of this patch is wrong.
>>>>>>
>>>>>> Correction: your statement was correct :) That's why
>>>>>> adjust_managed_page_count() adjusts that as well.
>>>>>>
>>>>>> __free_pages_core() only adjusts managed page count, because it assumes
>>>>>> totalram has already been adjusted early during boot.
>>>>>>
>>>>>> The reason we have this split for now, I think, is because of subsystems that
>>>>>> call totalram_pages() during init.
>>>>>>
>>>>>> So the "why" question remains, because this change has the potential to break
>>>>>> other stuff.
>>>>>>
>>>>>
>>>>> Thanks, I didn't notice this.
>>>>
>>>> I think having your cleanup would be very nice, as I have patches in the
>>>> works that would benefit from being able to move the totalram update from
>>>> memory hotplug code to __free_pages_core().
>>>>
>>>
>>> I got the same feeling.
>>>
>>>> We'd have to make sure that no code relies on totalram being sane/fixed
>>>> during boot for the initial memory. I think right now we might have such
>>>> code.
>>>>
>>>
>>> One concern is totalram would change when hotplug is enabled. That sounds
>>> those codes should do some re-calculation after totalram changes?
>>
>> We don't have such code in place -- there were discussions regarding that
>> recently.
>>
>> It would be reasonable to take a look at all totalram_pages() users and
>> determine if they could be affected by deferring updating it.
>>
>> At least page_alloc_init_late()->deferred_init_memmap() happens before
>> do_basic_setup()->do_initcalls(), which is good.
>>
> 
> But deferred_init_memmap() will spawn threads to do the work. I am afraid
> do_initcalls() won't wait for the completion of defer_init? Do I miss
> something?

Don't we wait for them to finish?

/* Block until all are initialised */
wait_for_completion(&pgdat_init_all_done_comp);

Wei Yang June 6, 2024, 11:25 p.m. UTC | #7

On Thu, Jun 06, 2024 at 09:14:25AM +0200, David Hildenbrand wrote:
>On 06.06.24 00:44, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>> > On 03.06.24 22:01, Wei Yang wrote:
>> > > On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > > > On 02.06.24 02:58, Wei Yang wrote:
>> > > > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > > > Total memory represents pages managed by buddy system.
>> > > > > > > 
>> > > > > > > No, that's managed pages.
>> > > > > > > 
>> > > > > > > > After the
>> > > > > > > > introduction of DEFERRED_STRUCT_PAGE_INIT, it may count the pages before
>> > > > > > > > being managed.
>> > > > > > > > 
>> > > > > > > 
>> > > > > > > I recall one reason that is done, so other subsystem know the total
>> > > > > > > memory size even before deferred init is done.
>> > > > > > > 
>> > > > > > > > free_low_memory_core_early() returns number of pages for all free pages,
>> > > > > > > > even at this moment only early initialized pages are freed to buddy
>> > > > > > > > system. This means the total memory at this moment is not correct.
>> > > > > > > > 
>> > > > > > > > Let's increase it when pages are freed to buddy system.
>> > > > > > > 
>> > > > > > > I'm missing the "why", and the very first sentence of this patch is wrong.
>> > > > > > 
>> > > > > > Correction: your statement was correct :) That's why
>> > > > > > adjust_managed_page_count() adjusts that as well.
>> > > > > > 
>> > > > > > __free_pages_core() only adjusts managed page count, because it assumes
>> > > > > > totalram has already been adjusted early during boot.
>> > > > > > 
>> > > > > > The reason we have this split for now, I think, is because of subsystems that
>> > > > > > call totalram_pages() during init.
>> > > > > > 
>> > > > > > So the "why" question remains, because this change has the potential to break
>> > > > > > other stuff.
>> > > > > > 
>> > > > > 
>> > > > > Thanks, I didn't notice this.
>> > > > 
>> > > > I think having your cleanup would be very nice, as I have patches in the
>> > > > works that would benefit from being able to move the totalram update from
>> > > > memory hotplug code to __free_pages_core().
>> > > > 
>> > > 
>> > > I got the same feeling.
>> > > 
>> > > > We'd have to make sure that no code relies on totalram being sane/fixed
>> > > > during boot for the initial memory. I think right now we might have such
>> > > > code.
>> > > > 
>> > > 
>> > > One concern is totalram would change when hotplug is enabled. That sounds
>> > > those codes should do some re-calculation after totalram changes?
>> > 
>> > We don't have such code in place -- there were discussions regarding that
>> > recently.
>> > 
>> > It would be reasonable to take a look at all totalram_pages() users and
>> > determine if they could be affected by deferring updating it.
>> > 
>> > At least page_alloc_init_late()->deferred_init_memmap() happens before
>> > do_basic_setup()->do_initcalls(), which is good.
>> > 
>> 
>> But deferred_init_memmap() will spawn threads to do the work. I am afraid
>> do_initcalls() won't wait for the completion of defer_init? Do I miss
>> something?
>
>Don't we wait for them to finish?
>
>/* Block until all are initialised */
>wait_for_completion(&pgdat_init_all_done_comp);

You are right. I missed this.

So we still need to wait for mm fully initialized and then to initialise
others. I thought everything would start in parallel.

>
>-- 
>Cheers,
>
>David / dhildenb

Wei Yang June 7, 2024, 1:50 a.m. UTC | #8

On Mon, Jun 03, 2024 at 10:43:51PM +0200, David Hildenbrand wrote:
>On 03.06.24 22:01, Wei Yang wrote:
>> On Mon, Jun 03, 2024 at 10:55:10AM +0200, David Hildenbrand wrote:
>> > On 02.06.24 02:58, Wei Yang wrote:
>> > > On Sat, Jun 01, 2024 at 06:15:33PM +0200, David Hildenbrand wrote:
>> > > > On 01.06.24 17:32, David Hildenbrand wrote:
>> > > > > On 01.06.24 15:34, Wei Yang wrote:
>> > > > > > Total memory represents pages managed by buddy system.
>> > > > > 
>> > > > > No, that's managed pages.
>> > > > > 
[...]
>> > > > So the "why" question remains, because this change has the potential to break
>> > > > other stuff.
>> > > > 
>> > > 
>> > > Thanks, I didn't notice this.
>> > 
>> > I think having your cleanup would be very nice, as I have patches in the
>> > works that would benefit from being able to move the totalram update from
>> > memory hotplug code to __free_pages_core().
>> > 
>> 
>> I got the same feeling.
>> 
>> > We'd have to make sure that no code relies on totalram being sane/fixed
>> > during boot for the initial memory. I think right now we might have such
>> > code.
>> > 
>> 
>> One concern is totalram would change when hotplug is enabled. That sounds
>> those codes should do some re-calculation after totalram changes?
>
>We don't have such code in place -- there were discussions regarding that
>recently.
>
>It would be reasonable to take a look at all totalram_pages() users and
>determine if they could be affected by deferring updating it.
>
>At least page_alloc_init_late()->deferred_init_memmap() happens before
>do_basic_setup()->do_initcalls(), which is good.
>
>So maybe it's not a big concern and this separate totalram pages accounting
>is much rather some legacy leftover.
>

I grepped the whole tree and found following 4 points may need to adjust. Hope
I don't miss one.

   * arch/s390/mm/init.c:73:	while (order > 2 && (totalram_pages() >> 10) < (1UL << order))
   * arch/um/kernel/mem.c:76:	max_low_pfn = totalram_pages();
   * kernel/fork.c:1002:	unsigned long nr_pages = totalram_pages();
   * mm/mm_init.c:2689:		K(physpages - totalram_pages() - totalcma_pages),

* arch/s390/mm/init.c:73:	while (order > 2 && (totalram_pages() >> 10) < (1UL << order))

    mem_init
        memblock_free_all
        setup_zero_pages

This calculate the size of empty_zero_page. Not sure if we can postpone this
function call after defer init.

* arch/um/kernel/mem.c:76:	max_low_pfn = totalram_pages();

    mem_init
        memblock_free_all
        max_low_pfn = totalram_pages

The usage seems not correct. totalram_pages return number of pages, but here
it seems need the pfn value.

* kernel/fork.c:1002:	unsigned long nr_pages = totalram_pages();

    start_kernel
        fork_init
            set_max_threads

Not sure it would be fine to set rlimit again after defer init.

* mm/mm_init.c:2689:		K(physpages - totalram_pages() - totalcma_pages),

Per my understanding, we can print the info after defer init.

>> 
>> > Further, we currently require only a single atomic RMW instruction to adjust
>> > totalram during boot, moving it to __free_pages_core() would imply more
>> > atomics: but usually only one per MAX_ORDER page, so I doubt this would make
>> > a big difference.
>> > 
>> 
>> I took a rough calculation on this.One MAX_ORDER page accounts for 2MB, and
>> with defer_init only low zone's memory is initialized during boot. Per my
>> understanding, low zone's memory is 4GB for x86. So the extra calculation is
>> 4GB / 2MB = 2K.
>
>Well, for all deferred-initialized memory you would now also require these --
>or if deferred-init would be disabled. Sounds like an interesting measurement
>if that would be measurable at all.
>
>-- 
>Cheers,
>
>David / dhildenb

kernel test robot June 11, 2024, 8:48 a.m. UTC | #9

Hello,

kernel test robot noticed "xfstests.generic.224.fail" on:

commit: a4722639595dc3fa15c793a5b00bb4b023a34288 ("[PATCH] mm: increase totalram_pages on freeing to buddy system")
url: https://github.com/intel-lab-lkp/linux/commits/Wei-Yang/mm-increase-totalram_pages-on-freeing-to-buddy-system/20240601-213533
base: https://git.kernel.org/cgit/linux/kernel/git/akpm/mm.git mm-everything
patch link: https://lore.kernel.org/all/20240601133402.2675-1-richard.weiyang@gmail.com/
patch subject: [PATCH] mm: increase totalram_pages on freeing to buddy system

in testcase: xfstests
version: xfstests-x86_64-98379713-1_20240603
with following parameters:

	disk: 4HDD
	fs: udf
	test: generic-224



compiler: gcc-13
test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory

(please refer to attached dmesg/kmsg for entire log/backtrace)




If you fix the issue in a separate patch/commit (i.e. not just a new version of
the same patch/commit), kindly add following tags
| Reported-by: kernel test robot <oliver.sang@intel.com>
| Closes: https://lore.kernel.org/oe-lkp/202406111644.931d379c-oliver.sang@intel.com

2024-06-09 06:39:08 export TEST_DIR=/fs/sda1
2024-06-09 06:39:08 export TEST_DEV=/dev/sda1
2024-06-09 06:39:08 export FSTYP=udf
2024-06-09 06:39:08 export SCRATCH_MNT=/fs/scratch
2024-06-09 06:39:08 mkdir /fs/scratch -p
2024-06-09 06:39:08 export DISABLE_UDF_TEST=1
2024-06-09 06:39:08 export SCRATCH_DEV=/dev/sda4
2024-06-09 06:39:08 echo generic/224
2024-06-09 06:39:08 ./check generic/224
FSTYP         -- udf
PLATFORM      -- Linux/x86_64 lkp-skl-d02 6.10.0-rc1-00201-ga4722639595d #1 SMP PREEMPT_DYNAMIC Fri Jun  7 19:31:50 CST 2024
MKFS_OPTIONS  -- /dev/sda4
MOUNT_OPTIONS -- /dev/sda4 /fs/scratch

generic/224       [failed, exit status 1]- output mismatch (see /lkp/benchmarks/xfstests/results//generic/224.out.bad)
    --- tests/generic/224.out	2024-06-03 12:10:00.000000000 +0000
    +++ /lkp/benchmarks/xfstests/results//generic/224.out.bad	2024-06-09 06:41:46.383320015 +0000
    @@ -1,2 +1,34 @@
     QA output created by 224
    -*** Silence is golden ***
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    +/lkp/benchmarks/xfstests/tests/generic/224: fork: retry: Resource temporarily unavailable
    ...
    (Run 'diff -u /lkp/benchmarks/xfstests/tests/generic/224.out /lkp/benchmarks/xfstests/results//generic/224.out.bad'  to see the entire diff)
Ran: generic/224
Failures: generic/224
Failed 1 of 1 tests




The kernel config and materials to reproduce are available at:
https://download.01.org/0day-ci/archive/20240611/202406111644.931d379c-oliver.sang@intel.com

mm: increase totalram_pages on freeing to buddy system

Commit Message

Comments

Patch