diff mbox series

[v2] mm/page_alloc: Add lockdep assertion for pageblock type change

Message ID 20250303-pageblock-lockdep-v2-1-3fc0c37e9532@google.com (mailing list archive)
State New
Headers show
Series [v2] mm/page_alloc: Add lockdep assertion for pageblock type change | expand

Commit Message

Brendan Jackman March 3, 2025, 12:13 p.m. UTC
Since the migratetype hygiene patches [0], the locking here is
a bit more formalised.

For other stuff, it's pretty obvious that it would be protected by the
zone lock. But it didn't seem totally self-evident that it should
protect the pageblock type. So it seems particularly helpful to have it
written in the code.

[0] https://lore.kernel.org/lkml/20240320180429.678181-3-hannes@cmpxchg.org/T/

Signed-off-by: Brendan Jackman <jackmanb@google.com>
---
Note Vlastimil Acked the first version, I did not carry his ack since
I've changed the diff. Not sure if I'm being overly pedantic there?

Changes in v2:
- Fixed missing in_mem_hotplug() setup for !CONFIG_MEMORY_HOTPLUG
- Expanded commit message to include a bit more rationale
- Link to v1: https://lore.kernel.org/r/20250227-pageblock-lockdep-v1-1-3701efb331bb@google.com
---
 include/linux/memory_hotplug.h | 5 +++++
 mm/memory_hotplug.c            | 5 +++++
 mm/page_alloc.c                | 4 ++++
 3 files changed, 14 insertions(+)


---
base-commit: 83c34e70d8dd67525c2547e8c5ee1a4bf37ac06b
change-id: 20250227-pageblock-lockdep-9628c48d7e08

Best regards,

Comments

David Hildenbrand March 3, 2025, 1:11 p.m. UTC | #1
On 03.03.25 13:13, Brendan Jackman wrote:
> Since the migratetype hygiene patches [0], the locking here is
> a bit more formalised.
> 
> For other stuff, it's pretty obvious that it would be protected by the
> zone lock. But it didn't seem totally self-evident that it should
> protect the pageblock type. So it seems particularly helpful to have it
> written in the code.

[...]

> +
>   u64 max_mem_size = U64_MAX;
>   
>   /* add this memory to iomem resource */
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
>   
>   void set_pageblock_migratetype(struct page *page, int migratetype)
>   {
> +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
> +		in_mem_hotplug() ||
> +		lockdep_is_held(&page_zone(page)->lock));
> +

I assume the call chain on the memory hotplug path is mostly

move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()

either when onlining a memory block, or from pagemap_range() while 
holding the hotplug lock.

But there is also the 
memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype() 
one, called from pagemap_range() *without* holding the hotplug lock, and 
you assertion would be missing that.

I'm not too happy about that assertion in general.
David Hildenbrand March 3, 2025, 1:48 p.m. UTC | #2
On 03.03.25 14:11, David Hildenbrand wrote:
> On 03.03.25 13:13, Brendan Jackman wrote:
>> Since the migratetype hygiene patches [0], the locking here is
>> a bit more formalised.
>>
>> For other stuff, it's pretty obvious that it would be protected by the
>> zone lock. But it didn't seem totally self-evident that it should
>> protect the pageblock type. So it seems particularly helpful to have it
>> written in the code.
> 
> [...]
> 
>> +
>>    u64 max_mem_size = U64_MAX;
>>    
>>    /* add this memory to iomem resource */
>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>> index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
>> --- a/mm/page_alloc.c
>> +++ b/mm/page_alloc.c
>> @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
>>    
>>    void set_pageblock_migratetype(struct page *page, int migratetype)
>>    {
>> +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
>> +		in_mem_hotplug() ||
>> +		lockdep_is_held(&page_zone(page)->lock));
>> +
> 
> I assume the call chain on the memory hotplug path is mostly
> 
> move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()
> 
> either when onlining a memory block, or from pagemap_range() while
> holding the hotplug lock.
> 
> But there is also the
> memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype()
> one, called from pagemap_range() *without* holding the hotplug lock, and
> you assertion would be missing that.

Heh, and I even ran into that right now by accident during boot:

[    9.790696][    T1] WARNING: CPU: 3 PID: 1 at mm/page_alloc.c:420 set_pageblock_migratetype+0xb3/0xf0
[    9.792672][    T1] Modules linked in:
[    9.793496][    T1] CPU: 3 UID: 0 PID: 1 Comm: swapper/0 Not tainted 6.14.0-rc3-00349-geaddff2b220c #164
[    9.795511][    T1] Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.3-2.fc40 04/01/2014
[    9.797471][    T1] RIP: 0010:set_pageblock_migratetype+0xb3/0xf0
[    9.798794][    T1] Code: 2c c5 c0 0b 6d 91 73 43 4d 69 e4 40 07 00 00 be ff ff ff ff 4b 8d bc 25 18 06 00 00 e8 46 09 1b 04 85 c0 0f 85 71 ff ff ff 90 <0f> 0b 90 e9 68 ff ff ff 31 db e9 74 ff ff ff 48 c7 c6 58 4a f7 86
[    9.802906][    T1] RSP: 0018:ffffc9000013bcc0 EFLAGS: 00010246
[    9.804198][    T1] RAX: 0000000000000000 RBX: 0000000000000001 RCX: 0000000000000001
[    9.805860][    T1] RDX: 0000000000000046 RSI: ffffffff8725b542 RDI: ffffffff872bd087
[    9.807528][    T1] RBP: ffffeaffffc00000 R08: 0000000000000005 R09: 0000000000000000
[    9.809186][    T1] R10: 0000000000000000 R11: 0000000000000002 R12: 0000000000001d00
[    9.810871][    T1] R13: ffff88847fffa540 R14: 0000000000000000 R15: 0000000000000000
[    9.812526][    T1] FS:  0000000000000000(0000) GS:ffff88846fcc0000(0000) knlGS:0000000000000000
[    9.814388][    T1] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
[    9.815751][    T1] CR2: 0000000000000000 CR3: 00000000076d8000 CR4: 0000000000750ef0
[    9.817410][    T1] PKRU: 55555554
[    9.818143][    T1] Call Trace:
[    9.818821][    T1]  <TASK>
[    9.819428][    T1]  ? set_pageblock_migratetype+0xb3/0xf0
[    9.820639][    T1]  ? __warn.cold+0x110/0x210
[    9.821618][    T1]  ? set_pageblock_migratetype+0xb3/0xf0
[    9.822809][    T1]  ? report_bug+0x1b9/0x320
[    9.823762][    T1]  ? handle_bug+0x54/0x90
[    9.824675][    T1]  ? exc_invalid_op+0x17/0x50
[    9.825659][    T1]  ? asm_exc_invalid_op+0x1a/0x20
[    9.826727][    T1]  ? set_pageblock_migratetype+0xb3/0xf0
[    9.827914][    T1]  __init_zone_device_page.constprop.0+0x20c/0x240
[    9.829293][    T1]  memmap_init_zone_device+0x191/0x330
[    9.830478][    T1]  memremap_pages+0x4b7/0xc80
[    9.831485][    T1]  dmirror_allocate_chunk+0x12b/0x400
[    9.832628][    T1]  hmm_dmirror_init+0x18f/0x260
[    9.833657][    T1]  ? __pfx_hmm_dmirror_init+0x10/0x10
[    9.834798][    T1]  do_one_initcall+0xa5/0x490
[    9.835789][    T1]  kernel_init_freeable+0x3b4/0x410
[    9.836897][    T1]  ? __pfx_kernel_init+0x10/0x10
[    9.837938][    T1]  kernel_init+0x1b/0x1d0
[    9.838856][    T1]  ret_from_fork+0x48/0x60
[    9.839796][    T1]  ? __pfx_kernel_init+0x10/0x10
[    9.840864][    T1]  ret_from_fork_asm+0x1a/0x30
[    9.841878][    T1]  </TASK>
Brendan Jackman March 3, 2025, 1:55 p.m. UTC | #3
On Mon, Mar 03, 2025 at 02:11:23PM +0100, David Hildenbrand wrote:
> On 03.03.25 13:13, Brendan Jackman wrote:
> > Since the migratetype hygiene patches [0], the locking here is
> > a bit more formalised.
> > 
> > For other stuff, it's pretty obvious that it would be protected by the
> > zone lock. But it didn't seem totally self-evident that it should
> > protect the pageblock type. So it seems particularly helpful to have it
> > written in the code.
> 
> [...]
> 
> > +
> >   u64 max_mem_size = U64_MAX;
> >   /* add this memory to iomem resource */
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
> >   void set_pageblock_migratetype(struct page *page, int migratetype)
> >   {
> > +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
> > +		in_mem_hotplug() ||
> > +		lockdep_is_held(&page_zone(page)->lock));
> > +
> 
> I assume the call chain on the memory hotplug path is mostly
> 
> move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()
> 
> either when onlining a memory block, or from pagemap_range() while holding
> the hotplug lock.
> 
> But there is also the memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype()
> one, called from pagemap_range() *without* holding the hotplug lock, and you
> assertion would be missing that.
> 
> I'm not too happy about that assertion in general.

Hmm, thanks for pointing that out. 

I guess if we really wanted the assertion the approach would be to
replace in_mem_hotplug() with some more fine-grained logic about the
state of the pageblock? But that seems like it would require rework
that isn't really justified.

So yeah I guess this synchronization isn't as ready as I thought for
such a straightforwad "you need one of these locks here" assertion.

My ulterior motive here is the series I'm working on where the
pageblock records the ASI mapping status, so maybe I can find a weaker
assertion that at least helps with that more specific logic.
David Hildenbrand March 3, 2025, 2:06 p.m. UTC | #4
On 03.03.25 14:55, Brendan Jackman wrote:
> On Mon, Mar 03, 2025 at 02:11:23PM +0100, David Hildenbrand wrote:
>> On 03.03.25 13:13, Brendan Jackman wrote:
>>> Since the migratetype hygiene patches [0], the locking here is
>>> a bit more formalised.
>>>
>>> For other stuff, it's pretty obvious that it would be protected by the
>>> zone lock. But it didn't seem totally self-evident that it should
>>> protect the pageblock type. So it seems particularly helpful to have it
>>> written in the code.
>>
>> [...]
>>
>>> +
>>>    u64 max_mem_size = U64_MAX;
>>>    /* add this memory to iomem resource */
>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>> index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
>>> --- a/mm/page_alloc.c
>>> +++ b/mm/page_alloc.c
>>> @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
>>>    void set_pageblock_migratetype(struct page *page, int migratetype)
>>>    {
>>> +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
>>> +		in_mem_hotplug() ||
>>> +		lockdep_is_held(&page_zone(page)->lock));
>>> +
>>
>> I assume the call chain on the memory hotplug path is mostly
>>
>> move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()
>>
>> either when onlining a memory block, or from pagemap_range() while holding
>> the hotplug lock.
>>
>> But there is also the memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype()
>> one, called from pagemap_range() *without* holding the hotplug lock, and you
>> assertion would be missing that.
>>
>> I'm not too happy about that assertion in general.
> 
> Hmm, thanks for pointing that out.
> 
> I guess if we really wanted the assertion the approach would be to
> replace in_mem_hotplug() with some more fine-grained logic about the
> state of the pageblock? But that seems like it would require rework
> that isn't really justified.

I was wondering if we could just grab the zone lock while initializing, then
assert that we either hold that or are in boot.

In move_pfn_range_to_zone() it should likely not cause too much harm, and we
could just grab it around all zone modification stuff.

memmap_init_zone_device() might take longer and be more problematic.

But I am not sure why memmap_init_zone_device() would have to set the
migratetype at all? Because migratetype is a buddy concept, and
ZONE_DEVICE does not interact with the buddy to that degree.

The comment in __init_zone_device_page states:

"Mark the block movable so that blocks are reserved for movable at
startup. This will force kernel allocations to reserve their blocks
rather than leaking throughout the address space during boot when
many long-lived kernel allocations are made."

But that just dates back to 966cf44f637e where we copy-pasted that code.


So I wonder if we could just

diff --git a/mm/mm_init.c b/mm/mm_init.c
index 57933683ed0d1..b95f545846e6e 100644
--- a/mm/mm_init.c
+++ b/mm/mm_init.c
@@ -1002,19 +1002,11 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
         page->zone_device_data = NULL;
  
         /*
-        * Mark the block movable so that blocks are reserved for
-        * movable at startup. This will force kernel allocations
-        * to reserve their blocks rather than leaking throughout
-        * the address space during boot when many long-lived
-        * kernel allocations are made.
-        *
-        * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
-        * because this is done early in section_activate()
+        * Note that we leave pageblock migratetypes uninitialized, because
+        * they don't apply to ZONE_DEVICE.
          */
-       if (pageblock_aligned(pfn)) {
-               set_pageblock_migratetype(page, MIGRATE_MOVABLE);
+       if (pageblock_aligned(pfn))
                 cond_resched();
-       }
  
         /*
          * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
Brendan Jackman March 3, 2025, 4 p.m. UTC | #5
On Mon, Mar 03, 2025 at 03:06:54PM +0100, David Hildenbrand wrote:
> On 03.03.25 14:55, Brendan Jackman wrote:
> > On Mon, Mar 03, 2025 at 02:11:23PM +0100, David Hildenbrand wrote:
> > > On 03.03.25 13:13, Brendan Jackman wrote:
> > > > Since the migratetype hygiene patches [0], the locking here is
> > > > a bit more formalised.
> > > > 
> > > > For other stuff, it's pretty obvious that it would be protected by the
> > > > zone lock. But it didn't seem totally self-evident that it should
> > > > protect the pageblock type. So it seems particularly helpful to have it
> > > > written in the code.
> > > 
> > > [...]
> > > 
> > > > +
> > > >    u64 max_mem_size = U64_MAX;
> > > >    /* add this memory to iomem resource */
> > > > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > > > index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
> > > > --- a/mm/page_alloc.c
> > > > +++ b/mm/page_alloc.c
> > > > @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
> > > >    void set_pageblock_migratetype(struct page *page, int migratetype)
> > > >    {
> > > > +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
> > > > +		in_mem_hotplug() ||
> > > > +		lockdep_is_held(&page_zone(page)->lock));
> > > > +
> > > 
> > > I assume the call chain on the memory hotplug path is mostly
> > > 
> > > move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()
> > > 
> > > either when onlining a memory block, or from pagemap_range() while holding
> > > the hotplug lock.
> > > 
> > > But there is also the memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype()
> > > one, called from pagemap_range() *without* holding the hotplug lock, and you
> > > assertion would be missing that.
> > > 
> > > I'm not too happy about that assertion in general.
> > 
> > Hmm, thanks for pointing that out.
> > 
> > I guess if we really wanted the assertion the approach would be to
> > replace in_mem_hotplug() with some more fine-grained logic about the
> > state of the pageblock? But that seems like it would require rework
> > that isn't really justified.
> 
> I was wondering if we could just grab the zone lock while initializing, then
> assert that we either hold that or are in boot.

Would that be because you want to avoid creating in_mem_hotplug()? Or
is it more about just simplifying the synchronization in general?

FWIW I don't think the in_mem_hotplug() is really that bad in the
assertion, it feels natural to me that memory hotplug would be an
exception to the locking rules in the same way that startup would be.

> In move_pfn_range_to_zone() it should likely not cause too much harm, and we
> could just grab it around all zone modification stuff.
> 
> memmap_init_zone_device() might take longer and be more problematic.
> 
> But I am not sure why memmap_init_zone_device() would have to set the
> migratetype at all? Because migratetype is a buddy concept, and
> ZONE_DEVICE does not interact with the buddy to that degree.
> 
> The comment in __init_zone_device_page states:
> 
> "Mark the block movable so that blocks are reserved for movable at
> startup. This will force kernel allocations to reserve their blocks
> rather than leaking throughout the address space during boot when
> many long-lived kernel allocations are made."

Uh, yeah I was pretty mystified by that. It would certainly be nice if
we can just get rid of this modification path.

> But that just dates back to 966cf44f637e where we copy-pasted that code.
>
> So I wonder if we could just
> 
> diff --git a/mm/mm_init.c b/mm/mm_init.c
> index 57933683ed0d1..b95f545846e6e 100644
> --- a/mm/mm_init.c
> +++ b/mm/mm_init.c
> @@ -1002,19 +1002,11 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>         page->zone_device_data = NULL;
>         /*
> -        * Mark the block movable so that blocks are reserved for
> -        * movable at startup. This will force kernel allocations
> -        * to reserve their blocks rather than leaking throughout
> -        * the address space during boot when many long-lived
> -        * kernel allocations are made.
> -        *
> -        * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
> -        * because this is done early in section_activate()
> +        * Note that we leave pageblock migratetypes uninitialized, because
> +        * they don't apply to ZONE_DEVICE.
>          */
> -       if (pageblock_aligned(pfn)) {
> -               set_pageblock_migratetype(page, MIGRATE_MOVABLE);
> +       if (pageblock_aligned(pfn))
>                 cond_resched();
> -       }
>         /*
>          * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released

memory-model.rst says:

> Since the
> page reference count never drops below 1 the page is never tracked as
> free memory and the page's `struct list_head lru` space is repurposed
> for back referencing to the host device / driver that mapped the memory.

And this code seems to assume that the whole pageblock is part of the
ZONE_DEVICE dance, it would certainly make sense to me...
David Hildenbrand March 3, 2025, 5:06 p.m. UTC | #6
On 03.03.25 17:00, Brendan Jackman wrote:
> On Mon, Mar 03, 2025 at 03:06:54PM +0100, David Hildenbrand wrote:
>> On 03.03.25 14:55, Brendan Jackman wrote:
>>> On Mon, Mar 03, 2025 at 02:11:23PM +0100, David Hildenbrand wrote:
>>>> On 03.03.25 13:13, Brendan Jackman wrote:
>>>>> Since the migratetype hygiene patches [0], the locking here is
>>>>> a bit more formalised.
>>>>>
>>>>> For other stuff, it's pretty obvious that it would be protected by the
>>>>> zone lock. But it didn't seem totally self-evident that it should
>>>>> protect the pageblock type. So it seems particularly helpful to have it
>>>>> written in the code.
>>>>
>>>> [...]
>>>>
>>>>> +
>>>>>     u64 max_mem_size = U64_MAX;
>>>>>     /* add this memory to iomem resource */
>>>>> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
>>>>> index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
>>>>> --- a/mm/page_alloc.c
>>>>> +++ b/mm/page_alloc.c
>>>>> @@ -417,6 +417,10 @@ void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
>>>>>     void set_pageblock_migratetype(struct page *page, int migratetype)
>>>>>     {
>>>>> +	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
>>>>> +		in_mem_hotplug() ||
>>>>> +		lockdep_is_held(&page_zone(page)->lock));
>>>>> +
>>>>
>>>> I assume the call chain on the memory hotplug path is mostly
>>>>
>>>> move_pfn_range_to_zone()->memmap_init_range()->set_pageblock_migratetype()
>>>>
>>>> either when onlining a memory block, or from pagemap_range() while holding
>>>> the hotplug lock.
>>>>
>>>> But there is also the memmap_init_zone_device()->memmap_init_compound()->__init_zone_device_page()->set_pageblock_migratetype()
>>>> one, called from pagemap_range() *without* holding the hotplug lock, and you
>>>> assertion would be missing that.
>>>>
>>>> I'm not too happy about that assertion in general.
>>>
>>> Hmm, thanks for pointing that out.
>>>
>>> I guess if we really wanted the assertion the approach would be to
>>> replace in_mem_hotplug() with some more fine-grained logic about the
>>> state of the pageblock? But that seems like it would require rework
>>> that isn't really justified.
>>
>> I was wondering if we could just grab the zone lock while initializing, then
>> assert that we either hold that or are in boot.
> 
> Would that be because you want to avoid creating in_mem_hotplug()? Or
> is it more about just simplifying the synchronization in general?

A little bit of both. The question is if lockless resizing of the zone 
range today is a bug or a feature :)

I'm not aware of any known side-effects of that, but if we could add 
locking without causing noticeable overheads, that would certainly be 
easiest ...

> 
> FWIW I don't think the in_mem_hotplug() is really that bad in the
> assertion, it feels natural to me that memory hotplug would be an
> exception to the locking rules in the same way that startup would be.
 > >> In move_pfn_range_to_zone() it should likely not cause too much 
harm, and we
>> could just grab it around all zone modification stuff.
>>
>> memmap_init_zone_device() might take longer and be more problematic.
>>
>> But I am not sure why memmap_init_zone_device() would have to set the
>> migratetype at all? Because migratetype is a buddy concept, and
>> ZONE_DEVICE does not interact with the buddy to that degree.
>>
>> The comment in __init_zone_device_page states:
>>
>> "Mark the block movable so that blocks are reserved for movable at
>> startup. This will force kernel allocations to reserve their blocks
>> rather than leaking throughout the address space during boot when
>> many long-lived kernel allocations are made."
> 
> Uh, yeah I was pretty mystified by that. It would certainly be nice if
> we can just get rid of this modification path.
> 
>> But that just dates back to 966cf44f637e where we copy-pasted that code.
>>
>> So I wonder if we could just
>>
>> diff --git a/mm/mm_init.c b/mm/mm_init.c
>> index 57933683ed0d1..b95f545846e6e 100644
>> --- a/mm/mm_init.c
>> +++ b/mm/mm_init.c
>> @@ -1002,19 +1002,11 @@ static void __ref __init_zone_device_page(struct page *page, unsigned long pfn,
>>          page->zone_device_data = NULL;
>>          /*
>> -        * Mark the block movable so that blocks are reserved for
>> -        * movable at startup. This will force kernel allocations
>> -        * to reserve their blocks rather than leaking throughout
>> -        * the address space during boot when many long-lived
>> -        * kernel allocations are made.
>> -        *
>> -        * Please note that MEMINIT_HOTPLUG path doesn't clear memmap
>> -        * because this is done early in section_activate()
>> +        * Note that we leave pageblock migratetypes uninitialized, because
>> +        * they don't apply to ZONE_DEVICE.
>>           */
>> -       if (pageblock_aligned(pfn)) {
>> -               set_pageblock_migratetype(page, MIGRATE_MOVABLE);
>> +       if (pageblock_aligned(pfn))
>>                  cond_resched();
>> -       }
>>          /*
>>           * ZONE_DEVICE pages other than MEMORY_TYPE_GENERIC are released
> 
> memory-model.rst says:
> 
>> Since the
>> page reference count never drops below 1 the page is never tracked as
>> free memory and the page's `struct list_head lru` space is repurposed
>> for back referencing to the host device / driver that mapped the memory.

That comment will be stale soon. In general, ZONE_DEVICE refcounts can 
drop to 0, but they will never go to the buddy, but will get intercepted 
on the freeing path.

> 
> And this code seems to assume that the whole pageblock is part of the
> ZONE_DEVICE dance, it would certainly make sense to me...

Sorry, I didn't get your final conclusion: do you thing we don't have to 
initialize the migratetype, or do you think there is reason to still do it?
diff mbox series

Patch

diff --git a/include/linux/memory_hotplug.h b/include/linux/memory_hotplug.h
index eaac5ae8c05c8ee2f2868d5bc1b04d1f68235b3f..508a1d074527a3349c1c9789ecf35d5ab08a5ba6 100644
--- a/include/linux/memory_hotplug.h
+++ b/include/linux/memory_hotplug.h
@@ -175,6 +175,7 @@  void put_online_mems(void);
 
 void mem_hotplug_begin(void);
 void mem_hotplug_done(void);
+bool in_mem_hotplug(void);
 
 /* See kswapd_is_running() */
 static inline void pgdat_kswapd_lock(pg_data_t *pgdat)
@@ -223,6 +224,10 @@  static inline void put_online_mems(void) {}
 
 static inline void mem_hotplug_begin(void) {}
 static inline void mem_hotplug_done(void) {}
+static inline bool in_mem_hotplug(void)
+{
+	return false;
+}
 
 static inline bool movable_node_is_enabled(void)
 {
diff --git a/mm/memory_hotplug.c b/mm/memory_hotplug.c
index e3655f07dd6e33efb3e811cab07f240649487441..c44c2765b21f1e9e4b2d68abda9ac161fb2869ec 100644
--- a/mm/memory_hotplug.c
+++ b/mm/memory_hotplug.c
@@ -267,6 +267,11 @@  void mem_hotplug_done(void)
 	cpus_read_unlock();
 }
 
+bool in_mem_hotplug(void)
+{
+	return percpu_is_write_locked(&mem_hotplug_lock);
+}
+
 u64 max_mem_size = U64_MAX;
 
 /* add this memory to iomem resource */
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index 579789600a3c7bfb7b0d847d51af702a9d4b139a..1ed21179676d05c66f77f9dbebf88e36bbe402e9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -417,6 +417,10 @@  void set_pfnblock_flags_mask(struct page *page, unsigned long flags,
 
 void set_pageblock_migratetype(struct page *page, int migratetype)
 {
+	lockdep_assert_once(system_state == SYSTEM_BOOTING ||
+		in_mem_hotplug() ||
+		lockdep_is_held(&page_zone(page)->lock));
+
 	if (unlikely(page_group_by_mobility_disabled &&
 		     migratetype < MIGRATE_PCPTYPES))
 		migratetype = MIGRATE_UNMOVABLE;