diff mbox series

[1/3] memory: extern memory_block_size_bytes and set_memory_block_size_order

Message ID 20241008044355.4325-2-gourry@gourry.net (mailing list archive)
State New
Headers show
Series memory,acpi: resize memory blocks based on CFMW alignment | expand

Commit Message

Gregory Price Oct. 8, 2024, 4:43 a.m. UTC
On CXL systems, block alignment may be as small as 256MB, which may
require a resize of the block size during initialization.  This is done
in the ACPI driver, so the resize function need to be made available.

Presently, only x86 provides the functionality to resize memory
block sizes.  Wire up a weak stub for set_memory_block_size_order,
similar to memory_block_size_bytes, that simply returns -ENODEV.

Since set_memory_block_size_order is now extern, we also need to
drop the __init macro.

Signed-off-by: Gregory Price <gourry@gourry.net>
---
 arch/x86/mm/init_64.c  | 2 +-
 drivers/base/memory.c  | 6 ++++++
 include/linux/memory.h | 4 ++--
 3 files changed, 9 insertions(+), 3 deletions(-)

Comments

David Hildenbrand Oct. 8, 2024, 2:03 p.m. UTC | #1
On 08.10.24 06:43, Gregory Price wrote:
> On CXL systems, block alignment may be as small as 256MB, which may
> require a resize of the block size during initialization.  This is done
> in the ACPI driver, so the resize function need to be made available.
> 
> Presently, only x86 provides the functionality to resize memory
> block sizes.  Wire up a weak stub for set_memory_block_size_order,
> similar to memory_block_size_bytes, that simply returns -ENODEV.
> 
> Since set_memory_block_size_order is now extern, we also need to
> drop the __init macro.
> 
> Signed-off-by: Gregory Price <gourry@gourry.net>
> ---
>   arch/x86/mm/init_64.c  | 2 +-
>   drivers/base/memory.c  | 6 ++++++
>   include/linux/memory.h | 4 ++--
>   3 files changed, 9 insertions(+), 3 deletions(-)
> 
> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> index ff253648706f..6086f99449fa 100644
> --- a/arch/x86/mm/init_64.c
> +++ b/arch/x86/mm/init_64.c
> @@ -1424,7 +1424,7 @@ void mark_rodata_ro(void)
>   
>   /* Adjustable memory block size */
>   static unsigned long set_memory_block_size;
> -int __init set_memory_block_size_order(unsigned int order)
> +int set_memory_block_size_order(unsigned int order)
>   {
>   	unsigned long size = 1UL << order;
>   
> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> index 67858eeb92ed..f9045642f69e 100644
> --- a/drivers/base/memory.c
> +++ b/drivers/base/memory.c
> @@ -110,6 +110,12 @@ static void memory_block_release(struct device *dev)
>   	kfree(mem);
>   }
>   
> +int __weak set_memory_block_size_order(unsigned int order)
> +{
> +	return -ENODEV;
> +}
> +EXPORT_SYMBOL_GPL(set_memory_block_size_order);

I can understand what you are trying to achieve, but letting arbitrary 
modules mess with this sounds like a bad idea.
Gregory Price Oct. 8, 2024, 2:51 p.m. UTC | #2
On Tue, Oct 08, 2024 at 04:03:37PM +0200, David Hildenbrand wrote:
> On 08.10.24 06:43, Gregory Price wrote:
> > On CXL systems, block alignment may be as small as 256MB, which may
> > require a resize of the block size during initialization.  This is done
> > in the ACPI driver, so the resize function need to be made available.
> > 
> > Presently, only x86 provides the functionality to resize memory
> > block sizes.  Wire up a weak stub for set_memory_block_size_order,
> > similar to memory_block_size_bytes, that simply returns -ENODEV.
> > 
> > Since set_memory_block_size_order is now extern, we also need to
> > drop the __init macro.
> > 
> > Signed-off-by: Gregory Price <gourry@gourry.net>
> > ---
> >   arch/x86/mm/init_64.c  | 2 +-
> >   drivers/base/memory.c  | 6 ++++++
> >   include/linux/memory.h | 4 ++--
> >   3 files changed, 9 insertions(+), 3 deletions(-)
> > 
> > diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
> > index ff253648706f..6086f99449fa 100644
> > --- a/arch/x86/mm/init_64.c
> > +++ b/arch/x86/mm/init_64.c
> > @@ -1424,7 +1424,7 @@ void mark_rodata_ro(void)
> >   /* Adjustable memory block size */
> >   static unsigned long set_memory_block_size;
> > -int __init set_memory_block_size_order(unsigned int order)
> > +int set_memory_block_size_order(unsigned int order)
> >   {
> >   	unsigned long size = 1UL << order;
> > diff --git a/drivers/base/memory.c b/drivers/base/memory.c
> > index 67858eeb92ed..f9045642f69e 100644
> > --- a/drivers/base/memory.c
> > +++ b/drivers/base/memory.c
> > @@ -110,6 +110,12 @@ static void memory_block_release(struct device *dev)
> >   	kfree(mem);
> >   }
> > +int __weak set_memory_block_size_order(unsigned int order)
> > +{
> > +	return -ENODEV;
> > +}
> > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> 
> I can understand what you are trying to achieve, but letting arbitrary
> modules mess with this sounds like a bad idea.
> 

I suppose the alternative is trying to scan the CEDT from inside each
machine, rather than the ACPI driver?  Seems less maintainable.

I don't entirely disagree with your comment.  I hummed and hawwed over
externing this - hence the warning in the x86 machine.

Open to better answers.

> -- 
> Cheers,
> 
> David / dhildenb
>
David Hildenbrand Oct. 8, 2024, 3:02 p.m. UTC | #3
On 08.10.24 16:51, Gregory Price wrote:
> On Tue, Oct 08, 2024 at 04:03:37PM +0200, David Hildenbrand wrote:
>> On 08.10.24 06:43, Gregory Price wrote:
>>> On CXL systems, block alignment may be as small as 256MB, which may
>>> require a resize of the block size during initialization.  This is done
>>> in the ACPI driver, so the resize function need to be made available.
>>>
>>> Presently, only x86 provides the functionality to resize memory
>>> block sizes.  Wire up a weak stub for set_memory_block_size_order,
>>> similar to memory_block_size_bytes, that simply returns -ENODEV.
>>>
>>> Since set_memory_block_size_order is now extern, we also need to
>>> drop the __init macro.
>>>
>>> Signed-off-by: Gregory Price <gourry@gourry.net>
>>> ---
>>>    arch/x86/mm/init_64.c  | 2 +-
>>>    drivers/base/memory.c  | 6 ++++++
>>>    include/linux/memory.h | 4 ++--
>>>    3 files changed, 9 insertions(+), 3 deletions(-)
>>>
>>> diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
>>> index ff253648706f..6086f99449fa 100644
>>> --- a/arch/x86/mm/init_64.c
>>> +++ b/arch/x86/mm/init_64.c
>>> @@ -1424,7 +1424,7 @@ void mark_rodata_ro(void)
>>>    /* Adjustable memory block size */
>>>    static unsigned long set_memory_block_size;
>>> -int __init set_memory_block_size_order(unsigned int order)
>>> +int set_memory_block_size_order(unsigned int order)
>>>    {
>>>    	unsigned long size = 1UL << order;
>>> diff --git a/drivers/base/memory.c b/drivers/base/memory.c
>>> index 67858eeb92ed..f9045642f69e 100644
>>> --- a/drivers/base/memory.c
>>> +++ b/drivers/base/memory.c
>>> @@ -110,6 +110,12 @@ static void memory_block_release(struct device *dev)
>>>    	kfree(mem);
>>>    }
>>> +int __weak set_memory_block_size_order(unsigned int order)
>>> +{
>>> +	return -ENODEV;
>>> +}
>>> +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
>>
>> I can understand what you are trying to achieve, but letting arbitrary
>> modules mess with this sounds like a bad idea.
>>
> 
> I suppose the alternative is trying to scan the CEDT from inside each
> machine, rather than the ACPI driver?  Seems less maintainable.
> 
> I don't entirely disagree with your comment.  I hummed and hawwed over
> externing this - hence the warning in the x86 machine.
> 
> Open to better answers.

Maybe an interface to add more restrictions on the maximum size might be 
better (instead of setting the size/order, you would impose another 
upper limit). Just imagine having various users of such an interface ..
Gregory Price Oct. 8, 2024, 3:21 p.m. UTC | #4
On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> On 08.10.24 16:51, Gregory Price wrote:
> > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > +{
> > > > +	return -ENODEV;
> > > > +}
> > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > 
> > > I can understand what you are trying to achieve, but letting arbitrary
> > > modules mess with this sounds like a bad idea.
> > > 
> > 
> > I suppose the alternative is trying to scan the CEDT from inside each
> > machine, rather than the ACPI driver?  Seems less maintainable.
> > 
> > I don't entirely disagree with your comment.  I hummed and hawwed over
> > externing this - hence the warning in the x86 machine.
> > 
> > Open to better answers.
> 
> Maybe an interface to add more restrictions on the maximum size might be
> better (instead of setting the size/order, you would impose another upper
> limit).

That is effectively what set_memory_block_size_order is, though.  Once
blocks are exposed to the allocators, its no longer safe to change the
size (in part because it was built assuming it wouldn't change, but I
imagine there are other dragons waiting in the shadows to bite me).

So this would basically amount to a lock-bit being set in the architecture,
beyond which block size can no longer be changed and a big ol' splat
can be generated that says "NO TOUCH".

> Just imagine having various users of such an interface ..

I don't wanna D:

> 
> -- 
> Cheers,
> 
> David / dhildenb
>
Ira Weiny Oct. 8, 2024, 7:04 p.m. UTC | #5
Gregory Price wrote:
> On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> > On 08.10.24 16:51, Gregory Price wrote:
> > > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > > +{
> > > > > +	return -ENODEV;
> > > > > +}
> > > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > > 
> > > > I can understand what you are trying to achieve, but letting arbitrary
> > > > modules mess with this sounds like a bad idea.
> > > > 
> > > 
> > > I suppose the alternative is trying to scan the CEDT from inside each
> > > machine, rather than the ACPI driver?  Seems less maintainable.
> > > 
> > > I don't entirely disagree with your comment.  I hummed and hawwed over
> > > externing this - hence the warning in the x86 machine.
> > > 
> > > Open to better answers.
> > 
> > Maybe an interface to add more restrictions on the maximum size might be
> > better (instead of setting the size/order, you would impose another upper
> > limit).
> 
> That is effectively what set_memory_block_size_order is, though.  Once
> blocks are exposed to the allocators, its no longer safe to change the
> size (in part because it was built assuming it wouldn't change, but I
> imagine there are other dragons waiting in the shadows to bite me).

Yea I think this is along the idea I had.  But much clearer.

Ira

> 
> So this would basically amount to a lock-bit being set in the architecture,
> beyond which block size can no longer be changed and a big ol' splat
> can be generated that says "NO TOUCH".
> 
> > Just imagine having various users of such an interface ..
> 
> I don't wanna D:
> 
> > 
> > -- 
> > Cheers,
> > 
> > David / dhildenb
> >
Gregory Price Oct. 8, 2024, 7:45 p.m. UTC | #6
On Tue, Oct 08, 2024 at 02:04:05PM -0500, Ira Weiny wrote:
> Gregory Price wrote:
> > On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> > > On 08.10.24 16:51, Gregory Price wrote:
> > > > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > > > +{
> > > > > > +	return -ENODEV;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > > > 
> > > > > I can understand what you are trying to achieve, but letting arbitrary
> > > > > modules mess with this sounds like a bad idea.
> > > > > 
> > > > 
> > > > I suppose the alternative is trying to scan the CEDT from inside each
> > > > machine, rather than the ACPI driver?  Seems less maintainable.
> > > > 
> > > > I don't entirely disagree with your comment.  I hummed and hawwed over
> > > > externing this - hence the warning in the x86 machine.
> > > > 
> > > > Open to better answers.
> > > 
> > > Maybe an interface to add more restrictions on the maximum size might be
> > > better (instead of setting the size/order, you would impose another upper
> > > limit).
> > 
> > That is effectively what set_memory_block_size_order is, though.  Once
> > blocks are exposed to the allocators, its no longer safe to change the
> > size (in part because it was built assuming it wouldn't change, but I
> > imagine there are other dragons waiting in the shadows to bite me).
> 
> Yea I think this is along the idea I had.  But much clearer.
> 
> Ira
> 

Dan seems to think I can just extern without EXPORT, so let me see if I can
get that working first.  Then I'll see if I can add a lock bit.

I'll see if i can make an arch_advise call that does away with some of the
ifdef spaghetti.

> > 
> > So this would basically amount to a lock-bit being set in the architecture,
> > beyond which block size can no longer be changed and a big ol' splat
> > can be generated that says "NO TOUCH".
> > 
> > > Just imagine having various users of such an interface ..
> > 
> > I don't wanna D:
> > 
> > > 
> > > -- 
> > > Cheers,
> > > 
> > > David / dhildenb
> > > 
> 
>
David Hildenbrand Oct. 14, 2024, 11:54 a.m. UTC | #7
On 08.10.24 17:21, Gregory Price wrote:
> On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
>> On 08.10.24 16:51, Gregory Price wrote:
>>>>> +int __weak set_memory_block_size_order(unsigned int order)
>>>>> +{
>>>>> +	return -ENODEV;
>>>>> +}
>>>>> +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
>>>>
>>>> I can understand what you are trying to achieve, but letting arbitrary
>>>> modules mess with this sounds like a bad idea.
>>>>
>>>
>>> I suppose the alternative is trying to scan the CEDT from inside each
>>> machine, rather than the ACPI driver?  Seems less maintainable.
>>>
>>> I don't entirely disagree with your comment.  I hummed and hawwed over
>>> externing this - hence the warning in the x86 machine.
>>>
>>> Open to better answers.
>>
>> Maybe an interface to add more restrictions on the maximum size might be
>> better (instead of setting the size/order, you would impose another upper
>> limit).
> 
> That is effectively what set_memory_block_size_order is, though.  Once
> blocks are exposed to the allocators, its no longer safe to change the
> size (in part because it was built assuming it wouldn't change, but I
> imagine there are other dragons waiting in the shadows to bite me).

Yes, we must run very early.

How is this supposed to interact with code like

set_block_size()

that also calls set_memory_block_size_order() on UV systems (assuming 
there will be CXL support sooner or later?)?


> 
> So this would basically amount to a lock-bit being set in the architecture,
> beyond which block size can no longer be changed and a big ol' splat
> can be generated that says "NO TOUCH".
> 
>> Just imagine having various users of such an interface ..
> 
> I don't wanna D:

Right, and it also doesn't make sense as explained in my other comment: 
this should never apply to loaded modules. :)
Gregory Price Oct. 14, 2024, 2:25 p.m. UTC | #8
On Mon, Oct 14, 2024 at 01:54:27PM +0200, David Hildenbrand wrote:
> On 08.10.24 17:21, Gregory Price wrote:
> > On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> > > On 08.10.24 16:51, Gregory Price wrote:
> > > > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > > > +{
> > > > > > +	return -ENODEV;
> > > > > > +}
> > > > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > > > 
> > > > > I can understand what you are trying to achieve, but letting arbitrary
> > > > > modules mess with this sounds like a bad idea.
> > > > > 
> > > > 
> > > > I suppose the alternative is trying to scan the CEDT from inside each
> > > > machine, rather than the ACPI driver?  Seems less maintainable.
> > > > 
> > > > I don't entirely disagree with your comment.  I hummed and hawwed over
> > > > externing this - hence the warning in the x86 machine.
> > > > 
> > > > Open to better answers.
> > > 
> > > Maybe an interface to add more restrictions on the maximum size might be
> > > better (instead of setting the size/order, you would impose another upper
> > > limit).
> > 
> > That is effectively what set_memory_block_size_order is, though.  Once
> > blocks are exposed to the allocators, its no longer safe to change the
> > size (in part because it was built assuming it wouldn't change, but I
> > imagine there are other dragons waiting in the shadows to bite me).
> 
> Yes, we must run very early.
> 
> How is this supposed to interact with code like
> 
> set_block_size()
> 
> that also calls set_memory_block_size_order() on UV systems (assuming there
> will be CXL support sooner or later?)?
> 
> 

Tying the other email to this one - just clarifying the way forward here.

It sounds like you're saying at a minimum drop EXPORT tags to prevent
modules from calling it - but it also sounds like built-ins need to be
prevented from touching it as well after a certain point in early boot.

Do you think I should go down the advise() path as suggested by Ira,
just adding a arch_lock_blocksize() bit and have set_..._order check it,
or should we just move towards each architecture having to go through
the ACPI:CEDT itself?

Doesn't sound like we've quite hit a consensus on where the actual
adjustment logic should land - just that this shouldn't be touched by modules.

~Gregory
David Hildenbrand Oct. 14, 2024, 8:32 p.m. UTC | #9
On 14.10.24 16:25, Gregory Price wrote:
> On Mon, Oct 14, 2024 at 01:54:27PM +0200, David Hildenbrand wrote:
>> On 08.10.24 17:21, Gregory Price wrote:
>>> On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
>>>> On 08.10.24 16:51, Gregory Price wrote:
>>>>>>> +int __weak set_memory_block_size_order(unsigned int order)
>>>>>>> +{
>>>>>>> +	return -ENODEV;
>>>>>>> +}
>>>>>>> +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
>>>>>>
>>>>>> I can understand what you are trying to achieve, but letting arbitrary
>>>>>> modules mess with this sounds like a bad idea.
>>>>>>
>>>>>
>>>>> I suppose the alternative is trying to scan the CEDT from inside each
>>>>> machine, rather than the ACPI driver?  Seems less maintainable.
>>>>>
>>>>> I don't entirely disagree with your comment.  I hummed and hawwed over
>>>>> externing this - hence the warning in the x86 machine.
>>>>>
>>>>> Open to better answers.
>>>>
>>>> Maybe an interface to add more restrictions on the maximum size might be
>>>> better (instead of setting the size/order, you would impose another upper
>>>> limit).
>>>
>>> That is effectively what set_memory_block_size_order is, though.  Once
>>> blocks are exposed to the allocators, its no longer safe to change the
>>> size (in part because it was built assuming it wouldn't change, but I
>>> imagine there are other dragons waiting in the shadows to bite me).
>>
>> Yes, we must run very early.
>>
>> How is this supposed to interact with code like
>>
>> set_block_size()
>>
>> that also calls set_memory_block_size_order() on UV systems (assuming there
>> will be CXL support sooner or later?)?
>>
>>
> 
> Tying the other email to this one - just clarifying the way forward here.
> 
> It sounds like you're saying at a minimum drop EXPORT tags to prevent
> modules from calling it - but it also sounds like built-ins need to be
> prevented from touching it as well after a certain point in early boot.

Right, at least the EXPORT is not required.

> 
> Do you think I should go down the advise() path as suggested by Ira,
> just adding a arch_lock_blocksize() bit and have set_..._order check it,
> or should we just move towards each architecture having to go through
> the ACPI:CEDT itself?

Let's summarize what we currently have on x86 is:

1) probe_memory_block_size()

Triggered on first memory_block_size_bytes() invocation. Makes a 
decision based on:

a) Already set size using set_memory_block_size_order()
b) RAM size
c) Bare metal vs. virt (bare metal -> use max)
d) Virt: largest block size aligned to memory end


2) set_memory_block_size_order()

Triggered by set_block_size() on UV systems.


I don't think set_memory_block_size_order() is the right tool to use. We 
just want to leave that alone I think -- it's a direct translation of a 
kernel cmdline parameter that should win.

You essentially want to tweak the b)->d) logic to take other alignment 
into consideration.

Maybe have some simple callback mechanism probe_memory_block_size() that 
can consult other sources for alignment requirements?

If that's not an option, then another way to set further min-alignment 
requirements (whereby we take MIN(old_align, new_align))?
Gregory Price Oct. 14, 2024, 10:40 p.m. UTC | #10
On Mon, Oct 14, 2024 at 10:32:36PM +0200, David Hildenbrand wrote:
> On 14.10.24 16:25, Gregory Price wrote:
> > On Mon, Oct 14, 2024 at 01:54:27PM +0200, David Hildenbrand wrote:
> > > On 08.10.24 17:21, Gregory Price wrote:
> > > > On Tue, Oct 08, 2024 at 05:02:33PM +0200, David Hildenbrand wrote:
> > > > > On 08.10.24 16:51, Gregory Price wrote:
> > > > > > > > +int __weak set_memory_block_size_order(unsigned int order)
> > > > > > > > +{
> > > > > > > > +	return -ENODEV;
> > > > > > > > +}
> > > > > > > > +EXPORT_SYMBOL_GPL(set_memory_block_size_order);
> > > > > > > 
> > > > > > > I can understand what you are trying to achieve, but letting arbitrary
> > > > > > > modules mess with this sounds like a bad idea.
> > > > > > > 
> > > > > > 
> > > > > > I suppose the alternative is trying to scan the CEDT from inside each
> > > > > > machine, rather than the ACPI driver?  Seems less maintainable.
> > > > > > 
> > > > > > I don't entirely disagree with your comment.  I hummed and hawwed over
> > > > > > externing this - hence the warning in the x86 machine.
> > > > > > 
> > > > > > Open to better answers.
> > > > > 
> > > > > Maybe an interface to add more restrictions on the maximum size might be
> > > > > better (instead of setting the size/order, you would impose another upper
> > > > > limit).
> > > > 
> > > > That is effectively what set_memory_block_size_order is, though.  Once
> > > > blocks are exposed to the allocators, its no longer safe to change the
> > > > size (in part because it was built assuming it wouldn't change, but I
> > > > imagine there are other dragons waiting in the shadows to bite me).
> > > 
> > > Yes, we must run very early.
> > > 
> > > How is this supposed to interact with code like
> > > 
> > > set_block_size()
> > > 
> > > that also calls set_memory_block_size_order() on UV systems (assuming there
> > > will be CXL support sooner or later?)?
> > > 
> > > 
> > 
> > Tying the other email to this one - just clarifying the way forward here.
> > 
> > It sounds like you're saying at a minimum drop EXPORT tags to prevent
> > modules from calling it - but it also sounds like built-ins need to be
> > prevented from touching it as well after a certain point in early boot.
> 
> Right, at least the EXPORT is not required.
> 
> > 
> > Do you think I should go down the advise() path as suggested by Ira,
> > just adding a arch_lock_blocksize() bit and have set_..._order check it,
> > or should we just move towards each architecture having to go through
> > the ACPI:CEDT itself?
> 
> Let's summarize what we currently have on x86 is:
> 
> 1) probe_memory_block_size()
> 
> Triggered on first memory_block_size_bytes() invocation. Makes a decision
> based on:
> 
> a) Already set size using set_memory_block_size_order()
> b) RAM size
> c) Bare metal vs. virt (bare metal -> use max)
> d) Virt: largest block size aligned to memory end
> 
> 
> 2) set_memory_block_size_order()
> 
> Triggered by set_block_size() on UV systems.
> 
> 
> I don't think set_memory_block_size_order() is the right tool to use. We
> just want to leave that alone I think -- it's a direct translation of a
> kernel cmdline parameter that should win.
> 
> You essentially want to tweak the b)->d) logic to take other alignment into
> consideration.
> 
> Maybe have some simple callback mechanism probe_memory_block_size() that can
> consult other sources for alignment requirements?
>

Thanks for this - I'll cobble something together.

Probably this ends up falling out similar to what Ira suggested. 

drivers/acpi/numa/srat.c
    acpi_numa_init():
        order = parse_cfwm(...)
        memblock_advise_size(order);

drivers/base/memory.c
    static int memblock_size_order = 0; /* let arch choose */

    int memblock_advise_size(order)
        int old_order;
        int new_order;
        if (order <= 0)
            return -EINVAL;

        do {
            old_order = memblock_size_order;
            new_order = MIN(old_order, order);
        } while (!atomic_cmpxchg(&memblock_size_order, old_order, new_order));

        /* memblock_size_order is now <= order, if -1 then the probe won */
        return new_order;

    int memblock_probe_size()
        return atomic_xchg(&memblock_size_order, -1);

drivers/base/memblock.h
    #ifdef HOTPLUG
        export memblock_advise_size()
        export memblock_probe_size()
    #else
        static memblock_advice_size() { return -ENODEV; } /* always fail */
        static memblock_probe_size() { return 0; } /* arch chooses */
    #endif

arch/*/mm/...
    probe_block_size():
        memblock_probe_size();
        /* select minimum across above suggested values */

> If that's not an option, then another way to set further min-alignment
> requirements (whereby we take MIN(old_align, new_align))?
> 
> -- 
> Cheers,
> 
> David / dhildenb
>
diff mbox series

Patch

diff --git a/arch/x86/mm/init_64.c b/arch/x86/mm/init_64.c
index ff253648706f..6086f99449fa 100644
--- a/arch/x86/mm/init_64.c
+++ b/arch/x86/mm/init_64.c
@@ -1424,7 +1424,7 @@  void mark_rodata_ro(void)
 
 /* Adjustable memory block size */
 static unsigned long set_memory_block_size;
-int __init set_memory_block_size_order(unsigned int order)
+int set_memory_block_size_order(unsigned int order)
 {
 	unsigned long size = 1UL << order;
 
diff --git a/drivers/base/memory.c b/drivers/base/memory.c
index 67858eeb92ed..f9045642f69e 100644
--- a/drivers/base/memory.c
+++ b/drivers/base/memory.c
@@ -110,6 +110,12 @@  static void memory_block_release(struct device *dev)
 	kfree(mem);
 }
 
+int __weak set_memory_block_size_order(unsigned int order)
+{
+	return -ENODEV;
+}
+EXPORT_SYMBOL_GPL(set_memory_block_size_order);
+
 unsigned long __weak memory_block_size_bytes(void)
 {
 	return MIN_MEMORY_BLOCK_SIZE;
diff --git a/include/linux/memory.h b/include/linux/memory.h
index c0afee5d126e..c57706178354 100644
--- a/include/linux/memory.h
+++ b/include/linux/memory.h
@@ -86,8 +86,8 @@  struct memory_block {
 };
 
 int arch_get_memory_phys_device(unsigned long start_pfn);
-unsigned long memory_block_size_bytes(void);
-int set_memory_block_size_order(unsigned int order);
+extern unsigned long memory_block_size_bytes(void);
+extern int set_memory_block_size_order(unsigned int order);
 
 /* These states are exposed to userspace as text strings in sysfs */
 #define	MEM_ONLINE		(1<<0) /* exposed to userspace */