diff mbox series

[RFC] mm/slab: Avoid build bug for calls to kmalloc with a large constant

Message ID 44312f4a-8b9c-49ce-9277-5873a94ca1bb@oracle.com (mailing list archive)
State New
Headers show
Series [RFC] mm/slab: Avoid build bug for calls to kmalloc with a large constant | expand

Commit Message

Dave Kleikamp Nov. 1, 2024, 8:16 p.m. UTC
When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
is no longer optimized out with a constant size, so a build bug may
occur on a path that won't be reached.

Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c

Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
---

Ryan,

Please consider incorporating this fix or something similar into your
mm patch in the boot-time pages size patches.

  include/linux/slab.h | 3 ++-
  1 file changed, 2 insertions(+), 1 deletion(-)

Comments

Ryan Roberts Nov. 6, 2024, 11:44 a.m. UTC | #1
On 01/11/2024 20:16, Dave Kleikamp wrote:
> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
> is no longer optimized out with a constant size, so a build bug may
> occur on a path that won't be reached.
> 
> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
> 
> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> ---
> 
> Ryan,
> 
> Please consider incorporating this fix or something similar into your
> mm patch in the boot-time pages size patches.
> 
>  include/linux/slab.h | 3 ++-
>  1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9848296ca6ba..a4c7507ab8ec 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t
> size,
>      if (size <= 1024 * 1024) return 20;
>      if (size <=  2 * 1024 * 1024) return 21;
>  
> -    if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
> +    if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&

Thanks for the patch! I think this may be better as:

       if (PAGE_SHIFT_MIN == PAGE_SHIFT_MAX &&

Since that is independent of the architecture. Your approach wouldn't work if
another arch wanted to enable boot time page size, or if arm64 dropped the
Kconfig because it decided only boot time page size will be supported in future.

Thanks,
Ryan

> +        !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>          BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>      else
>          BUG();
Dave Kleikamp Nov. 6, 2024, 3:20 p.m. UTC | #2
On 11/6/24 5:44AM, Ryan Roberts wrote:
> On 01/11/2024 20:16, Dave Kleikamp wrote:
>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>> is no longer optimized out with a constant size, so a build bug may
>> occur on a path that won't be reached.
>>
>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>
>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> ---
>>
>> Ryan,
>>
>> Please consider incorporating this fix or something similar into your
>> mm patch in the boot-time pages size patches.
>>
>>   include/linux/slab.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 9848296ca6ba..a4c7507ab8ec 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t
>> size,
>>       if (size <= 1024 * 1024) return 20;
>>       if (size <=  2 * 1024 * 1024) return 21;
>>   
>> -    if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>> +    if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
> 
> Thanks for the patch! I think this may be better as:
> 
>         if (PAGE_SHIFT_MIN == PAGE_SHIFT_MAX &&
> 
> Since that is independent of the architecture. Your approach wouldn't work if
> another arch wanted to enable boot time page size, or if arm64 dropped the
> Kconfig because it decided only boot time page size will be supported in future.

Absolutely. I may be sending some more. I haven't gotten to JFS yet, but 
that one is my responsibility.

Thanks,
Shaggy

> 
> Thanks,
> Ryan
> 
>> +        !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>           BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>       else
>>           BUG();
>
Vlastimil Babka Nov. 14, 2024, 10:09 a.m. UTC | #3
On 11/1/24 21:16, Dave Kleikamp wrote:
> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
> is no longer optimized out with a constant size, so a build bug may
> occur on a path that won't be reached.

That's rather unfortunate, the __builtin_constant_p(size) part of
kmalloc_noprof() really expects things to resolve at compile time and it
would be better to keep it that way.

I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
PAGE_SHIFT_MAX and kept it constant, instead of introducing
KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.

So if the kernel was built to support 4k to 64k, but booted as 4k, it would
still create and use kmalloc caches up to 128k. SLUB should handle that fine
(if not, please report it :)

Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
cache size is max 64k and not 128k but that should be probably evaluated
separately from this series.

Vlastimil

> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
> 
> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
> ---
> 
> Ryan,
> 
> Please consider incorporating this fix or something similar into your
> mm patch in the boot-time pages size patches.
> 
>   include/linux/slab.h | 3 ++-
>   1 file changed, 2 insertions(+), 1 deletion(-)
> 
> diff --git a/include/linux/slab.h b/include/linux/slab.h
> index 9848296ca6ba..a4c7507ab8ec 100644
> --- a/include/linux/slab.h
> +++ b/include/linux/slab.h
> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>   	if (size <= 1024 * 1024) return 20;
>   	if (size <=  2 * 1024 * 1024) return 21;
>   
> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>   	else
>   		BUG();
Ryan Roberts Nov. 26, 2024, 12:18 p.m. UTC | #4
On 14/11/2024 10:09, Vlastimil Babka wrote:
> On 11/1/24 21:16, Dave Kleikamp wrote:
>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>> is no longer optimized out with a constant size, so a build bug may
>> occur on a path that won't be reached.
> 
> That's rather unfortunate, the __builtin_constant_p(size) part of
> kmalloc_noprof() really expects things to resolve at compile time and it
> would be better to keep it that way.
> 
> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
> PAGE_SHIFT_MAX and kept it constant, instead of introducing
> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
> 
> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
> still create and use kmalloc caches up to 128k. SLUB should handle that fine
> (if not, please report it :)

So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
whereas before it only supported up to 8K. I was trying to avoid that since I
assumed that would be costly in terms of extra memory allocated for those higher
order buckets that will never be used. But I have no idea how SLUB works in
practice. Perhaps memory for the cache is only lazily allocated so we won't see
an issue in practice?

I'm happy to make this change if you're certain it's the right approach; please
confirm.

> 
> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
> cache size is max 64k and not 128k but that should be probably evaluated
> separately from this series.

I'm inferring from this that perhaps there is a memory cost with having the
higher orders defined but unused.

Thanks,
Ryan

> 
> Vlastimil
> 
>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>
>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>> ---
>>
>> Ryan,
>>
>> Please consider incorporating this fix or something similar into your
>> mm patch in the boot-time pages size patches.
>>
>>   include/linux/slab.h | 3 ++-
>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>
>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>> index 9848296ca6ba..a4c7507ab8ec 100644
>> --- a/include/linux/slab.h
>> +++ b/include/linux/slab.h
>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>   	if (size <= 1024 * 1024) return 20;
>>   	if (size <=  2 * 1024 * 1024) return 21;
>>   
>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>   	else
>>   		BUG();
>
Vlastimil Babka Nov. 26, 2024, 12:36 p.m. UTC | #5
On 11/26/24 13:18, Ryan Roberts wrote:
> On 14/11/2024 10:09, Vlastimil Babka wrote:
>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>> is no longer optimized out with a constant size, so a build bug may
>>> occur on a path that won't be reached.
>> 
>> That's rather unfortunate, the __builtin_constant_p(size) part of
>> kmalloc_noprof() really expects things to resolve at compile time and it
>> would be better to keep it that way.
>> 
>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>> 
>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>> (if not, please report it :)
> 
> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
> whereas before it only supported up to 8K. I was trying to avoid that since I
> assumed that would be costly in terms of extra memory allocated for those higher
> order buckets that will never be used. But I have no idea how SLUB works in
> practice. Perhaps memory for the cache is only lazily allocated so we won't see
> an issue in practice?

Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
some overhead with the management structures (struct kmem_cache etc) but
much smaller.
To be completely honest, some extra overhead might come to be when the slabs
are allocated ans later the user frees those allocations. kmalloc_large()
wwould return them immediately, while a regular kmem_cache will keep one or
more per cpu for reuse. But if that becomes a visible problem we can tune
those caches to discard slabs more aggressively.

> I'm happy to make this change if you're certain it's the right approach; please
> confirm.

Yes it's much better option than breaking the build-time-constant part of
kmalloc_noprof().

>> 
>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
>> cache size is max 64k and not 128k but that should be probably evaluated
>> separately from this series.
> 
> I'm inferring from this that perhaps there is a memory cost with having the
> higher orders defined but unused.

Yeah as per above, should not be too large and we could tune it down if
necessary.

> Thanks,
> Ryan
> 
>> 
>> Vlastimil
>> 
>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>>
>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>>> ---
>>>
>>> Ryan,
>>>
>>> Please consider incorporating this fix or something similar into your
>>> mm patch in the boot-time pages size patches.
>>>
>>>   include/linux/slab.h | 3 ++-
>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>
>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>> index 9848296ca6ba..a4c7507ab8ec 100644
>>> --- a/include/linux/slab.h
>>> +++ b/include/linux/slab.h
>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>>   	if (size <= 1024 * 1024) return 20;
>>>   	if (size <=  2 * 1024 * 1024) return 21;
>>>   
>>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>>   	else
>>>   		BUG();
>> 
>
Ryan Roberts Nov. 26, 2024, 2:26 p.m. UTC | #6
On 26/11/2024 12:36, Vlastimil Babka wrote:
> On 11/26/24 13:18, Ryan Roberts wrote:
>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>> is no longer optimized out with a constant size, so a build bug may
>>>> occur on a path that won't be reached.
>>>
>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>> would be better to keep it that way.
>>>
>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>
>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>> (if not, please report it :)
>>
>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>> whereas before it only supported up to 8K. I was trying to avoid that since I
>> assumed that would be costly in terms of extra memory allocated for those higher
>> order buckets that will never be used. But I have no idea how SLUB works in
>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>> an issue in practice?
> 
> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
> some overhead with the management structures (struct kmem_cache etc) but
> much smaller.
> To be completely honest, some extra overhead might come to be when the slabs
> are allocated ans later the user frees those allocations. kmalloc_large()
> wwould return them immediately, while a regular kmem_cache will keep one or
> more per cpu for reuse. But if that becomes a visible problem we can tune
> those caches to discard slabs more aggressively.
> 
>> I'm happy to make this change if you're certain it's the right approach; please
>> confirm.
> 
> Yes it's much better option than breaking the build-time-constant part of
> kmalloc_noprof().

OK, I'll take this approach as you suggest.

Thanks,
Ryan

> 
>>>
>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
>>> cache size is max 64k and not 128k but that should be probably evaluated
>>> separately from this series.
>>
>> I'm inferring from this that perhaps there is a memory cost with having the
>> higher orders defined but unused.
> 
> Yeah as per above, should not be too large and we could tune it down if
> necessary.
> 
>> Thanks,
>> Ryan
>>
>>>
>>> Vlastimil
>>>
>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>>>
>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>>>> ---
>>>>
>>>> Ryan,
>>>>
>>>> Please consider incorporating this fix or something similar into your
>>>> mm patch in the boot-time pages size patches.
>>>>
>>>>   include/linux/slab.h | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>>> index 9848296ca6ba..a4c7507ab8ec 100644
>>>> --- a/include/linux/slab.h
>>>> +++ b/include/linux/slab.h
>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>>>   	if (size <= 1024 * 1024) return 20;
>>>>   	if (size <=  2 * 1024 * 1024) return 21;
>>>>   
>>>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>>>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>>>   	else
>>>>   		BUG();
>>>
>>
>
Ryan Roberts Nov. 26, 2024, 2:53 p.m. UTC | #7
On 26/11/2024 12:36, Vlastimil Babka wrote:
> On 11/26/24 13:18, Ryan Roberts wrote:
>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>> is no longer optimized out with a constant size, so a build bug may
>>>> occur on a path that won't be reached.
>>>
>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>> would be better to keep it that way.
>>>
>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>
>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>> (if not, please report it :)
>>
>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>> whereas before it only supported up to 8K. I was trying to avoid that since I
>> assumed that would be costly in terms of extra memory allocated for those higher
>> order buckets that will never be used. But I have no idea how SLUB works in
>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>> an issue in practice?
> 
> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
> some overhead with the management structures (struct kmem_cache etc) but
> much smaller.
> To be completely honest, some extra overhead might come to be when the slabs
> are allocated ans later the user frees those allocations. kmalloc_large()
> wwould return them immediately, while a regular kmem_cache will keep one or
> more per cpu for reuse. But if that becomes a visible problem we can tune
> those caches to discard slabs more aggressively.

Sorry to keep pushing on this, now that I've actually looked at the code, I feel
I have a slightly better understanding:

void *kmalloc_noprof(size_t size, gfp_t flags)
{
	if (__builtin_constant_p(size) && size) {
		
		if (size > KMALLOC_MAX_CACHE_SIZE)
			return __kmalloc_large_noprof(size, flags); <<< (1)

		index = kmalloc_index(size);
		return __kmalloc_cache_noprof(...);   <<< (2)
	}
	return __kmalloc_noprof(size, flags);   <<< (3)
}

So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
resolving either to a call to (1) or (2), decided at compile time. If
KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
need to be kept in the function.

But intuatively, I would have guessed that given the choice between the overhead
of keeping that runtime conditional vs keeping per-cpu slab caches for extra
sizes between 16K and 128K, then the runtime conditional would be preferable. I
would guess that quite a bit of memory could get tied up in those caches?

Why is your preference the opposite? What am I not understanding?


> 
>> I'm happy to make this change if you're certain it's the right approach; please
>> confirm.
> 
> Yes it's much better option than breaking the build-time-constant part of
> kmalloc_noprof().
> 
>>>
>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
>>> cache size is max 64k and not 128k but that should be probably evaluated
>>> separately from this series.
>>
>> I'm inferring from this that perhaps there is a memory cost with having the
>> higher orders defined but unused.
> 
> Yeah as per above, should not be too large and we could tune it down if
> necessary.
> 
>> Thanks,
>> Ryan
>>
>>>
>>> Vlastimil
>>>
>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>>>
>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>>>> ---
>>>>
>>>> Ryan,
>>>>
>>>> Please consider incorporating this fix or something similar into your
>>>> mm patch in the boot-time pages size patches.
>>>>
>>>>   include/linux/slab.h | 3 ++-
>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>
>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>>> index 9848296ca6ba..a4c7507ab8ec 100644
>>>> --- a/include/linux/slab.h
>>>> +++ b/include/linux/slab.h
>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>>>   	if (size <= 1024 * 1024) return 20;
>>>>   	if (size <=  2 * 1024 * 1024) return 21;
>>>>   
>>>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>>>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>>>   	else
>>>>   		BUG();
>>>
>>
>
Vlastimil Babka Nov. 26, 2024, 3:09 p.m. UTC | #8
On 11/26/24 15:53, Ryan Roberts wrote:
> On 26/11/2024 12:36, Vlastimil Babka wrote:
>> On 11/26/24 13:18, Ryan Roberts wrote:
>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>> occur on a path that won't be reached.
>>>>
>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>> would be better to keep it that way.
>>>>
>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>
>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>> (if not, please report it :)
>>>
>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>> assumed that would be costly in terms of extra memory allocated for those higher
>>> order buckets that will never be used. But I have no idea how SLUB works in
>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>> an issue in practice?
>> 
>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>> some overhead with the management structures (struct kmem_cache etc) but
>> much smaller.
>> To be completely honest, some extra overhead might come to be when the slabs
>> are allocated ans later the user frees those allocations. kmalloc_large()
>> wwould return them immediately, while a regular kmem_cache will keep one or
>> more per cpu for reuse. But if that becomes a visible problem we can tune
>> those caches to discard slabs more aggressively.
> 
> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
> I have a slightly better understanding:
> 
> void *kmalloc_noprof(size_t size, gfp_t flags)
> {
> 	if (__builtin_constant_p(size) && size) {
> 		
> 		if (size > KMALLOC_MAX_CACHE_SIZE)
> 			return __kmalloc_large_noprof(size, flags); <<< (1)
> 
> 		index = kmalloc_index(size);
> 		return __kmalloc_cache_noprof(...);   <<< (2)
> 	}
> 	return __kmalloc_noprof(size, flags);   <<< (3)
> }
> 
> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
> resolving either to a call to (1) or (2), decided at compile time. If
> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
> need to be kept in the function.
> 
> But intuatively, I would have guessed that given the choice between the overhead
> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
> sizes between 16K and 128K, then the runtime conditional would be preferable. I
> would guess that quite a bit of memory could get tied up in those caches?
> 
> Why is your preference the opposite? What am I not understanding?

+CC more slab people.

So the above is an inline function, but constructed in a way that it should,
without further inline code, become
- a call to __kmalloc_large_noprof() for build-time constant size larger
than KMALLOC_MAX_CACHE_SIZE
- a call to __kmalloc_cache_noprof() for build-time constant size smaller
than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
compile-time calculated index
- call to __kmalloc_noprof() for non-constant sizes otherwise

If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
to handle it would be to #ifdef or otherwise compile out away the whole "if
__builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().

I just don't believe we would waste so much memory with caches the extra
sizes for sizes between 16K and 128K, so would do that suggestion only if
proven wrong. But I wouldn't mind it that much if you chose it right away.
The solution earlier in this thread to patch __kmalloc_index() would be
worse than either of those two alternatives though.

> 
>> 
>>> I'm happy to make this change if you're certain it's the right approach; please
>>> confirm.
>> 
>> Yes it's much better option than breaking the build-time-constant part of
>> kmalloc_noprof().
>> 
>>>>
>>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the
>>>> cache size is max 64k and not 128k but that should be probably evaluated
>>>> separately from this series.
>>>
>>> I'm inferring from this that perhaps there is a memory cost with having the
>>> higher orders defined but unused.
>> 
>> Yeah as per above, should not be too large and we could tune it down if
>> necessary.
>> 
>>> Thanks,
>>> Ryan
>>>
>>>>
>>>> Vlastimil
>>>>
>>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c
>>>>>
>>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com>
>>>>> ---
>>>>>
>>>>> Ryan,
>>>>>
>>>>> Please consider incorporating this fix or something similar into your
>>>>> mm patch in the boot-time pages size patches.
>>>>>
>>>>>   include/linux/slab.h | 3 ++-
>>>>>   1 file changed, 2 insertions(+), 1 deletion(-)
>>>>>
>>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h
>>>>> index 9848296ca6ba..a4c7507ab8ec 100644
>>>>> --- a/include/linux/slab.h
>>>>> +++ b/include/linux/slab.h
>>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size,
>>>>>   	if (size <= 1024 * 1024) return 20;
>>>>>   	if (size <=  2 * 1024 * 1024) return 21;
>>>>>   
>>>>> -	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>> +	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
>>>>> +	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
>>>>>   		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
>>>>>   	else
>>>>>   		BUG();
>>>>
>>>
>> 
>
Vlastimil Babka Nov. 26, 2024, 3:27 p.m. UTC | #9
On 11/26/24 16:09, Vlastimil Babka wrote:
> On 11/26/24 15:53, Ryan Roberts wrote:
>> On 26/11/2024 12:36, Vlastimil Babka wrote:
>>> On 11/26/24 13:18, Ryan Roberts wrote:
>>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>>> occur on a path that won't be reached.
>>>>>
>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>>> would be better to keep it that way.
>>>>>
>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>>
>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>>> (if not, please report it :)
>>>>
>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>>> assumed that would be costly in terms of extra memory allocated for those higher
>>>> order buckets that will never be used. But I have no idea how SLUB works in
>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>>> an issue in practice?
>>> 
>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>>> some overhead with the management structures (struct kmem_cache etc) but
>>> much smaller.
>>> To be completely honest, some extra overhead might come to be when the slabs
>>> are allocated ans later the user frees those allocations. kmalloc_large()
>>> wwould return them immediately, while a regular kmem_cache will keep one or
>>> more per cpu for reuse. But if that becomes a visible problem we can tune
>>> those caches to discard slabs more aggressively.
>> 
>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
>> I have a slightly better understanding:
>> 
>> void *kmalloc_noprof(size_t size, gfp_t flags)
>> {
>> 	if (__builtin_constant_p(size) && size) {
>> 		
>> 		if (size > KMALLOC_MAX_CACHE_SIZE)
>> 			return __kmalloc_large_noprof(size, flags); <<< (1)
>> 
>> 		index = kmalloc_index(size);
>> 		return __kmalloc_cache_noprof(...);   <<< (2)
>> 	}
>> 	return __kmalloc_noprof(size, flags);   <<< (3)
>> }
>> 
>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
>> resolving either to a call to (1) or (2), decided at compile time. If
>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
>> need to be kept in the function.
>> 
>> But intuatively, I would have guessed that given the choice between the overhead
>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
>> sizes between 16K and 128K, then the runtime conditional would be preferable. I
>> would guess that quite a bit of memory could get tied up in those caches?
>> 
>> Why is your preference the opposite? What am I not understanding?
> 
> +CC more slab people.
> 
> So the above is an inline function, but constructed in a way that it should,
> without further inline code, become
> - a call to __kmalloc_large_noprof() for build-time constant size larger
> than KMALLOC_MAX_CACHE_SIZE
> - a call to __kmalloc_cache_noprof() for build-time constant size smaller
> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
> compile-time calculated index
> - call to __kmalloc_noprof() for non-constant sizes otherwise
> 
> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
> to handle it would be to #ifdef or otherwise compile out away the whole "if
> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().

Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX
behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and
additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where
build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a
compile-time test) is redirected to __kmalloc_noprof() for a run-time test.

That seems like the optimum solution :)

> I just don't believe we would waste so much memory with caches the extra
> sizes for sizes between 16K and 128K, so would do that suggestion only if
> proven wrong. But I wouldn't mind it that much if you chose it right away.
> The solution earlier in this thread to patch __kmalloc_index() would be
> worse than either of those two alternatives though.
Ryan Roberts Nov. 26, 2024, 3:33 p.m. UTC | #10
On 26/11/2024 15:27, Vlastimil Babka wrote:
> On 11/26/24 16:09, Vlastimil Babka wrote:
>> On 11/26/24 15:53, Ryan Roberts wrote:
>>> On 26/11/2024 12:36, Vlastimil Babka wrote:
>>>> On 11/26/24 13:18, Ryan Roberts wrote:
>>>>> On 14/11/2024 10:09, Vlastimil Babka wrote:
>>>>>> On 11/1/24 21:16, Dave Kleikamp wrote:
>>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE
>>>>>>> is no longer optimized out with a constant size, so a build bug may
>>>>>>> occur on a path that won't be reached.
>>>>>>
>>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of
>>>>>> kmalloc_noprof() really expects things to resolve at compile time and it
>>>>>> would be better to keep it that way.
>>>>>>
>>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on
>>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing
>>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks.
>>>>>>
>>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would
>>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine
>>>>>> (if not, please report it :)
>>>>>
>>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K
>>>>> whereas before it only supported up to 8K. I was trying to avoid that since I
>>>>> assumed that would be costly in terms of extra memory allocated for those higher
>>>>> order buckets that will never be used. But I have no idea how SLUB works in
>>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see
>>>>> an issue in practice?
>>>>
>>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be
>>>> some overhead with the management structures (struct kmem_cache etc) but
>>>> much smaller.
>>>> To be completely honest, some extra overhead might come to be when the slabs
>>>> are allocated ans later the user frees those allocations. kmalloc_large()
>>>> wwould return them immediately, while a regular kmem_cache will keep one or
>>>> more per cpu for reuse. But if that becomes a visible problem we can tune
>>>> those caches to discard slabs more aggressively.
>>>
>>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel
>>> I have a slightly better understanding:
>>>
>>> void *kmalloc_noprof(size_t size, gfp_t flags)
>>> {
>>> 	if (__builtin_constant_p(size) && size) {
>>> 		
>>> 		if (size > KMALLOC_MAX_CACHE_SIZE)
>>> 			return __kmalloc_large_noprof(size, flags); <<< (1)
>>>
>>> 		index = kmalloc_index(size);
>>> 		return __kmalloc_cache_noprof(...);   <<< (2)
>>> 	}
>>> 	return __kmalloc_noprof(size, flags);   <<< (3)
>>> }
>>>
>>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this
>>> resolving either to a call to (1) or (2), decided at compile time. If
>>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional
>>> need to be kept in the function.
>>>
>>> But intuatively, I would have guessed that given the choice between the overhead
>>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra
>>> sizes between 16K and 128K, then the runtime conditional would be preferable. I
>>> would guess that quite a bit of memory could get tied up in those caches?
>>>
>>> Why is your preference the opposite? What am I not understanding?
>>
>> +CC more slab people.
>>
>> So the above is an inline function, but constructed in a way that it should,
>> without further inline code, become
>> - a call to __kmalloc_large_noprof() for build-time constant size larger
>> than KMALLOC_MAX_CACHE_SIZE
>> - a call to __kmalloc_cache_noprof() for build-time constant size smaller
>> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with
>> compile-time calculated index
>> - call to __kmalloc_noprof() for non-constant sizes otherwise
>>
>> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way
>> to handle it would be to #ifdef or otherwise compile out away the whole "if
>> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so
>> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading
>> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof().
> 
> Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX
> behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and
> additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where
> build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a
> compile-time test) is redirected to __kmalloc_noprof() for a run-time test.
> 
> That seems like the optimum solution :)

Yes; that feels like the better approach to me. I'll implement this by default
unless anyone else objects.

> 
>> I just don't believe we would waste so much memory with caches the extra
>> sizes for sizes between 16K and 128K, so would do that suggestion only if
>> proven wrong. But I wouldn't mind it that much if you chose it right away.
>> The solution earlier in this thread to patch __kmalloc_index() would be
>> worse than either of those two alternatives though.
> 
>
diff mbox series

Patch

diff --git a/include/linux/slab.h b/include/linux/slab.h
index 9848296ca6ba..a4c7507ab8ec 100644
--- a/include/linux/slab.h
+++ b/include/linux/slab.h
@@ -685,7 +685,8 @@  static __always_inline unsigned int __kmalloc_index(size_t size,
  	if (size <= 1024 * 1024) return 20;
  	if (size <=  2 * 1024 * 1024) return 21;
  
-	if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
+	if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) &&
+	    !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant)
  		BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()");
  	else
  		BUG();