Message ID | 44312f4a-8b9c-49ce-9277-5873a94ca1bb@oracle.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | [RFC] mm/slab: Avoid build bug for calls to kmalloc with a large constant | expand |
On 01/11/2024 20:16, Dave Kleikamp wrote: > When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE > is no longer optimized out with a constant size, so a build bug may > occur on a path that won't be reached. > > Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c > > Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> > --- > > Ryan, > > Please consider incorporating this fix or something similar into your > mm patch in the boot-time pages size patches. > > include/linux/slab.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 9848296ca6ba..a4c7507ab8ec 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t > size, > if (size <= 1024 * 1024) return 20; > if (size <= 2 * 1024 * 1024) return 21; > > - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) > + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && Thanks for the patch! I think this may be better as: if (PAGE_SHIFT_MIN == PAGE_SHIFT_MAX && Since that is independent of the architecture. Your approach wouldn't work if another arch wanted to enable boot time page size, or if arm64 dropped the Kconfig because it decided only boot time page size will be supported in future. Thanks, Ryan > + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) > BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); > else > BUG();
On 11/6/24 5:44AM, Ryan Roberts wrote: > On 01/11/2024 20:16, Dave Kleikamp wrote: >> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >> is no longer optimized out with a constant size, so a build bug may >> occur on a path that won't be reached. >> >> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >> >> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >> --- >> >> Ryan, >> >> Please consider incorporating this fix or something similar into your >> mm patch in the boot-time pages size patches. >> >> include/linux/slab.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/slab.h b/include/linux/slab.h >> index 9848296ca6ba..a4c7507ab8ec 100644 >> --- a/include/linux/slab.h >> +++ b/include/linux/slab.h >> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t >> size, >> if (size <= 1024 * 1024) return 20; >> if (size <= 2 * 1024 * 1024) return 21; >> >> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && > > Thanks for the patch! I think this may be better as: > > if (PAGE_SHIFT_MIN == PAGE_SHIFT_MAX && > > Since that is independent of the architecture. Your approach wouldn't work if > another arch wanted to enable boot time page size, or if arm64 dropped the > Kconfig because it decided only boot time page size will be supported in future. Absolutely. I may be sending some more. I haven't gotten to JFS yet, but that one is my responsibility. Thanks, Shaggy > > Thanks, > Ryan > >> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >> else >> BUG(); >
On 11/1/24 21:16, Dave Kleikamp wrote: > When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE > is no longer optimized out with a constant size, so a build bug may > occur on a path that won't be reached. That's rather unfortunate, the __builtin_constant_p(size) part of kmalloc_noprof() really expects things to resolve at compile time and it would be better to keep it that way. I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on PAGE_SHIFT_MAX and kept it constant, instead of introducing KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. So if the kernel was built to support 4k to 64k, but booted as 4k, it would still create and use kmalloc caches up to 128k. SLUB should handle that fine (if not, please report it :) Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the cache size is max 64k and not 128k but that should be probably evaluated separately from this series. Vlastimil > Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c > > Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> > --- > > Ryan, > > Please consider incorporating this fix or something similar into your > mm patch in the boot-time pages size patches. > > include/linux/slab.h | 3 ++- > 1 file changed, 2 insertions(+), 1 deletion(-) > > diff --git a/include/linux/slab.h b/include/linux/slab.h > index 9848296ca6ba..a4c7507ab8ec 100644 > --- a/include/linux/slab.h > +++ b/include/linux/slab.h > @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, > if (size <= 1024 * 1024) return 20; > if (size <= 2 * 1024 * 1024) return 21; > > - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) > + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && > + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) > BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); > else > BUG();
On 14/11/2024 10:09, Vlastimil Babka wrote: > On 11/1/24 21:16, Dave Kleikamp wrote: >> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >> is no longer optimized out with a constant size, so a build bug may >> occur on a path that won't be reached. > > That's rather unfortunate, the __builtin_constant_p(size) part of > kmalloc_noprof() really expects things to resolve at compile time and it > would be better to keep it that way. > > I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on > PAGE_SHIFT_MAX and kept it constant, instead of introducing > KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. > > So if the kernel was built to support 4k to 64k, but booted as 4k, it would > still create and use kmalloc caches up to 128k. SLUB should handle that fine > (if not, please report it :) So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K whereas before it only supported up to 8K. I was trying to avoid that since I assumed that would be costly in terms of extra memory allocated for those higher order buckets that will never be used. But I have no idea how SLUB works in practice. Perhaps memory for the cache is only lazily allocated so we won't see an issue in practice? I'm happy to make this change if you're certain it's the right approach; please confirm. > > Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the > cache size is max 64k and not 128k but that should be probably evaluated > separately from this series. I'm inferring from this that perhaps there is a memory cost with having the higher orders defined but unused. Thanks, Ryan > > Vlastimil > >> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >> >> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >> --- >> >> Ryan, >> >> Please consider incorporating this fix or something similar into your >> mm patch in the boot-time pages size patches. >> >> include/linux/slab.h | 3 ++- >> 1 file changed, 2 insertions(+), 1 deletion(-) >> >> diff --git a/include/linux/slab.h b/include/linux/slab.h >> index 9848296ca6ba..a4c7507ab8ec 100644 >> --- a/include/linux/slab.h >> +++ b/include/linux/slab.h >> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >> if (size <= 1024 * 1024) return 20; >> if (size <= 2 * 1024 * 1024) return 21; >> >> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >> else >> BUG(); >
On 11/26/24 13:18, Ryan Roberts wrote: > On 14/11/2024 10:09, Vlastimil Babka wrote: >> On 11/1/24 21:16, Dave Kleikamp wrote: >>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>> is no longer optimized out with a constant size, so a build bug may >>> occur on a path that won't be reached. >> >> That's rather unfortunate, the __builtin_constant_p(size) part of >> kmalloc_noprof() really expects things to resolve at compile time and it >> would be better to keep it that way. >> >> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >> PAGE_SHIFT_MAX and kept it constant, instead of introducing >> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >> >> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >> still create and use kmalloc caches up to 128k. SLUB should handle that fine >> (if not, please report it :) > > So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K > whereas before it only supported up to 8K. I was trying to avoid that since I > assumed that would be costly in terms of extra memory allocated for those higher > order buckets that will never be used. But I have no idea how SLUB works in > practice. Perhaps memory for the cache is only lazily allocated so we won't see > an issue in practice? Yes the e.g. 128k slabs themselves will be lazily allocated. There will be some overhead with the management structures (struct kmem_cache etc) but much smaller. To be completely honest, some extra overhead might come to be when the slabs are allocated ans later the user frees those allocations. kmalloc_large() wwould return them immediately, while a regular kmem_cache will keep one or more per cpu for reuse. But if that becomes a visible problem we can tune those caches to discard slabs more aggressively. > I'm happy to make this change if you're certain it's the right approach; please > confirm. Yes it's much better option than breaking the build-time-constant part of kmalloc_noprof(). >> >> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the >> cache size is max 64k and not 128k but that should be probably evaluated >> separately from this series. > > I'm inferring from this that perhaps there is a memory cost with having the > higher orders defined but unused. Yeah as per above, should not be too large and we could tune it down if necessary. > Thanks, > Ryan > >> >> Vlastimil >> >>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >>> >>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >>> --- >>> >>> Ryan, >>> >>> Please consider incorporating this fix or something similar into your >>> mm patch in the boot-time pages size patches. >>> >>> include/linux/slab.h | 3 ++- >>> 1 file changed, 2 insertions(+), 1 deletion(-) >>> >>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>> index 9848296ca6ba..a4c7507ab8ec 100644 >>> --- a/include/linux/slab.h >>> +++ b/include/linux/slab.h >>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >>> if (size <= 1024 * 1024) return 20; >>> if (size <= 2 * 1024 * 1024) return 21; >>> >>> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >>> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >>> else >>> BUG(); >> >
On 26/11/2024 12:36, Vlastimil Babka wrote: > On 11/26/24 13:18, Ryan Roberts wrote: >> On 14/11/2024 10:09, Vlastimil Babka wrote: >>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>> is no longer optimized out with a constant size, so a build bug may >>>> occur on a path that won't be reached. >>> >>> That's rather unfortunate, the __builtin_constant_p(size) part of >>> kmalloc_noprof() really expects things to resolve at compile time and it >>> would be better to keep it that way. >>> >>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>> >>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>> (if not, please report it :) >> >> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >> whereas before it only supported up to 8K. I was trying to avoid that since I >> assumed that would be costly in terms of extra memory allocated for those higher >> order buckets that will never be used. But I have no idea how SLUB works in >> practice. Perhaps memory for the cache is only lazily allocated so we won't see >> an issue in practice? > > Yes the e.g. 128k slabs themselves will be lazily allocated. There will be > some overhead with the management structures (struct kmem_cache etc) but > much smaller. > To be completely honest, some extra overhead might come to be when the slabs > are allocated ans later the user frees those allocations. kmalloc_large() > wwould return them immediately, while a regular kmem_cache will keep one or > more per cpu for reuse. But if that becomes a visible problem we can tune > those caches to discard slabs more aggressively. > >> I'm happy to make this change if you're certain it's the right approach; please >> confirm. > > Yes it's much better option than breaking the build-time-constant part of > kmalloc_noprof(). OK, I'll take this approach as you suggest. Thanks, Ryan > >>> >>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the >>> cache size is max 64k and not 128k but that should be probably evaluated >>> separately from this series. >> >> I'm inferring from this that perhaps there is a memory cost with having the >> higher orders defined but unused. > > Yeah as per above, should not be too large and we could tune it down if > necessary. > >> Thanks, >> Ryan >> >>> >>> Vlastimil >>> >>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >>>> >>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >>>> --- >>>> >>>> Ryan, >>>> >>>> Please consider incorporating this fix or something similar into your >>>> mm patch in the boot-time pages size patches. >>>> >>>> include/linux/slab.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>>> index 9848296ca6ba..a4c7507ab8ec 100644 >>>> --- a/include/linux/slab.h >>>> +++ b/include/linux/slab.h >>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >>>> if (size <= 1024 * 1024) return 20; >>>> if (size <= 2 * 1024 * 1024) return 21; >>>> >>>> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >>>> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >>>> else >>>> BUG(); >>> >> >
On 26/11/2024 12:36, Vlastimil Babka wrote: > On 11/26/24 13:18, Ryan Roberts wrote: >> On 14/11/2024 10:09, Vlastimil Babka wrote: >>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>> is no longer optimized out with a constant size, so a build bug may >>>> occur on a path that won't be reached. >>> >>> That's rather unfortunate, the __builtin_constant_p(size) part of >>> kmalloc_noprof() really expects things to resolve at compile time and it >>> would be better to keep it that way. >>> >>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>> >>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>> (if not, please report it :) >> >> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >> whereas before it only supported up to 8K. I was trying to avoid that since I >> assumed that would be costly in terms of extra memory allocated for those higher >> order buckets that will never be used. But I have no idea how SLUB works in >> practice. Perhaps memory for the cache is only lazily allocated so we won't see >> an issue in practice? > > Yes the e.g. 128k slabs themselves will be lazily allocated. There will be > some overhead with the management structures (struct kmem_cache etc) but > much smaller. > To be completely honest, some extra overhead might come to be when the slabs > are allocated ans later the user frees those allocations. kmalloc_large() > wwould return them immediately, while a regular kmem_cache will keep one or > more per cpu for reuse. But if that becomes a visible problem we can tune > those caches to discard slabs more aggressively. Sorry to keep pushing on this, now that I've actually looked at the code, I feel I have a slightly better understanding: void *kmalloc_noprof(size_t size, gfp_t flags) { if (__builtin_constant_p(size) && size) { if (size > KMALLOC_MAX_CACHE_SIZE) return __kmalloc_large_noprof(size, flags); <<< (1) index = kmalloc_index(size); return __kmalloc_cache_noprof(...); <<< (2) } return __kmalloc_noprof(size, flags); <<< (3) } So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this resolving either to a call to (1) or (2), decided at compile time. If KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional need to be kept in the function. But intuatively, I would have guessed that given the choice between the overhead of keeping that runtime conditional vs keeping per-cpu slab caches for extra sizes between 16K and 128K, then the runtime conditional would be preferable. I would guess that quite a bit of memory could get tied up in those caches? Why is your preference the opposite? What am I not understanding? > >> I'm happy to make this change if you're certain it's the right approach; please >> confirm. > > Yes it's much better option than breaking the build-time-constant part of > kmalloc_noprof(). > >>> >>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the >>> cache size is max 64k and not 128k but that should be probably evaluated >>> separately from this series. >> >> I'm inferring from this that perhaps there is a memory cost with having the >> higher orders defined but unused. > > Yeah as per above, should not be too large and we could tune it down if > necessary. > >> Thanks, >> Ryan >> >>> >>> Vlastimil >>> >>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >>>> >>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >>>> --- >>>> >>>> Ryan, >>>> >>>> Please consider incorporating this fix or something similar into your >>>> mm patch in the boot-time pages size patches. >>>> >>>> include/linux/slab.h | 3 ++- >>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>> >>>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>>> index 9848296ca6ba..a4c7507ab8ec 100644 >>>> --- a/include/linux/slab.h >>>> +++ b/include/linux/slab.h >>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >>>> if (size <= 1024 * 1024) return 20; >>>> if (size <= 2 * 1024 * 1024) return 21; >>>> >>>> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >>>> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >>>> else >>>> BUG(); >>> >> >
On 11/26/24 15:53, Ryan Roberts wrote: > On 26/11/2024 12:36, Vlastimil Babka wrote: >> On 11/26/24 13:18, Ryan Roberts wrote: >>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>> is no longer optimized out with a constant size, so a build bug may >>>>> occur on a path that won't be reached. >>>> >>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>> would be better to keep it that way. >>>> >>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>> >>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>> (if not, please report it :) >>> >>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>> whereas before it only supported up to 8K. I was trying to avoid that since I >>> assumed that would be costly in terms of extra memory allocated for those higher >>> order buckets that will never be used. But I have no idea how SLUB works in >>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>> an issue in practice? >> >> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >> some overhead with the management structures (struct kmem_cache etc) but >> much smaller. >> To be completely honest, some extra overhead might come to be when the slabs >> are allocated ans later the user frees those allocations. kmalloc_large() >> wwould return them immediately, while a regular kmem_cache will keep one or >> more per cpu for reuse. But if that becomes a visible problem we can tune >> those caches to discard slabs more aggressively. > > Sorry to keep pushing on this, now that I've actually looked at the code, I feel > I have a slightly better understanding: > > void *kmalloc_noprof(size_t size, gfp_t flags) > { > if (__builtin_constant_p(size) && size) { > > if (size > KMALLOC_MAX_CACHE_SIZE) > return __kmalloc_large_noprof(size, flags); <<< (1) > > index = kmalloc_index(size); > return __kmalloc_cache_noprof(...); <<< (2) > } > return __kmalloc_noprof(size, flags); <<< (3) > } > > So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this > resolving either to a call to (1) or (2), decided at compile time. If > KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional > need to be kept in the function. > > But intuatively, I would have guessed that given the choice between the overhead > of keeping that runtime conditional vs keeping per-cpu slab caches for extra > sizes between 16K and 128K, then the runtime conditional would be preferable. I > would guess that quite a bit of memory could get tied up in those caches? > > Why is your preference the opposite? What am I not understanding? +CC more slab people. So the above is an inline function, but constructed in a way that it should, without further inline code, become - a call to __kmalloc_large_noprof() for build-time constant size larger than KMALLOC_MAX_CACHE_SIZE - a call to __kmalloc_cache_noprof() for build-time constant size smaller than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with compile-time calculated index - call to __kmalloc_noprof() for non-constant sizes otherwise If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way to handle it would be to #ifdef or otherwise compile out away the whole "if __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). I just don't believe we would waste so much memory with caches the extra sizes for sizes between 16K and 128K, so would do that suggestion only if proven wrong. But I wouldn't mind it that much if you chose it right away. The solution earlier in this thread to patch __kmalloc_index() would be worse than either of those two alternatives though. > >> >>> I'm happy to make this change if you're certain it's the right approach; please >>> confirm. >> >> Yes it's much better option than breaking the build-time-constant part of >> kmalloc_noprof(). >> >>>> >>>> Maybe we could also stop adding + 1 to PAGE_SHIFT_MAX if it's >=64k, so the >>>> cache size is max 64k and not 128k but that should be probably evaluated >>>> separately from this series. >>> >>> I'm inferring from this that perhaps there is a memory cost with having the >>> higher orders defined but unused. >> >> Yeah as per above, should not be too large and we could tune it down if >> necessary. >> >>> Thanks, >>> Ryan >>> >>>> >>>> Vlastimil >>>> >>>>> Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c >>>>> >>>>> Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> >>>>> --- >>>>> >>>>> Ryan, >>>>> >>>>> Please consider incorporating this fix or something similar into your >>>>> mm patch in the boot-time pages size patches. >>>>> >>>>> include/linux/slab.h | 3 ++- >>>>> 1 file changed, 2 insertions(+), 1 deletion(-) >>>>> >>>>> diff --git a/include/linux/slab.h b/include/linux/slab.h >>>>> index 9848296ca6ba..a4c7507ab8ec 100644 >>>>> --- a/include/linux/slab.h >>>>> +++ b/include/linux/slab.h >>>>> @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, >>>>> if (size <= 1024 * 1024) return 20; >>>>> if (size <= 2 * 1024 * 1024) return 21; >>>>> >>>>> - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>>> + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && >>>>> + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) >>>>> BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); >>>>> else >>>>> BUG(); >>>> >>> >> >
On 11/26/24 16:09, Vlastimil Babka wrote: > On 11/26/24 15:53, Ryan Roberts wrote: >> On 26/11/2024 12:36, Vlastimil Babka wrote: >>> On 11/26/24 13:18, Ryan Roberts wrote: >>>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>>> is no longer optimized out with a constant size, so a build bug may >>>>>> occur on a path that won't be reached. >>>>> >>>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>>> would be better to keep it that way. >>>>> >>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>>> >>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>>> (if not, please report it :) >>>> >>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>>> whereas before it only supported up to 8K. I was trying to avoid that since I >>>> assumed that would be costly in terms of extra memory allocated for those higher >>>> order buckets that will never be used. But I have no idea how SLUB works in >>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>>> an issue in practice? >>> >>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >>> some overhead with the management structures (struct kmem_cache etc) but >>> much smaller. >>> To be completely honest, some extra overhead might come to be when the slabs >>> are allocated ans later the user frees those allocations. kmalloc_large() >>> wwould return them immediately, while a regular kmem_cache will keep one or >>> more per cpu for reuse. But if that becomes a visible problem we can tune >>> those caches to discard slabs more aggressively. >> >> Sorry to keep pushing on this, now that I've actually looked at the code, I feel >> I have a slightly better understanding: >> >> void *kmalloc_noprof(size_t size, gfp_t flags) >> { >> if (__builtin_constant_p(size) && size) { >> >> if (size > KMALLOC_MAX_CACHE_SIZE) >> return __kmalloc_large_noprof(size, flags); <<< (1) >> >> index = kmalloc_index(size); >> return __kmalloc_cache_noprof(...); <<< (2) >> } >> return __kmalloc_noprof(size, flags); <<< (3) >> } >> >> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this >> resolving either to a call to (1) or (2), decided at compile time. If >> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional >> need to be kept in the function. >> >> But intuatively, I would have guessed that given the choice between the overhead >> of keeping that runtime conditional vs keeping per-cpu slab caches for extra >> sizes between 16K and 128K, then the runtime conditional would be preferable. I >> would guess that quite a bit of memory could get tied up in those caches? >> >> Why is your preference the opposite? What am I not understanding? > > +CC more slab people. > > So the above is an inline function, but constructed in a way that it should, > without further inline code, become > - a call to __kmalloc_large_noprof() for build-time constant size larger > than KMALLOC_MAX_CACHE_SIZE > - a call to __kmalloc_cache_noprof() for build-time constant size smaller > than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with > compile-time calculated index > - call to __kmalloc_noprof() for non-constant sizes otherwise > > If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way > to handle it would be to #ifdef or otherwise compile out away the whole "if > __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so > we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading > to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a compile-time test) is redirected to __kmalloc_noprof() for a run-time test. That seems like the optimum solution :) > I just don't believe we would waste so much memory with caches the extra > sizes for sizes between 16K and 128K, so would do that suggestion only if > proven wrong. But I wouldn't mind it that much if you chose it right away. > The solution earlier in this thread to patch __kmalloc_index() would be > worse than either of those two alternatives though.
On 26/11/2024 15:27, Vlastimil Babka wrote: > On 11/26/24 16:09, Vlastimil Babka wrote: >> On 11/26/24 15:53, Ryan Roberts wrote: >>> On 26/11/2024 12:36, Vlastimil Babka wrote: >>>> On 11/26/24 13:18, Ryan Roberts wrote: >>>>> On 14/11/2024 10:09, Vlastimil Babka wrote: >>>>>> On 11/1/24 21:16, Dave Kleikamp wrote: >>>>>>> When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE >>>>>>> is no longer optimized out with a constant size, so a build bug may >>>>>>> occur on a path that won't be reached. >>>>>> >>>>>> That's rather unfortunate, the __builtin_constant_p(size) part of >>>>>> kmalloc_noprof() really expects things to resolve at compile time and it >>>>>> would be better to keep it that way. >>>>>> >>>>>> I think it would be better if we based KMALLOC_MAX_CACHE_SIZE itself on >>>>>> PAGE_SHIFT_MAX and kept it constant, instead of introducing >>>>>> KMALLOC_SHIFT_HIGH_MAX only for some sanity checks. >>>>>> >>>>>> So if the kernel was built to support 4k to 64k, but booted as 4k, it would >>>>>> still create and use kmalloc caches up to 128k. SLUB should handle that fine >>>>>> (if not, please report it :) >>>>> >>>>> So when PAGE_SIZE_MAX=64K and PAGE_SIZE=4K, kmalloc will support up to 128K >>>>> whereas before it only supported up to 8K. I was trying to avoid that since I >>>>> assumed that would be costly in terms of extra memory allocated for those higher >>>>> order buckets that will never be used. But I have no idea how SLUB works in >>>>> practice. Perhaps memory for the cache is only lazily allocated so we won't see >>>>> an issue in practice? >>>> >>>> Yes the e.g. 128k slabs themselves will be lazily allocated. There will be >>>> some overhead with the management structures (struct kmem_cache etc) but >>>> much smaller. >>>> To be completely honest, some extra overhead might come to be when the slabs >>>> are allocated ans later the user frees those allocations. kmalloc_large() >>>> wwould return them immediately, while a regular kmem_cache will keep one or >>>> more per cpu for reuse. But if that becomes a visible problem we can tune >>>> those caches to discard slabs more aggressively. >>> >>> Sorry to keep pushing on this, now that I've actually looked at the code, I feel >>> I have a slightly better understanding: >>> >>> void *kmalloc_noprof(size_t size, gfp_t flags) >>> { >>> if (__builtin_constant_p(size) && size) { >>> >>> if (size > KMALLOC_MAX_CACHE_SIZE) >>> return __kmalloc_large_noprof(size, flags); <<< (1) >>> >>> index = kmalloc_index(size); >>> return __kmalloc_cache_noprof(...); <<< (2) >>> } >>> return __kmalloc_noprof(size, flags); <<< (3) >>> } >>> >>> So if size and KMALLOC_MAX_CACHE_SIZE are constant, we end up with this >>> resolving either to a call to (1) or (2), decided at compile time. If >>> KMALLOC_MAX_CACHE_SIZE is not constant, (1), (2) and the runtime conditional >>> need to be kept in the function. >>> >>> But intuatively, I would have guessed that given the choice between the overhead >>> of keeping that runtime conditional vs keeping per-cpu slab caches for extra >>> sizes between 16K and 128K, then the runtime conditional would be preferable. I >>> would guess that quite a bit of memory could get tied up in those caches? >>> >>> Why is your preference the opposite? What am I not understanding? >> >> +CC more slab people. >> >> So the above is an inline function, but constructed in a way that it should, >> without further inline code, become >> - a call to __kmalloc_large_noprof() for build-time constant size larger >> than KMALLOC_MAX_CACHE_SIZE >> - a call to __kmalloc_cache_noprof() for build-time constant size smaller >> than KMALLOC_MAX_CACHE_SIZE, where the cache is picked from an array with >> compile-time calculated index >> - call to __kmalloc_noprof() for non-constant sizes otherwise >> >> If KMALLOC_MAX_CACHE_SIZE stops being build-time constant, the sensible way >> to handle it would be to #ifdef or otherwise compile out away the whole "if >> __builtin_constant_p(size)" part and just call __kmalloc_noprof() always, so >> we don't blow the inline paths with a KMALLOC_MAX_CACHE_SIZE check leading >> to choice between calling __kmalloc_large_noprof() or __kmalloc_cache_noprof(). > > Or maybe we could have PAGE_SIZE_MAX derived KMALLOC_MAX_CACHE_SIZE_MAX > behave as the code above currently does with KMALLOC_MAX_CACHE_SIZE, and > additionally have PAGE_SIZE_MIN derived KMALLOC_MAX_CACHE_SIZE_MIN, where > build-time-constant size larger than KMALLOC_MAX_CACHE_SIZE_MIN (which is a > compile-time test) is redirected to __kmalloc_noprof() for a run-time test. > > That seems like the optimum solution :) Yes; that feels like the better approach to me. I'll implement this by default unless anyone else objects. > >> I just don't believe we would waste so much memory with caches the extra >> sizes for sizes between 16K and 128K, so would do that suggestion only if >> proven wrong. But I wouldn't mind it that much if you chose it right away. >> The solution earlier in this thread to patch __kmalloc_index() would be >> worse than either of those two alternatives though. > >
diff --git a/include/linux/slab.h b/include/linux/slab.h index 9848296ca6ba..a4c7507ab8ec 100644 --- a/include/linux/slab.h +++ b/include/linux/slab.h @@ -685,7 +685,8 @@ static __always_inline unsigned int __kmalloc_index(size_t size, if (size <= 1024 * 1024) return 20; if (size <= 2 * 1024 * 1024) return 21; - if (!IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) + if (!IS_ENABLED(CONFIG_ARM64_BOOT_TIME_PAGE_SIZE) && + !IS_ENABLED(CONFIG_PROFILE_ALL_BRANCHES) && size_is_constant) BUILD_BUG_ON_MSG(1, "unexpected size in kmalloc_index()"); else BUG();
When boot-time page size is enabled, the test against KMALLOC_MAX_CACHE_SIZE is no longer optimized out with a constant size, so a build bug may occur on a path that won't be reached. Found compiling drivers/net/ethernet/qlogic/qed/qed_sriov.c Signed-off-by: Dave Kleikamp <dave.kleikamp@oracle.com> --- Ryan, Please consider incorporating this fix or something similar into your mm patch in the boot-time pages size patches. include/linux/slab.h | 3 ++- 1 file changed, 2 insertions(+), 1 deletion(-)