Message ID | 20190319211108.15495-1-vbabka@suse.cz (mailing list archive) |
---|---|
Headers | show |
Series | guarantee natural alignment for kmalloc() | expand |
On Tue, 19 Mar 2019, Vlastimil Babka wrote: > The recent thread [1] inspired me to look into guaranteeing alignment for > kmalloc() for power-of-two sizes. Turns out it's not difficult and in most > configuration nothing really changes as it happens implicitly. More details in > the first patch. If we agree we want to do this, I will see where to update > documentation and perhaps if there are any workarounds in the tree that can be > converted to plain kmalloc() afterwards. This means that the alignments are no longer uniform for all kmalloc caches and we get back to code making all sorts of assumptions about kmalloc alignments. Currently all kmalloc objects are aligned to KMALLOC_MIN_ALIGN. That will no longer be the case and alignments will become inconsistent. I think its valuable that alignment requirements need to be explicitly requested. Lets add an array of power of two aligned kmalloc caches if that is really necessary. Add some GFP_XXX flag to kmalloc to make it ^2 aligned maybe?
On Wed, 20 Mar 2019, Christopher Lameter wrote: > > The recent thread [1] inspired me to look into guaranteeing alignment for > > kmalloc() for power-of-two sizes. Turns out it's not difficult and in most > > configuration nothing really changes as it happens implicitly. More details in > > the first patch. If we agree we want to do this, I will see where to update > > documentation and perhaps if there are any workarounds in the tree that can be > > converted to plain kmalloc() afterwards. > > This means that the alignments are no longer uniform for all kmalloc > caches and we get back to code making all sorts of assumptions about > kmalloc alignments. > > Currently all kmalloc objects are aligned to KMALLOC_MIN_ALIGN. That will > no longer be the case and alignments will become inconsistent. > > I think its valuable that alignment requirements need to be explicitly > requested. > > Lets add an array of power of two aligned kmalloc caches if that is really > necessary. Add some GFP_XXX flag to kmalloc to make it ^2 aligned maybe? > No objection, but I think the GFP flags should remain what they are for: to Get Free Pages. If we are to add additional flags to specify characteristics of slab objects, can we add a kmalloc_flags() variant that will take a new set of flags? SLAB_OBJ_ALIGN_POW2?
On 3/20/19 1:43 AM, Christopher Lameter wrote: > On Tue, 19 Mar 2019, Vlastimil Babka wrote: > >> The recent thread [1] inspired me to look into guaranteeing alignment for >> kmalloc() for power-of-two sizes. Turns out it's not difficult and in most >> configuration nothing really changes as it happens implicitly. More details in >> the first patch. If we agree we want to do this, I will see where to update >> documentation and perhaps if there are any workarounds in the tree that can be >> converted to plain kmalloc() afterwards. > > This means that the alignments are no longer uniform for all kmalloc > caches and we get back to code making all sorts of assumptions about > kmalloc alignments. Natural alignment to size is rather well defined, no? Would anyone ever assume a larger one, for what reason? It's now where some make assumptions (even unknowingly) for natural There are two 'odd' sizes 96 and 192, which will keep cacheline size alignment, would anyone really expect more than 64 bytes? > Currently all kmalloc objects are aligned to KMALLOC_MIN_ALIGN. That will > no longer be the case and alignments will become inconsistent. KMALLOC_MIN_ALIGN is still the minimum, but in practice it's larger which is not a problem. Also let me stress again that nothing really changes except for SLOB, and SLUB with debug options. The natural alignment for power-of-two sizes already happens as SLAB and SLUB both allocate objects starting on the page boundary. So people make assumptions based on that, and then break with SLOB, or SLUB with debug. This patch just prevents that breakage by guaranteeing those natural assumptions at all times. > I think its valuable that alignment requirements need to be explicitly > requested. That's still possible for named caches created by kmem_cache_create(). > Lets add an array of power of two aligned kmalloc caches if that is really > necessary. Add some GFP_XXX flag to kmalloc to make it ^2 aligned maybe? That's unnecessary and wasteful, as the existing caches are already aligned in the common configurations. Requiring a flag doesn't help with the implicit assumptions going wrong. I really don't think it needs to get more complicated than adjusting the uncommon configuration, as this patch does.
On Wed, 20 Mar 2019, Vlastimil Babka wrote: > > This means that the alignments are no longer uniform for all kmalloc > > caches and we get back to code making all sorts of assumptions about > > kmalloc alignments. > > Natural alignment to size is rather well defined, no? Would anyone ever > assume a larger one, for what reason? > It's now where some make assumptions (even unknowingly) for natural > There are two 'odd' sizes 96 and 192, which will keep cacheline size > alignment, would anyone really expect more than 64 bytes? I think one would expect one set of alighment for any kmalloc object. > > Currently all kmalloc objects are aligned to KMALLOC_MIN_ALIGN. That will > > no longer be the case and alignments will become inconsistent. > > KMALLOC_MIN_ALIGN is still the minimum, but in practice it's larger > which is not a problem. "In practice" refers to the current way that slab allocators arrange objects within the page. They are free to do otherwise if new ideas come up for object arrangements etc. The slab allocators already may have to store data in addition to the user accessible part (f.e. for RCU or ctor). The "natural alighnment" of a power of 2 cache is no longer as you expect for these cases. Debugging is not the only case where we extend the object. > Also let me stress again that nothing really changes except for SLOB, > and SLUB with debug options. The natural alignment for power-of-two > sizes already happens as SLAB and SLUB both allocate objects starting on > the page boundary. So people make assumptions based on that, and then > break with SLOB, or SLUB with debug. This patch just prevents that > breakage by guaranteeing those natural assumptions at all times. As explained before there is nothing "natural" here. Doing so restricts future features and creates a mess within the allocator of exceptions for debuggin etc etc (see what happened to SLAB). "Natural" is just a simplistic thought of a user how he would arrange power of 2 objects. These assumption should not be made but specified explicitly. > > I think its valuable that alignment requirements need to be explicitly > > requested. > > That's still possible for named caches created by kmem_cache_create(). So lets leave it as it is now then.
On Wed, Mar 20, 2019 at 09:48:47AM +0100, Vlastimil Babka wrote: > Natural alignment to size is rather well defined, no? Would anyone ever > assume a larger one, for what reason? > It's now where some make assumptions (even unknowingly) for natural > There are two 'odd' sizes 96 and 192, which will keep cacheline size > alignment, would anyone really expect more than 64 bytes? Presumably 96 will keep being aligned to 32 bytes, as aligning 96 to 64 just results in 128-byte allocations.
On 3/20/2019 7:53 PM, Matthew Wilcox wrote: > On Wed, Mar 20, 2019 at 09:48:47AM +0100, Vlastimil Babka wrote: >> Natural alignment to size is rather well defined, no? Would anyone ever >> assume a larger one, for what reason? >> It's now where some make assumptions (even unknowingly) for natural >> There are two 'odd' sizes 96 and 192, which will keep cacheline size >> alignment, would anyone really expect more than 64 bytes? > > Presumably 96 will keep being aligned to 32 bytes, as aligning 96 to 64 > just results in 128-byte allocations. Well, looks like that's what happens. This is with SLAB, but the alignment calculations should be common: slabinfo - version: 2.1 # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> kmalloc-96 2611 4896 128 32 1 : tunables 120 60 8 : slabdata 153 153 0 kmalloc-128 4798 5536 128 32 1 : tunables 120 60 8 : slabdata 173 173 0
On Wed, Mar 20, 2019 at 10:48:03PM +0100, Vlastimil Babka wrote: > On 3/20/2019 7:53 PM, Matthew Wilcox wrote: > > On Wed, Mar 20, 2019 at 09:48:47AM +0100, Vlastimil Babka wrote: > >> Natural alignment to size is rather well defined, no? Would anyone ever > >> assume a larger one, for what reason? > >> It's now where some make assumptions (even unknowingly) for natural > >> There are two 'odd' sizes 96 and 192, which will keep cacheline size > >> alignment, would anyone really expect more than 64 bytes? > > > > Presumably 96 will keep being aligned to 32 bytes, as aligning 96 to 64 > > just results in 128-byte allocations. > > Well, looks like that's what happens. This is with SLAB, but the alignment > calculations should be common: > > slabinfo - version: 2.1 > # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> > kmalloc-96 2611 4896 128 32 1 : tunables 120 60 8 : slabdata 153 153 0 > kmalloc-128 4798 5536 128 32 1 : tunables 120 60 8 : slabdata 173 173 0 Hmm. On my laptop, I see: kmalloc-96 28050 35364 96 42 1 : tunables 0 0 0 : slabdata 842 842 0 That'd take me from 842 * 4k pages to 1105 4k pages -- an extra megabyte of memory. This is running Debian's 4.19 kernel: # CONFIG_SLAB is not set CONFIG_SLUB=y # CONFIG_SLOB is not set CONFIG_SLAB_MERGE_DEFAULT=y CONFIG_SLAB_FREELIST_RANDOM=y CONFIG_SLAB_FREELIST_HARDENED=y CONFIG_SLUB_CPU_PARTIAL=y
On 3/21/19 3:23 AM, Matthew Wilcox wrote: > On Wed, Mar 20, 2019 at 10:48:03PM +0100, Vlastimil Babka wrote: >> >> Well, looks like that's what happens. This is with SLAB, but the alignment >> calculations should be common: >> >> slabinfo - version: 2.1 >> # name <active_objs> <num_objs> <objsize> <objperslab> <pagesperslab> : tunables <limit> <batchcount> <sharedfactor> : slabdata <active_slabs> <num_slabs> <sharedavail> >> kmalloc-96 2611 4896 128 32 1 : tunables 120 60 8 : slabdata 153 153 0 >> kmalloc-128 4798 5536 128 32 1 : tunables 120 60 8 : slabdata 173 173 0 > > Hmm. On my laptop, I see: > > kmalloc-96 28050 35364 96 42 1 : tunables 0 0 0 : slabdata 842 842 0 > > That'd take me from 842 * 4k pages to 1105 4k pages -- an extra megabyte of > memory. > > This is running Debian's 4.19 kernel: > > # CONFIG_SLAB is not set > CONFIG_SLUB=y Ah, you're right. SLAB creates kmalloc caches with: #ifndef ARCH_KMALLOC_FLAGS #define ARCH_KMALLOC_FLAGS SLAB_HWCACHE_ALIGN #endif create_kmalloc_caches(ARCH_KMALLOC_FLAGS); While SLUB just: create_kmalloc_caches(0); even though it uses SLAB_HWCACHE_ALIGN for kmem_cache_node and kmem_cache caches.
On 3/20/19 7:20 PM, Christopher Lameter wrote: >>> Currently all kmalloc objects are aligned to KMALLOC_MIN_ALIGN. That will >>> no longer be the case and alignments will become inconsistent. >> >> KMALLOC_MIN_ALIGN is still the minimum, but in practice it's larger >> which is not a problem. > > "In practice" refers to the current way that slab allocators arrange > objects within the page. They are free to do otherwise if new ideas come > up for object arrangements etc. > > The slab allocators already may have to store data in addition to the user > accessible part (f.e. for RCU or ctor). The "natural alighnment" of a > power of 2 cache is no longer as you expect for these cases. Debugging is > not the only case where we extend the object. For plain kmalloc() caches, RCU and ctors don't apply, right. >> Also let me stress again that nothing really changes except for SLOB, >> and SLUB with debug options. The natural alignment for power-of-two >> sizes already happens as SLAB and SLUB both allocate objects starting on >> the page boundary. So people make assumptions based on that, and then >> break with SLOB, or SLUB with debug. This patch just prevents that >> breakage by guaranteeing those natural assumptions at all times. > > As explained before there is nothing "natural" here. Doing so restricts > future features Well, future features will have to deal with the existing named caches created with specific alignment. > and creates a mess within the allocator of exceptions for > debuggin etc etc (see what happened to SLAB). SLAB could be fixed, just nobody cares enough I guess. If I want to debug wrong SL*B usage I'll use SLUB. > "Natural" is just a > simplistic thought of a user how he would arrange power of 2 objects. > These assumption should not be made but specified explicitly. Patch 1 does this explicitly for plain kmalloc(). It's unrealistic to add 'align' parameter to plain kmalloc() as that would have to create caches on-demand for 'new' values of align parameter. >>> I think its valuable that alignment requirements need to be explicitly >>> requested. >> >> That's still possible for named caches created by kmem_cache_create(). > > So lets leave it as it is now then. That however doesn't work well for the xfs/IO case where block sizes are not known in advance: https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f
On Thu, 21 Mar 2019, Vlastimil Babka wrote: > That however doesn't work well for the xfs/IO case where block sizes are > not known in advance: > > https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f I thought we agreed to use custom slab caches for that?
On 3/22/19 6:52 PM, Christopher Lameter wrote: > On Thu, 21 Mar 2019, Vlastimil Babka wrote: > >> That however doesn't work well for the xfs/IO case where block sizes are >> not known in advance: >> >> https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f > > I thought we agreed to use custom slab caches for that? Hm maybe I missed something but my impression was that xfs/IO folks would have to create lots of them for various sizes not known in advance, and that it wasn't practical and would welcome if kmalloc just guaranteed the alignment. But so far they haven't chimed in here in this thread, so I guess I'm wrong.
On Fri, Apr 05, 2019 at 07:11:17PM +0200, Vlastimil Babka wrote: > On 3/22/19 6:52 PM, Christopher Lameter wrote: > > On Thu, 21 Mar 2019, Vlastimil Babka wrote: > > > >> That however doesn't work well for the xfs/IO case where block sizes are > >> not known in advance: > >> > >> https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f > > > > I thought we agreed to use custom slab caches for that? > > Hm maybe I missed something but my impression was that xfs/IO folks would have > to create lots of them for various sizes not known in advance, and that it > wasn't practical and would welcome if kmalloc just guaranteed the alignment. > But so far they haven't chimed in here in this thread, so I guess I'm wrong. Yes, in XFS we might have quite a few. Never mind all the other block level consumers that might have similar reasonable expectations but haven't triggered the problematic drivers yet.
On 4/7/19 10:00 AM, Christoph Hellwig wrote: > On Fri, Apr 05, 2019 at 07:11:17PM +0200, Vlastimil Babka wrote: >> On 3/22/19 6:52 PM, Christopher Lameter wrote: >> > On Thu, 21 Mar 2019, Vlastimil Babka wrote: >> > >> >> That however doesn't work well for the xfs/IO case where block sizes are >> >> not known in advance: >> >> >> >> https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f >> > >> > I thought we agreed to use custom slab caches for that? >> >> Hm maybe I missed something but my impression was that xfs/IO folks would have >> to create lots of them for various sizes not known in advance, and that it >> wasn't practical and would welcome if kmalloc just guaranteed the alignment. >> But so far they haven't chimed in here in this thread, so I guess I'm wrong. > > Yes, in XFS we might have quite a few. Never mind all the other > block level consumers that might have similar reasonable expectations > but haven't triggered the problematic drivers yet. What about a LSF session/BoF to sort this out, then? Would need to have people from all three MM+FS+IO groups, I suppose.
On Tue 09-04-19 10:07:42, Vlastimil Babka wrote: > On 4/7/19 10:00 AM, Christoph Hellwig wrote: > > On Fri, Apr 05, 2019 at 07:11:17PM +0200, Vlastimil Babka wrote: > >> On 3/22/19 6:52 PM, Christopher Lameter wrote: > >> > On Thu, 21 Mar 2019, Vlastimil Babka wrote: > >> > > >> >> That however doesn't work well for the xfs/IO case where block sizes are > >> >> not known in advance: > >> >> > >> >> https://lore.kernel.org/linux-fsdevel/20190225040904.5557-1-ming.lei@redhat.com/T/#ec3a292c358d05a6b29cc4a9ce3ae6b2faf31a23f > >> > > >> > I thought we agreed to use custom slab caches for that? > >> > >> Hm maybe I missed something but my impression was that xfs/IO folks would have > >> to create lots of them for various sizes not known in advance, and that it > >> wasn't practical and would welcome if kmalloc just guaranteed the alignment. > >> But so far they haven't chimed in here in this thread, so I guess I'm wrong. > > > > Yes, in XFS we might have quite a few. Never mind all the other > > block level consumers that might have similar reasonable expectations > > but haven't triggered the problematic drivers yet. > > What about a LSF session/BoF to sort this out, then? Would need to have people > from all three MM+FS+IO groups, I suppose. Sounds like a good plan. Care to send an email to lsf-pc mailing list so that it doesn't fall through cracks please?