Message ID | 20240308031325.218676-1-kent.overstreet@linux.dev (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/slub: Make __ksize() faster | expand |
On Thu, Mar 07, 2024 at 10:13:22PM -0500, Kent Overstreet wrote: > This steals it to make __ksize() faster; it's now a single dependent > load, instead of two. This is going to be important for tracking the > amount of memory stranded by RCU, which we want to be able to do if > we're going to be freeing all pagecache folios (and perhaps all folios) > via RCU. Do you have any measurements? I don't necessarily object, it's just that I would have thought we were already bringing slab->slab_cache into cache as part of the freeing operation.
On Fri, Mar 08, 2024 at 04:48:11AM +0000, Matthew Wilcox wrote: > On Thu, Mar 07, 2024 at 10:13:22PM -0500, Kent Overstreet wrote: > > This steals it to make __ksize() faster; it's now a single dependent > > load, instead of two. This is going to be important for tracking the > > amount of memory stranded by RCU, which we want to be able to do if > > we're going to be freeing all pagecache folios (and perhaps all folios) > > via RCU. > > Do you have any measurements? I don't necessarily object, it's just > that I would have thought we were already bringing slab->slab_cache > into cache as part of the freeing operation. for kfree() yes, but for kfree_rcu() no - that'll happen significantly later. I don't have measurements that it /matters/, but since we have the u32 there now this seems like a good use for it.
On Fri, Mar 08, 2024 at 12:16:09AM -0500, Kent Overstreet wrote: > On Fri, Mar 08, 2024 at 04:48:11AM +0000, Matthew Wilcox wrote: > > On Thu, Mar 07, 2024 at 10:13:22PM -0500, Kent Overstreet wrote: > > > This steals it to make __ksize() faster; it's now a single dependent > > > load, instead of two. This is going to be important for tracking the > > > amount of memory stranded by RCU, which we want to be able to do if > > > we're going to be freeing all pagecache folios (and perhaps all folios) > > > via RCU. > > > > Do you have any measurements? I don't necessarily object, it's just > > that I would have thought we were already bringing slab->slab_cache > > into cache as part of the freeing operation. > > for kfree() yes, but for kfree_rcu() no - that'll happen significantly > later. > > I don't have measurements that it /matters/, but since we have the u32 > there now this seems like a good use for it. There are potentiually better uses for those bits. We could turn folio_test_slab() into a PageType test, freeing up a page flag.
On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: > On Fri, Mar 08, 2024 at 12:16:09AM -0500, Kent Overstreet wrote: > > On Fri, Mar 08, 2024 at 04:48:11AM +0000, Matthew Wilcox wrote: > > > On Thu, Mar 07, 2024 at 10:13:22PM -0500, Kent Overstreet wrote: > > > > This steals it to make __ksize() faster; it's now a single dependent > > > > load, instead of two. This is going to be important for tracking the > > > > amount of memory stranded by RCU, which we want to be able to do if > > > > we're going to be freeing all pagecache folios (and perhaps all folios) > > > > via RCU. > > > > > > Do you have any measurements? I don't necessarily object, it's just > > > that I would have thought we were already bringing slab->slab_cache > > > into cache as part of the freeing operation. > > > > for kfree() yes, but for kfree_rcu() no - that'll happen significantly > > later. > > > > I don't have measurements that it /matters/, but since we have the u32 > > there now this seems like a good use for it. > > There are potentiually better uses for those bits. We could turn > folio_test_slab() into a PageType test, freeing up a page flag. They overlap _mapcount, did you figure out how to use that for a PageType enum?
On Fri, Mar 08, 2024 at 11:27:32AM -0500, Kent Overstreet wrote: > On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: > > There are potentiually better uses for those bits. We could turn > > folio_test_slab() into a PageType test, freeing up a page flag. > > They overlap _mapcount, did you figure out how to use that for a > PageType enum? In 2018 ... 6e292b9be7f4358985ce33ae1f59ab30a8c09e08
> On Fri, Mar 08, 2024 at 11:27:32AM -0500, Kent Overstreet wrote: >> On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: >>> There are potentiually better uses for those bits. We could turn >>> folio_test_slab() into a PageType test, freeing up a page flag. >> >> They overlap _mapcount, did you figure out how to use that for a >> PageType enum? > > In 2018 ... 6e292b9be7f4358985ce33ae1f59ab30a8c09e08 > This seems to be 32 bit field. We could segment that into two unsigned shorts. In fact any operation on a slab larger than 2xPAGE_SIZE is directly turned into a page allocator call bypassing slub. So you only need 0 ... 2 * PAGE_SIZE for the range of the int.
On Fri, Mar 08, 2024 at 09:12:06AM -0800, Christoph Lameter (Ampere) wrote: > > > On Fri, Mar 08, 2024 at 11:27:32AM -0500, Kent Overstreet wrote: > > > On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: > > > > There are potentiually better uses for those bits. We could turn > > > > folio_test_slab() into a PageType test, freeing up a page flag. > > > > > > They overlap _mapcount, did you figure out how to use that for a > > > PageType enum? > > > > In 2018 ... 6e292b9be7f4358985ce33ae1f59ab30a8c09e08 > > > > This seems to be 32 bit field. We could segment that into two unsigned > shorts. In fact any operation on a slab larger than 2xPAGE_SIZE is directly > turned into a page allocator call bypassing slub. So you only need 0 ... 2 * > PAGE_SIZE for the range of the int. Are there any CPUs with PAGE_SIZE > 65535? ;-) It could, just about, be done. Although not on Hexagon with its crazy 256kB page. Right now, I reserve 0xf000007f to catch over/underflow, although this is perhaps excessive and I could get away with just 0x8000007f. That leaves 24 bits. We've currently got 4 in use, and I want to add two more (Slab and HugeTLB), so there's 18 bits remaining. So is it really worth burning all the remaining bits on implementing ksize with one fewer pointer dereference, given that struct kmem_cache is read-mostly and should live in the CPU cache quite well?
On Fri, Mar 08, 2024 at 06:26:11PM +0000, Matthew Wilcox wrote: > On Fri, Mar 08, 2024 at 09:12:06AM -0800, Christoph Lameter (Ampere) wrote: > > > > > On Fri, Mar 08, 2024 at 11:27:32AM -0500, Kent Overstreet wrote: > > > > On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: > > > > > There are potentiually better uses for those bits. We could turn > > > > > folio_test_slab() into a PageType test, freeing up a page flag. > > > > > > > > They overlap _mapcount, did you figure out how to use that for a > > > > PageType enum? > > > > > > In 2018 ... 6e292b9be7f4358985ce33ae1f59ab30a8c09e08 > > > > > > > This seems to be 32 bit field. We could segment that into two unsigned > > shorts. In fact any operation on a slab larger than 2xPAGE_SIZE is directly > > turned into a page allocator call bypassing slub. So you only need 0 ... 2 * > > PAGE_SIZE for the range of the int. > > Are there any CPUs with PAGE_SIZE > 65535? ;-) > > It could, just about, be done. Although not on Hexagon with its crazy > 256kB page. Right now, I reserve 0xf000007f to catch over/underflow, > although this is perhaps excessive and I could get away with just > 0x8000007f. That leaves 24 bits. We've currently got 4 in use, and I > want to add two more (Slab and HugeTLB), so there's 18 bits remaining. > > So is it really worth burning all the remaining bits on implementing > ksize with one fewer pointer dereference, given that struct kmem_cache > is read-mostly and should live in the CPU cache quite well? Ok, this is the first I'd seen of these PageType shenanigans :) That definitely takes priority over shaving cycles here; but could we clean that up and expand it? Does it really need to be a bitmask? And really, enums should be enums (that we can give a proper type name to) not #defines and raw ints. If we did that, and killed PG_slab and made it part of the PageType enum, that frees up a few bits.
On Fri, 8 Mar 2024, Matthew Wilcox wrote: > On Fri, Mar 08, 2024 at 09:12:06AM -0800, Christoph Lameter (Ampere) wrote: >> >>> On Fri, Mar 08, 2024 at 11:27:32AM -0500, Kent Overstreet wrote: >>>> On Fri, Mar 08, 2024 at 02:58:48PM +0000, Matthew Wilcox wrote: >>>>> There are potentiually better uses for those bits. We could turn >>>>> folio_test_slab() into a PageType test, freeing up a page flag. >>>> >>>> They overlap _mapcount, did you figure out how to use that for a >>>> PageType enum? >>> >>> In 2018 ... 6e292b9be7f4358985ce33ae1f59ab30a8c09e08 >>> >> >> This seems to be 32 bit field. We could segment that into two unsigned >> shorts. In fact any operation on a slab larger than 2xPAGE_SIZE is directly >> turned into a page allocator call bypassing slub. So you only need 0 ... 2 * >> PAGE_SIZE for the range of the int. > > Are there any CPUs with PAGE_SIZE > 65535? ;-) Yes. PPC has something like that I believe. Otherwise 64K is popular on ARM64 f.e. > So is it really worth burning all the remaining bits on implementing > ksize with one fewer pointer dereference, given that struct kmem_cache > is read-mostly and should live in the CPU cache quite well? No idea. Yes struct kmem cache must be cache hot anyways for operations on a cache. So make sure to move the object size to the beginning of kmem_cache so that it shares a cacheline (this should already be the case??). Then there should really be no win from relocating the size. But we have 32 precious bits to play around with. That is an important discussion on what to do with them.
diff --git a/mm/slab.h b/mm/slab.h index 54deeb0428c6..64f06431cc97 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -84,7 +84,7 @@ struct slab { }; struct rcu_head rcu_head; }; - unsigned int __unused; + unsigned int object_size; atomic_t __page_refcount; #ifdef CONFIG_MEMCG diff --git a/mm/slab_common.c b/mm/slab_common.c index 6ec0f6543f34..f209b8cf4965 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -963,13 +963,12 @@ size_t __ksize(const void *object) if (WARN_ON(object != folio_address(folio))) return 0; return folio_size(folio); - } - + } else { #ifdef CONFIG_SLUB_DEBUG - skip_orig_size_check(folio_slab(folio)->slab_cache, object); + skip_orig_size_check(folio_slab(folio)->slab_cache, object); #endif - - return slab_ksize(folio_slab(folio)->slab_cache); + return folio_slab(folio)->object_size; + } } /** diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..37fe5774c110 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2366,6 +2366,7 @@ static struct slab *allocate_slab(struct kmem_cache *s, gfp_t flags, int node) } slab->objects = oo_objects(oo); + slab->object_size = slab_ksize(slab->slab_cache); slab->inuse = 0; slab->frozen = 0;
this is part of a larger in-progress patch series to add tracking of amount of memory stranded via RCU. I want to base this on willy's series that rearranges folio batch freeing - that's got __folio_put() cleanups I want. Once he's got that up in a git tree I'll do that and finish the rest. -- >8 -- with slab gone, we now have a free u32 in struct slab. This steals it to make __ksize() faster; it's now a single dependent load, instead of two. This is going to be important for tracking the amount of memory stranded by RCU, which we want to be able to do if we're going to be freeing all pagecache folios (and perhaps all folios) via RCU. Signed-off-by: Kent Overstreet <kent.overstreet@linux.dev> Cc: linux-mm@kvack.org Cc: Vlastimil Babka <vbabka@suse.cz> Cc: Andrew Morton <akpm@linux-foundation.org> Cc: Christoph Lameter <cl@linux.com> Cc: Pekka Enberg <penberg@kernel.org> Cc: David Rientjes <rientjes@google.com> Cc: Roman Gushchin <roman.gushchin@linux.dev> Cc: Hyeonggon Yoo <42.hyeyoo@gmail.com> Cc: Matthew Wilcox (Oracle) <willy@infradead.org> --- mm/slab.h | 2 +- mm/slab_common.c | 9 ++++----- mm/slub.c | 1 + 3 files changed, 6 insertions(+), 6 deletions(-)