Message ID | 20240117-slab-misc-v2-1-81766907896e@bytedance.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/slub: some minor optimization and cleanup | expand |
On 1/23/24 10:33, Chengming Zhou wrote: > The likely case is that we get a usable slab from the cpu partial list, > we can directly load freelist from it and return back, instead of going > the other way that need more work, like reenable interrupt and recheck. > > But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist() > for reusing it, since cpu partial slab is not frozen. It seems > acceptable since it's only for debug purpose. > > And get_freelist() also assumes it can return NULL if the freelist is > empty, which is not possible for the cpu partial slab case, so we > add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit. > > There is some small performance improvement too, which shows by: > perf bench sched messaging -g 5 -t -l 100000 > > mm-stable slub-optimize > Total time 7.473 7.209 > > Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> Reviewed-by: Vlastimil Babka <vbabka@suse.cz> I'm including the series to slab/for-6.9/optimize-get-freelist and slab/for-next, thanks! > --- > mm/slub.c | 21 +++++++++++---------- > 1 file changed, 11 insertions(+), 10 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index 2ef88bbf56a3..fda402b2d649 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab) > counters = slab->counters; > > new.counters = counters; > - VM_BUG_ON(!new.frozen); > > new.inuse = slab->objects; > new.frozen = freelist != NULL; > @@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, > > slab = slub_percpu_partial(c); > slub_set_percpu_partial(c, slab); > - local_unlock_irqrestore(&s->cpu_slab->lock, flags); > - stat(s, CPU_PARTIAL_ALLOC); > > - if (unlikely(!node_match(slab, node) || > - !pfmemalloc_match(slab, gfpflags))) { > - slab->next = NULL; > - __put_partials(s, slab); > - continue; > + if (likely(node_match(slab, node) && > + pfmemalloc_match(slab, gfpflags))) { > + c->slab = slab; > + freelist = get_freelist(s, slab); > + VM_BUG_ON(!freelist); > + stat(s, CPU_PARTIAL_ALLOC); > + goto load_freelist; > } > > - freelist = freeze_slab(s, slab); > - goto retry_load_slab; > + local_unlock_irqrestore(&s->cpu_slab->lock, flags); > + > + slab->next = NULL; > + __put_partials(s, slab); > } > #endif > >
diff --git a/mm/slub.c b/mm/slub.c index 2ef88bbf56a3..fda402b2d649 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -3326,7 +3326,6 @@ static inline void *get_freelist(struct kmem_cache *s, struct slab *slab) counters = slab->counters; new.counters = counters; - VM_BUG_ON(!new.frozen); new.inuse = slab->objects; new.frozen = freelist != NULL; @@ -3498,18 +3497,20 @@ static void *___slab_alloc(struct kmem_cache *s, gfp_t gfpflags, int node, slab = slub_percpu_partial(c); slub_set_percpu_partial(c, slab); - local_unlock_irqrestore(&s->cpu_slab->lock, flags); - stat(s, CPU_PARTIAL_ALLOC); - if (unlikely(!node_match(slab, node) || - !pfmemalloc_match(slab, gfpflags))) { - slab->next = NULL; - __put_partials(s, slab); - continue; + if (likely(node_match(slab, node) && + pfmemalloc_match(slab, gfpflags))) { + c->slab = slab; + freelist = get_freelist(s, slab); + VM_BUG_ON(!freelist); + stat(s, CPU_PARTIAL_ALLOC); + goto load_freelist; } - freelist = freeze_slab(s, slab); - goto retry_load_slab; + local_unlock_irqrestore(&s->cpu_slab->lock, flags); + + slab->next = NULL; + __put_partials(s, slab); } #endif
The likely case is that we get a usable slab from the cpu partial list, we can directly load freelist from it and return back, instead of going the other way that need more work, like reenable interrupt and recheck. But we need to remove the "VM_BUG_ON(!new.frozen)" in get_freelist() for reusing it, since cpu partial slab is not frozen. It seems acceptable since it's only for debug purpose. And get_freelist() also assumes it can return NULL if the freelist is empty, which is not possible for the cpu partial slab case, so we add "VM_BUG_ON(!freelist)" after get_freelist() to make it explicit. There is some small performance improvement too, which shows by: perf bench sched messaging -g 5 -t -l 100000 mm-stable slub-optimize Total time 7.473 7.209 Signed-off-by: Chengming Zhou <zhouchengming@bytedance.com> --- mm/slub.c | 21 +++++++++++---------- 1 file changed, 11 insertions(+), 10 deletions(-)