Message ID | 20220221105336.522086-6-42.hyeyoo@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | slab cleanups | expand |
On 2/21/22 11:53, Hyeonggon Yoo wrote: > Simply deactivate_slab() by removing variable 'lock' and replacing > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock > n->list_lock when cmpxchg_double() fails, and then retry. > > One slight functional change is releasing and taking n->list_lock again > when cmpxchg_double() fails. This is not harmful because SLUB avoids > deactivating slabs as much as possible. > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> Hm I wonder if we could simplify even a bit more. Do we have to actually place the slab on a partial (full) list before the cmpxchg, only to remove it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab un-frozen, but not on the list, which would be unexpected. However if anyone sees such slab, they have to take the list_lock first to start working with the slab... so this should be safe, because we hold the list_lock here, and will place the slab on the list before we release it. But it thus shouldn't matter if the placement happens before or after a successful cmpxchg, no? So we can only do it once after a successful cmpxchg and need no undo's? Specifically AFAIK the only possible race should be with a __slab_free() which might observe !was_frozen after we succeed an unfreezing cmpxchg and go through the "} else { /* Needs to be taken off a list */" branch but then it takes the list_lock as the first thing, so will be able to proceed only after the slab is actually on the list. Do I miss anything or would you agree? > --- > mm/slub.c | 74 +++++++++++++++++++++++++------------------------------ > 1 file changed, 33 insertions(+), 41 deletions(-) > > diff --git a/mm/slub.c b/mm/slub.c > index a4964deccb61..2d0663befb9e 100644 > --- a/mm/slub.c > +++ b/mm/slub.c > @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > { > enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; > struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > - int lock = 0, free_delta = 0; > - enum slab_modes l = M_NONE, m = M_NONE; > + int free_delta = 0; > + enum slab_modes mode = M_NONE; > void *nextfree, *freelist_iter, *freelist_tail; > int tail = DEACTIVATE_TO_HEAD; > unsigned long flags = 0; > @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > new.frozen = 0; > > if (!new.inuse && n->nr_partial >= s->min_partial) > - m = M_FREE; > + mode = M_FREE; > else if (new.freelist) { > - m = M_PARTIAL; > - if (!lock) { > - lock = 1; > - /* > - * Taking the spinlock removes the possibility that > - * acquire_slab() will see a slab that is frozen > - */ > - spin_lock_irqsave(&n->list_lock, flags); > - } > - } else { > - m = M_FULL; > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) { > - lock = 1; > - /* > - * This also ensures that the scanning of full > - * slabs from diagnostic functions will not see > - * any frozen slabs. > - */ > - spin_lock_irqsave(&n->list_lock, flags); > - } > + mode = M_PARTIAL; > + /* > + * Taking the spinlock removes the possibility that > + * acquire_slab() will see a slab that is frozen > + */ > + spin_lock_irqsave(&n->list_lock, flags); > + add_partial(n, slab, tail); > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { > + mode = M_FULL; > + /* > + * This also ensures that the scanning of full > + * slabs from diagnostic functions will not see > + * any frozen slabs. > + */ > + spin_lock_irqsave(&n->list_lock, flags); > + add_full(s, n, slab); > } > > - if (l != m) { > - if (l == M_PARTIAL) > - remove_partial(n, slab); > - else if (l == M_FULL) > - remove_full(s, n, slab); > > - if (m == M_PARTIAL) > - add_partial(n, slab, tail); > - else if (m == M_FULL) > - add_full(s, n, slab); > - } > - > - l = m; > if (!cmpxchg_double_slab(s, slab, > old.freelist, old.counters, > new.freelist, new.counters, > - "unfreezing slab")) > + "unfreezing slab")) { > + if (mode == M_PARTIAL) { > + remove_partial(n, slab); > + spin_unlock_irqrestore(&n->list_lock, flags); > + } else if (mode == M_FULL) { > + remove_full(s, n, slab); > + spin_unlock_irqrestore(&n->list_lock, flags); > + } > goto redo; > + } > > - if (lock) > - spin_unlock_irqrestore(&n->list_lock, flags); > > - if (m == M_PARTIAL) > + if (mode == M_PARTIAL) { > + spin_unlock_irqrestore(&n->list_lock, flags); > stat(s, tail); > - else if (m == M_FULL) > + } else if (mode == M_FULL) { > + spin_unlock_irqrestore(&n->list_lock, flags); > stat(s, DEACTIVATE_FULL); > - else if (m == M_FREE) { > + } else if (mode == M_FREE) { > stat(s, DEACTIVATE_EMPTY); > discard_slab(s, slab); > stat(s, FREE_SLAB);
On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote: > On 2/21/22 11:53, Hyeonggon Yoo wrote: > > Simply deactivate_slab() by removing variable 'lock' and replacing > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock > > n->list_lock when cmpxchg_double() fails, and then retry. > > > > One slight functional change is releasing and taking n->list_lock again > > when cmpxchg_double() fails. This is not harmful because SLUB avoids > > deactivating slabs as much as possible. > > > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > Hm I wonder if we could simplify even a bit more. Do we have to actually > place the slab on a partial (full) list before the cmpxchg, only to remove > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab > un-frozen, but not on the list, which would be unexpected. However if anyone > sees such slab, they have to take the list_lock first to start working with > the slab... so this should be safe, because we hold the list_lock here, and > will place the slab on the list before we release it. But it thus shouldn't > matter if the placement happens before or after a successful cmpxchg, no? So > we can only do it once after a successful cmpxchg and need no undo's? > My thought was similar. But after testing I noticed that &n->list_lock prevents race between __slab_free() and deactivate_slab(). > Specifically AFAIK the only possible race should be with a __slab_free() > which might observe !was_frozen after we succeed an unfreezing cmpxchg and > go through the > "} else { /* Needs to be taken off a list */" > branch but then it takes the list_lock as the first thing, so will be able > to proceed only after the slab is actually on the list. > > Do I miss anything or would you agree? > It's so tricky. I tried to simplify more as you said. Seeing frozen slab on list was not problem. But the problem was that something might interfere between cmpxchg_double() and taking spinlock. This is what I faced: CPU A CPU B deactivate_slab() { __slab_free() { /* slab is full */ slab.frozen = 0; cmpxchg_double(); /* Hmm... slab->frozen == 0 && slab->freelist != NULL? Oh This must be on the list.. */ spin_lock_irqsave(); cmpxchg_double(); /* Corruption: slab * was not yet inserted to * list but try removing */ remove_full(); spin_unlock_irqrestore(); } spin_lock_irqsave(); add_full(); spin_unlock_irqrestore(); } I think it's quite confusing because it's protecting code, not data. Would you have an idea to solve it, or should we add a comment for this? > > --- > > mm/slub.c | 74 +++++++++++++++++++++++++------------------------------ > > 1 file changed, 33 insertions(+), 41 deletions(-) > > > > diff --git a/mm/slub.c b/mm/slub.c > > index a4964deccb61..2d0663befb9e 100644 > > --- a/mm/slub.c > > +++ b/mm/slub.c > > @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > { > > enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; > > struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > - int lock = 0, free_delta = 0; > > - enum slab_modes l = M_NONE, m = M_NONE; > > + int free_delta = 0; > > + enum slab_modes mode = M_NONE; > > void *nextfree, *freelist_iter, *freelist_tail; > > int tail = DEACTIVATE_TO_HEAD; > > unsigned long flags = 0; > > @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > new.frozen = 0; > > > > if (!new.inuse && n->nr_partial >= s->min_partial) > > - m = M_FREE; > > + mode = M_FREE; > > else if (new.freelist) { > > - m = M_PARTIAL; > > - if (!lock) { > > - lock = 1; > > - /* > > - * Taking the spinlock removes the possibility that > > - * acquire_slab() will see a slab that is frozen > > - */ > > - spin_lock_irqsave(&n->list_lock, flags); > > - } > > - } else { > > - m = M_FULL; > > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) { > > - lock = 1; > > - /* > > - * This also ensures that the scanning of full > > - * slabs from diagnostic functions will not see > > - * any frozen slabs. > > - */ > > - spin_lock_irqsave(&n->list_lock, flags); > > - } > > + mode = M_PARTIAL; > > + /* > > + * Taking the spinlock removes the possibility that > > + * acquire_slab() will see a slab that is frozen > > + */ > > + spin_lock_irqsave(&n->list_lock, flags); > > + add_partial(n, slab, tail); > > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { > > + mode = M_FULL; > > + /* > > + * This also ensures that the scanning of full > > + * slabs from diagnostic functions will not see > > + * any frozen slabs. > > + */ > > + spin_lock_irqsave(&n->list_lock, flags); > > + add_full(s, n, slab); > > } > > > > - if (l != m) { > > - if (l == M_PARTIAL) > > - remove_partial(n, slab); > > - else if (l == M_FULL) > > - remove_full(s, n, slab); > > > > - if (m == M_PARTIAL) > > - add_partial(n, slab, tail); > > - else if (m == M_FULL) > > - add_full(s, n, slab); > > - } > > - > > - l = m; > > if (!cmpxchg_double_slab(s, slab, > > old.freelist, old.counters, > > new.freelist, new.counters, > > - "unfreezing slab")) > > + "unfreezing slab")) { > > + if (mode == M_PARTIAL) { > > + remove_partial(n, slab); > > + spin_unlock_irqrestore(&n->list_lock, flags); > > + } else if (mode == M_FULL) { > > + remove_full(s, n, slab); > > + spin_unlock_irqrestore(&n->list_lock, flags); > > + } > > goto redo; > > + } > > > > - if (lock) > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > > - if (m == M_PARTIAL) > > + if (mode == M_PARTIAL) { > > + spin_unlock_irqrestore(&n->list_lock, flags); > > stat(s, tail); > > - else if (m == M_FULL) > > + } else if (mode == M_FULL) { > > + spin_unlock_irqrestore(&n->list_lock, flags); > > stat(s, DEACTIVATE_FULL); > > - else if (m == M_FREE) { > > + } else if (mode == M_FREE) { > > stat(s, DEACTIVATE_EMPTY); > > discard_slab(s, slab); > > stat(s, FREE_SLAB); >
On Fri, Feb 25, 2022 at 09:34:09AM +0000, Hyeonggon Yoo wrote: > On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote: > > On 2/21/22 11:53, Hyeonggon Yoo wrote: > > > Simply deactivate_slab() by removing variable 'lock' and replacing > > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock > > > n->list_lock when cmpxchg_double() fails, and then retry. > > > > > > One slight functional change is releasing and taking n->list_lock again > > > when cmpxchg_double() fails. This is not harmful because SLUB avoids > > > deactivating slabs as much as possible. > > > > > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > Hm I wonder if we could simplify even a bit more. Do we have to actually > > place the slab on a partial (full) list before the cmpxchg, only to remove > > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab > > un-frozen, but not on the list, which would be unexpected. However if anyone > > sees such slab, they have to take the list_lock first to start working with > > the slab... so this should be safe, because we hold the list_lock here, and > > will place the slab on the list before we release it. But it thus shouldn't > > matter if the placement happens before or after a successful cmpxchg, no? So > > we can only do it once after a successful cmpxchg and need no undo's? > > > > My thought was similar. But after testing I noticed that &n->list_lock prevents > race between __slab_free() and deactivate_slab(). > > > Specifically AFAIK the only possible race should be with a __slab_free() > > which might observe !was_frozen after we succeed an unfreezing cmpxchg and > > go through the > > "} else { /* Needs to be taken off a list */" > > branch but then it takes the list_lock as the first thing, so will be able > > to proceed only after the slab is actually on the list. > > > > Do I miss anything or would you agree? > > > > It's so tricky. > > I tried to simplify more as you said. Seeing frozen slab on list was not > problem. But the problem was that something might interfere between > cmpxchg_double() and taking spinlock. > > This is what I faced: > > CPU A CPU B > deactivate_slab() { __slab_free() { > /* slab is full */ > slab.frozen = 0; > cmpxchg_double(); > /* Hmm... > slab->frozen == 0 && > slab->freelist != NULL? > Oh This must be on the list.. */ Oh this is wrong. slab->freelist must be NULL because it's full slab. It's more complex than I thought... > spin_lock_irqsave(); > cmpxchg_double(); > /* Corruption: slab > * was not yet inserted to > * list but try removing */ > remove_full(); > spin_unlock_irqrestore(); > } > spin_lock_irqsave(); > add_full(); > spin_unlock_irqrestore(); > } So it was... CPU A CPU B deactivate_slab() { __slab_free() { /* slab is full */ slab.frozen = 0; cmpxchg_double(); /* Hmm... !was_frozen && prior == NULL? Let's freeze this! */ put_cpu_partial(); } spin_lock_irqsave(); add_full(); /* It's now frozen by CPU B and at the same time on full list */ spin_unlock_irqrestore(); And &n->list_lock prevents such a race. > > I think it's quite confusing because it's protecting code, not data. > > Would you have an idea to solve it, or should we add a comment for this? > > > > --- > > > mm/slub.c | 74 +++++++++++++++++++++++++------------------------------ > > > 1 file changed, 33 insertions(+), 41 deletions(-) > > > > > > diff --git a/mm/slub.c b/mm/slub.c > > > index a4964deccb61..2d0663befb9e 100644 > > > --- a/mm/slub.c > > > +++ b/mm/slub.c > > > @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > > { > > > enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; > > > struct kmem_cache_node *n = get_node(s, slab_nid(slab)); > > > - int lock = 0, free_delta = 0; > > > - enum slab_modes l = M_NONE, m = M_NONE; > > > + int free_delta = 0; > > > + enum slab_modes mode = M_NONE; > > > void *nextfree, *freelist_iter, *freelist_tail; > > > int tail = DEACTIVATE_TO_HEAD; > > > unsigned long flags = 0; > > > @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, > > > new.frozen = 0; > > > > > > if (!new.inuse && n->nr_partial >= s->min_partial) > > > - m = M_FREE; > > > + mode = M_FREE; > > > else if (new.freelist) { > > > - m = M_PARTIAL; > > > - if (!lock) { > > > - lock = 1; > > > - /* > > > - * Taking the spinlock removes the possibility that > > > - * acquire_slab() will see a slab that is frozen > > > - */ > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - } > > > - } else { > > > - m = M_FULL; > > > - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) { > > > - lock = 1; > > > - /* > > > - * This also ensures that the scanning of full > > > - * slabs from diagnostic functions will not see > > > - * any frozen slabs. > > > - */ > > > - spin_lock_irqsave(&n->list_lock, flags); > > > - } > > > + mode = M_PARTIAL; > > > + /* > > > + * Taking the spinlock removes the possibility that > > > + * acquire_slab() will see a slab that is frozen > > > + */ > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + add_partial(n, slab, tail); > > > + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { > > > + mode = M_FULL; > > > + /* > > > + * This also ensures that the scanning of full > > > + * slabs from diagnostic functions will not see > > > + * any frozen slabs. > > > + */ > > > + spin_lock_irqsave(&n->list_lock, flags); > > > + add_full(s, n, slab); > > > } > > > > > > - if (l != m) { > > > - if (l == M_PARTIAL) > > > - remove_partial(n, slab); > > > - else if (l == M_FULL) > > > - remove_full(s, n, slab); > > > > > > - if (m == M_PARTIAL) > > > - add_partial(n, slab, tail); > > > - else if (m == M_FULL) > > > - add_full(s, n, slab); > > > - } > > > - > > > - l = m; > > > if (!cmpxchg_double_slab(s, slab, > > > old.freelist, old.counters, > > > new.freelist, new.counters, > > > - "unfreezing slab")) > > > + "unfreezing slab")) { > > > + if (mode == M_PARTIAL) { > > > + remove_partial(n, slab); > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > + } else if (mode == M_FULL) { > > > + remove_full(s, n, slab); > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > + } > > > goto redo; > > > + } > > > > > > - if (lock) > > > - spin_unlock_irqrestore(&n->list_lock, flags); > > > > > > - if (m == M_PARTIAL) > > > + if (mode == M_PARTIAL) { > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > stat(s, tail); > > > - else if (m == M_FULL) > > > + } else if (mode == M_FULL) { > > > + spin_unlock_irqrestore(&n->list_lock, flags); > > > stat(s, DEACTIVATE_FULL); > > > - else if (m == M_FREE) { > > > + } else if (mode == M_FREE) { > > > stat(s, DEACTIVATE_EMPTY); > > > discard_slab(s, slab); > > > stat(s, FREE_SLAB); > > > > -- > Thank you, You are awesome! > Hyeonggon :-)
On 2/25/22 10:50, Hyeonggon Yoo wrote: > On Fri, Feb 25, 2022 at 09:34:09AM +0000, Hyeonggon Yoo wrote: >> On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote: >> > On 2/21/22 11:53, Hyeonggon Yoo wrote: >> > > Simply deactivate_slab() by removing variable 'lock' and replacing >> > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock >> > > n->list_lock when cmpxchg_double() fails, and then retry. >> > > >> > > One slight functional change is releasing and taking n->list_lock again >> > > when cmpxchg_double() fails. This is not harmful because SLUB avoids >> > > deactivating slabs as much as possible. >> > > >> > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> >> > >> > Hm I wonder if we could simplify even a bit more. Do we have to actually >> > place the slab on a partial (full) list before the cmpxchg, only to remove >> > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab >> > un-frozen, but not on the list, which would be unexpected. However if anyone >> > sees such slab, they have to take the list_lock first to start working with >> > the slab... so this should be safe, because we hold the list_lock here, and >> > will place the slab on the list before we release it. But it thus shouldn't >> > matter if the placement happens before or after a successful cmpxchg, no? So >> > we can only do it once after a successful cmpxchg and need no undo's? >> > >> >> My thought was similar. But after testing I noticed that &n->list_lock prevents >> race between __slab_free() and deactivate_slab(). >> >> > Specifically AFAIK the only possible race should be with a __slab_free() >> > which might observe !was_frozen after we succeed an unfreezing cmpxchg and >> > go through the >> > "} else { /* Needs to be taken off a list */" >> > branch but then it takes the list_lock as the first thing, so will be able >> > to proceed only after the slab is actually on the list. >> > >> > Do I miss anything or would you agree? >> > >> >> It's so tricky. >> >> I tried to simplify more as you said. Seeing frozen slab on list was not >> problem. But the problem was that something might interfere between >> cmpxchg_double() and taking spinlock. >> >> This is what I faced: >> >> CPU A CPU B >> deactivate_slab() { __slab_free() { >> /* slab is full */ >> slab.frozen = 0; >> cmpxchg_double(); >> /* Hmm... >> slab->frozen == 0 && >> slab->freelist != NULL? >> Oh This must be on the list.. */ > Oh this is wrong. > slab->freelist must be > NULL because it's full > slab. > > It's more complex > than I thought... > > >> spin_lock_irqsave(); >> cmpxchg_double(); >> /* Corruption: slab >> * was not yet inserted to >> * list but try removing */ >> remove_full(); >> spin_unlock_irqrestore(); >> } >> spin_lock_irqsave(); >> add_full(); >> spin_unlock_irqrestore(); >> } > > So it was... > > CPU A CPU B > deactivate_slab() { __slab_free() { > /* slab is full */ > slab.frozen = 0; > cmpxchg_double(); > /* > Hmm... > !was_frozen && > prior == NULL? > Let's freeze this! > */ > put_cpu_partial(); > } > spin_lock_irqsave(); Yeah in my proposal I didn't intend to only take spin_lock_irqsave() here. My idea for CPU A would be something like: spin_lock_irqsave(); slab.frozen = 0; if (cmpxchg_double()); { /* success */ add_partial(); // (or add_full()) spin_unlock_irqrestore(); } else { /* fail */ spin_unlock_irqrestore(); goto redo; } So we would still have the list_lock protection around cmpxchg as in the current code. We just wouldn't do e.g. add_partial() before cmpxchg, only to remove_partial() when cmpxchg failed. > add_full(); > /* It's now frozen by CPU B and at the same time on full list */ > spin_unlock_irqrestore(); > > And &n->list_lock prevents such a race.
On Fri, Feb 25, 2022 at 11:07:45AM +0100, Vlastimil Babka wrote: > On 2/25/22 10:50, Hyeonggon Yoo wrote: > > On Fri, Feb 25, 2022 at 09:34:09AM +0000, Hyeonggon Yoo wrote: > >> On Thu, Feb 24, 2022 at 07:16:11PM +0100, Vlastimil Babka wrote: > >> > On 2/21/22 11:53, Hyeonggon Yoo wrote: > >> > > Simply deactivate_slab() by removing variable 'lock' and replacing > >> > > 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock > >> > > n->list_lock when cmpxchg_double() fails, and then retry. > >> > > > >> > > One slight functional change is releasing and taking n->list_lock again > >> > > when cmpxchg_double() fails. This is not harmful because SLUB avoids > >> > > deactivating slabs as much as possible. > >> > > > >> > > Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > >> > > >> > Hm I wonder if we could simplify even a bit more. Do we have to actually > >> > place the slab on a partial (full) list before the cmpxchg, only to remove > >> > it when cmpxchg fails? Seems it's to avoid anyone else seeing the slab > >> > un-frozen, but not on the list, which would be unexpected. However if anyone > >> > sees such slab, they have to take the list_lock first to start working with > >> > the slab... so this should be safe, because we hold the list_lock here, and > >> > will place the slab on the list before we release it. But it thus shouldn't > >> > matter if the placement happens before or after a successful cmpxchg, no? So > >> > we can only do it once after a successful cmpxchg and need no undo's? > >> > > >> > >> My thought was similar. But after testing I noticed that &n->list_lock prevents > >> race between __slab_free() and deactivate_slab(). > >> > >> > Specifically AFAIK the only possible race should be with a __slab_free() > >> > which might observe !was_frozen after we succeed an unfreezing cmpxchg and > >> > go through the > >> > "} else { /* Needs to be taken off a list */" > >> > branch but then it takes the list_lock as the first thing, so will be able > >> > to proceed only after the slab is actually on the list. > >> > > >> > Do I miss anything or would you agree? > >> > > >> > >> It's so tricky. > >> > >> I tried to simplify more as you said. Seeing frozen slab on list was not > >> problem. But the problem was that something might interfere between > >> cmpxchg_double() and taking spinlock. > >> > >> This is what I faced: > >> > >> CPU A CPU B > >> deactivate_slab() { __slab_free() { > >> /* slab is full */ > >> slab.frozen = 0; > >> cmpxchg_double(); > >> /* Hmm... > >> slab->frozen == 0 && > >> slab->freelist != NULL? > >> Oh This must be on the list.. */ > > Oh this is wrong. > > slab->freelist must be > > NULL because it's full > > slab. > > > > It's more complex > > than I thought... > > > > > >> spin_lock_irqsave(); > >> cmpxchg_double(); > >> /* Corruption: slab > >> * was not yet inserted to > >> * list but try removing */ > >> remove_full(); > >> spin_unlock_irqrestore(); > >> } > >> spin_lock_irqsave(); > >> add_full(); > >> spin_unlock_irqrestore(); > >> } > > > > So it was... > > > > CPU A CPU B > > deactivate_slab() { __slab_free() { > > /* slab is full */ > > slab.frozen = 0; > > cmpxchg_double(); > > /* > > Hmm... > > !was_frozen && > > prior == NULL? > > Let's freeze this! > > */ > > put_cpu_partial(); > > } > > spin_lock_irqsave(); > > Yeah in my proposal I didn't intend to only take spin_lock_irqsave() here. > My idea for CPU A would be something like: > Oh, misunderstood your proposal. I spent hours figuring what's wrong haha > spin_lock_irqsave(); > slab.frozen = 0; > if (cmpxchg_double()); { > /* success */ > add_partial(); // (or add_full()) > spin_unlock_irqrestore(); > } else { > /* fail */ > spin_unlock_irqrestore(); > goto redo; > } > > So we would still have the list_lock protection around cmpxchg as in the > current code. We just wouldn't do e.g. add_partial() before cmpxchg, only to > remove_partial() when cmpxchg failed. Now I got what you mean... I think that would work. Will try that. Thank you for nice proposal! > > > add_full(); > > /* It's now frozen by CPU B and at the same time on full list */ > > spin_unlock_irqrestore(); > > > > And &n->list_lock prevents such a race. >
diff --git a/mm/slub.c b/mm/slub.c index a4964deccb61..2d0663befb9e 100644 --- a/mm/slub.c +++ b/mm/slub.c @@ -2350,8 +2350,8 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, { enum slab_modes { M_NONE, M_PARTIAL, M_FULL, M_FREE }; struct kmem_cache_node *n = get_node(s, slab_nid(slab)); - int lock = 0, free_delta = 0; - enum slab_modes l = M_NONE, m = M_NONE; + int free_delta = 0; + enum slab_modes mode = M_NONE; void *nextfree, *freelist_iter, *freelist_tail; int tail = DEACTIVATE_TO_HEAD; unsigned long flags = 0; @@ -2420,57 +2420,49 @@ static void deactivate_slab(struct kmem_cache *s, struct slab *slab, new.frozen = 0; if (!new.inuse && n->nr_partial >= s->min_partial) - m = M_FREE; + mode = M_FREE; else if (new.freelist) { - m = M_PARTIAL; - if (!lock) { - lock = 1; - /* - * Taking the spinlock removes the possibility that - * acquire_slab() will see a slab that is frozen - */ - spin_lock_irqsave(&n->list_lock, flags); - } - } else { - m = M_FULL; - if (kmem_cache_debug_flags(s, SLAB_STORE_USER) && !lock) { - lock = 1; - /* - * This also ensures that the scanning of full - * slabs from diagnostic functions will not see - * any frozen slabs. - */ - spin_lock_irqsave(&n->list_lock, flags); - } + mode = M_PARTIAL; + /* + * Taking the spinlock removes the possibility that + * acquire_slab() will see a slab that is frozen + */ + spin_lock_irqsave(&n->list_lock, flags); + add_partial(n, slab, tail); + } else if (kmem_cache_debug_flags(s, SLAB_STORE_USER)) { + mode = M_FULL; + /* + * This also ensures that the scanning of full + * slabs from diagnostic functions will not see + * any frozen slabs. + */ + spin_lock_irqsave(&n->list_lock, flags); + add_full(s, n, slab); } - if (l != m) { - if (l == M_PARTIAL) - remove_partial(n, slab); - else if (l == M_FULL) - remove_full(s, n, slab); - if (m == M_PARTIAL) - add_partial(n, slab, tail); - else if (m == M_FULL) - add_full(s, n, slab); - } - - l = m; if (!cmpxchg_double_slab(s, slab, old.freelist, old.counters, new.freelist, new.counters, - "unfreezing slab")) + "unfreezing slab")) { + if (mode == M_PARTIAL) { + remove_partial(n, slab); + spin_unlock_irqrestore(&n->list_lock, flags); + } else if (mode == M_FULL) { + remove_full(s, n, slab); + spin_unlock_irqrestore(&n->list_lock, flags); + } goto redo; + } - if (lock) - spin_unlock_irqrestore(&n->list_lock, flags); - if (m == M_PARTIAL) + if (mode == M_PARTIAL) { + spin_unlock_irqrestore(&n->list_lock, flags); stat(s, tail); - else if (m == M_FULL) + } else if (mode == M_FULL) { + spin_unlock_irqrestore(&n->list_lock, flags); stat(s, DEACTIVATE_FULL); - else if (m == M_FREE) { + } else if (mode == M_FREE) { stat(s, DEACTIVATE_EMPTY); discard_slab(s, slab); stat(s, FREE_SLAB);
Simply deactivate_slab() by removing variable 'lock' and replacing 'l' and 'm' with 'mode'. Instead, remove slab from list and unlock n->list_lock when cmpxchg_double() fails, and then retry. One slight functional change is releasing and taking n->list_lock again when cmpxchg_double() fails. This is not harmful because SLUB avoids deactivating slabs as much as possible. Signed-off-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> --- mm/slub.c | 74 +++++++++++++++++++++++++------------------------------ 1 file changed, 33 insertions(+), 41 deletions(-)