Message ID | 85afd876-d8bb-0804-b2c5-48ed3055e702@joelfernandes.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | How to increase rcu_head size without breaking -mm ? | expand |
On 8/25/22 04:56, Joel Fernandes wrote: > Hi, > I am trying to add some debug information to rcu_head for some patches, and need > your help! At least this used to work some time ago (> 1 year). Hmm, got a patch of how it worked back then? Because what I see (in v5.12) struct page definition: struct rcu_head rcu_head; is in union with the whole of struct { /* slab, slob and slub */ ... } which starts with union { struct list_head slab_list; ... } struct kmem_cache *slab_cache; /* not slob */ and there's e.g. kmem_rcu_free() in mm/slab.c that does page = container_of(head, struct page, rcu_head); cachep = page->slab_cache; and very similar thing in mm/slub.c rcu_free_slab() - we assume that the rcu_head and slab_cache fields can coexist. So this looks silently fragile to me? In case rcu_head becomes larger than list_head, e.g. with your buf[4], it would start overlaping the page->slab_cache field, no? AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the overlap impossible, thus the struct slab will grow as rcu_head grows, and offsets of the later fields stop matching struct page counterparts, which doesn't grow. Hmm. > I get various SLAB_MATCH() errors with a change like this: > > diff --git a/include/linux/types.h b/include/linux/types.h > index ea8cf60a8a79..daf7682a7a3b 100644 > --- a/include/linux/types.h > +++ b/include/linux/types.h > @@ -220,6 +220,7 @@ struct ustat { > struct callback_head { > struct callback_head *next; > void (*func)(struct callback_head *head); > + char buf[4]; > } __attribute__((aligned(sizeof(void *)))); > #define rcu_head callback_head > --- > > Errors: > mm/slab.h:67:9: note: in expansion of macro ‘static_assert’ > 67 | static_assert(offsetof(struct page, pg) == offsetof(struct slab, > sl)) > | ^~~~~~~~~~~~~ > mm/slab.h:73:1: note: in expansion of macro ‘SLAB_MATCH’ > 73 | SLAB_MATCH(_refcount, __page_refcount); > | ^~~~~~~~~~ > > ---------- > > I tried various things like increasing the struct slab or struct page size using > dummy variables, but nothing gives. > > I tried to print the offsets of the struct using the compiler, but that requires > compilation to actually succeed. The same exercise with a pre-processor also > does not help because nested structures could also including rcu_heads. > > If I at least knew how the offsets were off, I could hack it for my needs (of > course upstreaming such a change is a different story, but I do want to upstream > some variant of rcu_head debug info). > > Any thoughts on how we can accomplish this? > > Thanks! > > - Joel
On 8/25/22 09:13, Vlastimil Babka wrote: > On 8/25/22 04:56, Joel Fernandes wrote: >> Hi, >> I am trying to add some debug information to rcu_head for some patches, and need >> your help! At least this used to work some time ago (> 1 year). > > Hmm, got a patch of how it worked back then? Because what I see (in v5.12) > struct page definition: > > struct rcu_head rcu_head; > > is in union with the whole of > > struct { /* slab, slob and slub */ ... } > > which starts with > > union { struct list_head slab_list; ... } > struct kmem_cache *slab_cache; /* not slob */ > > and there's e.g. kmem_rcu_free() in mm/slab.c that does > > page = container_of(head, struct page, rcu_head); > cachep = page->slab_cache; > > and very similar thing in mm/slub.c rcu_free_slab() - we assume that the > rcu_head and slab_cache fields can coexist. > > So this looks silently fragile to me? In case rcu_head becomes larger than > list_head, e.g. with your buf[4], it would start overlaping the > page->slab_cache field, no? > > AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the > overlap impossible, thus the struct slab will grow as rcu_head grows, and > offsets of the later fields stop matching struct page counterparts, which > doesn't grow. Hmm. The following has made things compile for both SLUB and SLAB (didn't adjust SLOB), the idea is to define more explicitly that everything can overlap with rcu_head except slab_cache. So the structures don't even grow in the end. Which requires moving slab_cache field around. Which would be fine except for SLUB and its cmpxchg_double usage, which requires the freelist+counter fields to be 128bit aligned, which means we can't fit it all with these constraints. So that's broken with the patch and I don't see an easy way to fix this. But maybe it can unblock your current debugging efforts if you use SLAB, or disable the cmpxchg_double in SLUB (at some perf cost). ----8<---- diff --git a/include/linux/types.h b/include/linux/types.h index ea8cf60a8a79..daf7682a7a3b 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -220,6 +220,7 @@ struct ustat { struct callback_head { struct callback_head *next; void (*func)(struct callback_head *head); + char buf[4]; } __attribute__((aligned(sizeof(void *)))); #define rcu_head callback_head diff --git a/mm/slab.h b/mm/slab.h index 4ec82bec15ec..9fdbfcc401c0 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -12,37 +12,42 @@ struct slab { #if defined(CONFIG_SLAB) union { - struct list_head slab_list; + struct { + struct list_head slab_list; + void *freelist; /* array of free object indexes */ + void *s_mem; /* first object */ + }; struct rcu_head rcu_head; }; struct kmem_cache *slab_cache; - void *freelist; /* array of free object indexes */ - void *s_mem; /* first object */ unsigned int active; #elif defined(CONFIG_SLUB) union { - struct list_head slab_list; - struct rcu_head rcu_head; -#ifdef CONFIG_SLUB_CPU_PARTIAL struct { - struct slab *next; - int slabs; /* Nr of slabs left */ - }; + union { + struct list_head slab_list; +#ifdef CONFIG_SLUB_CPU_PARTIAL + struct { + struct slab *next; + int slabs; /* Nr of slabs left */ + }; #endif - }; - struct kmem_cache *slab_cache; - /* Double-word boundary */ - void *freelist; /* first free object */ - union { - unsigned long counters; - struct { - unsigned inuse:16; - unsigned objects:15; - unsigned frozen:1; + }; + void *freelist; /* first free object */ + union { + unsigned long counters; + struct { + unsigned inuse:16; + unsigned objects:15; + unsigned frozen:1; + }; + }; }; + struct rcu_head rcu_head; }; + struct kmem_cache *slab_cache; unsigned int __unused; #elif defined(CONFIG_SLOB)
On 8/25/22 09:50, Vlastimil Babka wrote: > > On 8/25/22 09:13, Vlastimil Babka wrote: >> On 8/25/22 04:56, Joel Fernandes wrote: >>> Hi, >>> I am trying to add some debug information to rcu_head for some patches, and need >>> your help! At least this used to work some time ago (> 1 year). >> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12) >> struct page definition: >> >> struct rcu_head rcu_head; >> >> is in union with the whole of >> >> struct { /* slab, slob and slub */ ... } >> >> which starts with >> >> union { struct list_head slab_list; ... } >> struct kmem_cache *slab_cache; /* not slob */ >> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does >> >> page = container_of(head, struct page, rcu_head); >> cachep = page->slab_cache; >> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the >> rcu_head and slab_cache fields can coexist. >> >> So this looks silently fragile to me? In case rcu_head becomes larger than >> list_head, e.g. with your buf[4], it would start overlaping the >> page->slab_cache field, no? >> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the >> overlap impossible, thus the struct slab will grow as rcu_head grows, and >> offsets of the later fields stop matching struct page counterparts, which >> doesn't grow. Hmm. > > The following has made things compile for both SLUB and SLAB (didn't adjust > SLOB), the idea is to define more explicitly that everything can overlap > with rcu_head except slab_cache. So the structures don't even grow in the > end. Which requires moving slab_cache field around. Which would be fine > except for SLUB and its cmpxchg_double usage, which requires the > freelist+counter fields to be 128bit aligned, which means we can't fit it > all with these constraints. So that's broken with the patch and I don't see > an easy way to fix this. Ah maybe I found a way. It requires that rcu_head in struct slab would be at different offset in struct page, and we drop the particular SLAB_MATCH check. I think it should be fine as we don't cast struct slab to struct page in the rcu calls/callbacks so it was probably there just for intermediate steps of the struct slab series, and can be dropped now. So this should allow you to use up to 32 bytes of rcu_head. ----8<---- diff --git a/include/linux/types.h b/include/linux/types.h index ea8cf60a8a79..daf7682a7a3b 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -220,6 +220,7 @@ struct ustat { struct callback_head { struct callback_head *next; void (*func)(struct callback_head *head); + char buf[4]; } __attribute__((aligned(sizeof(void *)))); #define rcu_head callback_head diff --git a/mm/slab.h b/mm/slab.h index 4ec82bec15ec..dd49851f9814 100644 --- a/mm/slab.h +++ b/mm/slab.h @@ -12,37 +12,45 @@ struct slab { #if defined(CONFIG_SLAB) union { - struct list_head slab_list; + struct { + struct list_head slab_list; + void *freelist; /* array of free object indexes */ + void *s_mem; /* first object */ + }; struct rcu_head rcu_head; }; struct kmem_cache *slab_cache; - void *freelist; /* array of free object indexes */ - void *s_mem; /* first object */ unsigned int active; #elif defined(CONFIG_SLUB) - union { - struct list_head slab_list; - struct rcu_head rcu_head; -#ifdef CONFIG_SLUB_CPU_PARTIAL - struct { - struct slab *next; - int slabs; /* Nr of slabs left */ - }; -#endif - }; struct kmem_cache *slab_cache; - /* Double-word boundary */ - void *freelist; /* first free object */ + union { - unsigned long counters; struct { - unsigned inuse:16; - unsigned objects:15; - unsigned frozen:1; + union { + struct list_head slab_list; +#ifdef CONFIG_SLUB_CPU_PARTIAL + struct { + struct slab *next; + int slabs; /* Nr of slabs left */ + }; +#endif + }; + /* Double-word boundary */ + void *freelist; /* first free object */ + union { + unsigned long counters; + struct { + unsigned inuse:16; + unsigned objects:15; + unsigned frozen:1; + }; + }; }; + struct rcu_head rcu_head; }; + unsigned int __unused; #elif defined(CONFIG_SLOB) @@ -66,9 +74,13 @@ struct slab { #define SLAB_MATCH(pg, sl) \ static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) SLAB_MATCH(flags, __page_flags); +#ifdef CONFIG_SLUB +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ +#else SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ +#endif #ifndef CONFIG_SLOB -SLAB_MATCH(rcu_head, rcu_head); +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ #endif SLAB_MATCH(_refcount, __page_refcount); #ifdef CONFIG_MEMCG
On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote: > > On 8/25/22 09:50, Vlastimil Babka wrote: > > > > On 8/25/22 09:13, Vlastimil Babka wrote: > >> On 8/25/22 04:56, Joel Fernandes wrote: > >>> Hi, > >>> I am trying to add some debug information to rcu_head for some patches, and need > >>> your help! At least this used to work some time ago (> 1 year). > >> > >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12) > >> struct page definition: > >> > >> struct rcu_head rcu_head; > >> > >> is in union with the whole of > >> > >> struct { /* slab, slob and slub */ ... } > >> > >> which starts with > >> > >> union { struct list_head slab_list; ... } > >> struct kmem_cache *slab_cache; /* not slob */ > >> > >> and there's e.g. kmem_rcu_free() in mm/slab.c that does > >> > >> page = container_of(head, struct page, rcu_head); > >> cachep = page->slab_cache; > >> > >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the > >> rcu_head and slab_cache fields can coexist. > >> > >> So this looks silently fragile to me? In case rcu_head becomes larger than > >> list_head, e.g. with your buf[4], it would start overlaping the > >> page->slab_cache field, no? > >> > >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the > >> overlap impossible, thus the struct slab will grow as rcu_head grows, and > >> offsets of the later fields stop matching struct page counterparts, which > >> doesn't grow. Hmm. > > > > The following has made things compile for both SLUB and SLAB (didn't adjust > > SLOB), the idea is to define more explicitly that everything can overlap > > with rcu_head except slab_cache. So the structures don't even grow in the > > end. Which requires moving slab_cache field around. Which would be fine > > except for SLUB and its cmpxchg_double usage, which requires the > > freelist+counter fields to be 128bit aligned, which means we can't fit it > > all with these constraints. So that's broken with the patch and I don't see > > an easy way to fix this. > > Ah maybe I found a way. It requires that rcu_head in struct slab would be at > different offset in struct page, and we drop the particular SLAB_MATCH > check. I think it should be fine as we don't cast struct slab to struct page > in the rcu calls/callbacks so it was probably there just for intermediate > steps of the struct slab series, and can be dropped now. > > So this should allow you to use up to 32 bytes of rcu_head. Thanks a lot, Vlastimil! I can confirm it builds, some comments below: > ----8<---- > diff --git a/include/linux/types.h b/include/linux/types.h > index ea8cf60a8a79..daf7682a7a3b 100644 > --- a/include/linux/types.h > +++ b/include/linux/types.h > @@ -220,6 +220,7 @@ struct ustat { > struct callback_head { > struct callback_head *next; > void (*func)(struct callback_head *head); > + char buf[4]; > } __attribute__((aligned(sizeof(void *)))); > #define rcu_head callback_head > > diff --git a/mm/slab.h b/mm/slab.h > index 4ec82bec15ec..dd49851f9814 100644 > --- a/mm/slab.h > +++ b/mm/slab.h > @@ -12,37 +12,45 @@ struct slab { > #if defined(CONFIG_SLAB) > > union { > - struct list_head slab_list; > + struct { > + struct list_head slab_list; > + void *freelist; /* array of free object indexes */ > + void *s_mem; /* first object */ > + }; > struct rcu_head rcu_head; Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab being in use before the end of a grace-period? I was concerned about the freelist being corrupted by the rcu_head storage, since its now in a union with it. If this does not matter, then that's all good. > @@ -66,9 +74,13 @@ struct slab { > #define SLAB_MATCH(pg, sl) \ > static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) > SLAB_MATCH(flags, __page_flags); > +#ifdef CONFIG_SLUB > +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ > +#else > SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ > +#endif > #ifndef CONFIG_SLOB > -SLAB_MATCH(rcu_head, rcu_head); > +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ I think if it helps, I can make this check as a dependency on a default-off debug config option. Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> thanks, - Joel
On 8/25/22 17:00, Joel Fernandes wrote: > On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote: >> >> On 8/25/22 09:50, Vlastimil Babka wrote: >> > >> > On 8/25/22 09:13, Vlastimil Babka wrote: >> >> On 8/25/22 04:56, Joel Fernandes wrote: >> >>> Hi, >> >>> I am trying to add some debug information to rcu_head for some patches, and need >> >>> your help! At least this used to work some time ago (> 1 year). >> >> >> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12) >> >> struct page definition: >> >> >> >> struct rcu_head rcu_head; >> >> >> >> is in union with the whole of >> >> >> >> struct { /* slab, slob and slub */ ... } >> >> >> >> which starts with >> >> >> >> union { struct list_head slab_list; ... } >> >> struct kmem_cache *slab_cache; /* not slob */ >> >> >> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does >> >> >> >> page = container_of(head, struct page, rcu_head); >> >> cachep = page->slab_cache; >> >> >> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the >> >> rcu_head and slab_cache fields can coexist. >> >> >> >> So this looks silently fragile to me? In case rcu_head becomes larger than >> >> list_head, e.g. with your buf[4], it would start overlaping the >> >> page->slab_cache field, no? >> >> >> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the >> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and >> >> offsets of the later fields stop matching struct page counterparts, which >> >> doesn't grow. Hmm. >> > >> > The following has made things compile for both SLUB and SLAB (didn't adjust >> > SLOB), the idea is to define more explicitly that everything can overlap >> > with rcu_head except slab_cache. So the structures don't even grow in the >> > end. Which requires moving slab_cache field around. Which would be fine >> > except for SLUB and its cmpxchg_double usage, which requires the >> > freelist+counter fields to be 128bit aligned, which means we can't fit it >> > all with these constraints. So that's broken with the patch and I don't see >> > an easy way to fix this. >> >> Ah maybe I found a way. It requires that rcu_head in struct slab would be at >> different offset in struct page, and we drop the particular SLAB_MATCH >> check. I think it should be fine as we don't cast struct slab to struct page >> in the rcu calls/callbacks so it was probably there just for intermediate >> steps of the struct slab series, and can be dropped now. >> >> So this should allow you to use up to 32 bytes of rcu_head. > > Thanks a lot, Vlastimil! I can confirm it builds, some comments below: > >> ----8<---- >> diff --git a/include/linux/types.h b/include/linux/types.h >> index ea8cf60a8a79..daf7682a7a3b 100644 >> --- a/include/linux/types.h >> +++ b/include/linux/types.h >> @@ -220,6 +220,7 @@ struct ustat { >> struct callback_head { >> struct callback_head *next; >> void (*func)(struct callback_head *head); >> + char buf[4]; >> } __attribute__((aligned(sizeof(void *)))); >> #define rcu_head callback_head >> >> diff --git a/mm/slab.h b/mm/slab.h >> index 4ec82bec15ec..dd49851f9814 100644 >> --- a/mm/slab.h >> +++ b/mm/slab.h >> @@ -12,37 +12,45 @@ struct slab { >> #if defined(CONFIG_SLAB) >> >> union { >> - struct list_head slab_list; >> + struct { >> + struct list_head slab_list; >> + void *freelist; /* array of free object indexes */ >> + void *s_mem; /* first object */ >> + }; >> struct rcu_head rcu_head; > > Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab > being in use before the end of a grace-period? I was concerned about the > freelist being corrupted by the rcu_head storage, since its now in a union > with it. If this does not matter, then that's all good. Good point, I didn't check that. If we're freeing the slab then nothing should be using it at that point, including the freelist pointer. Only the actual object storage is guaranteed not to go away or become overwritten for SLAB_TYPESAFE_BY_RCU caches. But we need to check the slab freeing functions themselves. It looks like SLAB is fine, slab_destroy() there takes care to copy the freelist pointer away before call_rcu() already and then it only works with the copy. Also nothing should need s_mem either, in case you need more bytes for the rcu_head debug. SLUB seems also fine as the freeing doesn't try to look at the freelist even in case of debugging consistency checks being enabled. It could be a problem in case of a double-free, but that's already a problem anyway. In case you need more than the 8 bytes of freelist pointer, then overwriting the next 'counters' union would be a problem for the consistency checks as it reads slab->objects from there. But I guess we could do those checks before call_rcu, so the callback would then only free the slab page. The state expected by those checks doesn't depend on the grace period expiring. >> @@ -66,9 +74,13 @@ struct slab { >> #define SLAB_MATCH(pg, sl) \ >> static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) >> SLAB_MATCH(flags, __page_flags); >> +#ifdef CONFIG_SLUB >> +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ >> +#else >> SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ >> +#endif >> #ifndef CONFIG_SLOB >> -SLAB_MATCH(rcu_head, rcu_head); >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ > > I think if it helps, I can make this check as a dependency on a default-off > debug config option. It wouldn't help as with this patch, for SLUB the rcu_head would start at different offset regardless of whether you enable the option to increase its size. > Tested-by: Joel Fernandes (Google) <joel@joelfernandes.org> > > thanks, > > - Joel >
On Thu, Aug 25, 2022 at 05:33:49PM +0200, Vlastimil Babka wrote: > On 8/25/22 17:00, Joel Fernandes wrote: > > On Thu, Aug 25, 2022 at 10:14:16AM +0200, Vlastimil Babka wrote: > >> > >> On 8/25/22 09:50, Vlastimil Babka wrote: > >> > > >> > On 8/25/22 09:13, Vlastimil Babka wrote: > >> >> On 8/25/22 04:56, Joel Fernandes wrote: > >> >>> Hi, > >> >>> I am trying to add some debug information to rcu_head for some patches, and need > >> >>> your help! At least this used to work some time ago (> 1 year). > >> >> > >> >> Hmm, got a patch of how it worked back then? Because what I see (in v5.12) > >> >> struct page definition: > >> >> > >> >> struct rcu_head rcu_head; > >> >> > >> >> is in union with the whole of > >> >> > >> >> struct { /* slab, slob and slub */ ... } > >> >> > >> >> which starts with > >> >> > >> >> union { struct list_head slab_list; ... } > >> >> struct kmem_cache *slab_cache; /* not slob */ > >> >> > >> >> and there's e.g. kmem_rcu_free() in mm/slab.c that does > >> >> > >> >> page = container_of(head, struct page, rcu_head); > >> >> cachep = page->slab_cache; > >> >> > >> >> and very similar thing in mm/slub.c rcu_free_slab() - we assume that the > >> >> rcu_head and slab_cache fields can coexist. > >> >> > >> >> So this looks silently fragile to me? In case rcu_head becomes larger than > >> >> list_head, e.g. with your buf[4], it would start overlaping the > >> >> page->slab_cache field, no? > >> >> > >> >> AFAICS the way it's structured now in struct slab in mm/slab.h, it makes the > >> >> overlap impossible, thus the struct slab will grow as rcu_head grows, and > >> >> offsets of the later fields stop matching struct page counterparts, which > >> >> doesn't grow. Hmm. > >> > > >> > The following has made things compile for both SLUB and SLAB (didn't adjust > >> > SLOB), the idea is to define more explicitly that everything can overlap > >> > with rcu_head except slab_cache. So the structures don't even grow in the > >> > end. Which requires moving slab_cache field around. Which would be fine > >> > except for SLUB and its cmpxchg_double usage, which requires the > >> > freelist+counter fields to be 128bit aligned, which means we can't fit it > >> > all with these constraints. So that's broken with the patch and I don't see > >> > an easy way to fix this. > >> > >> Ah maybe I found a way. It requires that rcu_head in struct slab would be at > >> different offset in struct page, and we drop the particular SLAB_MATCH > >> check. I think it should be fine as we don't cast struct slab to struct page > >> in the rcu calls/callbacks so it was probably there just for intermediate > >> steps of the struct slab series, and can be dropped now. > >> > >> So this should allow you to use up to 32 bytes of rcu_head. > > > > Thanks a lot, Vlastimil! I can confirm it builds, some comments below: > > > >> ----8<---- > >> diff --git a/include/linux/types.h b/include/linux/types.h > >> index ea8cf60a8a79..daf7682a7a3b 100644 > >> --- a/include/linux/types.h > >> +++ b/include/linux/types.h > >> @@ -220,6 +220,7 @@ struct ustat { > >> struct callback_head { > >> struct callback_head *next; > >> void (*func)(struct callback_head *head); > >> + char buf[4]; > >> } __attribute__((aligned(sizeof(void *)))); > >> #define rcu_head callback_head > >> > >> diff --git a/mm/slab.h b/mm/slab.h > >> index 4ec82bec15ec..dd49851f9814 100644 > >> --- a/mm/slab.h > >> +++ b/mm/slab.h > >> @@ -12,37 +12,45 @@ struct slab { > >> #if defined(CONFIG_SLAB) > >> > >> union { > >> - struct list_head slab_list; > >> + struct { > >> + struct list_head slab_list; > >> + void *freelist; /* array of free object indexes */ > >> + void *s_mem; /* first object */ > >> + }; > >> struct rcu_head rcu_head; > > > > Does this present any problems for SLAB_TYPESAFE_BY_RCU such as the slab > > being in use before the end of a grace-period? I was concerned about the > > freelist being corrupted by the rcu_head storage, since its now in a union > > with it. If this does not matter, then that's all good. > > Good point, I didn't check that. If we're freeing the slab then nothing > should be using it at that point, including the freelist pointer. Only the > actual object storage is guaranteed not to go away or become overwritten for > SLAB_TYPESAFE_BY_RCU caches. Cool, sounds good. > But we need to check the slab freeing functions themselves. > It looks like SLAB is fine, slab_destroy() there takes care to copy the > freelist pointer away before call_rcu() already and then it only works with > the copy. Also nothing should need s_mem either, in case you need more bytes > for the rcu_head debug. > > SLUB seems also fine as the freeing doesn't try to look at the freelist even > in case of debugging consistency checks being enabled. It could be a problem > in case of a double-free, but that's already a problem anyway. > In case you need more than the 8 bytes of freelist pointer, then overwriting > the next 'counters' union would be a problem for the consistency checks as > it reads slab->objects from there. > But I guess we could do those checks before call_rcu, so the callback would > then only free the slab page. The state expected by those checks doesn't > depend on the grace period expiring. Ok, sounds good, I can run it for a few days on real hardware and see if something breaks or such. The most I can get with your patch is 12 bytes before the FOLIO checks start complainting. I can stick to 8 bytes, and perhaps we can allocate a separate debug object if we need more or something. > >> @@ -66,9 +74,13 @@ struct slab { > >> #define SLAB_MATCH(pg, sl) \ > >> static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) > >> SLAB_MATCH(flags, __page_flags); > >> +#ifdef CONFIG_SLUB > >> +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ > >> +#else > >> SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ > >> +#endif > >> #ifndef CONFIG_SLOB > >> -SLAB_MATCH(rcu_head, rcu_head); > >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ > > > > I think if it helps, I can make this check as a dependency on a default-off > > debug config option. > > It wouldn't help as with this patch, for SLUB the rcu_head would start at > different offset regardless of whether you enable the option to increase its > size. True, unless we maintained 2 versions of the structs based on CONFIGs but that's really ugly. Thanks so much for this patch, its surely helpful for debugging purposes and I look forward to testing it more and providing you any feedback! thanks, - Joel
On 8/25/22 21:51, Joel Fernandes wrote: >> But we need to check the slab freeing functions themselves. >> It looks like SLAB is fine, slab_destroy() there takes care to copy the >> freelist pointer away before call_rcu() already and then it only works with >> the copy. Also nothing should need s_mem either, in case you need more bytes >> for the rcu_head debug. >> >> SLUB seems also fine as the freeing doesn't try to look at the freelist even >> in case of debugging consistency checks being enabled. It could be a problem >> in case of a double-free, but that's already a problem anyway. >> In case you need more than the 8 bytes of freelist pointer, then overwriting >> the next 'counters' union would be a problem for the consistency checks as >> it reads slab->objects from there. >> But I guess we could do those checks before call_rcu, so the callback would >> then only free the slab page. The state expected by those checks doesn't >> depend on the grace period expiring. > > Ok, sounds good, I can run it for a few days on real hardware and see if > something breaks or such. The most I can get with your patch is 12 bytes > before the FOLIO checks start complainting. I can stick to 8 bytes, and > perhaps we can allocate a separate debug object if we need more or something. Weird, it works for me with 16 bytes, both SLAB and SLUB. >> >> @@ -66,9 +74,13 @@ struct slab { >> >> #define SLAB_MATCH(pg, sl) \ >> >> static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) >> >> SLAB_MATCH(flags, __page_flags); >> >> +#ifdef CONFIG_SLUB >> >> +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ >> >> +#else >> >> SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ >> >> +#endif >> >> #ifndef CONFIG_SLOB >> >> -SLAB_MATCH(rcu_head, rcu_head); >> >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ >> > >> > I think if it helps, I can make this check as a dependency on a default-off >> > debug config option. >> >> It wouldn't help as with this patch, for SLUB the rcu_head would start at >> different offset regardless of whether you enable the option to increase its >> size. > > True, unless we maintained 2 versions of the structs based on CONFIGs but > that's really ugly. > > Thanks so much for this patch, its surely helpful for debugging purposes and > I look forward to testing it more and providing you any feedback! I'll send a formal RFC soon.
On 8/26/22 10:35, Vlastimil Babka wrote: > On 8/25/22 21:51, Joel Fernandes wrote: >>> But we need to check the slab freeing functions themselves. >>> It looks like SLAB is fine, slab_destroy() there takes care to copy the >>> freelist pointer away before call_rcu() already and then it only works with >>> the copy. Also nothing should need s_mem either, in case you need more bytes >>> for the rcu_head debug. >>> >>> SLUB seems also fine as the freeing doesn't try to look at the freelist even >>> in case of debugging consistency checks being enabled. It could be a problem >>> in case of a double-free, but that's already a problem anyway. >>> In case you need more than the 8 bytes of freelist pointer, then overwriting >>> the next 'counters' union would be a problem for the consistency checks as >>> it reads slab->objects from there. >>> But I guess we could do those checks before call_rcu, so the callback would >>> then only free the slab page. The state expected by those checks doesn't >>> depend on the grace period expiring. >> >> Ok, sounds good, I can run it for a few days on real hardware and see if >> something breaks or such. The most I can get with your patch is 12 bytes >> before the FOLIO checks start complainting. I can stick to 8 bytes, and >> perhaps we can allocate a separate debug object if we need more or something. > > Weird, it works for me with 16 bytes, both SLAB and SLUB. Oh, I see, it breaks in some 32bit vdso object, in the struct page's asserts, not slab's. Likely some false positive due to header creep, I doubt the vdso really works with 32bit struct pages as those don't exist on 64 bit kernel. Maybe we should move the asserts from the header to e.g. page_alloc.c, willy? > >>> >> @@ -66,9 +74,13 @@ struct slab { >>> >> #define SLAB_MATCH(pg, sl) \ >>> >> static_assert(offsetof(struct page, pg) == offsetof(struct slab, sl)) >>> >> SLAB_MATCH(flags, __page_flags); >>> >> +#ifdef CONFIG_SLUB >>> >> +SLAB_MATCH(compound_head, slab_cache); /* Ensure bit 0 is clear */ >>> >> +#else >>> >> SLAB_MATCH(compound_head, slab_list); /* Ensure bit 0 is clear */ >>> >> +#endif >>> >> #ifndef CONFIG_SLOB >>> >> -SLAB_MATCH(rcu_head, rcu_head); >>> >> +// SLAB_MATCH(rcu_head, rcu_head); /* not necessary, hopefully? */ >>> > >>> > I think if it helps, I can make this check as a dependency on a default-off >>> > debug config option. >>> >>> It wouldn't help as with this patch, for SLUB the rcu_head would start at >>> different offset regardless of whether you enable the option to increase its >>> size. >> >> True, unless we maintained 2 versions of the structs based on CONFIGs but >> that's really ugly. >> >> Thanks so much for this patch, its surely helpful for debugging purposes and >> I look forward to testing it more and providing you any feedback! > > I'll send a formal RFC soon. >
diff --git a/include/linux/types.h b/include/linux/types.h index ea8cf60a8a79..daf7682a7a3b 100644 --- a/include/linux/types.h +++ b/include/linux/types.h @@ -220,6 +220,7 @@ struct ustat { struct callback_head { struct callback_head *next; void (*func)(struct callback_head *head); + char buf[4]; } __attribute__((aligned(sizeof(void *)))); #define rcu_head callback_head ---