Message ID | 20220919031241.1358001-1-feng.tang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm/slab_common: fix possiable double free of kmem_cache | expand |
On 9/19/22 05:12, Feng Tang wrote: > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' > kunit test case cause a use-after-free error: > > BUG: KASAN: use-after-free in kobject_del+0x14/0x30 > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 > > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > Call Trace: > <TASK> > dump_stack_lvl+0x34/0x48 > print_address_description.constprop.0+0x87/0x2a5 > print_report+0x103/0x1ed > kasan_report+0xb7/0x140 > kobject_del+0x14/0x30 > kmem_cache_destroy+0x130/0x170 > test_exit+0x1a/0x30 > kunit_try_run_case+0xad/0xc0 > kunit_generic_run_threadfn_adapter+0x26/0x50 > kthread+0x17b/0x1b0 > </TASK> > > The cause is inside kmem_cache_destroy(): > > kmem_cache_destroy > acquire lock/mutex > shutdown_cache > schedule_work(kmem_cache_release) (if RCU flag set) > release lock/mutex > kmem_cache_release (if RCU flag set) ^ not set I've fixed that up. > > in some certain timing, the scheduled work could be run before > the next RCU flag checking which will get a wrong state. > > Fix it by caching the RCU flag inside protected area, just like 'refcnt' > > Signed-off-by: Feng Tang <feng.tang@intel.com> Thanks! > --- > > note: > > The error only happens on linux-next tree, and not in Linus' tree, > which already has Waiman's commit: > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() > without holding slab_mutex/cpu_hotplug_lock") Actually that commit is already in Linus' rc5 too, so I will send your fix this week too. Added a Fixes: 0495e337b703 (...) too. > mm/slab_common.c | 5 ++++- > 1 file changed, 4 insertions(+), 1 deletion(-) > > diff --git a/mm/slab_common.c b/mm/slab_common.c > index 07b948288f84..ccc02573588f 100644 > --- a/mm/slab_common.c > +++ b/mm/slab_common.c > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) > void kmem_cache_destroy(struct kmem_cache *s) > { > int refcnt; > + bool rcu_set; > > if (unlikely(!s) || !kasan_check_byte(s)) > return; > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) > cpus_read_lock(); > mutex_lock(&slab_mutex); > > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; > + > refcnt = --s->refcount; > if (refcnt) > goto out_unlock; > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) > out_unlock: > mutex_unlock(&slab_mutex); > cpus_read_unlock(); > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) > + if (!refcnt && !rcu_set) > kmem_cache_release(s); > } > EXPORT_SYMBOL(kmem_cache_destroy);
On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote: > On 9/19/22 05:12, Feng Tang wrote: > > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' > > kunit test case cause a use-after-free error: > > If I'm not mistaken, I think the subject should be: s/double free/use after free/g > > BUG: KASAN: use-after-free in kobject_del+0x14/0x30 > > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 > > > > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 > > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > > Call Trace: > > <TASK> > > dump_stack_lvl+0x34/0x48 > > print_address_description.constprop.0+0x87/0x2a5 > > print_report+0x103/0x1ed > > kasan_report+0xb7/0x140 > > kobject_del+0x14/0x30 > > kmem_cache_destroy+0x130/0x170 > > test_exit+0x1a/0x30 > > kunit_try_run_case+0xad/0xc0 > > kunit_generic_run_threadfn_adapter+0x26/0x50 > > kthread+0x17b/0x1b0 > > </TASK> > > > > The cause is inside kmem_cache_destroy(): > > > > kmem_cache_destroy > > acquire lock/mutex > > shutdown_cache > > schedule_work(kmem_cache_release) (if RCU flag set) > > release lock/mutex > > kmem_cache_release (if RCU flag set) > > ^ not set > > I've fixed that up. > > > > > in some certain timing, the scheduled work could be run before > > the next RCU flag checking which will get a wrong state. > > > > Fix it by caching the RCU flag inside protected area, just like 'refcnt' Very nice catch, thanks! Otherwise (and with Vlastimil's fix): Looks good to me. Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > Thanks! > > > --- > > > > note: > > > > The error only happens on linux-next tree, and not in Linus' tree, > > which already has Waiman's commit: > > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() > > without holding slab_mutex/cpu_hotplug_lock") > > Actually that commit is already in Linus' rc5 too, so I will send your fix > this week too. Added a Fixes: 0495e337b703 (...) too. > > > mm/slab_common.c | 5 ++++- > > 1 file changed, 4 insertions(+), 1 deletion(-) > > > > diff --git a/mm/slab_common.c b/mm/slab_common.c > > index 07b948288f84..ccc02573588f 100644 > > --- a/mm/slab_common.c > > +++ b/mm/slab_common.c > > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) > > void kmem_cache_destroy(struct kmem_cache *s) > > { > > int refcnt; > > + bool rcu_set; > > > > if (unlikely(!s) || !kasan_check_byte(s)) > > return; > > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) > > cpus_read_lock(); > > mutex_lock(&slab_mutex); > > > > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; > > + > > refcnt = --s->refcount; > > if (refcnt) > > goto out_unlock; > > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) > > out_unlock: > > mutex_unlock(&slab_mutex); > > cpus_read_unlock(); > > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) > > + if (!refcnt && !rcu_set) > > kmem_cache_release(s); > > } > > EXPORT_SYMBOL(kmem_cache_destroy); >
On 9/19/22 13:56, Hyeonggon Yoo wrote: > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote: >> On 9/19/22 05:12, Feng Tang wrote: >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' >> > kunit test case cause a use-after-free error: >> > > > If I'm not mistaken, I think the subject should be: > s/double free/use after free/g Well, it's both AFAICS. By the initial use-after-free we can read a wrong s->flags that was modified since we freed for the first time, and it can lead to another kmem_cache_release() which is basically a double free. >> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30 >> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 >> > >> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 >> > Call Trace: >> > <TASK> >> > dump_stack_lvl+0x34/0x48 >> > print_address_description.constprop.0+0x87/0x2a5 >> > print_report+0x103/0x1ed >> > kasan_report+0xb7/0x140 >> > kobject_del+0x14/0x30 >> > kmem_cache_destroy+0x130/0x170 >> > test_exit+0x1a/0x30 >> > kunit_try_run_case+0xad/0xc0 >> > kunit_generic_run_threadfn_adapter+0x26/0x50 >> > kthread+0x17b/0x1b0 >> > </TASK> >> > >> > The cause is inside kmem_cache_destroy(): >> > >> > kmem_cache_destroy >> > acquire lock/mutex >> > shutdown_cache >> > schedule_work(kmem_cache_release) (if RCU flag set) >> > release lock/mutex >> > kmem_cache_release (if RCU flag set) >> >> ^ not set >> >> I've fixed that up. >> >> > >> > in some certain timing, the scheduled work could be run before >> > the next RCU flag checking which will get a wrong state. >> > >> > Fix it by caching the RCU flag inside protected area, just like 'refcnt' > > Very nice catch, thanks! > > Otherwise (and with Vlastimil's fix): > > Looks good to me. > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > >> > >> > Signed-off-by: Feng Tang <feng.tang@intel.com> >> >> Thanks! >> >> > --- >> > >> > note: >> > >> > The error only happens on linux-next tree, and not in Linus' tree, >> > which already has Waiman's commit: >> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() >> > without holding slab_mutex/cpu_hotplug_lock") >> >> Actually that commit is already in Linus' rc5 too, so I will send your fix >> this week too. Added a Fixes: 0495e337b703 (...) too. >> >> > mm/slab_common.c | 5 ++++- >> > 1 file changed, 4 insertions(+), 1 deletion(-) >> > >> > diff --git a/mm/slab_common.c b/mm/slab_common.c >> > index 07b948288f84..ccc02573588f 100644 >> > --- a/mm/slab_common.c >> > +++ b/mm/slab_common.c >> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) >> > void kmem_cache_destroy(struct kmem_cache *s) >> > { >> > int refcnt; >> > + bool rcu_set; >> > >> > if (unlikely(!s) || !kasan_check_byte(s)) >> > return; >> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) >> > cpus_read_lock(); >> > mutex_lock(&slab_mutex); >> > >> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; >> > + >> > refcnt = --s->refcount; >> > if (refcnt) >> > goto out_unlock; >> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) >> > out_unlock: >> > mutex_unlock(&slab_mutex); >> > cpus_read_unlock(); >> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) >> > + if (!refcnt && !rcu_set) >> > kmem_cache_release(s); >> > } >> > EXPORT_SYMBOL(kmem_cache_destroy); >> >
On Mon, Sep 19, 2022 at 02:03:15PM +0200, Vlastimil Babka wrote: > On 9/19/22 13:56, Hyeonggon Yoo wrote: > > On Mon, Sep 19, 2022 at 11:12:38AM +0200, Vlastimil Babka wrote: > >> On 9/19/22 05:12, Feng Tang wrote: > >> > When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' > >> > kunit test case cause a use-after-free error: > >> > > > > > If I'm not mistaken, I think the subject should be: > > s/double free/use after free/g > > Well, it's both AFAICS. By the initial use-after-free we can read a wrong > s->flags that was modified since we freed for the first time, and it can > lead to another kmem_cache_release() which is basically a double free. > Yeah, I realized that just after sending the mail ;) it is use-after-free bug that can potentially lead to double free. Thank you for correction! > >> > BUG: KASAN: use-after-free in kobject_del+0x14/0x30 > >> > Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 > >> > > >> > CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 > >> > Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 > >> > Call Trace: > >> > <TASK> > >> > dump_stack_lvl+0x34/0x48 > >> > print_address_description.constprop.0+0x87/0x2a5 > >> > print_report+0x103/0x1ed > >> > kasan_report+0xb7/0x140 > >> > kobject_del+0x14/0x30 > >> > kmem_cache_destroy+0x130/0x170 > >> > test_exit+0x1a/0x30 > >> > kunit_try_run_case+0xad/0xc0 > >> > kunit_generic_run_threadfn_adapter+0x26/0x50 > >> > kthread+0x17b/0x1b0 > >> > </TASK> > >> > > >> > The cause is inside kmem_cache_destroy(): > >> > > >> > kmem_cache_destroy > >> > acquire lock/mutex > >> > shutdown_cache > >> > schedule_work(kmem_cache_release) (if RCU flag set) > >> > release lock/mutex > >> > kmem_cache_release (if RCU flag set) > >> > >> ^ not set > >> > >> I've fixed that up. > >> > >> > > >> > in some certain timing, the scheduled work could be run before > >> > the next RCU flag checking which will get a wrong state. > >> > > >> > Fix it by caching the RCU flag inside protected area, just like 'refcnt' > > > > Very nice catch, thanks! > > > > Otherwise (and with Vlastimil's fix): > > > > Looks good to me. > > Reviewed-by: Hyeonggon Yoo <42.hyeyoo@gmail.com> > > > >> > > >> > Signed-off-by: Feng Tang <feng.tang@intel.com> > >> > >> Thanks! > >> > >> > --- > >> > > >> > note: > >> > > >> > The error only happens on linux-next tree, and not in Linus' tree, > >> > which already has Waiman's commit: > >> > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() > >> > without holding slab_mutex/cpu_hotplug_lock") > >> > >> Actually that commit is already in Linus' rc5 too, so I will send your fix > >> this week too. Added a Fixes: 0495e337b703 (...) too. > >> > >> > mm/slab_common.c | 5 ++++- > >> > 1 file changed, 4 insertions(+), 1 deletion(-) > >> > > >> > diff --git a/mm/slab_common.c b/mm/slab_common.c > >> > index 07b948288f84..ccc02573588f 100644 > >> > --- a/mm/slab_common.c > >> > +++ b/mm/slab_common.c > >> > @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) > >> > void kmem_cache_destroy(struct kmem_cache *s) > >> > { > >> > int refcnt; > >> > + bool rcu_set; > >> > > >> > if (unlikely(!s) || !kasan_check_byte(s)) > >> > return; > >> > @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) > >> > cpus_read_lock(); > >> > mutex_lock(&slab_mutex); > >> > > >> > + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; > >> > + > >> > refcnt = --s->refcount; > >> > if (refcnt) > >> > goto out_unlock; > >> > @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) > >> > out_unlock: > >> > mutex_unlock(&slab_mutex); > >> > cpus_read_unlock(); > >> > - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) > >> > + if (!refcnt && !rcu_set) > >> > kmem_cache_release(s); > >> > } > >> > EXPORT_SYMBOL(kmem_cache_destroy); > >> > > >
On Mon, Sep 19, 2022 at 05:12:38PM +0800, Vlastimil Babka wrote: > On 9/19/22 05:12, Feng Tang wrote: [...] > > The cause is inside kmem_cache_destroy(): > > > > kmem_cache_destroy > > acquire lock/mutex > > shutdown_cache > > schedule_work(kmem_cache_release) (if RCU flag set) > > release lock/mutex > > kmem_cache_release (if RCU flag set) > > ^ not set > > I've fixed that up. Oops.. Thanks for catching it! > > > > in some certain timing, the scheduled work could be run before > > the next RCU flag checking which will get a wrong state. > > > > Fix it by caching the RCU flag inside protected area, just like 'refcnt' > > > > Signed-off-by: Feng Tang <feng.tang@intel.com> > > Thanks! > > > --- > > > > note: > > > > The error only happens on linux-next tree, and not in Linus' tree, > > which already has Waiman's commit: > > 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() > > without holding slab_mutex/cpu_hotplug_lock") > > Actually that commit is already in Linus' rc5 too, so I will send your fix > this week too. Added a Fixes: 0495e337b703 (...) too. Got it, thanks - Feng
On 9/19/22 08:50, Feng Tang wrote: > On Mon, Sep 19, 2022 at 05:12:38PM +0800, Vlastimil Babka wrote: >> On 9/19/22 05:12, Feng Tang wrote: > [...] >>> The cause is inside kmem_cache_destroy(): >>> >>> kmem_cache_destroy >>> acquire lock/mutex >>> shutdown_cache >>> schedule_work(kmem_cache_release) (if RCU flag set) >>> release lock/mutex >>> kmem_cache_release (if RCU flag set) >> ^ not set >> >> I've fixed that up. > Oops.. Thanks for catching it! > >>> in some certain timing, the scheduled work could be run before >>> the next RCU flag checking which will get a wrong state. >>> >>> Fix it by caching the RCU flag inside protected area, just like 'refcnt' >>> >>> Signed-off-by: Feng Tang <feng.tang@intel.com> >> Thanks! >> >>> --- >>> >>> note: >>> >>> The error only happens on linux-next tree, and not in Linus' tree, >>> which already has Waiman's commit: >>> 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() >>> without holding slab_mutex/cpu_hotplug_lock") >> Actually that commit is already in Linus' rc5 too, so I will send your fix >> this week too. Added a Fixes: 0495e337b703 (...) too. > Got it, thanks > > - Feng Thanks for catching this bug. Reviewed-by: Waiman Long <longman@redhat.com> Cheers, Longman
diff --git a/mm/slab_common.c b/mm/slab_common.c index 07b948288f84..ccc02573588f 100644 --- a/mm/slab_common.c +++ b/mm/slab_common.c @@ -475,6 +475,7 @@ void slab_kmem_cache_release(struct kmem_cache *s) void kmem_cache_destroy(struct kmem_cache *s) { int refcnt; + bool rcu_set; if (unlikely(!s) || !kasan_check_byte(s)) return; @@ -482,6 +483,8 @@ void kmem_cache_destroy(struct kmem_cache *s) cpus_read_lock(); mutex_lock(&slab_mutex); + rcu_set = s->flags & SLAB_TYPESAFE_BY_RCU; + refcnt = --s->refcount; if (refcnt) goto out_unlock; @@ -492,7 +495,7 @@ void kmem_cache_destroy(struct kmem_cache *s) out_unlock: mutex_unlock(&slab_mutex); cpus_read_unlock(); - if (!refcnt && !(s->flags & SLAB_TYPESAFE_BY_RCU)) + if (!refcnt && !rcu_set) kmem_cache_release(s); } EXPORT_SYMBOL(kmem_cache_destroy);
When doing slub_debug test, kfence's 'test_memcache_typesafe_by_rcu' kunit test case cause a use-after-free error: BUG: KASAN: use-after-free in kobject_del+0x14/0x30 Read of size 8 at addr ffff888007679090 by task kunit_try_catch/261 CPU: 1 PID: 261 Comm: kunit_try_catch Tainted: G B N 6.0.0-rc5-next-20220916 #17 Hardware name: QEMU Standard PC (i440FX + PIIX, 1996), BIOS 1.15.0-1 04/01/2014 Call Trace: <TASK> dump_stack_lvl+0x34/0x48 print_address_description.constprop.0+0x87/0x2a5 print_report+0x103/0x1ed kasan_report+0xb7/0x140 kobject_del+0x14/0x30 kmem_cache_destroy+0x130/0x170 test_exit+0x1a/0x30 kunit_try_run_case+0xad/0xc0 kunit_generic_run_threadfn_adapter+0x26/0x50 kthread+0x17b/0x1b0 </TASK> The cause is inside kmem_cache_destroy(): kmem_cache_destroy acquire lock/mutex shutdown_cache schedule_work(kmem_cache_release) (if RCU flag set) release lock/mutex kmem_cache_release (if RCU flag set) in some certain timing, the scheduled work could be run before the next RCU flag checking which will get a wrong state. Fix it by caching the RCU flag inside protected area, just like 'refcnt' Signed-off-by: Feng Tang <feng.tang@intel.com> --- note: The error only happens on linux-next tree, and not in Linus' tree, which already has Waiman's commit: 0495e337b703 ("mm/slab_common: Deleting kobject in kmem_cache_destroy() without holding slab_mutex/cpu_hotplug_lock") mm/slab_common.c | 5 ++++- 1 file changed, 4 insertions(+), 1 deletion(-)