Message ID | 432a89fafce11244287c8af757e73a2eb22a5354.1702339432.git.andreyknvl@google.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | lib/stackdepot, kasan: fixes for stack eviction series | expand |
On Tue, 12 Dec 2023 at 01:14, <andrey.konovalov@linux.dev> wrote: > > From: Andrey Konovalov <andreyknvl@google.com> > > kasan_record_aux_stack can be called concurrently on the same object. > This might lead to a race condition when rotating the saved aux stack > trace handles. > > Fix by introducing a spinlock to protect the aux stack trace handles > in kasan_record_aux_stack. > > Reported-by: Tetsuo Handa <penguin-kernel@i-love.sakura.ne.jp> > Reported-by: syzbot+186b55175d8360728234@syzkaller.appspotmail.com > Closes: https://lore.kernel.org/all/000000000000784b1c060b0074a2@google.com/ > Signed-off-by: Andrey Konovalov <andreyknvl@google.com> > > --- > > This can be squashed into "kasan: use stack_depot_put for Generic mode" > or left standalone. > --- > mm/kasan/generic.c | 15 +++++++++++++-- > 1 file changed, 13 insertions(+), 2 deletions(-) > > diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c > index 54e20b2bc3e1..ca5c75a1866c 100644 > --- a/mm/kasan/generic.c > +++ b/mm/kasan/generic.c > @@ -25,6 +25,7 @@ > #include <linux/sched.h> > #include <linux/sched/task_stack.h> > #include <linux/slab.h> > +#include <linux/spinlock.h> > #include <linux/stackdepot.h> > #include <linux/stacktrace.h> > #include <linux/string.h> > @@ -35,6 +36,8 @@ > #include "kasan.h" > #include "../slab.h" > > +DEFINE_SPINLOCK(aux_lock); No, please don't. > /* > * All functions below always inlined so compiler could > * perform better optimizations in each of __asan_loadX/__assn_storeX > @@ -502,6 +505,8 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags) > struct kmem_cache *cache; > struct kasan_alloc_meta *alloc_meta; > void *object; > + depot_stack_handle_t new_handle, old_handle; > + unsigned long flags; > > if (is_kfence_address(addr) || !slab) > return; > @@ -512,9 +517,15 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags) > if (!alloc_meta) > return; > > - stack_depot_put(alloc_meta->aux_stack[1]); > + new_handle = kasan_save_stack(0, depot_flags); > + > + spin_lock_irqsave(&aux_lock, flags); This is a unnecessary global lock. What's the problem here? As far as I can understand a race is possible where we may end up with duplicated or lost stack handles. Since storing this information is best effort anyway, and bugs are rare, a global lock protecting this is overkill. I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just to make sure we don't tear any reads/writes and the depot handles are valid. There are other more complex schemes [1], but I think they are overkill as well. [1]: Since a depot stack handle is just an u32, we can have a union { depot_stack_handle_t handles[2]; atomic64_t atomic_handle; } aux_stack; (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.) Then in the code here create the same union and load atomic_handle. Swap handle[1] into handle[0] and write the new one in handles[1]. Then do a cmpxchg loop to store the new atomic_handle. > + old_handle = alloc_meta->aux_stack[1]; > alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0]; > - alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags); > + alloc_meta->aux_stack[0] = new_handle; > + spin_unlock_irqrestore(&aux_lock, flags); > + > + stack_depot_put(old_handle); > } > > void kasan_record_aux_stack(void *addr) > -- > 2.25.1 >
On Tue, Dec 12, 2023 at 8:29 PM Marco Elver <elver@google.com> wrote: > > > - stack_depot_put(alloc_meta->aux_stack[1]); > > + new_handle = kasan_save_stack(0, depot_flags); > > + > > + spin_lock_irqsave(&aux_lock, flags); > > This is a unnecessary global lock. What's the problem here? As far as > I can understand a race is possible where we may end up with > duplicated or lost stack handles. Yes, this is the problem. And this leads to refcount underflows in the stack depot code, as we fail to keep precise track of the stack traces. > Since storing this information is best effort anyway, and bugs are > rare, a global lock protecting this is overkill. > > I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just > to make sure we don't tear any reads/writes and the depot handles are > valid. This will help with the potential tears but will not help with the refcount issues. > There are other more complex schemes [1], but I think they are > overkill as well. > > [1]: Since a depot stack handle is just an u32, we can have a > > union { > depot_stack_handle_t handles[2]; > atomic64_t atomic_handle; > } aux_stack; > (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.) > > Then in the code here create the same union and load atomic_handle. > Swap handle[1] into handle[0] and write the new one in handles[1]. > Then do a cmpxchg loop to store the new atomic_handle. This approach should work. If you prefer, I can do this instead of a spinlock. But we do need some kind of atomicity while rotating the aux handles to make sure nothing gets lost. Thanks!
On Wed, 13 Dec 2023 at 15:40, Andrey Konovalov <andreyknvl@gmail.com> wrote: > > On Tue, Dec 12, 2023 at 8:29 PM Marco Elver <elver@google.com> wrote: > > > > > - stack_depot_put(alloc_meta->aux_stack[1]); > > > + new_handle = kasan_save_stack(0, depot_flags); > > > + > > > + spin_lock_irqsave(&aux_lock, flags); > > > > This is a unnecessary global lock. What's the problem here? As far as > > I can understand a race is possible where we may end up with > > duplicated or lost stack handles. > > Yes, this is the problem. And this leads to refcount underflows in the > stack depot code, as we fail to keep precise track of the stack > traces. > > > Since storing this information is best effort anyway, and bugs are > > rare, a global lock protecting this is overkill. > > > > I'd just accept the racyness and use READ_ONCE() / WRITE_ONCE() just > > to make sure we don't tear any reads/writes and the depot handles are > > valid. > > This will help with the potential tears but will not help with the > refcount issues. > > > There are other more complex schemes [1], but I think they are > > overkill as well. > > > > [1]: Since a depot stack handle is just an u32, we can have a > > > > union { > > depot_stack_handle_t handles[2]; > > atomic64_t atomic_handle; > > } aux_stack; > > (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.) > > > > Then in the code here create the same union and load atomic_handle. > > Swap handle[1] into handle[0] and write the new one in handles[1]. > > Then do a cmpxchg loop to store the new atomic_handle. > > This approach should work. If you prefer, I can do this instead of a spinlock. > > But we do need some kind of atomicity while rotating the aux handles > to make sure nothing gets lost. Yes, I think that'd be preferable. Although note that not all 32-bit architectures have 64-bit atomics, so that may be an issue. Another alternative is to have a spinlock next to the aux_stack (it needs to be initialized properly). It'll use up a little more space, but that's for KASAN configs only, so I think it's ok. Certainly better than a global lock.
On Wed, Dec 13, 2023 at 5:51 PM Marco Elver <elver@google.com> wrote: > > > > [1]: Since a depot stack handle is just an u32, we can have a > > > > > > union { > > > depot_stack_handle_t handles[2]; > > > atomic64_t atomic_handle; > > > } aux_stack; > > > (BUILD_BUG_ON somewhere if sizeof handles and atomic_handle mismatch.) > > > > > > Then in the code here create the same union and load atomic_handle. > > > Swap handle[1] into handle[0] and write the new one in handles[1]. > > > Then do a cmpxchg loop to store the new atomic_handle. > > > > This approach should work. If you prefer, I can do this instead of a spinlock. > > > > But we do need some kind of atomicity while rotating the aux handles > > to make sure nothing gets lost. > > Yes, I think that'd be preferable. Although note that not all 32-bit > architectures have 64-bit atomics, so that may be an issue. Another > alternative is to have a spinlock next to the aux_stack (it needs to > be initialized properly). It'll use up a little more space, but that's > for KASAN configs only, so I think it's ok. Certainly better than a > global lock. Ah, hm, actually this is what I indented to do with this change. But somehow my brain glitched out and decided to use a global lock :) I'll change this into a local spinlock in v2. Thank you!
diff --git a/mm/kasan/generic.c b/mm/kasan/generic.c index 54e20b2bc3e1..ca5c75a1866c 100644 --- a/mm/kasan/generic.c +++ b/mm/kasan/generic.c @@ -25,6 +25,7 @@ #include <linux/sched.h> #include <linux/sched/task_stack.h> #include <linux/slab.h> +#include <linux/spinlock.h> #include <linux/stackdepot.h> #include <linux/stacktrace.h> #include <linux/string.h> @@ -35,6 +36,8 @@ #include "kasan.h" #include "../slab.h" +DEFINE_SPINLOCK(aux_lock); + /* * All functions below always inlined so compiler could * perform better optimizations in each of __asan_loadX/__assn_storeX @@ -502,6 +505,8 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags) struct kmem_cache *cache; struct kasan_alloc_meta *alloc_meta; void *object; + depot_stack_handle_t new_handle, old_handle; + unsigned long flags; if (is_kfence_address(addr) || !slab) return; @@ -512,9 +517,15 @@ static void __kasan_record_aux_stack(void *addr, depot_flags_t depot_flags) if (!alloc_meta) return; - stack_depot_put(alloc_meta->aux_stack[1]); + new_handle = kasan_save_stack(0, depot_flags); + + spin_lock_irqsave(&aux_lock, flags); + old_handle = alloc_meta->aux_stack[1]; alloc_meta->aux_stack[1] = alloc_meta->aux_stack[0]; - alloc_meta->aux_stack[0] = kasan_save_stack(0, depot_flags); + alloc_meta->aux_stack[0] = new_handle; + spin_unlock_irqrestore(&aux_lock, flags); + + stack_depot_put(old_handle); } void kasan_record_aux_stack(void *addr)