Message ID | 20220414025925.2423818-1-qiang1.zhang@intel.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | kasan: Prevent cpu_quarantine corruption when CPU offline and cache shrink occur at same time | expand |
On Thu, 14 Apr 2022 10:59:25 +0800 Zqiang <qiang1.zhang@intel.com> wrote: > The kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/ > destroy(), the kasan_quarantine_remove_cache() call is protected by > cpuslock in kmem_cache_destroy(), can ensure serialization with > kasan_cpu_offline(). however the kasan_quarantine_remove_cache() call > is not protected by cpuslock in kmem_cache_shrink(), when CPU going > offline and cache shrink occur at same time, the cpu_quarantine may be > corrupted by interrupt(per_cpu_remove_cache operation). so add > cpu_quarantine offline flags check in per_cpu_remove_cache(). > > ... > Could we please have some reviewer input here? > --- a/mm/kasan/quarantine.c > +++ b/mm/kasan/quarantine.c > @@ -330,6 +330,8 @@ static void per_cpu_remove_cache(void *arg) > struct cpu_shrink_qlist *sq; > #endif > q = this_cpu_ptr(&cpu_quarantine); > + if (READ_ONCE(q->offline)) > + return; > #ifndef CONFIG_PREEMPT_RT > qlist_move_cache(q, &to_free, cache); > qlist_free_all(&to_free, cache); It might be helpful to have a little comment which explains why we're doing this?
On Fri, 22 Apr 2022 at 00:07, Andrew Morton <akpm@linux-foundation.org> wrote: > > On Thu, 14 Apr 2022 10:59:25 +0800 Zqiang <qiang1.zhang@intel.com> wrote: > > > The kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/ > > destroy(), the kasan_quarantine_remove_cache() call is protected by > > cpuslock in kmem_cache_destroy(), can ensure serialization with > > kasan_cpu_offline(). however the kasan_quarantine_remove_cache() call > > is not protected by cpuslock in kmem_cache_shrink(), when CPU going > > offline and cache shrink occur at same time, the cpu_quarantine may be > > corrupted by interrupt(per_cpu_remove_cache operation). so add > > cpu_quarantine offline flags check in per_cpu_remove_cache(). > > > > ... > > > > Could we please have some reviewer input here? This is very tricky, I think can follow this: Reviewed-by: Dmitry Vyukov <dvyukov@google.com> If q->offline is set, then kasan_cpu_offline() will or has already removed everything from cpu_quarantine and freed, so we can return early in per_cpu_remove_cache(). If kasan_cpu_offline() hasn't yet removed everything from cpu_quarantine already, it's actually problematic for the kmem_cache_destroy() case. But since both kmem_cache_destroy() and kasan_cpu_offline() are serialized by cpus lock, this case must not happen. > > --- a/mm/kasan/quarantine.c > > +++ b/mm/kasan/quarantine.c > > @@ -330,6 +330,8 @@ static void per_cpu_remove_cache(void *arg) > > struct cpu_shrink_qlist *sq; > > #endif > > q = this_cpu_ptr(&cpu_quarantine); > > + if (READ_ONCE(q->offline)) > > + return; > > #ifndef CONFIG_PREEMPT_RT > > qlist_move_cache(q, &to_free, cache); > > qlist_free_all(&to_free, cache); > > It might be helpful to have a little comment which explains why we're > doing this?
On Thu, 14 Apr 2022 10:59:25 +0800 Zqiang <qiang1.zhang@intel.com> wrote: > The kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/ > destroy(), the kasan_quarantine_remove_cache() call is protected by > cpuslock in kmem_cache_destroy(), can ensure serialization with > kasan_cpu_offline(). however the kasan_quarantine_remove_cache() call > is not protected by cpuslock in kmem_cache_shrink(), when CPU going > offline and cache shrink occur at same time, the cpu_quarantine may be > corrupted by interrupt(per_cpu_remove_cache operation). so add > cpu_quarantine offline flags check in per_cpu_remove_cache(). > > ... > >Could we please have some reviewer input here? > --- a/mm/kasan/quarantine.c > +++ b/mm/kasan/quarantine.c > @@ -330,6 +330,8 @@ static void per_cpu_remove_cache(void *arg) > struct cpu_shrink_qlist *sq; > #endif > q = this_cpu_ptr(&cpu_quarantine); > + if (READ_ONCE(q->offline)) > + return; > #ifndef CONFIG_PREEMPT_RT > qlist_move_cache(q, &to_free, cache); > qlist_free_all(&to_free, cache); >It might be helpful to have a little comment which explains why we're doing this? Sorry for late reply, may be add some comment: Ensure the ordering between the writing to q->offline and per_cpu_remove_cache. prevent cpu_quarantine be corrupted by interrupt. Is this OK ? Thanks Zqiang
diff --git a/mm/kasan/quarantine.c b/mm/kasan/quarantine.c index 0e33d30abb8d..51a8192d49cf 100644 --- a/mm/kasan/quarantine.c +++ b/mm/kasan/quarantine.c @@ -330,6 +330,8 @@ static void per_cpu_remove_cache(void *arg) struct cpu_shrink_qlist *sq; #endif q = this_cpu_ptr(&cpu_quarantine); + if (READ_ONCE(q->offline)) + return; #ifndef CONFIG_PREEMPT_RT qlist_move_cache(q, &to_free, cache); qlist_free_all(&to_free, cache);
The kasan_quarantine_remove_cache() is called in kmem_cache_shrink()/ destroy(), the kasan_quarantine_remove_cache() call is protected by cpuslock in kmem_cache_destroy(), can ensure serialization with kasan_cpu_offline(). however the kasan_quarantine_remove_cache() call is not protected by cpuslock in kmem_cache_shrink(), when CPU going offline and cache shrink occur at same time, the cpu_quarantine may be corrupted by interrupt(per_cpu_remove_cache operation). so add cpu_quarantine offline flags check in per_cpu_remove_cache(). Signed-off-by: Zqiang <qiang1.zhang@intel.com> --- mm/kasan/quarantine.c | 2 ++ 1 file changed, 2 insertions(+)