diff mbox series

[1/3] mm: security: introduce the init_allocations=1 boot option

Message ID 20190418154208.131118-2-glider@google.com (mailing list archive)
State New, archived
Headers show
Series RFC: add init_allocations=1 boot option | expand

Commit Message

Alexander Potapenko April 18, 2019, 3:42 p.m. UTC
This option adds the possibility to initialize newly allocated pages and
heap objects with zeroes. This is needed to prevent possible information
leaks and make the control-flow bugs that depend on uninitialized values
more deterministic.

Initialization is done at allocation time at the places where checks for
__GFP_ZERO are performed. We don't initialize slab caches with
constructors to preserve their semantics. To reduce runtime costs of
checking cachep->ctor we replace a call to memset with a call to
cachep->poison_fn, which is only executed if the memory block needs to
be initialized.

For kernel testing purposes filling allocations with a nonzero pattern
would be more suitable, but may require platform-specific code. To have
a simple baseline we've decided to start with zero-initialization.

No performance optimizations are done at the moment to reduce double
initialization of memory regions.

Signed-off-by: Alexander Potapenko <glider@google.com>
Cc: Andrew Morton <akpm@linux-foundation.org>
Cc: James Morris <jmorris@namei.org>
Cc: "Serge E. Hallyn" <serge@hallyn.com>
Cc: Nick Desaulniers <ndesaulniers@google.com>
Cc: Kostya Serebryany <kcc@google.com>
Cc: Dmitry Vyukov <dvyukov@google.com>
Cc: Kees Cook <keescook@chromium.org>
Cc: Sandeep Patil <sspatil@android.com>
Cc: Laura Abbott <labbott@redhat.com>
Cc: Randy Dunlap <rdunlap@infradead.org>
Cc: Jann Horn <jannh@google.com>
Cc: Mark Rutland <mark.rutland@arm.com>
Cc: Qian Cai <cai@lca.pw>
Cc: Vlastimil Babka <vbabka@suse.cz>
Cc: linux-mm@kvack.org
Cc: linux-security-module@vger.kernel.org
Cc: kernel-hardening@lists.openwall.com
---
 drivers/infiniband/core/uverbs_ioctl.c |  2 +-
 include/linux/mm.h                     |  8 ++++++++
 include/linux/slab_def.h               |  1 +
 include/linux/slub_def.h               |  1 +
 kernel/kexec_core.c                    |  2 +-
 mm/dmapool.c                           |  2 +-
 mm/page_alloc.c                        | 18 +++++++++++++++++-
 mm/slab.c                              | 12 ++++++------
 mm/slab.h                              |  1 +
 mm/slab_common.c                       | 15 +++++++++++++++
 mm/slob.c                              |  2 +-
 mm/slub.c                              |  8 ++++----
 net/core/sock.c                        |  2 +-
 13 files changed, 58 insertions(+), 16 deletions(-)

Comments

Dave Hansen April 18, 2019, 4:35 p.m. UTC | #1
On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.

Isn't it better to do this at free time rather than allocation time?  If
doing it at free, you can't even have information leaks for pages that
are in the allocator.
Alexander Potapenko April 18, 2019, 4:43 p.m. UTC | #2
On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
>
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
>
> Isn't it better to do this at free time rather than allocation time?  If
> doing it at free, you can't even have information leaks for pages that
> are in the allocator.
I should have mentioned this in the patch description, as this
question is being asked every time I send a patch :)
If we want to avoid double initialization and take advantage of
__GFP_NOINIT (see the second and third patches in the series) we need
to do initialize the memory at allocation time, because free() and
free_pages() don't accept GFP flags.
Alexander Potapenko April 18, 2019, 4:50 p.m. UTC | #3
On Thu, Apr 18, 2019 at 6:43 PM Alexander Potapenko <glider@google.com> wrote:
>
> On Thu, Apr 18, 2019 at 6:35 PM Dave Hansen <dave.hansen@intel.com> wrote:
> >
> > On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > > This option adds the possibility to initialize newly allocated pages and
> > > heap objects with zeroes. This is needed to prevent possible information
> > > leaks and make the control-flow bugs that depend on uninitialized values
> > > more deterministic.
> >
> > Isn't it better to do this at free time rather than allocation time?  If
> > doing it at free, you can't even have information leaks for pages that
> > are in the allocator.
> I should have mentioned this in the patch description, as this
> question is being asked every time I send a patch :)
> If we want to avoid double initialization and take advantage of
> __GFP_NOINIT (see the second and third patches in the series) we need
> to do initialize the memory at allocation time, because free() and
> free_pages() don't accept GFP flags.

On a second thought, double zeroing on memory reclaim should be quite rare.
Most of the speedup we gain with __GFP_NOINIT is because we assume
it's safe to not initialize memory that'll be overwritten anyway.
I'll need to check how e.g. hackbench behaves if we choose to zero
memory on free() (my guess would be it'll be slower than with
__GFP_NOINIT hack, albeit a little safer)
>
>
> --
> Alexander Potapenko
> Software Engineer
>
> Google Germany GmbH
> Erika-Mann-Straße, 33
> 80636 München
>
> Geschäftsführer: Paul Manicle, Halimah DeLaine Prado
> Registergericht und -nummer: Hamburg, HRB 86891
> Sitz der Gesellschaft: Hamburg
Randy Dunlap April 18, 2019, 10:08 p.m. UTC | #4
On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
> 
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.
> 
> For kernel testing purposes filling allocations with a nonzero pattern
> would be more suitable, but may require platform-specific code. To have
> a simple baseline we've decided to start with zero-initialization.
> 
> No performance optimizations are done at the moment to reduce double
> initialization of memory regions.
> 
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/mm.h                     |  8 ++++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  kernel/kexec_core.c                    |  2 +-
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 +++++++++++++++++-
>  mm/slab.c                              | 12 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++++
>  mm/slob.c                              |  2 +-
>  mm/slub.c                              |  8 ++++----
>  net/core/sock.c                        |  2 +-
>  13 files changed, 58 insertions(+), 16 deletions(-)
> 

Hi,
Please document init_allocations=N in Documentation/admin-guide/kernel-parameters.txt.

thanks.
Michal Hocko April 23, 2019, 8:31 a.m. UTC | #5
On Thu 18-04-19 09:35:32, Dave Hansen wrote:
> On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
> 
> Isn't it better to do this at free time rather than allocation time?  If
> doing it at free, you can't even have information leaks for pages that
> are in the allocator.

I would tend to agree here. Free path is usually less performance sensitive
than the allocation. Those really hot paths tend to defer the work.

I am also worried that an opt-out gfp flag would tend to be used
incorrectly as the history has shown for others - e.g. __GFP_TEMPORARY.
So I would rather see this robust without a fine tuning unless there is
real use case that would suffer from this and we can think of a
background scrubbing or something similar.
Kees Cook April 23, 2019, 7 p.m. UTC | #6
On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.
>
> For kernel testing purposes filling allocations with a nonzero pattern
> would be more suitable, but may require platform-specific code. To have
> a simple baseline we've decided to start with zero-initialization.

The memory tagging future may be worth mentioning here too. We'll
always need an allocation-time hook. What we do in that hook (tag,
zero, poison) can be manage from there.

> No performance optimizations are done at the moment to reduce double
> initialization of memory regions.

Isn't this addressed in later patches?

>
> Signed-off-by: Alexander Potapenko <glider@google.com>
> Cc: Andrew Morton <akpm@linux-foundation.org>
> Cc: James Morris <jmorris@namei.org>
> Cc: "Serge E. Hallyn" <serge@hallyn.com>
> Cc: Nick Desaulniers <ndesaulniers@google.com>
> Cc: Kostya Serebryany <kcc@google.com>
> Cc: Dmitry Vyukov <dvyukov@google.com>
> Cc: Kees Cook <keescook@chromium.org>
> Cc: Sandeep Patil <sspatil@android.com>
> Cc: Laura Abbott <labbott@redhat.com>
> Cc: Randy Dunlap <rdunlap@infradead.org>
> Cc: Jann Horn <jannh@google.com>
> Cc: Mark Rutland <mark.rutland@arm.com>
> Cc: Qian Cai <cai@lca.pw>
> Cc: Vlastimil Babka <vbabka@suse.cz>
> Cc: linux-mm@kvack.org
> Cc: linux-security-module@vger.kernel.org
> Cc: kernel-hardening@lists.openwall.com
> ---
>  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
>  include/linux/mm.h                     |  8 ++++++++
>  include/linux/slab_def.h               |  1 +
>  include/linux/slub_def.h               |  1 +
>  kernel/kexec_core.c                    |  2 +-
>  mm/dmapool.c                           |  2 +-
>  mm/page_alloc.c                        | 18 +++++++++++++++++-
>  mm/slab.c                              | 12 ++++++------
>  mm/slab.h                              |  1 +
>  mm/slab_common.c                       | 15 +++++++++++++++
>  mm/slob.c                              |  2 +-
>  mm/slub.c                              |  8 ++++----
>  net/core/sock.c                        |  2 +-
>  13 files changed, 58 insertions(+), 16 deletions(-)
>
> diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> index e1379949e663..f31234906be2 100644
> --- a/drivers/infiniband/core/uverbs_ioctl.c
> +++ b/drivers/infiniband/core/uverbs_ioctl.c
> @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
>         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
>         pbundle->internal_used =
>                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> -       if (flags & __GFP_ZERO)
> +       if (want_init_memory(flags))
>                 memset(res, 0, size);
>         return res;
>  }
> diff --git a/include/linux/mm.h b/include/linux/mm.h
> index 76769749b5a5..b38b71a5efaa 100644
> --- a/include/linux/mm.h
> +++ b/include/linux/mm.h
> @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
>                                         int enable) { }
>  #endif
>
> +DECLARE_STATIC_KEY_FALSE(init_allocations);

I'd like a CONFIG to control this default. We can keep the boot-time
option to change it, but I think a CONFIG is warranted here.

> +static inline bool want_init_memory(gfp_t flags)
> +{
> +       if (static_branch_unlikely(&init_allocations))
> +               return true;
> +       return flags & __GFP_ZERO;
> +}
> +
>  #ifdef CONFIG_DEBUG_PAGEALLOC
>  extern bool _debug_pagealloc_enabled;
>  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> index 9a5eafb7145b..9dfe9eb639d7 100644
> --- a/include/linux/slab_def.h
> +++ b/include/linux/slab_def.h
> @@ -37,6 +37,7 @@ struct kmem_cache {
>
>         /* constructor func */
>         void (*ctor)(void *obj);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
>  /* 4) cache creation/removal */
>         const char *name;
> diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> index d2153789bd9f..afb928cb7c20 100644
> --- a/include/linux/slub_def.h
> +++ b/include/linux/slub_def.h
> @@ -99,6 +99,7 @@ struct kmem_cache {
>         gfp_t allocflags;       /* gfp flags to use on each alloc */
>         int refcount;           /* Refcount for slab cache destroy */
>         void (*ctor)(void *);
> +       void (*poison_fn)(struct kmem_cache *c, void *object);
>         unsigned int inuse;             /* Offset to metadata */
>         unsigned int align;             /* Alignment */
>         unsigned int red_left_pad;      /* Left redzone padding size */
> diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> index d7140447be75..be84f5f95c97 100644
> --- a/kernel/kexec_core.c
> +++ b/kernel/kexec_core.c
> @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
>                 arch_kexec_post_alloc_pages(page_address(pages), count,
>                                             gfp_mask);
>
> -               if (gfp_mask & __GFP_ZERO)
> +               if (want_init_memory(gfp_mask))
>                         for (i = 0; i < count; i++)
>                                 clear_highpage(pages + i);
>         }
> diff --git a/mm/dmapool.c b/mm/dmapool.c
> index 76a160083506..796e38160d39 100644
> --- a/mm/dmapool.c
> +++ b/mm/dmapool.c
> @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
>  #endif
>         spin_unlock_irqrestore(&pool->lock, flags);
>
> -       if (mem_flags & __GFP_ZERO)
> +       if (want_init_memory(mem_flags))
>                 memset(retval, 0, pool->size);
>
>         return retval;
> diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> index d96ca5bc555b..e2a21d866ac9 100644
> --- a/mm/page_alloc.c
> +++ b/mm/page_alloc.c
> @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
>
>  int percpu_pagelist_fraction;
>  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> +bool want_init_allocations __read_mostly;

This can be a stack variable in early_init_allocations() -- it's never
used again.

> +EXPORT_SYMBOL(want_init_allocations);
> +DEFINE_STATIC_KEY_FALSE(init_allocations);
> +
> +static int __init early_init_allocations(char *buf)
> +{
> +       int ret;
> +
> +       if (!buf)
> +               return -EINVAL;
> +       ret = kstrtobool(buf, &want_init_allocations);
> +       if (want_init_allocations)
> +               static_branch_enable(&init_allocations);

With the CONFIG, this should have a _disable on an "else" here...

> +       return ret;
> +}
> +early_param("init_allocations", early_init_allocations);

Does early_init_allocations() get called before things like
prep_new_page() are up and running to call want_init_memory()?

>
>  /*
>   * A cached value of the page's pageblock's migratetype, used when the page is
> @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
>
>         post_alloc_hook(page, order, gfp_flags);
>
> -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
>                 for (i = 0; i < (1 << order); i++)
>                         clear_highpage(page + i);
>
> diff --git a/mm/slab.c b/mm/slab.c
> index 47a380a486ee..dcc5b73cf767 100644
> --- a/mm/slab.c
> +++ b/mm/slab.c
> @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
>         local_irq_restore(save_flags);
>         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
>
> -       if (unlikely(flags & __GFP_ZERO) && ptr)
> -               memset(ptr, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && ptr)
> +               cachep->poison_fn(cachep, ptr);

So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
"poison_fn" is the right name, and it likely needs to take the "flags"
argument.

>
>         slab_post_alloc_hook(cachep, flags, 1, &ptr);
>         return ptr;
> @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
>         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
>         prefetchw(objp);
>
> -       if (unlikely(flags & __GFP_ZERO) && objp)
> -               memset(objp, 0, cachep->object_size);
> +       if (unlikely(want_init_memory(flags)) && objp)
> +               cachep->poison_fn(cachep, objp);

Same.

>
>         slab_post_alloc_hook(cachep, flags, 1, &objp);
>         return objp;
> @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
>
>         /* Clear memory outside IRQ disabled section */
> -       if (unlikely(flags & __GFP_ZERO))
> +       if (unlikely(want_init_memory(flags)))
>                 for (i = 0; i < size; i++)
> -                       memset(p[i], 0, s->object_size);
> +                       s->poison_fn(s, p[i]);

Same.

>
>         slab_post_alloc_hook(s, flags, size, p);
>         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> diff --git a/mm/slab.h b/mm/slab.h
> index 43ac818b8592..3b541e8970ee 100644
> --- a/mm/slab.h
> +++ b/mm/slab.h
> @@ -27,6 +27,7 @@ struct kmem_cache {
>         const char *name;       /* Slab name for sysfs */
>         int refcount;           /* Use counter */
>         void (*ctor)(void *);   /* Called on object slot creation */
> +       void (*poison_fn)(struct kmem_cache *c, void *object);

How about naming it just "initialize"?

>         struct list_head list;  /* List of all slab caches on the system */
>  };
>
> diff --git a/mm/slab_common.c b/mm/slab_common.c
> index 58251ba63e4a..37810114b2ea 100644
> --- a/mm/slab_common.c
> +++ b/mm/slab_common.c
> @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
>         return NULL;
>  }
>
> +static void poison_zero(struct kmem_cache *c, void *object)
> +{
> +       memset(object, 0, c->object_size);
> +}
> +
> +static void poison_dont(struct kmem_cache *c, void *object)
> +{
> +       /* Do nothing. Use for caches with constructors. */
> +}
> +
>  static struct kmem_cache *create_cache(const char *name,
>                 unsigned int object_size, unsigned int align,
>                 slab_flags_t flags, unsigned int useroffset,
> @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
>         s->size = s->object_size = object_size;
>         s->align = align;
>         s->ctor = ctor;
> +       if (ctor)
> +               s->poison_fn = poison_dont;
> +       else
> +               s->poison_fn = poison_zero;

As mentioned, we must still always zero when __GFP_ZERO is present.

>         s->useroffset = useroffset;
>         s->usersize = usersize;
>
> @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
>         s->useroffset = useroffset;
>         s->usersize = usersize;
> +       s->poison_fn = poison_zero;
>
>         slab_init_memcg_params(s);
>
> diff --git a/mm/slob.c b/mm/slob.c
> index 307c2c9feb44..18981a71e962 100644
> --- a/mm/slob.c
> +++ b/mm/slob.c
> @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
>                 BUG_ON(!b);
>                 spin_unlock_irqrestore(&slob_lock, flags);
>         }
> -       if (unlikely(gfp & __GFP_ZERO))
> +       if (unlikely(want_init_memory(gfp)))
>                 memset(b, 0, size);
>         return b;
>  }
> diff --git a/mm/slub.c b/mm/slub.c
> index d30ede89f4a6..e4efb6575510 100644
> --- a/mm/slub.c
> +++ b/mm/slub.c
> @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
>                 stat(s, ALLOC_FASTPATH);
>         }
>
> -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> -               memset(object, 0, s->object_size);
> +       if (unlikely(want_init_memory(gfpflags)) && object)
> +               s->poison_fn(s, object);
>
>         slab_post_alloc_hook(s, gfpflags, 1, &object);
>
> @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
>         local_irq_enable();
>
>         /* Clear memory outside IRQ disabled fastpath loop */
> -       if (unlikely(flags & __GFP_ZERO)) {
> +       if (unlikely(want_init_memory(flags))) {
>                 int j;
>
>                 for (j = 0; j < i; j++)
> -                       memset(p[j], 0, s->object_size);
> +                       s->poison_fn(s, p[j]);
>         }
>
>         /* memcg and kmem_cache debug support */
> diff --git a/net/core/sock.c b/net/core/sock.c
> index 782343bb925b..99b288a19b39 100644
> --- a/net/core/sock.c
> +++ b/net/core/sock.c
> @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
>                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
>                 if (!sk)
>                         return sk;
> -               if (priority & __GFP_ZERO)
> +               if (want_init_memory(priority))
>                         sk_prot_clear_nulls(sk, prot->obj_size);
>         } else
>                 sk = kmalloc(prot->obj_size, priority);
> --
> 2.21.0.392.gf8f6787159e-goog
>

Looking good. :) Thanks for working on this!
Dave Hansen April 23, 2019, 8:36 p.m. UTC | #7
On 4/18/19 8:42 AM, Alexander Potapenko wrote:
> +static void poison_dont(struct kmem_cache *c, void *object)
> +{
> +	/* Do nothing. Use for caches with constructors. */
> +}
> +
>  static struct kmem_cache *create_cache(const char *name,
>  		unsigned int object_size, unsigned int align,
>  		slab_flags_t flags, unsigned int useroffset,
> @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
>  	s->size = s->object_size = object_size;
>  	s->align = align;
>  	s->ctor = ctor;
> +	if (ctor)
> +		s->poison_fn = poison_dont;
> +	else
> +		s->poison_fn = poison_zero;
>  	s->useroffset = useroffset;
>  	s->usersize = usersize;
>  
> @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
>  	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
>  	s->useroffset = useroffset;
>  	s->usersize = usersize;
> +	s->poison_fn = poison_zero;

An empty indirect call is probably a pretty bad idea on systems with
retpoline.  Isn't this just a bool anyway for either calling poison_dont
or poison_zero?  Can it call anything else?
Alexander Potapenko April 26, 2019, 12:12 p.m. UTC | #8
On Tue, Apr 23, 2019 at 9:00 PM Kees Cook <keescook@chromium.org> wrote:
>
> On Thu, Apr 18, 2019 at 8:42 AM Alexander Potapenko <glider@google.com> wrote:
> > This option adds the possibility to initialize newly allocated pages and
> > heap objects with zeroes. This is needed to prevent possible information
> > leaks and make the control-flow bugs that depend on uninitialized values
> > more deterministic.
> >
> > Initialization is done at allocation time at the places where checks for
> > __GFP_ZERO are performed. We don't initialize slab caches with
> > constructors to preserve their semantics. To reduce runtime costs of
> > checking cachep->ctor we replace a call to memset with a call to
> > cachep->poison_fn, which is only executed if the memory block needs to
> > be initialized.
> >
> > For kernel testing purposes filling allocations with a nonzero pattern
> > would be more suitable, but may require platform-specific code. To have
> > a simple baseline we've decided to start with zero-initialization.
>
> The memory tagging future may be worth mentioning here too. We'll
> always need an allocation-time hook. What we do in that hook (tag,
> zero, poison) can be manage from there.
Shall we factor out the allocation hook in this patch? Note that I'll
be probably dropping this poison_fn() stuff, as it should be enough to
just call memset() under a static branch.
> > No performance optimizations are done at the moment to reduce double
> > initialization of memory regions.
>
> Isn't this addressed in later patches?
Agreed.
> >
> > Signed-off-by: Alexander Potapenko <glider@google.com>
> > Cc: Andrew Morton <akpm@linux-foundation.org>
> > Cc: James Morris <jmorris@namei.org>
> > Cc: "Serge E. Hallyn" <serge@hallyn.com>
> > Cc: Nick Desaulniers <ndesaulniers@google.com>
> > Cc: Kostya Serebryany <kcc@google.com>
> > Cc: Dmitry Vyukov <dvyukov@google.com>
> > Cc: Kees Cook <keescook@chromium.org>
> > Cc: Sandeep Patil <sspatil@android.com>
> > Cc: Laura Abbott <labbott@redhat.com>
> > Cc: Randy Dunlap <rdunlap@infradead.org>
> > Cc: Jann Horn <jannh@google.com>
> > Cc: Mark Rutland <mark.rutland@arm.com>
> > Cc: Qian Cai <cai@lca.pw>
> > Cc: Vlastimil Babka <vbabka@suse.cz>
> > Cc: linux-mm@kvack.org
> > Cc: linux-security-module@vger.kernel.org
> > Cc: kernel-hardening@lists.openwall.com
> > ---
> >  drivers/infiniband/core/uverbs_ioctl.c |  2 +-
> >  include/linux/mm.h                     |  8 ++++++++
> >  include/linux/slab_def.h               |  1 +
> >  include/linux/slub_def.h               |  1 +
> >  kernel/kexec_core.c                    |  2 +-
> >  mm/dmapool.c                           |  2 +-
> >  mm/page_alloc.c                        | 18 +++++++++++++++++-
> >  mm/slab.c                              | 12 ++++++------
> >  mm/slab.h                              |  1 +
> >  mm/slab_common.c                       | 15 +++++++++++++++
> >  mm/slob.c                              |  2 +-
> >  mm/slub.c                              |  8 ++++----
> >  net/core/sock.c                        |  2 +-
> >  13 files changed, 58 insertions(+), 16 deletions(-)
> >
> > diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
> > index e1379949e663..f31234906be2 100644
> > --- a/drivers/infiniband/core/uverbs_ioctl.c
> > +++ b/drivers/infiniband/core/uverbs_ioctl.c
> > @@ -127,7 +127,7 @@ __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
> >         res = (void *)pbundle->internal_buffer + pbundle->internal_used;
> >         pbundle->internal_used =
> >                 ALIGN(new_used, sizeof(*pbundle->internal_buffer));
> > -       if (flags & __GFP_ZERO)
> > +       if (want_init_memory(flags))
> >                 memset(res, 0, size);
> >         return res;
> >  }
> > diff --git a/include/linux/mm.h b/include/linux/mm.h
> > index 76769749b5a5..b38b71a5efaa 100644
> > --- a/include/linux/mm.h
> > +++ b/include/linux/mm.h
> > @@ -2597,6 +2597,14 @@ static inline void kernel_poison_pages(struct page *page, int numpages,
> >                                         int enable) { }
> >  #endif
> >
> > +DECLARE_STATIC_KEY_FALSE(init_allocations);
>
> I'd like a CONFIG to control this default. We can keep the boot-time
> option to change it, but I think a CONFIG is warranted here.
Ok, will do.
> > +static inline bool want_init_memory(gfp_t flags)
> > +{
> > +       if (static_branch_unlikely(&init_allocations))
> > +               return true;
> > +       return flags & __GFP_ZERO;
> > +}
> > +
> >  #ifdef CONFIG_DEBUG_PAGEALLOC
> >  extern bool _debug_pagealloc_enabled;
> >  extern void __kernel_map_pages(struct page *page, int numpages, int enable);
> > diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
> > index 9a5eafb7145b..9dfe9eb639d7 100644
> > --- a/include/linux/slab_def.h
> > +++ b/include/linux/slab_def.h
> > @@ -37,6 +37,7 @@ struct kmem_cache {
> >
> >         /* constructor func */
> >         void (*ctor)(void *obj);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >
> >  /* 4) cache creation/removal */
> >         const char *name;
> > diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
> > index d2153789bd9f..afb928cb7c20 100644
> > --- a/include/linux/slub_def.h
> > +++ b/include/linux/slub_def.h
> > @@ -99,6 +99,7 @@ struct kmem_cache {
> >         gfp_t allocflags;       /* gfp flags to use on each alloc */
> >         int refcount;           /* Refcount for slab cache destroy */
> >         void (*ctor)(void *);
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
> >         unsigned int inuse;             /* Offset to metadata */
> >         unsigned int align;             /* Alignment */
> >         unsigned int red_left_pad;      /* Left redzone padding size */
> > diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
> > index d7140447be75..be84f5f95c97 100644
> > --- a/kernel/kexec_core.c
> > +++ b/kernel/kexec_core.c
> > @@ -315,7 +315,7 @@ static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
> >                 arch_kexec_post_alloc_pages(page_address(pages), count,
> >                                             gfp_mask);
> >
> > -               if (gfp_mask & __GFP_ZERO)
> > +               if (want_init_memory(gfp_mask))
> >                         for (i = 0; i < count; i++)
> >                                 clear_highpage(pages + i);
> >         }
> > diff --git a/mm/dmapool.c b/mm/dmapool.c
> > index 76a160083506..796e38160d39 100644
> > --- a/mm/dmapool.c
> > +++ b/mm/dmapool.c
> > @@ -381,7 +381,7 @@ void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
> >  #endif
> >         spin_unlock_irqrestore(&pool->lock, flags);
> >
> > -       if (mem_flags & __GFP_ZERO)
> > +       if (want_init_memory(mem_flags))
> >                 memset(retval, 0, pool->size);
> >
> >         return retval;
> > diff --git a/mm/page_alloc.c b/mm/page_alloc.c
> > index d96ca5bc555b..e2a21d866ac9 100644
> > --- a/mm/page_alloc.c
> > +++ b/mm/page_alloc.c
> > @@ -133,6 +133,22 @@ unsigned long totalcma_pages __read_mostly;
> >
> >  int percpu_pagelist_fraction;
> >  gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
> > +bool want_init_allocations __read_mostly;
>
> This can be a stack variable in early_init_allocations() -- it's never
> used again.
Ack.
> > +EXPORT_SYMBOL(want_init_allocations);
> > +DEFINE_STATIC_KEY_FALSE(init_allocations);
> > +
> > +static int __init early_init_allocations(char *buf)
> > +{
> > +       int ret;
> > +
> > +       if (!buf)
> > +               return -EINVAL;
> > +       ret = kstrtobool(buf, &want_init_allocations);
> > +       if (want_init_allocations)
> > +               static_branch_enable(&init_allocations);
>
> With the CONFIG, this should have a _disable on an "else" here...
Ack.
> > +       return ret;
> > +}
> > +early_param("init_allocations", early_init_allocations);
>
> Does early_init_allocations() get called before things like
> prep_new_page() are up and running to call want_init_memory()?
Yes, IIUC early params are initialized before the memory subsystem is
initialized.

> >
> >  /*
> >   * A cached value of the page's pageblock's migratetype, used when the page is
> > @@ -2014,7 +2030,7 @@ static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
> >
> >         post_alloc_hook(page, order, gfp_flags);
> >
> > -       if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
> > +       if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
> >                 for (i = 0; i < (1 << order); i++)
> >                         clear_highpage(page + i);
> >
> > diff --git a/mm/slab.c b/mm/slab.c
> > index 47a380a486ee..dcc5b73cf767 100644
> > --- a/mm/slab.c
> > +++ b/mm/slab.c
> > @@ -3331,8 +3331,8 @@ slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
> >         local_irq_restore(save_flags);
> >         ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && ptr)
> > -               memset(ptr, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && ptr)
> > +               cachep->poison_fn(cachep, ptr);
>
> So... this _must_ zero when __GFP_ZERO is present, so I'm not sure
> "poison_fn" is the right name, and it likely needs to take the "flags"
> argument.
I'll rework this part, as it had been pointed out that an empty
indirect call still costs something.
We're basically choosing between two options:
 - flags & __GFP_ZERO when initialization is disabled;
 - and cachep->ctr when it's enabled
A static branch should be enough to switch between those without
introducing extra checks or indirections.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &ptr);
> >         return ptr;
> > @@ -3388,8 +3388,8 @@ slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
> >         objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
> >         prefetchw(objp);
> >
> > -       if (unlikely(flags & __GFP_ZERO) && objp)
> > -               memset(objp, 0, cachep->object_size);
> > +       if (unlikely(want_init_memory(flags)) && objp)
> > +               cachep->poison_fn(cachep, objp);
>
> Same.
>
> >
> >         slab_post_alloc_hook(cachep, flags, 1, &objp);
> >         return objp;
> > @@ -3596,9 +3596,9 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
> >
> >         /* Clear memory outside IRQ disabled section */
> > -       if (unlikely(flags & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(flags)))
> >                 for (i = 0; i < size; i++)
> > -                       memset(p[i], 0, s->object_size);
> > +                       s->poison_fn(s, p[i]);
>
> Same.
>
> >
> >         slab_post_alloc_hook(s, flags, size, p);
> >         /* FIXME: Trace call missing. Christoph would like a bulk variant */
> > diff --git a/mm/slab.h b/mm/slab.h
> > index 43ac818b8592..3b541e8970ee 100644
> > --- a/mm/slab.h
> > +++ b/mm/slab.h
> > @@ -27,6 +27,7 @@ struct kmem_cache {
> >         const char *name;       /* Slab name for sysfs */
> >         int refcount;           /* Use counter */
> >         void (*ctor)(void *);   /* Called on object slot creation */
> > +       void (*poison_fn)(struct kmem_cache *c, void *object);
>
> How about naming it just "initialize"?
DItto.
> >         struct list_head list;  /* List of all slab caches on the system */
> >  };
> >
> > diff --git a/mm/slab_common.c b/mm/slab_common.c
> > index 58251ba63e4a..37810114b2ea 100644
> > --- a/mm/slab_common.c
> > +++ b/mm/slab_common.c
> > @@ -360,6 +360,16 @@ struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
> >         return NULL;
> >  }
> >
> > +static void poison_zero(struct kmem_cache *c, void *object)
> > +{
> > +       memset(object, 0, c->object_size);
> > +}
> > +
> > +static void poison_dont(struct kmem_cache *c, void *object)
> > +{
> > +       /* Do nothing. Use for caches with constructors. */
> > +}
> > +
> >  static struct kmem_cache *create_cache(const char *name,
> >                 unsigned int object_size, unsigned int align,
> >                 slab_flags_t flags, unsigned int useroffset,
> > @@ -381,6 +391,10 @@ static struct kmem_cache *create_cache(const char *name,
> >         s->size = s->object_size = object_size;
> >         s->align = align;
> >         s->ctor = ctor;
> > +       if (ctor)
> > +               s->poison_fn = poison_dont;
> > +       else
> > +               s->poison_fn = poison_zero;
>
> As mentioned, we must still always zero when __GFP_ZERO is present.
This part will be gone, but in general __GFP_ZERO is incompatible with
constructors, see a check in new_slab_objects().
I don't think we must support erroneous behaviour.

> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> >
> > @@ -974,6 +988,7 @@ void __init create_boot_cache(struct kmem_cache *s, const char *name,
> >         s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
> >         s->useroffset = useroffset;
> >         s->usersize = usersize;
> > +       s->poison_fn = poison_zero;
> >
> >         slab_init_memcg_params(s);
> >
> > diff --git a/mm/slob.c b/mm/slob.c
> > index 307c2c9feb44..18981a71e962 100644
> > --- a/mm/slob.c
> > +++ b/mm/slob.c
> > @@ -330,7 +330,7 @@ static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
> >                 BUG_ON(!b);
> >                 spin_unlock_irqrestore(&slob_lock, flags);
> >         }
> > -       if (unlikely(gfp & __GFP_ZERO))
> > +       if (unlikely(want_init_memory(gfp)))
> >                 memset(b, 0, size);
> >         return b;
> >  }
> > diff --git a/mm/slub.c b/mm/slub.c
> > index d30ede89f4a6..e4efb6575510 100644
> > --- a/mm/slub.c
> > +++ b/mm/slub.c
> > @@ -2750,8 +2750,8 @@ static __always_inline void *slab_alloc_node(struct kmem_cache *s,
> >                 stat(s, ALLOC_FASTPATH);
> >         }
> >
> > -       if (unlikely(gfpflags & __GFP_ZERO) && object)
> > -               memset(object, 0, s->object_size);
> > +       if (unlikely(want_init_memory(gfpflags)) && object)
> > +               s->poison_fn(s, object);
> >
> >         slab_post_alloc_hook(s, gfpflags, 1, &object);
> >
> > @@ -3172,11 +3172,11 @@ int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
> >         local_irq_enable();
> >
> >         /* Clear memory outside IRQ disabled fastpath loop */
> > -       if (unlikely(flags & __GFP_ZERO)) {
> > +       if (unlikely(want_init_memory(flags))) {
> >                 int j;
> >
> >                 for (j = 0; j < i; j++)
> > -                       memset(p[j], 0, s->object_size);
> > +                       s->poison_fn(s, p[j]);
> >         }
> >
> >         /* memcg and kmem_cache debug support */
> > diff --git a/net/core/sock.c b/net/core/sock.c
> > index 782343bb925b..99b288a19b39 100644
> > --- a/net/core/sock.c
> > +++ b/net/core/sock.c
> > @@ -1601,7 +1601,7 @@ static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
> >                 sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
> >                 if (!sk)
> >                         return sk;
> > -               if (priority & __GFP_ZERO)
> > +               if (want_init_memory(priority))
> >                         sk_prot_clear_nulls(sk, prot->obj_size);
> >         } else
> >                 sk = kmalloc(prot->obj_size, priority);
> > --
> > 2.21.0.392.gf8f6787159e-goog
> >
>
> Looking good. :) Thanks for working on this!
>
> --
> Kees Cook
Christoph Lameter (Ampere) April 26, 2019, 2:14 p.m. UTC | #9
On Thu, 18 Apr 2019, Alexander Potapenko wrote:

> This option adds the possibility to initialize newly allocated pages and
> heap objects with zeroes. This is needed to prevent possible information
> leaks and make the control-flow bugs that depend on uninitialized values
> more deterministic.
>
> Initialization is done at allocation time at the places where checks for
> __GFP_ZERO are performed. We don't initialize slab caches with
> constructors to preserve their semantics. To reduce runtime costs of
> checking cachep->ctor we replace a call to memset with a call to
> cachep->poison_fn, which is only executed if the memory block needs to
> be initialized.

Just check for a ctor and then zero or use whatever pattern ? Why add a
new function?
Christoph Lameter (Ampere) April 26, 2019, 3:24 p.m. UTC | #10
Hmmmm.... Maybe its better to zero on free? That way you dont need to
initialize the allocations. You could even check if someone mucked with
the object during allocation. This is a replication of some of the
inherent debugging facilities in the allocator though.
Alexander Potapenko April 26, 2019, 3:48 p.m. UTC | #11
On Fri, Apr 26, 2019 at 5:24 PM Christopher Lameter <cl@linux.com> wrote:
>
> Hmmmm.... Maybe its better to zero on free? That way you dont need to
> initialize the allocations. You could even check if someone mucked with
> the object during allocation.
As mentioned somewhere in the neighboring threads, Kees requested the
options to initialize on both allocation and free, because we'll need
this functionality for memory tagging someday.
> This is a replication of some of the
> inherent debugging facilities in the allocator though.
I'm curious, how often do people use SL[AU]B poisoning and redzones these days?
Guess those should be superseded by KASAN already?
I agree it makes sense to reuse the existing debugging code, but on
the other hand we probably want to keep the initialization fast enough
to be used in production.
>
Re: poison_fn, it wasn't a good idea, so I'll be dropping it.
diff mbox series

Patch

diff --git a/drivers/infiniband/core/uverbs_ioctl.c b/drivers/infiniband/core/uverbs_ioctl.c
index e1379949e663..f31234906be2 100644
--- a/drivers/infiniband/core/uverbs_ioctl.c
+++ b/drivers/infiniband/core/uverbs_ioctl.c
@@ -127,7 +127,7 @@  __malloc void *_uverbs_alloc(struct uverbs_attr_bundle *bundle, size_t size,
 	res = (void *)pbundle->internal_buffer + pbundle->internal_used;
 	pbundle->internal_used =
 		ALIGN(new_used, sizeof(*pbundle->internal_buffer));
-	if (flags & __GFP_ZERO)
+	if (want_init_memory(flags))
 		memset(res, 0, size);
 	return res;
 }
diff --git a/include/linux/mm.h b/include/linux/mm.h
index 76769749b5a5..b38b71a5efaa 100644
--- a/include/linux/mm.h
+++ b/include/linux/mm.h
@@ -2597,6 +2597,14 @@  static inline void kernel_poison_pages(struct page *page, int numpages,
 					int enable) { }
 #endif
 
+DECLARE_STATIC_KEY_FALSE(init_allocations);
+static inline bool want_init_memory(gfp_t flags)
+{
+	if (static_branch_unlikely(&init_allocations))
+		return true;
+	return flags & __GFP_ZERO;
+}
+
 #ifdef CONFIG_DEBUG_PAGEALLOC
 extern bool _debug_pagealloc_enabled;
 extern void __kernel_map_pages(struct page *page, int numpages, int enable);
diff --git a/include/linux/slab_def.h b/include/linux/slab_def.h
index 9a5eafb7145b..9dfe9eb639d7 100644
--- a/include/linux/slab_def.h
+++ b/include/linux/slab_def.h
@@ -37,6 +37,7 @@  struct kmem_cache {
 
 	/* constructor func */
 	void (*ctor)(void *obj);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 
 /* 4) cache creation/removal */
 	const char *name;
diff --git a/include/linux/slub_def.h b/include/linux/slub_def.h
index d2153789bd9f..afb928cb7c20 100644
--- a/include/linux/slub_def.h
+++ b/include/linux/slub_def.h
@@ -99,6 +99,7 @@  struct kmem_cache {
 	gfp_t allocflags;	/* gfp flags to use on each alloc */
 	int refcount;		/* Refcount for slab cache destroy */
 	void (*ctor)(void *);
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	unsigned int inuse;		/* Offset to metadata */
 	unsigned int align;		/* Alignment */
 	unsigned int red_left_pad;	/* Left redzone padding size */
diff --git a/kernel/kexec_core.c b/kernel/kexec_core.c
index d7140447be75..be84f5f95c97 100644
--- a/kernel/kexec_core.c
+++ b/kernel/kexec_core.c
@@ -315,7 +315,7 @@  static struct page *kimage_alloc_pages(gfp_t gfp_mask, unsigned int order)
 		arch_kexec_post_alloc_pages(page_address(pages), count,
 					    gfp_mask);
 
-		if (gfp_mask & __GFP_ZERO)
+		if (want_init_memory(gfp_mask))
 			for (i = 0; i < count; i++)
 				clear_highpage(pages + i);
 	}
diff --git a/mm/dmapool.c b/mm/dmapool.c
index 76a160083506..796e38160d39 100644
--- a/mm/dmapool.c
+++ b/mm/dmapool.c
@@ -381,7 +381,7 @@  void *dma_pool_alloc(struct dma_pool *pool, gfp_t mem_flags,
 #endif
 	spin_unlock_irqrestore(&pool->lock, flags);
 
-	if (mem_flags & __GFP_ZERO)
+	if (want_init_memory(mem_flags))
 		memset(retval, 0, pool->size);
 
 	return retval;
diff --git a/mm/page_alloc.c b/mm/page_alloc.c
index d96ca5bc555b..e2a21d866ac9 100644
--- a/mm/page_alloc.c
+++ b/mm/page_alloc.c
@@ -133,6 +133,22 @@  unsigned long totalcma_pages __read_mostly;
 
 int percpu_pagelist_fraction;
 gfp_t gfp_allowed_mask __read_mostly = GFP_BOOT_MASK;
+bool want_init_allocations __read_mostly;
+EXPORT_SYMBOL(want_init_allocations);
+DEFINE_STATIC_KEY_FALSE(init_allocations);
+
+static int __init early_init_allocations(char *buf)
+{
+	int ret;
+
+	if (!buf)
+		return -EINVAL;
+	ret = kstrtobool(buf, &want_init_allocations);
+	if (want_init_allocations)
+		static_branch_enable(&init_allocations);
+	return ret;
+}
+early_param("init_allocations", early_init_allocations);
 
 /*
  * A cached value of the page's pageblock's migratetype, used when the page is
@@ -2014,7 +2030,7 @@  static void prep_new_page(struct page *page, unsigned int order, gfp_t gfp_flags
 
 	post_alloc_hook(page, order, gfp_flags);
 
-	if (!free_pages_prezeroed() && (gfp_flags & __GFP_ZERO))
+	if (!free_pages_prezeroed() && want_init_memory(gfp_flags))
 		for (i = 0; i < (1 << order); i++)
 			clear_highpage(page + i);
 
diff --git a/mm/slab.c b/mm/slab.c
index 47a380a486ee..dcc5b73cf767 100644
--- a/mm/slab.c
+++ b/mm/slab.c
@@ -3331,8 +3331,8 @@  slab_alloc_node(struct kmem_cache *cachep, gfp_t flags, int nodeid,
 	local_irq_restore(save_flags);
 	ptr = cache_alloc_debugcheck_after(cachep, flags, ptr, caller);
 
-	if (unlikely(flags & __GFP_ZERO) && ptr)
-		memset(ptr, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && ptr)
+		cachep->poison_fn(cachep, ptr);
 
 	slab_post_alloc_hook(cachep, flags, 1, &ptr);
 	return ptr;
@@ -3388,8 +3388,8 @@  slab_alloc(struct kmem_cache *cachep, gfp_t flags, unsigned long caller)
 	objp = cache_alloc_debugcheck_after(cachep, flags, objp, caller);
 	prefetchw(objp);
 
-	if (unlikely(flags & __GFP_ZERO) && objp)
-		memset(objp, 0, cachep->object_size);
+	if (unlikely(want_init_memory(flags)) && objp)
+		cachep->poison_fn(cachep, objp);
 
 	slab_post_alloc_hook(cachep, flags, 1, &objp);
 	return objp;
@@ -3596,9 +3596,9 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	cache_alloc_debugcheck_after_bulk(s, flags, size, p, _RET_IP_);
 
 	/* Clear memory outside IRQ disabled section */
-	if (unlikely(flags & __GFP_ZERO))
+	if (unlikely(want_init_memory(flags)))
 		for (i = 0; i < size; i++)
-			memset(p[i], 0, s->object_size);
+			s->poison_fn(s, p[i]);
 
 	slab_post_alloc_hook(s, flags, size, p);
 	/* FIXME: Trace call missing. Christoph would like a bulk variant */
diff --git a/mm/slab.h b/mm/slab.h
index 43ac818b8592..3b541e8970ee 100644
--- a/mm/slab.h
+++ b/mm/slab.h
@@ -27,6 +27,7 @@  struct kmem_cache {
 	const char *name;	/* Slab name for sysfs */
 	int refcount;		/* Use counter */
 	void (*ctor)(void *);	/* Called on object slot creation */
+	void (*poison_fn)(struct kmem_cache *c, void *object);
 	struct list_head list;	/* List of all slab caches on the system */
 };
 
diff --git a/mm/slab_common.c b/mm/slab_common.c
index 58251ba63e4a..37810114b2ea 100644
--- a/mm/slab_common.c
+++ b/mm/slab_common.c
@@ -360,6 +360,16 @@  struct kmem_cache *find_mergeable(unsigned int size, unsigned int align,
 	return NULL;
 }
 
+static void poison_zero(struct kmem_cache *c, void *object)
+{
+	memset(object, 0, c->object_size);
+}
+
+static void poison_dont(struct kmem_cache *c, void *object)
+{
+	/* Do nothing. Use for caches with constructors. */
+}
+
 static struct kmem_cache *create_cache(const char *name,
 		unsigned int object_size, unsigned int align,
 		slab_flags_t flags, unsigned int useroffset,
@@ -381,6 +391,10 @@  static struct kmem_cache *create_cache(const char *name,
 	s->size = s->object_size = object_size;
 	s->align = align;
 	s->ctor = ctor;
+	if (ctor)
+		s->poison_fn = poison_dont;
+	else
+		s->poison_fn = poison_zero;
 	s->useroffset = useroffset;
 	s->usersize = usersize;
 
@@ -974,6 +988,7 @@  void __init create_boot_cache(struct kmem_cache *s, const char *name,
 	s->align = calculate_alignment(flags, ARCH_KMALLOC_MINALIGN, size);
 	s->useroffset = useroffset;
 	s->usersize = usersize;
+	s->poison_fn = poison_zero;
 
 	slab_init_memcg_params(s);
 
diff --git a/mm/slob.c b/mm/slob.c
index 307c2c9feb44..18981a71e962 100644
--- a/mm/slob.c
+++ b/mm/slob.c
@@ -330,7 +330,7 @@  static void *slob_alloc(size_t size, gfp_t gfp, int align, int node)
 		BUG_ON(!b);
 		spin_unlock_irqrestore(&slob_lock, flags);
 	}
-	if (unlikely(gfp & __GFP_ZERO))
+	if (unlikely(want_init_memory(gfp)))
 		memset(b, 0, size);
 	return b;
 }
diff --git a/mm/slub.c b/mm/slub.c
index d30ede89f4a6..e4efb6575510 100644
--- a/mm/slub.c
+++ b/mm/slub.c
@@ -2750,8 +2750,8 @@  static __always_inline void *slab_alloc_node(struct kmem_cache *s,
 		stat(s, ALLOC_FASTPATH);
 	}
 
-	if (unlikely(gfpflags & __GFP_ZERO) && object)
-		memset(object, 0, s->object_size);
+	if (unlikely(want_init_memory(gfpflags)) && object)
+		s->poison_fn(s, object);
 
 	slab_post_alloc_hook(s, gfpflags, 1, &object);
 
@@ -3172,11 +3172,11 @@  int kmem_cache_alloc_bulk(struct kmem_cache *s, gfp_t flags, size_t size,
 	local_irq_enable();
 
 	/* Clear memory outside IRQ disabled fastpath loop */
-	if (unlikely(flags & __GFP_ZERO)) {
+	if (unlikely(want_init_memory(flags))) {
 		int j;
 
 		for (j = 0; j < i; j++)
-			memset(p[j], 0, s->object_size);
+			s->poison_fn(s, p[j]);
 	}
 
 	/* memcg and kmem_cache debug support */
diff --git a/net/core/sock.c b/net/core/sock.c
index 782343bb925b..99b288a19b39 100644
--- a/net/core/sock.c
+++ b/net/core/sock.c
@@ -1601,7 +1601,7 @@  static struct sock *sk_prot_alloc(struct proto *prot, gfp_t priority,
 		sk = kmem_cache_alloc(slab, priority & ~__GFP_ZERO);
 		if (!sk)
 			return sk;
-		if (priority & __GFP_ZERO)
+		if (want_init_memory(priority))
 			sk_prot_clear_nulls(sk, prot->obj_size);
 	} else
 		sk = kmalloc(prot->obj_size, priority);