diff mbox series

mm: make __GFP_SKIP_ZERO visible to skip zero operation

Message ID 20230831105252.1385911-1-zhaoyang.huang@unisoc.com (mailing list archive)
State New
Headers show
Series mm: make __GFP_SKIP_ZERO visible to skip zero operation | expand

Commit Message

zhaoyang.huang Aug. 31, 2023, 10:52 a.m. UTC
From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>

There is no explicit gfp flags to let the allocation skip zero
operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
__GFP_SKIP_ZERO be visible even if kasan is not configured.

Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
---
 include/linux/gfp_types.h | 3 +--
 1 file changed, 1 insertion(+), 2 deletions(-)
 mode change 100644 => 100755 include/linux/gfp_types.h

Comments

Matthew Wilcox (Oracle) Aug. 31, 2023, 12:16 p.m. UTC | #1
On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> 
> There is no explicit gfp flags to let the allocation skip zero
> operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> __GFP_SKIP_ZERO be visible even if kasan is not configured.

This bypasses a security feature so you're going to have to do a little
better than "I want it".

> Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> ---
>  include/linux/gfp_types.h | 3 +--
>  1 file changed, 1 insertion(+), 2 deletions(-)
>  mode change 100644 => 100755 include/linux/gfp_types.h

What is this garbage?  Header files should not be executable.
Zhaoyang Huang Sept. 1, 2023, 10:29 a.m. UTC | #2
loop alex

On Thu, Aug 31, 2023 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
>
> On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> >
> > There is no explicit gfp flags to let the allocation skip zero
> > operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> > __GFP_SKIP_ZERO be visible even if kasan is not configured.
>
> This bypasses a security feature so you're going to have to do a little
> better than "I want it".
Thanks for pointing this out. What I want to do is to give the user a
way to exempt some types of pages from being zeroed, which could help
on performance issues.  Could we have the most safety concern admin
use INIT_ON_FREE while the less concerned use INIT_ON_ALLOC &
__GFP_SKIP_ZERO as a light version method?
>
> > Signed-off-by: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > ---
> >  include/linux/gfp_types.h | 3 +--
> >  1 file changed, 1 insertion(+), 2 deletions(-)
> >  mode change 100644 => 100755 include/linux/gfp_types.h
>
> What is this garbage?  Header files should not be executable.
sorry for that, will remove
Alexander Potapenko Sept. 1, 2023, 12:55 p.m. UTC | #3
On Fri, Sep 1, 2023 at 12:29 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
>
> loop alex

(adding more people who took part in the previous discussions)

>
> On Thu, Aug 31, 2023 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > There is no explicit gfp flags to let the allocation skip zero
> > > operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> > > __GFP_SKIP_ZERO be visible even if kasan is not configured.

Hi all,

This is a recurring question, as people keep encountering performance
problems on systems with init_on_alloc=1
(https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1862822 being
one of the examples).

Brad Spengler has also pointed out
(https://twitter.com/spendergrsec/status/1296461651659694082) that
there are cases where there is no security vs. performance tradeoff
(e.g. kmemdup() and kstrdup()).

An opt-out flag was included in the initial init_on_alloc series, but
back then Michal Hocko has noted that it might easily get out of
control: https://patchwork.kernel.org/project/linux-hardening/patch/20190418154208.131118-2-glider@google.com/#22600229.

Now that init_on_alloc is actually being used by people which may have
different preferences wrt. security and performance (in the cases
where this tradeoff exists), we must be very careful with the opt-out
GFP flag. Not initializing a particular allocation site in the
upstream kernel will affect every downstream user, and some may
consider this a security regression.
Another problematic case is an OS vendor mandating init_on_alloc
everywhere, but a third party driver vendor doing kmalloc(...,
__GFP_SKIP_ZERO) for their allocations.

So I think a working opt-out scheme for the heap initialization should
be two-step:
1. The code owner may decide that a particular allocation site needs
an opt-out, and make the upstream code change;
2. The OS vendor has the ability to override that decision for the
kernel they ship without the need to patch the source.

Let me quoute the idea briefly outlined at
https://lore.kernel.org/lkml/CAG_fn=UQEuvJ9WXou_sW3moHcVQZJ9NvJ5McNcsYE8xw_WEYGw@mail.gmail.com/
(unfortunately the discussion got derailed a bit):

"""
1. Add a ___GFP_SKIP_ZERO flag that is not intended to be used
explicitly (e.g. disallow passing it to kmalloc(), add a checkpatch.pl
warning). Explicitly passing an opt-out flag to allocation functions
was considered harmful previously:
https://lore.kernel.org/kernel-hardening/20190423083148.GF25106@dhcp22.suse.cz/

2. Define new allocation API that will allow opt-out:
  struct page *alloc_pages_uninit(gfp_t gfp, unsigned int order, const
char *key);
  void *kmalloc_uninit(size_t size, gfp_t flags, const char *key);
  void *kmem_cache_alloc_uninit(struct kmem_cache *, gfp_t flags,
const char *key);
, where @key is an arbitrary string that identifies a single
allocation or a group of allocations.

3. Provide a boot-time flag that holds a comma-separated list of
opt-out keys that actually take effect:
  init_on_alloc.skip="xyz-camera-driver,some_big_buffer".
"""

A draft implementation at
https://github.com/ramosian-glider/linux/commit/00791be14eb1113eae615c74b652f94b5cc3c336
(which probably does not apply anymore) may give some insight into how
this is supposed to work.
There's plenty of room for bikeshedding here (does the command-line
flag opt-in or opt-out? should we use function names instead of some
"keys"? can we allow overriding every allocation site without the need
for alloc_pages_uninit()?), but if the overall scheme is viable I can
probably proceed with an RFC.

> >
> > This bypasses a security feature so you're going to have to do a little
> > better than "I want it".
> Thanks for pointing this out. What I want to do is to give the user a
> way to exempt some types of pages from being zeroed, which could help
> on performance issues.  Could we have the most safety concern admin
> use INIT_ON_FREE while the less concerned use INIT_ON_ALLOC &
> __GFP_SKIP_ZERO as a light version method?

init_on_free has a more significant performance impact, and might be
more problematic for production use (even more opt-outs would've been
needed).

As a side note, I don't think we should repurpose the same
__GFP_SKIP_ZERO flag used by KASAN, because the semantics of the flags
may not match 1:1.
Kees Cook Sept. 1, 2023, 6:32 p.m. UTC | #4
On Fri, Sep 01, 2023 at 02:55:17PM +0200, Alexander Potapenko wrote:
> A draft implementation at
> https://github.com/ramosian-glider/linux/commit/00791be14eb1113eae615c74b652f94b5cc3c336
> (which probably does not apply anymore) may give some insight into how
> this is supposed to work.
> There's plenty of room for bikeshedding here (does the command-line
> flag opt-in or opt-out? should we use function names instead of some
> "keys"? can we allow overriding every allocation site without the need
> for alloc_pages_uninit()?), but if the overall scheme is viable I can
> probably proceed with an RFC.

This is my preferred direction to go with this idea (though I agree some
internals could be partially whitelisted: the "dup" functions need to
wipe the trailing rounded-up bucket size bytes still).
Michal Hocko Sept. 4, 2023, 7:31 a.m. UTC | #5
On Fri 01-09-23 18:29:07, Zhaoyang Huang wrote:
> loop alex
> 
> On Thu, Aug 31, 2023 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
> >
> > On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > >
> > > There is no explicit gfp flags to let the allocation skip zero
> > > operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> > > __GFP_SKIP_ZERO be visible even if kasan is not configured.
> >
> > This bypasses a security feature so you're going to have to do a little
> > better than "I want it".
> Thanks for pointing this out. What I want to do is to give the user a
> way to exempt some types of pages from being zeroed, which could help
> on performance issues.

Could you be more explicit about those users? Your patch doesn't add
any.

> Could we have the most safety concern admin
> use INIT_ON_FREE while the less concerned use INIT_ON_ALLOC &
> __GFP_SKIP_ZERO as a light version method?

Are you suggesting the __GFP_SKIP_ZERO would be ignored in any setups
except for init_on_alloc?
Michal Hocko Sept. 4, 2023, 7:54 a.m. UTC | #6
On Fri 01-09-23 14:55:17, Alexander Potapenko wrote:
> On Fri, Sep 1, 2023 at 12:29 PM Zhaoyang Huang <huangzhaoyang@gmail.com> wrote:
> >
> > loop alex
> 
> (adding more people who took part in the previous discussions)
> 
> >
> > On Thu, Aug 31, 2023 at 8:16 PM Matthew Wilcox <willy@infradead.org> wrote:
> > >
> > > On Thu, Aug 31, 2023 at 06:52:52PM +0800, zhaoyang.huang wrote:
> > > > From: Zhaoyang Huang <zhaoyang.huang@unisoc.com>
> > > >
> > > > There is no explicit gfp flags to let the allocation skip zero
> > > > operation when CONFIG_INIT_ON_ALLOC_DEFAULT_ON=y. I would like to make
> > > > __GFP_SKIP_ZERO be visible even if kasan is not configured.
> 
> Hi all,
> 
> This is a recurring question, as people keep encountering performance
> problems on systems with init_on_alloc=1
> (https://bugs.launchpad.net/ubuntu/+source/linux/+bug/1862822 being
> one of the examples).
> 
> Brad Spengler has also pointed out
> (https://twitter.com/spendergrsec/status/1296461651659694082) that
> there are cases where there is no security vs. performance tradeoff
> (e.g. kmemdup() and kstrdup()).
> 
> An opt-out flag was included in the initial init_on_alloc series, but
> back then Michal Hocko has noted that it might easily get out of
> control: https://patchwork.kernel.org/project/linux-hardening/patch/20190418154208.131118-2-glider@google.com/#22600229.

I still maintain my opinion. I really do not like the idea of mixing
concepts of init_on_alloc (which is pretty much security oriented) and
an opt out flag which bypasses it. Sooner or later this will become an
unreviewable mess so the value of init_on_alloc will become very
dubious.

> Now that init_on_alloc is actually being used by people which may have
> different preferences wrt. security and performance (in the cases
> where this tradeoff exists), we must be very careful with the opt-out
> GFP flag. Not initializing a particular allocation site in the
> upstream kernel will affect every downstream user, and some may
> consider this a security regression.

Fully agreed!

> Another problematic case is an OS vendor mandating init_on_alloc
> everywhere, but a third party driver vendor doing kmalloc(...,
> __GFP_SKIP_ZERO) for their allocations.

Exactly. This allows to sniff into driver unrelated proper and allow a
whole class of isssues.

> So I think a working opt-out scheme for the heap initialization should
> be two-step:
> 1. The code owner may decide that a particular allocation site needs
> an opt-out, and make the upstream code change;
> 2. The OS vendor has the ability to override that decision for the
> kernel they ship without the need to patch the source.

Practically speaking this would require a new mode
init_on_alloc_but_potentially_unsafe

Another option would be to provide a simple page allocator wrapper that
would allow to recycle pages for a particular user or providing a slab
cache that would achieve the same thing. This would be still a bit
quetiongable because the user could be seeing stale data but less of a
problem than crossing propers and potentially security domains.
Linus Torvalds Sept. 4, 2023, 5:34 p.m. UTC | #7
On Mon, 4 Sept 2023 at 00:55, Michal Hocko <mhocko@suse.com> wrote:
>
>       Sooner or later this will become an
> unreviewable mess so the value of init_on_alloc will become very
> dubious.

The value of init_on_alloc is *already* very dubious.

Exactly because people will turn it off, because it hurts performance
so much - and in pointless ways.

You do realize that distributions - well, at least Fedora - simply
don't turn INIT_ON_ALLOC_DEFAULT_ON on at all?

So the current state of init_on_alloc is that nobody sane uses it. You
have to think you're special to enable it, because it is *so* bad.

Security people need to realize that the primary point of computing is
NEVER EVER security. Security is entirely pointless without a usable
system.

Unless security people realize that they are always secondary, they
aren't security people, they are just random wankers.

And people who state this truism had better not get shamed for
standing up to stupidity.

             Linus
Eric Biggers Sept. 4, 2023, 6:22 p.m. UTC | #8
On Mon, Sep 04, 2023 at 10:34:25AM -0700, Linus Torvalds wrote:
> On Mon, 4 Sept 2023 at 00:55, Michal Hocko <mhocko@suse.com> wrote:
> >
> >       Sooner or later this will become an
> > unreviewable mess so the value of init_on_alloc will become very
> > dubious.
> 
> The value of init_on_alloc is *already* very dubious.
> 
> Exactly because people will turn it off, because it hurts performance
> so much - and in pointless ways.
> 
> You do realize that distributions - well, at least Fedora - simply
> don't turn INIT_ON_ALLOC_DEFAULT_ON on at all?
> 
> So the current state of init_on_alloc is that nobody sane uses it. You
> have to think you're special to enable it, because it is *so* bad.
> 
> Security people need to realize that the primary point of computing is
> NEVER EVER security. Security is entirely pointless without a usable
> system.
> 
> Unless security people realize that they are always secondary, they
> aren't security people, they are just random wankers.
> 
> And people who state this truism had better not get shamed for
> standing up to stupidity.
> 

Android and Ubuntu both set CONFIG_INIT_ON_ALLOC_DEFAULT_ON.  I think this makes
it clear that the init-on-alloc feature is useful for a substantial amount of
users even in its current form.

I would caution against checking the kernel config for the distro you happen to
be using and extrapolating that to all Linux systems.

Regardless, I'm in favor of a per allocation opt-out flag like GFP_SKIP_ZERO.
There are clear cases where it makes sense, for example some places in the VFS
where the performance impact is large and the code has been carefully reviewed.

I don't really like the idea
(https://lore.kernel.org/lkml/CAG_fn=UQEuvJ9WXou_sW3moHcVQZJ9NvJ5McNcsYE8xw_WEYGw@mail.gmail.com/)
of making the system administrator have to opt out allocation sites individually
via a kernel command-line parameter.  Yes, it makes the opt out less likely to
be abused as two parties (developer and system administrator) have to consent to
each individual opt out.  So it theory it sounds good.  But I feel that doesn't
outweigh the fact that it would be complicated and hard to use.  How about just
having two options: one to always honor GFP_SKIP_ZERO in the code and one to
always ignore it.

- Eric
Alistair Popple Sept. 5, 2023, 2:25 a.m. UTC | #9
Eric Biggers <ebiggers@kernel.org> writes:

> On Mon, Sep 04, 2023 at 10:34:25AM -0700, Linus Torvalds wrote:
>> On Mon, 4 Sept 2023 at 00:55, Michal Hocko <mhocko@suse.com> wrote:
>> >
>> >       Sooner or later this will become an
>> > unreviewable mess so the value of init_on_alloc will become very
>> > dubious.
>> 
>> The value of init_on_alloc is *already* very dubious.
>> 
>> Exactly because people will turn it off, because it hurts performance
>> so much - and in pointless ways.
>> 
>> You do realize that distributions - well, at least Fedora - simply
>> don't turn INIT_ON_ALLOC_DEFAULT_ON on at all?
>> 
>> So the current state of init_on_alloc is that nobody sane uses it. You
>> have to think you're special to enable it, because it is *so* bad.
>> 
>> Security people need to realize that the primary point of computing is
>> NEVER EVER security. Security is entirely pointless without a usable
>> system.
>> 
>> Unless security people realize that they are always secondary, they
>> aren't security people, they are just random wankers.
>> 
>> And people who state this truism had better not get shamed for
>> standing up to stupidity.
>> 
>
> Android and Ubuntu both set CONFIG_INIT_ON_ALLOC_DEFAULT_ON.  I think this makes
> it clear that the init-on-alloc feature is useful for a substantial amount of
> users even in its current form.
>
> I would caution against checking the kernel config for the distro you happen to
> be using and extrapolating that to all Linux systems.
>
> Regardless, I'm in favor of a per allocation opt-out flag like GFP_SKIP_ZERO.
> There are clear cases where it makes sense, for example some places in the VFS
> where the performance impact is large and the code has been carefully reviewed.

The performance impact for some drivers is also large. This came up
recently where we've seen some large (2-3x) slowdowns with our GPU
driver on some systems with Ubuntu due to having init_on_alloc
enabled. This is exacerbated by the fact drivers can't rely on having
this option set and would generally rather zero memory themselves with
faster DMA engines or by overwitting it directly with valid data. So the
end result is memory gets zeroed/initialised twice and we have to
recommend our users turn this off anyway.

So a per-allocation flag would be nice. Of course I'm not claiming
driver code has been as carefully reviewed as VFS code so maybe there is
need for an override, but the performance impact is large. If it would
be helpful/motivating I can get actual performance numbers that show the
impact to post here.

> I don't really like the idea
> (https://lore.kernel.org/lkml/CAG_fn=UQEuvJ9WXou_sW3moHcVQZJ9NvJ5McNcsYE8xw_WEYGw@mail.gmail.com/)
> of making the system administrator have to opt out allocation sites individually
> via a kernel command-line parameter.  Yes, it makes the opt out less likely to
> be abused as two parties (developer and system administrator) have to consent to
> each individual opt out.  So it theory it sounds good.  But I feel that doesn't
> outweigh the fact that it would be complicated and hard to use.  How about just
> having two options: one to always honor GFP_SKIP_ZERO in the code and one to
> always ignore it.
>
> - Eric
Alexander Potapenko Sept. 6, 2023, 2:17 p.m. UTC | #10
On Mon, Sep 4, 2023 at 8:22 PM Eric Biggers <ebiggers@kernel.org> wrote:
>
> On Mon, Sep 04, 2023 at 10:34:25AM -0700, Linus Torvalds wrote:
> > On Mon, 4 Sept 2023 at 00:55, Michal Hocko <mhocko@suse.com> wrote:
> > >
> > >       Sooner or later this will become an
> > > unreviewable mess so the value of init_on_alloc will become very
> > > dubious.
> >
> > The value of init_on_alloc is *already* very dubious.
> >
> > Exactly because people will turn it off, because it hurts performance
> > so much - and in pointless ways.
> >
> > You do realize that distributions - well, at least Fedora - simply
> > don't turn INIT_ON_ALLOC_DEFAULT_ON on at all?
> >
> > So the current state of init_on_alloc is that nobody sane uses it. You
> > have to think you're special to enable it, because it is *so* bad.
> >
> > Security people need to realize that the primary point of computing is
> > NEVER EVER security. Security is entirely pointless without a usable
> > system.
> >
> > Unless security people realize that they are always secondary, they
> > aren't security people, they are just random wankers.
> >
> > And people who state this truism had better not get shamed for
> > standing up to stupidity.
> >
>
> Android and Ubuntu both set CONFIG_INIT_ON_ALLOC_DEFAULT_ON.  I think this makes
> it clear that the init-on-alloc feature is useful for a substantial amount of
> users even in its current form.
>
> I would caution against checking the kernel config for the distro you happen to
> be using and extrapolating that to all Linux systems.
>
> Regardless, I'm in favor of a per allocation opt-out flag like GFP_SKIP_ZERO.
> There are clear cases where it makes sense, for example some places in the VFS
> where the performance impact is large and the code has been carefully reviewed.

What are our options to prevent this flag from spreading uncontrollably?
Would it make sense to still provide a separate API for such
allocations, so that the flag doesn't get added into some module-level
`gfp` variable?

>
> I don't really like the idea
> (https://lore.kernel.org/lkml/CAG_fn=UQEuvJ9WXou_sW3moHcVQZJ9NvJ5McNcsYE8xw_WEYGw@mail.gmail.com/)
> of making the system administrator have to opt out allocation sites individually
> via a kernel command-line parameter.  Yes, it makes the opt out less likely to
> be abused as two parties (developer and system administrator) have to consent to
> each individual opt out.  So it theory it sounds good.  But I feel that doesn't
> outweigh the fact that it would be complicated and hard to use.  How about just
> having two options: one to always honor GFP_SKIP_ZERO in the code and one to
> always ignore it.

I am afraid we still need some level of granularity here.
E.g. we definitely want to skip initialization for kstrdup(),
kmemdup() and friends, some would say even on the systems running with
init_on_alloc=1.
For e.g. 3rd party driver allocations we also need an opt-out flag,
but the need for a kill switch to disable that flag is higher. On the
other hand, that kill switch does not have to disable the carefully
reviewed opt-outs in the upstream code.

Assuming that OS vendors usually control their kernel source, we can
probably distinguish between opting out the allocations done within
statically linked code and the modules, introducing the following
levels of initialization that could be controlled by init_on_alloc:
 - init_on_alloc=1 - initialize all allocations, like it's done currently
 - init_on_alloc=except_static_optouts - initialize all allocations,
allow GFP_SKIP_ZERO in the statically linked code
 - init_on_alloc=except_optouts - initialize all allocations, allow
GFP_SKIP_ZERO in the kernel and the modules
 - init_on_alloc=0 - do not initialize allocations by default

In this scheme, the system administrator may choose to either be
paranoid, or choose to trust just their OS vendor, or trust the driver
vendors as well.
In any case, it will be possible to dynamically pull the plug on the opt-outs.

>
> - Eric




--
Alexander Potapenko
Software Engineer

Google Germany GmbH
Erika-Mann-Straße, 33
80636 München

Geschäftsführer: Paul Manicle, Liana Sebastian
Registergericht und -nummer: Hamburg, HRB 86891
Sitz der Gesellschaft: Hamburg
diff mbox series

Patch

diff --git a/include/linux/gfp_types.h b/include/linux/gfp_types.h
old mode 100644
new mode 100755
index d88c46ca82e1..4e9d50bba269
--- a/include/linux/gfp_types.h
+++ b/include/linux/gfp_types.h
@@ -46,12 +46,11 @@  typedef unsigned int __bitwise gfp_t;
 #define ___GFP_THISNODE		0x200000u
 #define ___GFP_ACCOUNT		0x400000u
 #define ___GFP_ZEROTAGS		0x800000u
+#define ___GFP_SKIP_ZERO	0x1000000u
 #ifdef CONFIG_KASAN_HW_TAGS
-#define ___GFP_SKIP_ZERO		0x1000000u
 #define ___GFP_SKIP_KASAN_UNPOISON	0x2000000u
 #define ___GFP_SKIP_KASAN_POISON	0x4000000u
 #else
-#define ___GFP_SKIP_ZERO		0
 #define ___GFP_SKIP_KASAN_UNPOISON	0
 #define ___GFP_SKIP_KASAN_POISON	0
 #endif