diff mbox series

mm: Make kvmalloc refuse to allocate more than 2GB

Message ID 20210721184131.2264356-1-willy@infradead.org (mailing list archive)
State New
Headers show
Series mm: Make kvmalloc refuse to allocate more than 2GB | expand

Commit Message

Matthew Wilcox (Oracle) July 21, 2021, 6:41 p.m. UTC
It's generally dangerous to allocate such large quantities of memory
within the kernel owing to our propensity to use 'int' to represent
a length.  If somebody really needs it, we can add a kvmalloc_large()
later, but let's default to "You can't allocate that much memory".

Signed-off-by: Matthew Wilcox (Oracle) <willy@infradead.org>
---
 fs/seq_file.c                                     | 3 ---
 mm/util.c                                         | 7 +++++++
 2 files changed, 7 insertions(+), 3 deletions(-)

Comments

Linus Torvalds July 21, 2021, 8:46 p.m. UTC | #1
On Wed, Jul 21, 2021 at 11:42 AM Matthew Wilcox (Oracle)
<willy@infradead.org> wrote:
>
> It's generally dangerous to allocate such large quantities of memory
> within the kernel owing to our propensity to use 'int' to represent
> a length.  If somebody really needs it, we can add a kvmalloc_large()
> later, but let's default to "You can't allocate that much memory".

I really think that without the WARN_ON_ONCE(), this is just moving
that failure point from a known good place ("we know this must not
succeed") to a possibly bad place ("this might cause silent and
hard-to-understand failures elsewhere").

IOW, in seq_buf_alloc() there's no need to warn. It's clear that a
bigger allocation can never be valid.

But in kvmalloc(), it needs to warn, because if it ever triggers we
need to check what triggered it.

So this is not just moving code from one place to another equivalent one.

                 Linus
Matthew Wilcox (Oracle) July 22, 2021, 12:14 a.m. UTC | #2
On Wed, Jul 21, 2021 at 01:46:09PM -0700, Linus Torvalds wrote:
> On Wed, Jul 21, 2021 at 11:42 AM Matthew Wilcox (Oracle)
> <willy@infradead.org> wrote:
> >
> > It's generally dangerous to allocate such large quantities of memory
> > within the kernel owing to our propensity to use 'int' to represent
> > a length.  If somebody really needs it, we can add a kvmalloc_large()
> > later, but let's default to "You can't allocate that much memory".
> 
> I really think that without the WARN_ON_ONCE(), this is just moving
> that failure point from a known good place ("we know this must not
> succeed") to a possibly bad place ("this might cause silent and
> hard-to-understand failures elsewhere").

To a certain extent, yes.  On the other hand, if you don't have any
error handling on your kvmalloc of 2GB, Qualys seems to have a reliable
way to run you out of vmalloc space, and that's going to get exercised.

My initial thought was to leverage the existing __GFP_NOWARN code:

        if (size > PAGE_SIZE) {
-               kmalloc_flags |= __GFP_NOWARN;
+               if (size <= INT_MAX)
+                       kmalloc_flags |= __GFP_NOWARN;

because that dumps some interesting information (ratelimited), which
might help the sysadmin realise they're under attack.  A WARN_ON_ONCE
is one-and-done, so an attacker can hide their tracks.  Unfortunately,
we actually bail out before getting there:

        if (unlikely(order >= MAX_ORDER)) {
                WARN_ON_ONCE(!(gfp & __GFP_NOWARN));
                return NULL;
        }

... maybe that should call warn_alloc() too.

So I'm now thinking (relative to the earlier patch):

-       if (size > INT_MAX)
+       if (size > INT_MAX) {
+               warn_alloc(flags, NULL, "oversized allocation:%zu", size);
                return NULL;
+       }
Theodore Ts'o July 22, 2021, 2:43 p.m. UTC | #3
On Wed, Jul 21, 2021 at 07:41:31PM +0100, Matthew Wilcox (Oracle) wrote:
> It's generally dangerous to allocate such large quantities of memory
> within the kernel owing to our propensity to use 'int' to represent
> a length.  If somebody really needs it, we can add a kvmalloc_large()
> later, but let's default to "You can't allocate that much memory".

If we really need it, maybe we can add a GFP_LARGE_ALLOC to allow
allocations larger than 2GB later on?  I can't quite see why that
would ever be needed, but that's probably a failure of my imagination.  :-)

      	      	      	  	 - Ted
Michal Hocko July 27, 2021, 7:38 a.m. UTC | #4
On Wed 21-07-21 19:41:31, Matthew Wilcox wrote:
> It's generally dangerous to allocate such large quantities of memory
> within the kernel owing to our propensity to use 'int' to represent
> a length.  If somebody really needs it, we can add a kvmalloc_large()
> later, but let's default to "You can't allocate that much memory".

I do agree that limiting kvmalloc allocation size is a reasonable thing
to do but I do not really see why we should remove the check from
seq_buf_alloc. Implicitly relying on kvmalloc to workaround a bug that
was in seq_buf code seems like a step backwards to me.
diff mbox series

Patch

diff --git a/fs/seq_file.c b/fs/seq_file.c
index 4a2cda04d3e2..b117b212ef28 100644
--- a/fs/seq_file.c
+++ b/fs/seq_file.c
@@ -32,9 +32,6 @@  static void seq_set_overflow(struct seq_file *m)
 
 static void *seq_buf_alloc(unsigned long size)
 {
-	if (unlikely(size > MAX_RW_COUNT))
-		return NULL;
-
 	return kvmalloc(size, GFP_KERNEL_ACCOUNT);
 }
 
diff --git a/mm/util.c b/mm/util.c
index 9043d03750a7..8ff2a8924d5f 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -593,6 +593,13 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	if (ret || size <= PAGE_SIZE)
 		return ret;
 
+	/*
+	 * Succeeding for sizes above 2GiB can lead to truncation if
+	 * someone casts the size to an int.
+	 */
+	if (size > INT_MAX)
+		return NULL;
+
 	return __vmalloc_node(size, 1, flags, node,
 			__builtin_return_address(0));
 }