diff mbox

mm: kvmalloc does not fallback to vmalloc for incompatible gfp flags

Message ID 20180601115329.27807-1-mhocko@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Michal Hocko June 1, 2018, 11:53 a.m. UTC
From: Michal Hocko <mhocko@suse.com>

kvmalloc warned about incompatible gfp_mask to catch abusers (mostly
GFP_NOFS) with an intention that this will motivate authors of the code
to fix those. Linus argues that this just motivates people to do even
more hacks like
	if (gfp == GFP_KERNEL)
		kvmalloc
	else
		kmalloc

I haven't seen this happening much (Linus pointed to bucket_lock special
cases an atomic allocation but my git foo hasn't found much more) but
it is true that we can grow those in future. Therefore Linus suggested
to simply not fallback to vmalloc for incompatible gfp flags and rather
stick with the kmalloc path.

Requested-by: Linus Torvalds <torvalds@linux-foundation.org>
Signed-off-by: Michal Hocko <mhocko@suse.com>
---

Hi Andrew,
for more context. Linus has pointed out [1] that our (well mine)
insisting on GFP_KERNEL compatible gfp flags for kvmalloc* can actually
lead to a worse code because people will work around the restriction.
So this patch allows kvmalloc to be more permissive and silently skip
vmalloc path for incompatible gfp flags. This will not help my original
plan to enforce people to think about GFP_NOFS usage more deeply but
I can live with that obviously...

alloc_bucket_spinlocks is the only place I could find which special
cases kvmalloc based on the gfp mask.

[1] http://lkml.kernel.org/r/CA+55aFxvNCEBQsxfr=yL3jgxiC8M8wY2MHwVBH+T8qSWyP-WPg@mail.gmail.com

 lib/bucket_locks.c | 5 +----
 mm/util.c          | 6 ++++--
 2 files changed, 5 insertions(+), 6 deletions(-)

Comments

Linus Torvalds June 2, 2018, 4:43 p.m. UTC | #1
On Fri, Jun 1, 2018 at 4:53 AM Michal Hocko <mhocko@kernel.org> wrote:
>
> for more context. Linus has pointed out [1] that our (well mine)
> insisting on GFP_KERNEL compatible gfp flags for kvmalloc* can actually
> lead to a worse code because people will work around the restriction.
> So this patch allows kvmalloc to be more permissive and silently skip
> vmalloc path for incompatible gfp flags.

Ack.

> This will not help my original
> plan to enforce people to think about GFP_NOFS usage more deeply but
> I can live with that obviously...

Is it NOFS in particular you care about? The only reason for that
should be the whole "don't recurse", and I think the naming is
historical and slightly odd.

It was historically just about allocations that were in the writeout
path for a block layer or filesystem - and the name made sense in that
context. These days, I think it's just shorthand for "you can do
simple direct reclaim from the mm itself, but you can't  block or call
anything else".

So I think the name and the semantics are a bit unclear, but it's
obviously still useful.

It's entirely possible that direct reclaim should never do any of the
more complicated callback cases anyway, but we'd still need the whole
"don't wait for the complex case" logic to avoid deadlocks.

           Linus
Michal Hocko June 4, 2018, 6:37 a.m. UTC | #2
On Sat 02-06-18 09:43:56, Linus Torvalds wrote:
> On Fri, Jun 1, 2018 at 4:53 AM Michal Hocko <mhocko@kernel.org> wrote:
> >
> > for more context. Linus has pointed out [1] that our (well mine)
> > insisting on GFP_KERNEL compatible gfp flags for kvmalloc* can actually
> > lead to a worse code because people will work around the restriction.
> > So this patch allows kvmalloc to be more permissive and silently skip
> > vmalloc path for incompatible gfp flags.
> 
> Ack.
> 
> > This will not help my original
> > plan to enforce people to think about GFP_NOFS usage more deeply but
> > I can live with that obviously...
> 
> Is it NOFS in particular you care about?

Yes, mostly.

> The only reason for that
> should be the whole "don't recurse", and I think the naming is
> historical and slightly odd.
> 
> It was historically just about allocations that were in the writeout
> path for a block layer or filesystem - and the name made sense in that
> context. These days, I think it's just shorthand for "you can do
> simple direct reclaim from the mm itself, but you can't  block or call
> anything else".

It is still mostly used by fs code these days. There are few exceptions
though. David Chinner mentioned some DRM code which does use NOFS to
prevent recursing into their slab shrinkers.

> So I think the name and the semantics are a bit unclear, but it's
> obviously still useful.

agreed
 
> It's entirely possible that direct reclaim should never do any of the
> more complicated callback cases anyway, but we'd still need the whole
> "don't wait for the complex case" logic to avoid deadlocks.

This is problematic because we can sit on a huge amount of reclaimable
memory and the direct reclaim is the only context to trigger the oom
killer so we have to either find some other way to do the same or invoke
even the complex reclaimers. My long term plan was to convert direct NOFS
users to the scope API (see memalloc_no{fs,io}_{save,restore}) which
would mark "reclaim recursion critical sections" and all allocations
within that scope would not trigger shrinkers that could deadlock. The
current API is quite coarse but there are plans to make it more fine
grained.

Anyway, this is not directly related to this patch. Current kvmalloc
users seem to be GFP_KERNEL compliant. Let's hope it stays that way.
diff mbox

Patch

diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c
index 266a97c5708b..ade3ce6c4af6 100644
--- a/lib/bucket_locks.c
+++ b/lib/bucket_locks.c
@@ -30,10 +30,7 @@  int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask,
 	}
 
 	if (sizeof(spinlock_t) != 0) {
-		if (gfpflags_allow_blocking(gfp))
-			tlocks = kvmalloc(size * sizeof(spinlock_t), gfp);
-		else
-			tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp);
+		tlocks = kvmalloc_array(size, sizeof(spinlock_t), gfp);
 		if (!tlocks)
 			return -ENOMEM;
 		for (i = 0; i < size; i++)
diff --git a/mm/util.c b/mm/util.c
index 45fc3169e7b0..c6586c146995 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -391,7 +391,8 @@  EXPORT_SYMBOL(vm_mmap);
  * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is
  * preferable to the vmalloc fallback, due to visible performance drawbacks.
  *
- * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people.
+ * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not
+ * fall back to vmalloc.
  */
 void *kvmalloc_node(size_t size, gfp_t flags, int node)
 {
@@ -402,7 +403,8 @@  void *kvmalloc_node(size_t size, gfp_t flags, int node)
 	 * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables)
 	 * so the given set of flags has to be compatible.
 	 */
-	WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL);
+	if ((flags & GFP_KERNEL) != GFP_KERNEL)
+		return kmalloc_node(size, flags, node);
 
 	/*
 	 * We want to attempt a large physically contiguous block first because