Message ID | 20180601115329.27807-1-mhocko@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Fri, Jun 1, 2018 at 4:53 AM Michal Hocko <mhocko@kernel.org> wrote: > > for more context. Linus has pointed out [1] that our (well mine) > insisting on GFP_KERNEL compatible gfp flags for kvmalloc* can actually > lead to a worse code because people will work around the restriction. > So this patch allows kvmalloc to be more permissive and silently skip > vmalloc path for incompatible gfp flags. Ack. > This will not help my original > plan to enforce people to think about GFP_NOFS usage more deeply but > I can live with that obviously... Is it NOFS in particular you care about? The only reason for that should be the whole "don't recurse", and I think the naming is historical and slightly odd. It was historically just about allocations that were in the writeout path for a block layer or filesystem - and the name made sense in that context. These days, I think it's just shorthand for "you can do simple direct reclaim from the mm itself, but you can't block or call anything else". So I think the name and the semantics are a bit unclear, but it's obviously still useful. It's entirely possible that direct reclaim should never do any of the more complicated callback cases anyway, but we'd still need the whole "don't wait for the complex case" logic to avoid deadlocks. Linus
On Sat 02-06-18 09:43:56, Linus Torvalds wrote: > On Fri, Jun 1, 2018 at 4:53 AM Michal Hocko <mhocko@kernel.org> wrote: > > > > for more context. Linus has pointed out [1] that our (well mine) > > insisting on GFP_KERNEL compatible gfp flags for kvmalloc* can actually > > lead to a worse code because people will work around the restriction. > > So this patch allows kvmalloc to be more permissive and silently skip > > vmalloc path for incompatible gfp flags. > > Ack. > > > This will not help my original > > plan to enforce people to think about GFP_NOFS usage more deeply but > > I can live with that obviously... > > Is it NOFS in particular you care about? Yes, mostly. > The only reason for that > should be the whole "don't recurse", and I think the naming is > historical and slightly odd. > > It was historically just about allocations that were in the writeout > path for a block layer or filesystem - and the name made sense in that > context. These days, I think it's just shorthand for "you can do > simple direct reclaim from the mm itself, but you can't block or call > anything else". It is still mostly used by fs code these days. There are few exceptions though. David Chinner mentioned some DRM code which does use NOFS to prevent recursing into their slab shrinkers. > So I think the name and the semantics are a bit unclear, but it's > obviously still useful. agreed > It's entirely possible that direct reclaim should never do any of the > more complicated callback cases anyway, but we'd still need the whole > "don't wait for the complex case" logic to avoid deadlocks. This is problematic because we can sit on a huge amount of reclaimable memory and the direct reclaim is the only context to trigger the oom killer so we have to either find some other way to do the same or invoke even the complex reclaimers. My long term plan was to convert direct NOFS users to the scope API (see memalloc_no{fs,io}_{save,restore}) which would mark "reclaim recursion critical sections" and all allocations within that scope would not trigger shrinkers that could deadlock. The current API is quite coarse but there are plans to make it more fine grained. Anyway, this is not directly related to this patch. Current kvmalloc users seem to be GFP_KERNEL compliant. Let's hope it stays that way.
diff --git a/lib/bucket_locks.c b/lib/bucket_locks.c index 266a97c5708b..ade3ce6c4af6 100644 --- a/lib/bucket_locks.c +++ b/lib/bucket_locks.c @@ -30,10 +30,7 @@ int alloc_bucket_spinlocks(spinlock_t **locks, unsigned int *locks_mask, } if (sizeof(spinlock_t) != 0) { - if (gfpflags_allow_blocking(gfp)) - tlocks = kvmalloc(size * sizeof(spinlock_t), gfp); - else - tlocks = kmalloc_array(size, sizeof(spinlock_t), gfp); + tlocks = kvmalloc_array(size, sizeof(spinlock_t), gfp); if (!tlocks) return -ENOMEM; for (i = 0; i < size; i++) diff --git a/mm/util.c b/mm/util.c index 45fc3169e7b0..c6586c146995 100644 --- a/mm/util.c +++ b/mm/util.c @@ -391,7 +391,8 @@ EXPORT_SYMBOL(vm_mmap); * __GFP_RETRY_MAYFAIL is supported, and it should be used only if kmalloc is * preferable to the vmalloc fallback, due to visible performance drawbacks. * - * Any use of gfp flags outside of GFP_KERNEL should be consulted with mm people. + * Please note that any use of gfp flags outside of GFP_KERNEL is careful to not + * fall back to vmalloc. */ void *kvmalloc_node(size_t size, gfp_t flags, int node) { @@ -402,7 +403,8 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) * vmalloc uses GFP_KERNEL for some internal allocations (e.g page tables) * so the given set of flags has to be compatible. */ - WARN_ON_ONCE((flags & GFP_KERNEL) != GFP_KERNEL); + if ((flags & GFP_KERNEL) != GFP_KERNEL) + return kmalloc_node(size, flags, node); /* * We want to attempt a large physically contiguous block first because