diff mbox series

mm: check for sleepable context in kvfree

Message ID 20190723131212.445-1-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show
Series mm: check for sleepable context in kvfree | expand

Commit Message

Jeff Layton July 23, 2019, 1:12 p.m. UTC
A lot of callers of kvfree only go down the vfree path under very rare
circumstances, and so may never end up hitting the might_sleep_if in it.
Ensure that when kvfree is called, that it is operating in a context
where it is allowed to sleep.

Cc: Alexander Viro <viro@zeniv.linux.org.uk>
Cc: Luis Henriques <lhenriques@suse.com>
Signed-off-by: Jeff Layton <jlayton@kernel.org>
---
 mm/util.c | 2 ++
 1 file changed, 2 insertions(+)

Comments

Jeff Layton July 23, 2019, 5:52 p.m. UTC | #1
On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote:
> A lot of callers of kvfree only go down the vfree path under very rare
> circumstances, and so may never end up hitting the might_sleep_if in it.
> Ensure that when kvfree is called, that it is operating in a context
> where it is allowed to sleep.
> 
> Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> Cc: Luis Henriques <lhenriques@suse.com>
> Signed-off-by: Jeff Layton <jlayton@kernel.org>
> ---
>  mm/util.c | 2 ++
>  1 file changed, 2 insertions(+)
> 

FWIW, I started looking at this after Luis sent me some ceph patches
that fixed a few of these problems. I have not done extensive testing
with this patch, so maybe consider this an RFC for now.

HCH points out that xfs uses kvfree as a generic "free this no matter
what it is" sort of wrapper and expects the callers to work out whether
they might be freeing a vmalloc'ed address. If that sort of usage turns
out to be prevalent, then we may need another approach to clean this up.

> diff --git a/mm/util.c b/mm/util.c
> index e6351a80f248..81ec2a003c86 100644
> --- a/mm/util.c
> +++ b/mm/util.c
> @@ -482,6 +482,8 @@ EXPORT_SYMBOL(kvmalloc_node);
>   */
>  void kvfree(const void *addr)
>  {
> +	might_sleep_if(!in_interrupt());
> +
>  	if (is_vmalloc_addr(addr))
>  		vfree(addr);
>  	else
Matthew Wilcox (Oracle) July 23, 2019, 5:55 p.m. UTC | #2
On Tue, Jul 23, 2019 at 01:52:36PM -0400, Jeff Layton wrote:
> On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote:
> > A lot of callers of kvfree only go down the vfree path under very rare
> > circumstances, and so may never end up hitting the might_sleep_if in it.
> > Ensure that when kvfree is called, that it is operating in a context
> > where it is allowed to sleep.
> > 
> > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > Cc: Luis Henriques <lhenriques@suse.com>
> > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > ---
> >  mm/util.c | 2 ++
> >  1 file changed, 2 insertions(+)
> > 
> 
> FWIW, I started looking at this after Luis sent me some ceph patches
> that fixed a few of these problems. I have not done extensive testing
> with this patch, so maybe consider this an RFC for now.
> 
> HCH points out that xfs uses kvfree as a generic "free this no matter
> what it is" sort of wrapper and expects the callers to work out whether
> they might be freeing a vmalloc'ed address. If that sort of usage turns
> out to be prevalent, then we may need another approach to clean this up.

I think it's a bit of a landmine, to be honest.  How about we have kvfree()
call vfree_atomic() instead?
Jeff Layton July 23, 2019, 6:05 p.m. UTC | #3
On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote:
> On Tue, Jul 23, 2019 at 01:52:36PM -0400, Jeff Layton wrote:
> > On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote:
> > > A lot of callers of kvfree only go down the vfree path under very rare
> > > circumstances, and so may never end up hitting the might_sleep_if in it.
> > > Ensure that when kvfree is called, that it is operating in a context
> > > where it is allowed to sleep.
> > > 
> > > Cc: Alexander Viro <viro@zeniv.linux.org.uk>
> > > Cc: Luis Henriques <lhenriques@suse.com>
> > > Signed-off-by: Jeff Layton <jlayton@kernel.org>
> > > ---
> > >  mm/util.c | 2 ++
> > >  1 file changed, 2 insertions(+)
> > > 
> > 
> > FWIW, I started looking at this after Luis sent me some ceph patches
> > that fixed a few of these problems. I have not done extensive testing
> > with this patch, so maybe consider this an RFC for now.
> > 
> > HCH points out that xfs uses kvfree as a generic "free this no matter
> > what it is" sort of wrapper and expects the callers to work out whether
> > they might be freeing a vmalloc'ed address. If that sort of usage turns
> > out to be prevalent, then we may need another approach to clean this up.
> 
> I think it's a bit of a landmine, to be honest.  How about we have kvfree()
> call vfree_atomic() instead?

Not a bad idea, though it means more overhead for the vfree case.

Since we're spitballing here...could we have kvfree figure out whether
it's running in a context where it would need to queue it instead and
only do it in that case?

We currently have to figure that out for the might_sleep_if anyway. We
could just have it DTRT instead of printk'ing and dumping the stack in
that case.
Matthew Wilcox (Oracle) July 23, 2019, 6:11 p.m. UTC | #4
On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote:
> On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote:
> > > HCH points out that xfs uses kvfree as a generic "free this no matter
> > > what it is" sort of wrapper and expects the callers to work out whether
> > > they might be freeing a vmalloc'ed address. If that sort of usage turns
> > > out to be prevalent, then we may need another approach to clean this up.
> > 
> > I think it's a bit of a landmine, to be honest.  How about we have kvfree()
> > call vfree_atomic() instead?
> 
> Not a bad idea, though it means more overhead for the vfree case.
> 
> Since we're spitballing here...could we have kvfree figure out whether
> it's running in a context where it would need to queue it instead and
> only do it in that case?
> 
> We currently have to figure that out for the might_sleep_if anyway. We
> could just have it DTRT instead of printk'ing and dumping the stack in
> that case.

I don't think we have a generic way to determine if we're currently
holding a spinlock.  ie this can fail:

spin_lock(&my_lock);
kvfree(p);
spin_unlock(&my_lock);

If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT
doesn't record the number of spinlocks currently taken.
Jeff Layton July 23, 2019, 6:19 p.m. UTC | #5
On Tue, 2019-07-23 at 11:11 -0700, Matthew Wilcox wrote:
> On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote:
> > On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote:
> > > > HCH points out that xfs uses kvfree as a generic "free this no matter
> > > > what it is" sort of wrapper and expects the callers to work out whether
> > > > they might be freeing a vmalloc'ed address. If that sort of usage turns
> > > > out to be prevalent, then we may need another approach to clean this up.
> > > 
> > > I think it's a bit of a landmine, to be honest.  How about we have kvfree()
> > > call vfree_atomic() instead?
> > 
> > Not a bad idea, though it means more overhead for the vfree case.
> > 
> > Since we're spitballing here...could we have kvfree figure out whether
> > it's running in a context where it would need to queue it instead and
> > only do it in that case?
> > 
> > We currently have to figure that out for the might_sleep_if anyway. We
> > could just have it DTRT instead of printk'ing and dumping the stack in
> > that case.
> 
> I don't think we have a generic way to determine if we're currently
> holding a spinlock.  ie this can fail:
> 
> spin_lock(&my_lock);
> kvfree(p);
> spin_unlock(&my_lock);
> 
> If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT
> doesn't record the number of spinlocks currently taken.


Ahh right...that makes sense.

Al also suggested on IRC that we could add a kvfree_atomic if that were
useful. That might be good for new callers, but we'd probably need a
patch like this one to suss out which of the existing kvfree callers
would need to switch to using it.

I think you're quite right that this is a landmine. That said, this
seems like something we ought to try to clean up.
Matthew Wilcox (Oracle) July 23, 2019, 6:29 p.m. UTC | #6
On Tue, Jul 23, 2019 at 02:19:03PM -0400, Jeff Layton wrote:
> On Tue, 2019-07-23 at 11:11 -0700, Matthew Wilcox wrote:
> > On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote:
> > > On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote:
> > > > I think it's a bit of a landmine, to be honest.  How about we have kvfree()
> > > > call vfree_atomic() instead?
> > > 
> > > Not a bad idea, though it means more overhead for the vfree case.
> > > 
> > > Since we're spitballing here...could we have kvfree figure out whether
> > > it's running in a context where it would need to queue it instead and
> > > only do it in that case?
> > > 
> > > We currently have to figure that out for the might_sleep_if anyway. We
> > > could just have it DTRT instead of printk'ing and dumping the stack in
> > > that case.
> > 
> > I don't think we have a generic way to determine if we're currently
> > holding a spinlock.  ie this can fail:
> > 
> > spin_lock(&my_lock);
> > kvfree(p);
> > spin_unlock(&my_lock);
> > 
> > If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT
> > doesn't record the number of spinlocks currently taken.
> 
> Ahh right...that makes sense.
> 
> Al also suggested on IRC that we could add a kvfree_atomic if that were
> useful. That might be good for new callers, but we'd probably need a
> patch like this one to suss out which of the existing kvfree callers
> would need to switch to using it.
> 
> I think you're quite right that this is a landmine. That said, this
> seems like something we ought to try to clean up.

I'd rather add a kvfree_fast().  So something like this:

diff --git a/mm/util.c b/mm/util.c
index bab284d69c8c..992f0332dced 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -470,6 +470,28 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node)
 }
 EXPORT_SYMBOL(kvmalloc_node);
 
+/**
+ * kvfree_fast() - Free memory.
+ * @addr: Pointer to allocated memory.
+ *
+ * kvfree_fast frees memory allocated by any of vmalloc(), kmalloc() or
+ * kvmalloc().  It is slightly more efficient to use kfree() or vfree() if
+ * you are certain that you know which one to use.
+ *
+ * Context: Either preemptible task context or not-NMI interrupt.  Must not
+ * hold a spinlock as it can sleep.
+ */
+void kvfree_fast(const void *addr)
+{
+	might_sleep();
+
+	if (is_vmalloc_addr(addr))
+		vfree(addr);
+	else
+		kfree(addr);
+}
+EXPORT_SYMBOL(kvfree_fast);
+
 /**
  * kvfree() - Free memory.
  * @addr: Pointer to allocated memory.
@@ -478,12 +500,12 @@ EXPORT_SYMBOL(kvmalloc_node);
  * It is slightly more efficient to use kfree() or vfree() if you are certain
  * that you know which one to use.
  *
- * Context: Either preemptible task context or not-NMI interrupt.
+ * Context: Any context except NMI.
  */
 void kvfree(const void *addr)
 {
 	if (is_vmalloc_addr(addr))
-		vfree(addr);
+		vfree_atomic(addr);
 	else
 		kfree(addr);
 }
diff mbox series

Patch

diff --git a/mm/util.c b/mm/util.c
index e6351a80f248..81ec2a003c86 100644
--- a/mm/util.c
+++ b/mm/util.c
@@ -482,6 +482,8 @@  EXPORT_SYMBOL(kvmalloc_node);
  */
 void kvfree(const void *addr)
 {
+	might_sleep_if(!in_interrupt());
+
 	if (is_vmalloc_addr(addr))
 		vfree(addr);
 	else