Message ID | 20190723131212.445-1-jlayton@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | mm: check for sleepable context in kvfree | expand |
On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote: > A lot of callers of kvfree only go down the vfree path under very rare > circumstances, and so may never end up hitting the might_sleep_if in it. > Ensure that when kvfree is called, that it is operating in a context > where it is allowed to sleep. > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > Cc: Luis Henriques <lhenriques@suse.com> > Signed-off-by: Jeff Layton <jlayton@kernel.org> > --- > mm/util.c | 2 ++ > 1 file changed, 2 insertions(+) > FWIW, I started looking at this after Luis sent me some ceph patches that fixed a few of these problems. I have not done extensive testing with this patch, so maybe consider this an RFC for now. HCH points out that xfs uses kvfree as a generic "free this no matter what it is" sort of wrapper and expects the callers to work out whether they might be freeing a vmalloc'ed address. If that sort of usage turns out to be prevalent, then we may need another approach to clean this up. > diff --git a/mm/util.c b/mm/util.c > index e6351a80f248..81ec2a003c86 100644 > --- a/mm/util.c > +++ b/mm/util.c > @@ -482,6 +482,8 @@ EXPORT_SYMBOL(kvmalloc_node); > */ > void kvfree(const void *addr) > { > + might_sleep_if(!in_interrupt()); > + > if (is_vmalloc_addr(addr)) > vfree(addr); > else
On Tue, Jul 23, 2019 at 01:52:36PM -0400, Jeff Layton wrote: > On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote: > > A lot of callers of kvfree only go down the vfree path under very rare > > circumstances, and so may never end up hitting the might_sleep_if in it. > > Ensure that when kvfree is called, that it is operating in a context > > where it is allowed to sleep. > > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > > Cc: Luis Henriques <lhenriques@suse.com> > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > --- > > mm/util.c | 2 ++ > > 1 file changed, 2 insertions(+) > > > > FWIW, I started looking at this after Luis sent me some ceph patches > that fixed a few of these problems. I have not done extensive testing > with this patch, so maybe consider this an RFC for now. > > HCH points out that xfs uses kvfree as a generic "free this no matter > what it is" sort of wrapper and expects the callers to work out whether > they might be freeing a vmalloc'ed address. If that sort of usage turns > out to be prevalent, then we may need another approach to clean this up. I think it's a bit of a landmine, to be honest. How about we have kvfree() call vfree_atomic() instead?
On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote: > On Tue, Jul 23, 2019 at 01:52:36PM -0400, Jeff Layton wrote: > > On Tue, 2019-07-23 at 09:12 -0400, Jeff Layton wrote: > > > A lot of callers of kvfree only go down the vfree path under very rare > > > circumstances, and so may never end up hitting the might_sleep_if in it. > > > Ensure that when kvfree is called, that it is operating in a context > > > where it is allowed to sleep. > > > > > > Cc: Alexander Viro <viro@zeniv.linux.org.uk> > > > Cc: Luis Henriques <lhenriques@suse.com> > > > Signed-off-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > mm/util.c | 2 ++ > > > 1 file changed, 2 insertions(+) > > > > > > > FWIW, I started looking at this after Luis sent me some ceph patches > > that fixed a few of these problems. I have not done extensive testing > > with this patch, so maybe consider this an RFC for now. > > > > HCH points out that xfs uses kvfree as a generic "free this no matter > > what it is" sort of wrapper and expects the callers to work out whether > > they might be freeing a vmalloc'ed address. If that sort of usage turns > > out to be prevalent, then we may need another approach to clean this up. > > I think it's a bit of a landmine, to be honest. How about we have kvfree() > call vfree_atomic() instead? Not a bad idea, though it means more overhead for the vfree case. Since we're spitballing here...could we have kvfree figure out whether it's running in a context where it would need to queue it instead and only do it in that case? We currently have to figure that out for the might_sleep_if anyway. We could just have it DTRT instead of printk'ing and dumping the stack in that case.
On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote: > On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote: > > > HCH points out that xfs uses kvfree as a generic "free this no matter > > > what it is" sort of wrapper and expects the callers to work out whether > > > they might be freeing a vmalloc'ed address. If that sort of usage turns > > > out to be prevalent, then we may need another approach to clean this up. > > > > I think it's a bit of a landmine, to be honest. How about we have kvfree() > > call vfree_atomic() instead? > > Not a bad idea, though it means more overhead for the vfree case. > > Since we're spitballing here...could we have kvfree figure out whether > it's running in a context where it would need to queue it instead and > only do it in that case? > > We currently have to figure that out for the might_sleep_if anyway. We > could just have it DTRT instead of printk'ing and dumping the stack in > that case. I don't think we have a generic way to determine if we're currently holding a spinlock. ie this can fail: spin_lock(&my_lock); kvfree(p); spin_unlock(&my_lock); If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT doesn't record the number of spinlocks currently taken.
On Tue, 2019-07-23 at 11:11 -0700, Matthew Wilcox wrote: > On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote: > > On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote: > > > > HCH points out that xfs uses kvfree as a generic "free this no matter > > > > what it is" sort of wrapper and expects the callers to work out whether > > > > they might be freeing a vmalloc'ed address. If that sort of usage turns > > > > out to be prevalent, then we may need another approach to clean this up. > > > > > > I think it's a bit of a landmine, to be honest. How about we have kvfree() > > > call vfree_atomic() instead? > > > > Not a bad idea, though it means more overhead for the vfree case. > > > > Since we're spitballing here...could we have kvfree figure out whether > > it's running in a context where it would need to queue it instead and > > only do it in that case? > > > > We currently have to figure that out for the might_sleep_if anyway. We > > could just have it DTRT instead of printk'ing and dumping the stack in > > that case. > > I don't think we have a generic way to determine if we're currently > holding a spinlock. ie this can fail: > > spin_lock(&my_lock); > kvfree(p); > spin_unlock(&my_lock); > > If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT > doesn't record the number of spinlocks currently taken. Ahh right...that makes sense. Al also suggested on IRC that we could add a kvfree_atomic if that were useful. That might be good for new callers, but we'd probably need a patch like this one to suss out which of the existing kvfree callers would need to switch to using it. I think you're quite right that this is a landmine. That said, this seems like something we ought to try to clean up.
On Tue, Jul 23, 2019 at 02:19:03PM -0400, Jeff Layton wrote: > On Tue, 2019-07-23 at 11:11 -0700, Matthew Wilcox wrote: > > On Tue, Jul 23, 2019 at 02:05:11PM -0400, Jeff Layton wrote: > > > On Tue, 2019-07-23 at 10:55 -0700, Matthew Wilcox wrote: > > > > I think it's a bit of a landmine, to be honest. How about we have kvfree() > > > > call vfree_atomic() instead? > > > > > > Not a bad idea, though it means more overhead for the vfree case. > > > > > > Since we're spitballing here...could we have kvfree figure out whether > > > it's running in a context where it would need to queue it instead and > > > only do it in that case? > > > > > > We currently have to figure that out for the might_sleep_if anyway. We > > > could just have it DTRT instead of printk'ing and dumping the stack in > > > that case. > > > > I don't think we have a generic way to determine if we're currently > > holding a spinlock. ie this can fail: > > > > spin_lock(&my_lock); > > kvfree(p); > > spin_unlock(&my_lock); > > > > If we're preemptible, we can check the preempt count, but !CONFIG_PREEMPT > > doesn't record the number of spinlocks currently taken. > > Ahh right...that makes sense. > > Al also suggested on IRC that we could add a kvfree_atomic if that were > useful. That might be good for new callers, but we'd probably need a > patch like this one to suss out which of the existing kvfree callers > would need to switch to using it. > > I think you're quite right that this is a landmine. That said, this > seems like something we ought to try to clean up. I'd rather add a kvfree_fast(). So something like this: diff --git a/mm/util.c b/mm/util.c index bab284d69c8c..992f0332dced 100644 --- a/mm/util.c +++ b/mm/util.c @@ -470,6 +470,28 @@ void *kvmalloc_node(size_t size, gfp_t flags, int node) } EXPORT_SYMBOL(kvmalloc_node); +/** + * kvfree_fast() - Free memory. + * @addr: Pointer to allocated memory. + * + * kvfree_fast frees memory allocated by any of vmalloc(), kmalloc() or + * kvmalloc(). It is slightly more efficient to use kfree() or vfree() if + * you are certain that you know which one to use. + * + * Context: Either preemptible task context or not-NMI interrupt. Must not + * hold a spinlock as it can sleep. + */ +void kvfree_fast(const void *addr) +{ + might_sleep(); + + if (is_vmalloc_addr(addr)) + vfree(addr); + else + kfree(addr); +} +EXPORT_SYMBOL(kvfree_fast); + /** * kvfree() - Free memory. * @addr: Pointer to allocated memory. @@ -478,12 +500,12 @@ EXPORT_SYMBOL(kvmalloc_node); * It is slightly more efficient to use kfree() or vfree() if you are certain * that you know which one to use. * - * Context: Either preemptible task context or not-NMI interrupt. + * Context: Any context except NMI. */ void kvfree(const void *addr) { if (is_vmalloc_addr(addr)) - vfree(addr); + vfree_atomic(addr); else kfree(addr); }
diff --git a/mm/util.c b/mm/util.c index e6351a80f248..81ec2a003c86 100644 --- a/mm/util.c +++ b/mm/util.c @@ -482,6 +482,8 @@ EXPORT_SYMBOL(kvmalloc_node); */ void kvfree(const void *addr) { + might_sleep_if(!in_interrupt()); + if (is_vmalloc_addr(addr)) vfree(addr); else
A lot of callers of kvfree only go down the vfree path under very rare circumstances, and so may never end up hitting the might_sleep_if in it. Ensure that when kvfree is called, that it is operating in a context where it is allowed to sleep. Cc: Alexander Viro <viro@zeniv.linux.org.uk> Cc: Luis Henriques <lhenriques@suse.com> Signed-off-by: Jeff Layton <jlayton@kernel.org> --- mm/util.c | 2 ++ 1 file changed, 2 insertions(+)