diff mbox series

mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited()

Message ID 20211014082433.30733-1-qiang.zhang1211@gmail.com (mailing list archive)
State New
Headers show
Series mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited() | expand

Commit Message

Zqiang Oct. 14, 2021, 8:24 a.m. UTC
<IRQ>
 __init_work+0x2d/0x50
 synchronize_rcu_expedited+0x3af/0x650
 bdi_remove_from_list [inline]
 bdi_unregister+0x17f/0x5c0
 release_bdi+0xa1/0xc0
 kref_put [inline]
 bdi_put+0x72/0xa0
 bdev_free_inode+0x11e/0x220
 i_callback+0x3f/0x70
 rcu_do_batch [inline]
 rcu_core+0x76d/0x16c0
 __do_softirq+0x1d7/0x93b
 invoke_softirq [inline]
 __irq_exit_rcu [inline]
 irq_exit_rcu+0xf2/0x130
 sysvec_apic_timer_interrupt+0x93/0xc0

The bdi_remove_from_list() is called in RCU softirq, however the
synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
instead of it.

Reported-by: Hao Sun <sunhao.th@gmail.com>
Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
---
 include/linux/backing-dev-defs.h | 1 +
 mm/backing-dev.c                 | 4 +---
 2 files changed, 2 insertions(+), 3 deletions(-)

Comments

Matthew Wilcox Oct. 14, 2021, 11:24 a.m. UTC | #1
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.
> 
> Reported-by: Hao Sun <sunhao.th@gmail.com>
> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> ---
>  include/linux/backing-dev-defs.h | 1 +
>  mm/backing-dev.c                 | 4 +---
>  2 files changed, 2 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
> index 33207004cfde..35a093384518 100644
> --- a/include/linux/backing-dev-defs.h
> +++ b/include/linux/backing-dev-defs.h
> @@ -202,6 +202,7 @@ struct backing_dev_info {
>  #ifdef CONFIG_DEBUG_FS
>  	struct dentry *debug_dir;
>  #endif
> +	struct rcu_head rcu;
>  };

Instead of growing struct backing_dev_info, it seems to me this rcu_head
could be placed in a union with rb_node, since it will have been removed
from the bdi_tree by this point and the tree is never walked under
RCU protection?
Christoph Hellwig Oct. 14, 2021, 12:06 p.m. UTC | #2
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> <IRQ>
>  __init_work+0x2d/0x50
>  synchronize_rcu_expedited+0x3af/0x650
>  bdi_remove_from_list [inline]
>  bdi_unregister+0x17f/0x5c0
>  release_bdi+0xa1/0xc0
>  kref_put [inline]
>  bdi_put+0x72/0xa0
>  bdev_free_inode+0x11e/0x220
>  i_callback+0x3f/0x70
>  rcu_do_batch [inline]
>  rcu_core+0x76d/0x16c0
>  __do_softirq+0x1d7/0x93b
>  invoke_softirq [inline]
>  __irq_exit_rcu [inline]
>  irq_exit_rcu+0xf2/0x130
>  sysvec_apic_timer_interrupt+0x93/0xc0
> 
> The bdi_remove_from_list() is called in RCU softirq, however the
> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> instead of it.

What workload is this running?  If we hit the RCU unregister path
from inode freeing we have a lifetime problem somewhere that we need
to fix first.
Zqiang Oct. 15, 2021, 2:57 a.m. UTC | #3
Matthew Wilcox <willy@infradead.org> 于2021年10月14日周四 下午7:26写道:

> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> > The bdi_remove_from_list() is called in RCU softirq, however the
> > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
> > instead of it.
> >
> > Reported-by: Hao Sun <sunhao.th@gmail.com>
> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
> > ---
> >  include/linux/backing-dev-defs.h | 1 +
> >  mm/backing-dev.c                 | 4 +---
> >  2 files changed, 2 insertions(+), 3 deletions(-)
> >
> > diff --git a/include/linux/backing-dev-defs.h
> b/include/linux/backing-dev-defs.h
> > index 33207004cfde..35a093384518 100644
> > --- a/include/linux/backing-dev-defs.h
> > +++ b/include/linux/backing-dev-defs.h
> > @@ -202,6 +202,7 @@ struct backing_dev_info {
> >  #ifdef CONFIG_DEBUG_FS
> >       struct dentry *debug_dir;
> >  #endif
> > +     struct rcu_head rcu;
> >  };
>
> >Instead of growing struct backing_dev_info, it seems to me this rcu_head
> >could be placed in a union with rb_node, since it will have been removed
> >from the bdi_tree by this point and the tree is never walked under
> >RCU protection?
>
>
Thanks for your advice, I find this bdi_tree is traversed under the
protection of a spin lock, not under the protection of RCU.
I find this modification does not avoid the problem described in patch,
the flush_delayed_work() may be called in release_bdi()
The same will cause problems.

may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
i_callback) or do you have any better suggestions?

Thanks
Zqiang
Zqiang Oct. 15, 2021, 3:34 a.m. UTC | #4
Qiang Zhang <qiang.zhang1211@gmail.com> 于2021年10月15日周五 上午10:57写道:

>
>
> Matthew Wilcox <willy@infradead.org> 于2021年10月14日周四 下午7:26写道:
>
>> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>> > The bdi_remove_from_list() is called in RCU softirq, however the
>> > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
>> > instead of it.
>> >
>> > Reported-by: Hao Sun <sunhao.th@gmail.com>
>> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>> > ---
>> >  include/linux/backing-dev-defs.h | 1 +
>> >  mm/backing-dev.c                 | 4 +---
>> >  2 files changed, 2 insertions(+), 3 deletions(-)
>> >
>> > diff --git a/include/linux/backing-dev-defs.h
>> b/include/linux/backing-dev-defs.h
>> > index 33207004cfde..35a093384518 100644
>> > --- a/include/linux/backing-dev-defs.h
>> > +++ b/include/linux/backing-dev-defs.h
>> > @@ -202,6 +202,7 @@ struct backing_dev_info {
>> >  #ifdef CONFIG_DEBUG_FS
>> >       struct dentry *debug_dir;
>> >  #endif
>> > +     struct rcu_head rcu;
>> >  };
>>
>> >Instead of growing struct backing_dev_info, it seems to me this rcu_head
>> >could be placed in a union with rb_node, since it will have been removed
>> >from the bdi_tree by this point and the tree is never walked under
>> >RCU protection?
>>
>>
> Thanks for your advice, I find this bdi_tree is traversed under the
> protection of a spin lock, not under the protection of RCU.
> I find this modification does not avoid the problem described in patch,
> the flush_delayed_work() may be called in release_bdi()
> The same will cause problems.
>
> may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
> i_callback) or do you have any better suggestions?
>
> Thanks
> Zqiang
>
>
zhangqiang Oct. 15, 2021, 3:39 a.m. UTC | #5
On 2021/10/14 下午7:24, Matthew Wilcox wrote:
> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>> The bdi_remove_from_list() is called in RCU softirq, however the
>> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu()
>> instead of it.
>>
>> Reported-by: Hao Sun <sunhao.th@gmail.com>
>> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com>
>> ---
>>   include/linux/backing-dev-defs.h | 1 +
>>   mm/backing-dev.c                 | 4 +---
>>   2 files changed, 2 insertions(+), 3 deletions(-)
>>
>> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
>> index 33207004cfde..35a093384518 100644
>> --- a/include/linux/backing-dev-defs.h
>> +++ b/include/linux/backing-dev-defs.h
>> @@ -202,6 +202,7 @@ struct backing_dev_info {
>>   #ifdef CONFIG_DEBUG_FS
>>   	struct dentry *debug_dir;
>>   #endif
>> +	struct rcu_head rcu;
>>   };
> Instead of growing struct backing_dev_info, it seems to me this rcu_head
> could be placed in a union with rb_node, since it will have been removed
> from the bdi_tree by this point and the tree is never walked under
> RCU protection?
>
Thanks for your advice, I find this bdi_tree is traversed under the 
protection of a spin lock, not under the protection of RCU.
I find this modification does not avoid the problem described in patch, 
the flush_delayed_work() may be called in release_bdi()
The same will cause problems.

may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, 
i_callback) or do you have any better suggestions?

Thanks
Zqiang
Zqiang Oct. 15, 2021, 5:06 a.m. UTC | #6
On 2021/10/15 上午10:57, Qiang Zhang wrote:
>
>
> Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>> 
> 于2021年10月14日周四 下午7:26写道:
>
>     On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>     > The bdi_remove_from_list() is called in RCU softirq, however the
>     > synchronize_rcu_expedited() will produce sleep action, use
>     kfree_rcu()
>     > instead of it.
>     >
>     > Reported-by: Hao Sun <sunhao.th@gmail.com
>     <mailto:sunhao.th@gmail.com>>
>     > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com
>     <mailto:qiang.zhang1211@gmail.com>>
>     > ---
>     >  include/linux/backing-dev-defs.h | 1 +
>     >  mm/backing-dev.c                 | 4 +---
>     >  2 files changed, 2 insertions(+), 3 deletions(-)
>     >
>     > diff --git a/include/linux/backing-dev-defs.h
>     b/include/linux/backing-dev-defs.h
>     > index 33207004cfde..35a093384518 100644
>     > --- a/include/linux/backing-dev-defs.h
>     > +++ b/include/linux/backing-dev-defs.h
>     > @@ -202,6 +202,7 @@ struct backing_dev_info {
>     >  #ifdef CONFIG_DEBUG_FS
>     >       struct dentry *debug_dir;
>     >  #endif
>     > +     struct rcu_head rcu;
>     >  };
>
>     >Instead of growing struct backing_dev_info, it seems to me this
>     rcu_head
>     >could be placed in a union with rb_node, since it will have been
>     removed
>     >from the bdi_tree by this point and the tree is never walked under
>     >RCU protection?
>
>
> Thanks for your advice, I find this bdi_tree is traversed under the 
> protection of a spin lock, not under the protection of RCU.
> I find this modification does not avoid the problem described in 
> patch, the flush_delayed_work() may be called in release_bdi()
> The same will cause problems.
> may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, 
> i_callback) or do you have any better suggestions?
>
> Thanks
> Zqiang


diff --git a/fs/inode.c b/fs/inode.c
index a49695f57e1e..300beb19aed6 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -219,9 +219,9 @@ void free_inode_nonrcu(struct inode *inode)
  }
  EXPORT_SYMBOL(free_inode_nonrcu);

-static void i_callback(struct rcu_head *head)
+static void i_callback(struct work_struct *work)
  {
-       struct inode *inode = container_of(head, struct inode, i_rcu);
+       struct inode *inode = container_of(to_rcu_work(work), struct 
inode, rwork);
         if (inode->free_inode)
                 inode->free_inode(inode);
         else
@@ -248,7 +248,7 @@ static struct inode *alloc_inode(struct super_block *sb)
                                 return NULL;
                 }
                 inode->free_inode = ops->free_inode;
-               i_callback(&inode->i_rcu);
+               i_callback(&inode->rwork.work);
                 return NULL;
         }

@@ -289,7 +289,8 @@ static void destroy_inode(struct inode *inode)
                         return;
         }
         inode->free_inode = ops->free_inode;
-       call_rcu(&inode->i_rcu, i_callback);
+       INIT_RCU_WORK(&inode->rwork, i_callback);
+       queue_rcu_work(system_wq, &inode->rwork);
  }

  /**
diff --git a/include/linux/fs.h b/include/linux/fs.h
index 8903a95611a2..006d769791a8 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -686,7 +686,7 @@ struct inode {
         struct list_head        i_wb_list;      /* backing dev 
writeback list */
         union {
                 struct hlist_head       i_dentry;
-               struct rcu_head         i_rcu;
+               struct rcu_work         rwork;
         };
         atomic64_t              i_version;
         atomic64_t              i_sequence; /* see futex */
Matthew Wilcox Oct. 15, 2021, 12:35 p.m. UTC | #7
On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
> 
> On 2021/10/15 上午10:57, Qiang Zhang wrote:
> > 
> > 
> > Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>>
> > 于2021年10月14日周四 下午7:26写道:
> > 
> >     On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
> >     > The bdi_remove_from_list() is called in RCU softirq, however the
> >     > synchronize_rcu_expedited() will produce sleep action, use
> >     kfree_rcu()
> >     > instead of it.
> >     >
> >     > Reported-by: Hao Sun <sunhao.th@gmail.com
> >     <mailto:sunhao.th@gmail.com>>
> >     > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com
> >     <mailto:qiang.zhang1211@gmail.com>>
> >     > ---
> >     >  include/linux/backing-dev-defs.h | 1 +
> >     >  mm/backing-dev.c                 | 4 +---
> >     >  2 files changed, 2 insertions(+), 3 deletions(-)
> >     >
> >     > diff --git a/include/linux/backing-dev-defs.h
> >     b/include/linux/backing-dev-defs.h
> >     > index 33207004cfde..35a093384518 100644
> >     > --- a/include/linux/backing-dev-defs.h
> >     > +++ b/include/linux/backing-dev-defs.h
> >     > @@ -202,6 +202,7 @@ struct backing_dev_info {
> >     >  #ifdef CONFIG_DEBUG_FS
> >     >       struct dentry *debug_dir;
> >     >  #endif
> >     > +     struct rcu_head rcu;
> >     >  };
> > 
> >     >Instead of growing struct backing_dev_info, it seems to me this
> >     rcu_head
> >     >could be placed in a union with rb_node, since it will have been
> >     removed
> >     >from the bdi_tree by this point and the tree is never walked under
> >     >RCU protection?
> > 
> > 
> > Thanks for your advice, I find this bdi_tree is traversed under the
> > protection of a spin lock, not under the protection of RCU.
> > I find this modification does not avoid the problem described in patch,
> > the flush_delayed_work() may be called in release_bdi()
> > The same will cause problems.
> > may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
> > i_callback) or do you have any better suggestions?

What?  All I was suggesting was:

+++ b/include/linux/backing-dev-defs.h
@@ -168,7 +168,10 @@ struct bdi_writeback {
 
 struct backing_dev_info {
        u64 id;
-       struct rb_node rb_node; /* keyed by ->id */
+       union {
+               struct rb_node rb_node; /* keyed by ->id */
+               struct rcu_head rcu;
+       };
        struct list_head bdi_list;
        unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
        unsigned long io_pages; /* max allowed IO size */


Christoph, independent of the inode lifetime problem, this actually seems
like a good approach to take.  I don't see why we should synchronize_rcu()
here?  Adding Jens (original introducer of the synchronize_rcu()), Mikulas
(converted it to use _expedited) and Tejun (worked around a problem when
using _expedited).
Christoph Hellwig Oct. 15, 2021, 1:19 p.m. UTC | #8
On Fri, Oct 15, 2021 at 01:35:56PM +0100, Matthew Wilcox wrote:
>  struct backing_dev_info {
>         u64 id;
> -       struct rb_node rb_node; /* keyed by ->id */
> +       union {
> +               struct rb_node rb_node; /* keyed by ->id */
> +               struct rcu_head rcu;
> +       };
>         struct list_head bdi_list;
>         unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
>         unsigned long io_pages; /* max allowed IO size */
> 
> 
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take.  I don't see why we should synchronize_rcu()
> here?  Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).

The kfree+rcu + your suggestion does seem like a good idea in general to
me.  But I'd still like to fix the actual bug being reported before
optimizing the area in a way that papers over the bug.
Zqiang Oct. 18, 2021, 2:15 a.m. UTC | #9
On 2021/10/15 下午8:35, Matthew Wilcox wrote:
> On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote:
>> On 2021/10/15 上午10:57, Qiang Zhang wrote:
>>>
>>> Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>>
>>> 于2021年10月14日周四 下午7:26写道:
>>>
>>>      On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote:
>>>      > The bdi_remove_from_list() is called in RCU softirq, however the
>>>      > synchronize_rcu_expedited() will produce sleep action, use
>>>      kfree_rcu()
>>>      > instead of it.
>>>      >
>>>      > Reported-by: Hao Sun <sunhao.th@gmail.com
>>>      <mailto:sunhao.th@gmail.com>>
>>>      > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com
>>>      <mailto:qiang.zhang1211@gmail.com>>
>>>      > ---
>>>      >  include/linux/backing-dev-defs.h | 1 +
>>>      >  mm/backing-dev.c                 | 4 +---
>>>      >  2 files changed, 2 insertions(+), 3 deletions(-)
>>>      >
>>>      > diff --git a/include/linux/backing-dev-defs.h
>>>      b/include/linux/backing-dev-defs.h
>>>      > index 33207004cfde..35a093384518 100644
>>>      > --- a/include/linux/backing-dev-defs.h
>>>      > +++ b/include/linux/backing-dev-defs.h
>>>      > @@ -202,6 +202,7 @@ struct backing_dev_info {
>>>      >  #ifdef CONFIG_DEBUG_FS
>>>      >       struct dentry *debug_dir;
>>>      >  #endif
>>>      > +     struct rcu_head rcu;
>>>      >  };
>>>
>>>      >Instead of growing struct backing_dev_info, it seems to me this
>>>      rcu_head
>>>      >could be placed in a union with rb_node, since it will have been
>>>      removed
>>>      >from the bdi_tree by this point and the tree is never walked under
>>>      >RCU protection?
>>>
>>>
>>> Thanks for your advice, I find this bdi_tree is traversed under the
>>> protection of a spin lock, not under the protection of RCU.
>>> I find this modification does not avoid the problem described in patch,
>>> the flush_delayed_work() may be called in release_bdi()
>>> The same will cause problems.
>>> may be  we can replace queue_rcu_work() of call_rcu(&inode->i_rcu,
>>> i_callback) or do you have any better suggestions?
> What?  All I was suggesting was:
>
> +++ b/include/linux/backing-dev-defs.h
> @@ -168,7 +168,10 @@ struct bdi_writeback {
>   
>   struct backing_dev_info {
>          u64 id;
> -       struct rb_node rb_node; /* keyed by ->id */
> +       union {
> +               struct rb_node rb_node; /* keyed by ->id */
> +               struct rcu_head rcu;
> +       };
>          struct list_head bdi_list;
>          unsigned long ra_pages; /* max readahead in PAGE_SIZE units */
>          unsigned long io_pages; /* max allowed IO size */
>
>
> Christoph, independent of the inode lifetime problem, this actually seems
> like a good approach to take.  I don't see why we should synchronize_rcu()
> here?  Adding Jens (original introducer of the synchronize_rcu()), Mikulas
> (converted it to use _expedited) and Tejun (worked around a problem when
> using _expedited).

Sorry,this my mistake.   this problem and the inode lifetime cycle are 
two different problems

Can this modification which use kfree_rcu() instead of synchronize_rcu() 
be accepted?


Thanks

Zqiang
diff mbox series

Patch

diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h
index 33207004cfde..35a093384518 100644
--- a/include/linux/backing-dev-defs.h
+++ b/include/linux/backing-dev-defs.h
@@ -202,6 +202,7 @@  struct backing_dev_info {
 #ifdef CONFIG_DEBUG_FS
 	struct dentry *debug_dir;
 #endif
+	struct rcu_head rcu;
 };
 
 enum {
diff --git a/mm/backing-dev.c b/mm/backing-dev.c
index c878d995af06..45d866a3a4a2 100644
--- a/mm/backing-dev.c
+++ b/mm/backing-dev.c
@@ -935,8 +935,6 @@  static void bdi_remove_from_list(struct backing_dev_info *bdi)
 	rb_erase(&bdi->rb_node, &bdi_tree);
 	list_del_rcu(&bdi->bdi_list);
 	spin_unlock_bh(&bdi_lock);
-
-	synchronize_rcu_expedited();
 }
 
 void bdi_unregister(struct backing_dev_info *bdi)
@@ -969,7 +967,7 @@  static void release_bdi(struct kref *ref)
 		bdi_unregister(bdi);
 	WARN_ON_ONCE(bdi->dev);
 	wb_exit(&bdi->wb);
-	kfree(bdi);
+	kfree_rcu(bdi, rcu);
 }
 
 void bdi_put(struct backing_dev_info *bdi)