Message ID | 20211014082433.30733-1-qiang.zhang1211@gmail.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | mm: backing-dev: use kfree_rcu() instead of synchronize_rcu_expedited() | expand |
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: > The bdi_remove_from_list() is called in RCU softirq, however the > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() > instead of it. > > Reported-by: Hao Sun <sunhao.th@gmail.com> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > --- > include/linux/backing-dev-defs.h | 1 + > mm/backing-dev.c | 4 +--- > 2 files changed, 2 insertions(+), 3 deletions(-) > > diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h > index 33207004cfde..35a093384518 100644 > --- a/include/linux/backing-dev-defs.h > +++ b/include/linux/backing-dev-defs.h > @@ -202,6 +202,7 @@ struct backing_dev_info { > #ifdef CONFIG_DEBUG_FS > struct dentry *debug_dir; > #endif > + struct rcu_head rcu; > }; Instead of growing struct backing_dev_info, it seems to me this rcu_head could be placed in a union with rb_node, since it will have been removed from the bdi_tree by this point and the tree is never walked under RCU protection?
On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: > <IRQ> > __init_work+0x2d/0x50 > synchronize_rcu_expedited+0x3af/0x650 > bdi_remove_from_list [inline] > bdi_unregister+0x17f/0x5c0 > release_bdi+0xa1/0xc0 > kref_put [inline] > bdi_put+0x72/0xa0 > bdev_free_inode+0x11e/0x220 > i_callback+0x3f/0x70 > rcu_do_batch [inline] > rcu_core+0x76d/0x16c0 > __do_softirq+0x1d7/0x93b > invoke_softirq [inline] > __irq_exit_rcu [inline] > irq_exit_rcu+0xf2/0x130 > sysvec_apic_timer_interrupt+0x93/0xc0 > > The bdi_remove_from_list() is called in RCU softirq, however the > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() > instead of it. What workload is this running? If we hit the RCU unregister path from inode freeing we have a lifetime problem somewhere that we need to fix first.
Matthew Wilcox <willy@infradead.org> 于2021年10月14日周四 下午7:26写道: > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: > > The bdi_remove_from_list() is called in RCU softirq, however the > > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() > > instead of it. > > > > Reported-by: Hao Sun <sunhao.th@gmail.com> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> > > --- > > include/linux/backing-dev-defs.h | 1 + > > mm/backing-dev.c | 4 +--- > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/backing-dev-defs.h > b/include/linux/backing-dev-defs.h > > index 33207004cfde..35a093384518 100644 > > --- a/include/linux/backing-dev-defs.h > > +++ b/include/linux/backing-dev-defs.h > > @@ -202,6 +202,7 @@ struct backing_dev_info { > > #ifdef CONFIG_DEBUG_FS > > struct dentry *debug_dir; > > #endif > > + struct rcu_head rcu; > > }; > > >Instead of growing struct backing_dev_info, it seems to me this rcu_head > >could be placed in a union with rb_node, since it will have been removed > >from the bdi_tree by this point and the tree is never walked under > >RCU protection? > > Thanks for your advice, I find this bdi_tree is traversed under the protection of a spin lock, not under the protection of RCU. I find this modification does not avoid the problem described in patch, the flush_delayed_work() may be called in release_bdi() The same will cause problems. may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, i_callback) or do you have any better suggestions? Thanks Zqiang
Qiang Zhang <qiang.zhang1211@gmail.com> 于2021年10月15日周五 上午10:57写道: > > > Matthew Wilcox <willy@infradead.org> 于2021年10月14日周四 下午7:26写道: > >> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: >> > The bdi_remove_from_list() is called in RCU softirq, however the >> > synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() >> > instead of it. >> > >> > Reported-by: Hao Sun <sunhao.th@gmail.com> >> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> >> > --- >> > include/linux/backing-dev-defs.h | 1 + >> > mm/backing-dev.c | 4 +--- >> > 2 files changed, 2 insertions(+), 3 deletions(-) >> > >> > diff --git a/include/linux/backing-dev-defs.h >> b/include/linux/backing-dev-defs.h >> > index 33207004cfde..35a093384518 100644 >> > --- a/include/linux/backing-dev-defs.h >> > +++ b/include/linux/backing-dev-defs.h >> > @@ -202,6 +202,7 @@ struct backing_dev_info { >> > #ifdef CONFIG_DEBUG_FS >> > struct dentry *debug_dir; >> > #endif >> > + struct rcu_head rcu; >> > }; >> >> >Instead of growing struct backing_dev_info, it seems to me this rcu_head >> >could be placed in a union with rb_node, since it will have been removed >> >from the bdi_tree by this point and the tree is never walked under >> >RCU protection? >> >> > Thanks for your advice, I find this bdi_tree is traversed under the > protection of a spin lock, not under the protection of RCU. > I find this modification does not avoid the problem described in patch, > the flush_delayed_work() may be called in release_bdi() > The same will cause problems. > > may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, > i_callback) or do you have any better suggestions? > > Thanks > Zqiang > >
On 2021/10/14 下午7:24, Matthew Wilcox wrote: > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: >> The bdi_remove_from_list() is called in RCU softirq, however the >> synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() >> instead of it. >> >> Reported-by: Hao Sun <sunhao.th@gmail.com> >> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> >> --- >> include/linux/backing-dev-defs.h | 1 + >> mm/backing-dev.c | 4 +--- >> 2 files changed, 2 insertions(+), 3 deletions(-) >> >> diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h >> index 33207004cfde..35a093384518 100644 >> --- a/include/linux/backing-dev-defs.h >> +++ b/include/linux/backing-dev-defs.h >> @@ -202,6 +202,7 @@ struct backing_dev_info { >> #ifdef CONFIG_DEBUG_FS >> struct dentry *debug_dir; >> #endif >> + struct rcu_head rcu; >> }; > Instead of growing struct backing_dev_info, it seems to me this rcu_head > could be placed in a union with rb_node, since it will have been removed > from the bdi_tree by this point and the tree is never walked under > RCU protection? > Thanks for your advice, I find this bdi_tree is traversed under the protection of a spin lock, not under the protection of RCU. I find this modification does not avoid the problem described in patch, the flush_delayed_work() may be called in release_bdi() The same will cause problems. may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, i_callback) or do you have any better suggestions? Thanks Zqiang
On 2021/10/15 上午10:57, Qiang Zhang wrote: > > > Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>> > 于2021年10月14日周四 下午7:26写道: > > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: > > The bdi_remove_from_list() is called in RCU softirq, however the > > synchronize_rcu_expedited() will produce sleep action, use > kfree_rcu() > > instead of it. > > > > Reported-by: Hao Sun <sunhao.th@gmail.com > <mailto:sunhao.th@gmail.com>> > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com > <mailto:qiang.zhang1211@gmail.com>> > > --- > > include/linux/backing-dev-defs.h | 1 + > > mm/backing-dev.c | 4 +--- > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > diff --git a/include/linux/backing-dev-defs.h > b/include/linux/backing-dev-defs.h > > index 33207004cfde..35a093384518 100644 > > --- a/include/linux/backing-dev-defs.h > > +++ b/include/linux/backing-dev-defs.h > > @@ -202,6 +202,7 @@ struct backing_dev_info { > > #ifdef CONFIG_DEBUG_FS > > struct dentry *debug_dir; > > #endif > > + struct rcu_head rcu; > > }; > > >Instead of growing struct backing_dev_info, it seems to me this > rcu_head > >could be placed in a union with rb_node, since it will have been > removed > >from the bdi_tree by this point and the tree is never walked under > >RCU protection? > > > Thanks for your advice, I find this bdi_tree is traversed under the > protection of a spin lock, not under the protection of RCU. > I find this modification does not avoid the problem described in > patch, the flush_delayed_work() may be called in release_bdi() > The same will cause problems. > may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, > i_callback) or do you have any better suggestions? > > Thanks > Zqiang diff --git a/fs/inode.c b/fs/inode.c index a49695f57e1e..300beb19aed6 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -219,9 +219,9 @@ void free_inode_nonrcu(struct inode *inode) } EXPORT_SYMBOL(free_inode_nonrcu); -static void i_callback(struct rcu_head *head) +static void i_callback(struct work_struct *work) { - struct inode *inode = container_of(head, struct inode, i_rcu); + struct inode *inode = container_of(to_rcu_work(work), struct inode, rwork); if (inode->free_inode) inode->free_inode(inode); else @@ -248,7 +248,7 @@ static struct inode *alloc_inode(struct super_block *sb) return NULL; } inode->free_inode = ops->free_inode; - i_callback(&inode->i_rcu); + i_callback(&inode->rwork.work); return NULL; } @@ -289,7 +289,8 @@ static void destroy_inode(struct inode *inode) return; } inode->free_inode = ops->free_inode; - call_rcu(&inode->i_rcu, i_callback); + INIT_RCU_WORK(&inode->rwork, i_callback); + queue_rcu_work(system_wq, &inode->rwork); } /** diff --git a/include/linux/fs.h b/include/linux/fs.h index 8903a95611a2..006d769791a8 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -686,7 +686,7 @@ struct inode { struct list_head i_wb_list; /* backing dev writeback list */ union { struct hlist_head i_dentry; - struct rcu_head i_rcu; + struct rcu_work rwork; }; atomic64_t i_version; atomic64_t i_sequence; /* see futex */
On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote: > > On 2021/10/15 上午10:57, Qiang Zhang wrote: > > > > > > Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>> > > 于2021年10月14日周四 下午7:26写道: > > > > On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: > > > The bdi_remove_from_list() is called in RCU softirq, however the > > > synchronize_rcu_expedited() will produce sleep action, use > > kfree_rcu() > > > instead of it. > > > > > > Reported-by: Hao Sun <sunhao.th@gmail.com > > <mailto:sunhao.th@gmail.com>> > > > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com > > <mailto:qiang.zhang1211@gmail.com>> > > > --- > > > include/linux/backing-dev-defs.h | 1 + > > > mm/backing-dev.c | 4 +--- > > > 2 files changed, 2 insertions(+), 3 deletions(-) > > > > > > diff --git a/include/linux/backing-dev-defs.h > > b/include/linux/backing-dev-defs.h > > > index 33207004cfde..35a093384518 100644 > > > --- a/include/linux/backing-dev-defs.h > > > +++ b/include/linux/backing-dev-defs.h > > > @@ -202,6 +202,7 @@ struct backing_dev_info { > > > #ifdef CONFIG_DEBUG_FS > > > struct dentry *debug_dir; > > > #endif > > > + struct rcu_head rcu; > > > }; > > > > >Instead of growing struct backing_dev_info, it seems to me this > > rcu_head > > >could be placed in a union with rb_node, since it will have been > > removed > > >from the bdi_tree by this point and the tree is never walked under > > >RCU protection? > > > > > > Thanks for your advice, I find this bdi_tree is traversed under the > > protection of a spin lock, not under the protection of RCU. > > I find this modification does not avoid the problem described in patch, > > the flush_delayed_work() may be called in release_bdi() > > The same will cause problems. > > may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, > > i_callback) or do you have any better suggestions? What? All I was suggesting was: +++ b/include/linux/backing-dev-defs.h @@ -168,7 +168,10 @@ struct bdi_writeback { struct backing_dev_info { u64 id; - struct rb_node rb_node; /* keyed by ->id */ + union { + struct rb_node rb_node; /* keyed by ->id */ + struct rcu_head rcu; + }; struct list_head bdi_list; unsigned long ra_pages; /* max readahead in PAGE_SIZE units */ unsigned long io_pages; /* max allowed IO size */ Christoph, independent of the inode lifetime problem, this actually seems like a good approach to take. I don't see why we should synchronize_rcu() here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas (converted it to use _expedited) and Tejun (worked around a problem when using _expedited).
On Fri, Oct 15, 2021 at 01:35:56PM +0100, Matthew Wilcox wrote: > struct backing_dev_info { > u64 id; > - struct rb_node rb_node; /* keyed by ->id */ > + union { > + struct rb_node rb_node; /* keyed by ->id */ > + struct rcu_head rcu; > + }; > struct list_head bdi_list; > unsigned long ra_pages; /* max readahead in PAGE_SIZE units */ > unsigned long io_pages; /* max allowed IO size */ > > > Christoph, independent of the inode lifetime problem, this actually seems > like a good approach to take. I don't see why we should synchronize_rcu() > here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas > (converted it to use _expedited) and Tejun (worked around a problem when > using _expedited). The kfree+rcu + your suggestion does seem like a good idea in general to me. But I'd still like to fix the actual bug being reported before optimizing the area in a way that papers over the bug.
On 2021/10/15 下午8:35, Matthew Wilcox wrote: > On Fri, Oct 15, 2021 at 01:06:02PM +0800, Zqiang wrote: >> On 2021/10/15 上午10:57, Qiang Zhang wrote: >>> >>> Matthew Wilcox <willy@infradead.org <mailto:willy@infradead.org>> >>> 于2021年10月14日周四 下午7:26写道: >>> >>> On Thu, Oct 14, 2021 at 04:24:33PM +0800, Zqiang wrote: >>> > The bdi_remove_from_list() is called in RCU softirq, however the >>> > synchronize_rcu_expedited() will produce sleep action, use >>> kfree_rcu() >>> > instead of it. >>> > >>> > Reported-by: Hao Sun <sunhao.th@gmail.com >>> <mailto:sunhao.th@gmail.com>> >>> > Signed-off-by: Zqiang <qiang.zhang1211@gmail.com >>> <mailto:qiang.zhang1211@gmail.com>> >>> > --- >>> > include/linux/backing-dev-defs.h | 1 + >>> > mm/backing-dev.c | 4 +--- >>> > 2 files changed, 2 insertions(+), 3 deletions(-) >>> > >>> > diff --git a/include/linux/backing-dev-defs.h >>> b/include/linux/backing-dev-defs.h >>> > index 33207004cfde..35a093384518 100644 >>> > --- a/include/linux/backing-dev-defs.h >>> > +++ b/include/linux/backing-dev-defs.h >>> > @@ -202,6 +202,7 @@ struct backing_dev_info { >>> > #ifdef CONFIG_DEBUG_FS >>> > struct dentry *debug_dir; >>> > #endif >>> > + struct rcu_head rcu; >>> > }; >>> >>> >Instead of growing struct backing_dev_info, it seems to me this >>> rcu_head >>> >could be placed in a union with rb_node, since it will have been >>> removed >>> >from the bdi_tree by this point and the tree is never walked under >>> >RCU protection? >>> >>> >>> Thanks for your advice, I find this bdi_tree is traversed under the >>> protection of a spin lock, not under the protection of RCU. >>> I find this modification does not avoid the problem described in patch, >>> the flush_delayed_work() may be called in release_bdi() >>> The same will cause problems. >>> may be we can replace queue_rcu_work() of call_rcu(&inode->i_rcu, >>> i_callback) or do you have any better suggestions? > What? All I was suggesting was: > > +++ b/include/linux/backing-dev-defs.h > @@ -168,7 +168,10 @@ struct bdi_writeback { > > struct backing_dev_info { > u64 id; > - struct rb_node rb_node; /* keyed by ->id */ > + union { > + struct rb_node rb_node; /* keyed by ->id */ > + struct rcu_head rcu; > + }; > struct list_head bdi_list; > unsigned long ra_pages; /* max readahead in PAGE_SIZE units */ > unsigned long io_pages; /* max allowed IO size */ > > > Christoph, independent of the inode lifetime problem, this actually seems > like a good approach to take. I don't see why we should synchronize_rcu() > here? Adding Jens (original introducer of the synchronize_rcu()), Mikulas > (converted it to use _expedited) and Tejun (worked around a problem when > using _expedited). Sorry,this my mistake. this problem and the inode lifetime cycle are two different problems Can this modification which use kfree_rcu() instead of synchronize_rcu() be accepted? Thanks Zqiang
diff --git a/include/linux/backing-dev-defs.h b/include/linux/backing-dev-defs.h index 33207004cfde..35a093384518 100644 --- a/include/linux/backing-dev-defs.h +++ b/include/linux/backing-dev-defs.h @@ -202,6 +202,7 @@ struct backing_dev_info { #ifdef CONFIG_DEBUG_FS struct dentry *debug_dir; #endif + struct rcu_head rcu; }; enum { diff --git a/mm/backing-dev.c b/mm/backing-dev.c index c878d995af06..45d866a3a4a2 100644 --- a/mm/backing-dev.c +++ b/mm/backing-dev.c @@ -935,8 +935,6 @@ static void bdi_remove_from_list(struct backing_dev_info *bdi) rb_erase(&bdi->rb_node, &bdi_tree); list_del_rcu(&bdi->bdi_list); spin_unlock_bh(&bdi_lock); - - synchronize_rcu_expedited(); } void bdi_unregister(struct backing_dev_info *bdi) @@ -969,7 +967,7 @@ static void release_bdi(struct kref *ref) bdi_unregister(bdi); WARN_ON_ONCE(bdi->dev); wb_exit(&bdi->wb); - kfree(bdi); + kfree_rcu(bdi, rcu); } void bdi_put(struct backing_dev_info *bdi)
<IRQ> __init_work+0x2d/0x50 synchronize_rcu_expedited+0x3af/0x650 bdi_remove_from_list [inline] bdi_unregister+0x17f/0x5c0 release_bdi+0xa1/0xc0 kref_put [inline] bdi_put+0x72/0xa0 bdev_free_inode+0x11e/0x220 i_callback+0x3f/0x70 rcu_do_batch [inline] rcu_core+0x76d/0x16c0 __do_softirq+0x1d7/0x93b invoke_softirq [inline] __irq_exit_rcu [inline] irq_exit_rcu+0xf2/0x130 sysvec_apic_timer_interrupt+0x93/0xc0 The bdi_remove_from_list() is called in RCU softirq, however the synchronize_rcu_expedited() will produce sleep action, use kfree_rcu() instead of it. Reported-by: Hao Sun <sunhao.th@gmail.com> Signed-off-by: Zqiang <qiang.zhang1211@gmail.com> --- include/linux/backing-dev-defs.h | 1 + mm/backing-dev.c | 4 +--- 2 files changed, 2 insertions(+), 3 deletions(-)