Message ID | 841d0e0f-f04c-9611-2eea-0bcc40e5b084@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | fs: avoid softlockups in s_inodes iterators | expand |
On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: > Anything that walks all inodes on sb->s_inodes list without rescheduling > risks softlockups. > > Previous efforts were made in 2 functions, see: > > c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() > ac05fbb inode: don't softlockup when evicting inodes > > but there hasn't been an audit of all walkers, so do that now. This > also consistently moves the cond_resched() calls to the bottom of each > loop. > > One remains: remove_dquot_ref(), because I'm not quite sure how to deal > with that one w/o taking the i_lock. > > Signed-off-by: Eric Sandeen <sandeen@redhat.com> You've got iput cleanups in here and cond_resched()'s. I feel like this is a missed opportunity to pad your patch count. Thanks, Josef
On 10/11/19 12:29 PM, Josef Bacik wrote: > On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: >> Anything that walks all inodes on sb->s_inodes list without rescheduling >> risks softlockups. >> >> Previous efforts were made in 2 functions, see: >> >> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() >> ac05fbb inode: don't softlockup when evicting inodes >> >> but there hasn't been an audit of all walkers, so do that now. This >> also consistently moves the cond_resched() calls to the bottom of each >> loop. >> >> One remains: remove_dquot_ref(), because I'm not quite sure how to deal >> with that one w/o taking the i_lock. >> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > You've got iput cleanups in here and cond_resched()'s. I feel like this is a > missed opportunity to pad your patch count. Thanks, yeah, I was going to suggest that I could split it out into 3 (move cond_rescheds, clean up iputs, add new rescheds) if there was a request. But it seemed a bit ridiculously granular. Find by me if desired, tho. So, was that a request? -Eric
On Fri, Oct 11, 2019 at 12:34:45PM -0500, Eric Sandeen wrote: > On 10/11/19 12:29 PM, Josef Bacik wrote: > > On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: > >> Anything that walks all inodes on sb->s_inodes list without rescheduling > >> risks softlockups. > >> > >> Previous efforts were made in 2 functions, see: > >> > >> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() > >> ac05fbb inode: don't softlockup when evicting inodes > >> > >> but there hasn't been an audit of all walkers, so do that now. This > >> also consistently moves the cond_resched() calls to the bottom of each > >> loop. > >> > >> One remains: remove_dquot_ref(), because I'm not quite sure how to deal > >> with that one w/o taking the i_lock. > >> > >> Signed-off-by: Eric Sandeen <sandeen@redhat.com> > > > > You've got iput cleanups in here and cond_resched()'s. I feel like this is a > > missed opportunity to pad your patch count. Thanks, > > yeah, I was going to suggest that I could split it out into 3 > (move cond_rescheds, clean up iputs, add new rescheds) if there was a > request. But it seemed a bit ridiculously granular. Find by me > if desired, tho. > > So, was that a request? I think just two patches, one for the iputs and one for the resched changes. Thanks, Josef
On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: > @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) > inode_lru_list_del(inode); > spin_unlock(&inode->i_lock); > list_add(&inode->i_lru, &dispose); > + > + if (need_resched()) { > + spin_unlock(&sb->s_inode_list_lock); > + cond_resched(); > + dispose_list(&dispose); > + goto again; > + } > } > spin_unlock(&sb->s_inode_list_lock); > Is this equivalent to: + cond_resched_lock(&sb->s_inode_list_lock)); or is disposing of the list a crucial part here?
On 10/11/19 1:32 PM, Matthew Wilcox wrote: > On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: >> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) >> inode_lru_list_del(inode); >> spin_unlock(&inode->i_lock); >> list_add(&inode->i_lru, &dispose); >> + >> + if (need_resched()) { >> + spin_unlock(&sb->s_inode_list_lock); >> + cond_resched(); >> + dispose_list(&dispose); >> + goto again; >> + } >> } >> spin_unlock(&sb->s_inode_list_lock); >> > > Is this equivalent to: > > + cond_resched_lock(&sb->s_inode_list_lock)); > > or is disposing of the list a crucial part here? I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto: -Eric
On 10/11/19 1:45 PM, Eric Sandeen wrote: > On 10/11/19 1:32 PM, Matthew Wilcox wrote: >> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: >>> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) >>> inode_lru_list_del(inode); >>> spin_unlock(&inode->i_lock); >>> list_add(&inode->i_lru, &dispose); >>> + >>> + if (need_resched()) { >>> + spin_unlock(&sb->s_inode_list_lock); >>> + cond_resched(); >>> + dispose_list(&dispose); >>> + goto again; >>> + } >>> } >>> spin_unlock(&sb->s_inode_list_lock); >>> >> >> Is this equivalent to: >> >> + cond_resched_lock(&sb->s_inode_list_lock)); >> >> or is disposing of the list a crucial part here? > > I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto: Oh, if you meant in lieu of the goto, we can't drop that lock and expect to pick up our traversal where we left off, can we? -Eric
On Fri, Oct 11, 2019 at 04:14:02PM -0500, Eric Sandeen wrote: > > > On 10/11/19 1:45 PM, Eric Sandeen wrote: > > On 10/11/19 1:32 PM, Matthew Wilcox wrote: > >> On Fri, Oct 11, 2019 at 11:49:38AM -0500, Eric Sandeen wrote: > >>> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) > >>> inode_lru_list_del(inode); > >>> spin_unlock(&inode->i_lock); > >>> list_add(&inode->i_lru, &dispose); > >>> + > >>> + if (need_resched()) { > >>> + spin_unlock(&sb->s_inode_list_lock); > >>> + cond_resched(); > >>> + dispose_list(&dispose); > >>> + goto again; > >>> + } > >>> } > >>> spin_unlock(&sb->s_inode_list_lock); > >>> > >> > >> Is this equivalent to: > >> > >> + cond_resched_lock(&sb->s_inode_list_lock)); > >> > >> or is disposing of the list a crucial part here? > > > > I think we need to dispose, or we'll start with the entire ~unmodified list again after the goto: > > Oh, if you meant in lieu of the goto, we can't drop that lock and > expect to pick up our traversal where we left off, can we? No, we can't. Unless you're doing the iget/iput game (which we can't here!) the moment the s_inode_list_lock is dropped we cannot rely on the inode or it's next pointer to still be valid. Hence we have to restart the traversal. And we dispose of the list before restarting because there's nothing to gain by waiting until we've done the entire sb inode list walk (could be hundreds of millions of inodes) before we start actually freeing them.... Cheers, Dave.
On Fri 11-10-19 11:49:38, Eric Sandeen wrote: > One remains: remove_dquot_ref(), because I'm not quite sure how to deal > with that one w/o taking the i_lock. Yeah, that will be somewhat tricky. But I think we can modify the standard iget-iput dance like: if (need_resched()) { spin_lock(&inode->i_lock); if (inode->i_state & (I_FREEING|I_WILL_FREE|I_NEW)) { /* * Cannot break on this inode, need to do one * more. */ spin_unlock(&inode->i_lock); continue; } __iget(inode); spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); iput(put_inode); put_inode = inode; cond_resched(); spin_lock(&sb->s_inode_list_lock); } ... iput(put_inode); Will you transform this into a proper patch in your series or should I do it? Honza
diff --git a/fs/drop_caches.c b/fs/drop_caches.c index d31b6c72b476..dc1a1d5d825b 100644 --- a/fs/drop_caches.c +++ b/fs/drop_caches.c @@ -35,11 +35,11 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); - cond_resched(); invalidate_mapping_pages(inode->i_mapping, 0, -1); iput(toput_inode); toput_inode = inode; + cond_resched(); spin_lock(&sb->s_inode_list_lock); } spin_unlock(&sb->s_inode_list_lock); diff --git a/fs/inode.c b/fs/inode.c index fef457a42882..b0c789bb3dba 100644 --- a/fs/inode.c +++ b/fs/inode.c @@ -676,6 +676,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) struct inode *inode, *next; LIST_HEAD(dispose); +again: spin_lock(&sb->s_inode_list_lock); list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) { spin_lock(&inode->i_lock); @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty) inode_lru_list_del(inode); spin_unlock(&inode->i_lock); list_add(&inode->i_lru, &dispose); + + if (need_resched()) { + spin_unlock(&sb->s_inode_list_lock); + cond_resched(); + dispose_list(&dispose); + goto again; + } } spin_unlock(&sb->s_inode_list_lock); diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c index 2ecef6155fc0..15f77cbbd188 100644 --- a/fs/notify/fsnotify.c +++ b/fs/notify/fsnotify.c @@ -67,22 +67,19 @@ static void fsnotify_unmount_inodes(struct super_block *sb) spin_unlock(&inode->i_lock); spin_unlock(&sb->s_inode_list_lock); - if (iput_inode) - iput(iput_inode); - + iput(iput_inode); /* for each watch, send FS_UNMOUNT and then remove it */ fsnotify(inode, FS_UNMOUNT, inode, FSNOTIFY_EVENT_INODE, NULL, 0); - fsnotify_inode_delete(inode); iput_inode = inode; + cond_resched(); spin_lock(&sb->s_inode_list_lock); } spin_unlock(&sb->s_inode_list_lock); + iput(iput_inode); - if (iput_inode) - iput(iput_inode); /* Wait for outstanding inode references from connectors */ wait_var_event(&sb->s_fsnotify_inode_refs, !atomic_long_read(&sb->s_fsnotify_inode_refs)); diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c index 6e826b454082..4a085b3c7cac 100644 --- a/fs/quota/dquot.c +++ b/fs/quota/dquot.c @@ -985,6 +985,7 @@ static int add_dquot_ref(struct super_block *sb, int type) * later. */ old_inode = inode; + cond_resched(); spin_lock(&sb->s_inode_list_lock); } spin_unlock(&sb->s_inode_list_lock);
Anything that walks all inodes on sb->s_inodes list without rescheduling risks softlockups. Previous efforts were made in 2 functions, see: c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb() ac05fbb inode: don't softlockup when evicting inodes but there hasn't been an audit of all walkers, so do that now. This also consistently moves the cond_resched() calls to the bottom of each loop. One remains: remove_dquot_ref(), because I'm not quite sure how to deal with that one w/o taking the i_lock. Signed-off-by: Eric Sandeen <sandeen@redhat.com> ---