diff mbox series

[V2] fs: avoid softlockups in s_inodes iterators

Message ID a26fae1d-a741-6eb1-b460-968a3b97e238@redhat.com (mailing list archive)
State New, archived
Headers show
Series [V2] fs: avoid softlockups in s_inodes iterators | expand

Commit Message

Eric Sandeen Oct. 14, 2019, 9:30 p.m. UTC
Anything that walks all inodes on sb->s_inodes list without rescheduling
risks softlockups.

Previous efforts were made in 2 functions, see:

c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
ac05fbb inode: don't softlockup when evicting inodes

but there hasn't been an audit of all walkers, so do that now.  This
also consistently moves the cond_resched() calls to the bottom of each
loop in cases where it already exists.

One loop remains: remove_dquot_ref(), because I'm not quite sure how
to deal with that one w/o taking the i_lock.

Signed-off-by: Eric Sandeen <sandeen@redhat.com>
---

V2: Drop unrelated iput cleanups in fsnotify

Comments

Eric Sandeen Oct. 14, 2019, 9:36 p.m. UTC | #1
On 10/14/19 4:30 PM, Eric Sandeen wrote:
> Anything that walks all inodes on sb->s_inodes list without rescheduling
> risks softlockups.
> 
> Previous efforts were made in 2 functions, see:
> 
> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> ac05fbb inode: don't softlockup when evicting inodes
> 
> but there hasn't been an audit of all walkers, so do that now.  This
> also consistently moves the cond_resched() calls to the bottom of each
> loop in cases where it already exists.
> 
> One loop remains: remove_dquot_ref(), because I'm not quite sure how
> to deal with that one w/o taking the i_lock.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Whoops, cc: Jan & Josef as on original, sorry.
Jan Kara Oct. 15, 2019, 7:37 a.m. UTC | #2
On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
> Anything that walks all inodes on sb->s_inodes list without rescheduling
> risks softlockups.
> 
> Previous efforts were made in 2 functions, see:
> 
> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> ac05fbb inode: don't softlockup when evicting inodes
> 
> but there hasn't been an audit of all walkers, so do that now.  This
> also consistently moves the cond_resched() calls to the bottom of each
> loop in cases where it already exists.
> 
> One loop remains: remove_dquot_ref(), because I'm not quite sure how
> to deal with that one w/o taking the i_lock.
> 
> Signed-off-by: Eric Sandeen <sandeen@redhat.com>

Thanks Eric. The patch looks good to me. You can add:

Reviewed-by: Jan Kara <jack@suse.cz>

BTW, I suppose you need to add Al to pickup the patch?

								Honza

> ---
> 
> V2: Drop unrelated iput cleanups in fsnotify
> 
> diff --git a/fs/drop_caches.c b/fs/drop_caches.c
> index d31b6c72b476..dc1a1d5d825b 100644
> --- a/fs/drop_caches.c
> +++ b/fs/drop_caches.c
> @@ -35,11 +35,11 @@ static void drop_pagecache_sb(struct super_block *sb, void *unused)
>  		spin_unlock(&inode->i_lock);
>  		spin_unlock(&sb->s_inode_list_lock);
> -		cond_resched();
>  		invalidate_mapping_pages(inode->i_mapping, 0, -1);
>  		iput(toput_inode);
>  		toput_inode = inode;
> +		cond_resched();
>  		spin_lock(&sb->s_inode_list_lock);
>  	}
>  	spin_unlock(&sb->s_inode_list_lock);
> diff --git a/fs/inode.c b/fs/inode.c
> index fef457a42882..b0c789bb3dba 100644
> --- a/fs/inode.c
> +++ b/fs/inode.c
> @@ -676,6 +676,7 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>  	struct inode *inode, *next;
>  	LIST_HEAD(dispose);
> +again:
>  	spin_lock(&sb->s_inode_list_lock);
>  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
>  		spin_lock(&inode->i_lock);
> @@ -698,6 +699,13 @@ int invalidate_inodes(struct super_block *sb, bool kill_dirty)
>  		inode_lru_list_del(inode);
>  		spin_unlock(&inode->i_lock);
>  		list_add(&inode->i_lru, &dispose);
> +
> +		if (need_resched()) {
> +			spin_unlock(&sb->s_inode_list_lock);
> +			cond_resched();
> +			dispose_list(&dispose);
> +			goto again;
> +		}
>  	}
>  	spin_unlock(&sb->s_inode_list_lock);
> diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
> index 2ecef6155fc0..ac9eb273e28c 100644
> --- a/fs/notify/fsnotify.c
> +++ b/fs/notify/fsnotify.c
> @@ -77,6 +77,7 @@ static void fsnotify_unmount_inodes(struct super_block *sb)
>  		iput_inode = inode;
> +		cond_resched();
>  		spin_lock(&sb->s_inode_list_lock);
>  	}
>  	spin_unlock(&sb->s_inode_list_lock);
> diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
> index 6e826b454082..4a085b3c7cac 100644
> --- a/fs/quota/dquot.c
> +++ b/fs/quota/dquot.c
> @@ -985,6 +985,7 @@ static int add_dquot_ref(struct super_block *sb, int type)
>  		 * later.
>  		 */
>  		old_inode = inode;
> +		cond_resched();
>  		spin_lock(&sb->s_inode_list_lock);
>  	}
>  	spin_unlock(&sb->s_inode_list_lock);
>
Eric Sandeen Oct. 16, 2019, 2:36 a.m. UTC | #3
On 10/15/19 2:37 AM, Jan Kara wrote:
> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>> risks softlockups.
>>
>> Previous efforts were made in 2 functions, see:
>>
>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>> ac05fbb inode: don't softlockup when evicting inodes
>>
>> but there hasn't been an audit of all walkers, so do that now.  This
>> also consistently moves the cond_resched() calls to the bottom of each
>> loop in cases where it already exists.
>>
>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
>> to deal with that one w/o taking the i_lock.
>>
>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> 
> Thanks Eric. The patch looks good to me. You can add:
> 
> Reviewed-by: Jan Kara <jack@suse.cz>

thanks

> BTW, I suppose you need to add Al to pickup the patch?

Yeah (cc'd now)

But it was just pointed out to me that if/when the majority of inodes
at umount time have i_count == 0, we'll never hit the resched in 
fsnotify_unmount_inodes() and may still have an issue ...

-Eric
Jan Kara Oct. 16, 2019, 9:42 a.m. UTC | #4
On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
> On 10/15/19 2:37 AM, Jan Kara wrote:
> > On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
> >> Anything that walks all inodes on sb->s_inodes list without rescheduling
> >> risks softlockups.
> >>
> >> Previous efforts were made in 2 functions, see:
> >>
> >> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> >> ac05fbb inode: don't softlockup when evicting inodes
> >>
> >> but there hasn't been an audit of all walkers, so do that now.  This
> >> also consistently moves the cond_resched() calls to the bottom of each
> >> loop in cases where it already exists.
> >>
> >> One loop remains: remove_dquot_ref(), because I'm not quite sure how
> >> to deal with that one w/o taking the i_lock.
> >>
> >> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> > 
> > Thanks Eric. The patch looks good to me. You can add:
> > 
> > Reviewed-by: Jan Kara <jack@suse.cz>
> 
> thanks
> 
> > BTW, I suppose you need to add Al to pickup the patch?
> 
> Yeah (cc'd now)
> 
> But it was just pointed out to me that if/when the majority of inodes
> at umount time have i_count == 0, we'll never hit the resched in 
> fsnotify_unmount_inodes() and may still have an issue ...

Yeah, that's a good point. So that loop will need some further tweaking
(like doing iget-iput dance in need_resched() case like in some other
places).

								Honza
Eric Sandeen Oct. 16, 2019, 1:23 p.m. UTC | #5
On 10/16/19 4:42 AM, Jan Kara wrote:
> On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
>> On 10/15/19 2:37 AM, Jan Kara wrote:
>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>>>> risks softlockups.
>>>>
>>>> Previous efforts were made in 2 functions, see:
>>>>
>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>>>> ac05fbb inode: don't softlockup when evicting inodes
>>>>
>>>> but there hasn't been an audit of all walkers, so do that now.  This
>>>> also consistently moves the cond_resched() calls to the bottom of each
>>>> loop in cases where it already exists.
>>>>
>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
>>>> to deal with that one w/o taking the i_lock.
>>>>
>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>
>>> Thanks Eric. The patch looks good to me. You can add:
>>>
>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>
>> thanks
>>
>>> BTW, I suppose you need to add Al to pickup the patch?
>>
>> Yeah (cc'd now)
>>
>> But it was just pointed out to me that if/when the majority of inodes
>> at umount time have i_count == 0, we'll never hit the resched in 
>> fsnotify_unmount_inodes() and may still have an issue ...
> 
> Yeah, that's a good point. So that loop will need some further tweaking
> (like doing iget-iput dance in need_resched() case like in some other
> places).

Well, it's already got an iget/iput for anything with i_count > 0.  But
as the comment says (and I think it's right...) doing an iget/iput
on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
iput here would actually start evicting inodes in /this/ loop, right?

I think we could (ab)use the lru list to construct a "dispose" list for
fsnotify processing as was done in evict_inodes...

or maybe the two should be merged, and fsnotify watches could be handled
directly in evict_inodes.  But that doesn't feel quite right.

-Eric
Jan Kara Oct. 16, 2019, 1:49 p.m. UTC | #6
On Wed 16-10-19 08:23:51, Eric Sandeen wrote:
> On 10/16/19 4:42 AM, Jan Kara wrote:
> > On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
> >> On 10/15/19 2:37 AM, Jan Kara wrote:
> >>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
> >>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
> >>>> risks softlockups.
> >>>>
> >>>> Previous efforts were made in 2 functions, see:
> >>>>
> >>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> >>>> ac05fbb inode: don't softlockup when evicting inodes
> >>>>
> >>>> but there hasn't been an audit of all walkers, so do that now.  This
> >>>> also consistently moves the cond_resched() calls to the bottom of each
> >>>> loop in cases where it already exists.
> >>>>
> >>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
> >>>> to deal with that one w/o taking the i_lock.
> >>>>
> >>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >>>
> >>> Thanks Eric. The patch looks good to me. You can add:
> >>>
> >>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>
> >> thanks
> >>
> >>> BTW, I suppose you need to add Al to pickup the patch?
> >>
> >> Yeah (cc'd now)
> >>
> >> But it was just pointed out to me that if/when the majority of inodes
> >> at umount time have i_count == 0, we'll never hit the resched in 
> >> fsnotify_unmount_inodes() and may still have an issue ...
> > 
> > Yeah, that's a good point. So that loop will need some further tweaking
> > (like doing iget-iput dance in need_resched() case like in some other
> > places).
> 
> Well, it's already got an iget/iput for anything with i_count > 0.  But
> as the comment says (and I think it's right...) doing an iget/iput
> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
> iput here would actually start evicting inodes in /this/ loop, right?

Yes, it would but since this is just before calling evict_inodes(), I have
currently hard time remembering why evicting inodes like that would be an
issue.

> I think we could (ab)use the lru list to construct a "dispose" list for
> fsnotify processing as was done in evict_inodes...
> 
> or maybe the two should be merged, and fsnotify watches could be handled
> directly in evict_inodes.  But that doesn't feel quite right.

Merging the two would be possible (and faster!) as well but I agree it
feels a bit dirty :)
								Honza
Eric Sandeen Oct. 16, 2019, 2:39 p.m. UTC | #7
On 10/16/19 8:49 AM, Jan Kara wrote:
> On Wed 16-10-19 08:23:51, Eric Sandeen wrote:
>> On 10/16/19 4:42 AM, Jan Kara wrote:
>>> On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
>>>> On 10/15/19 2:37 AM, Jan Kara wrote:
>>>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
>>>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>>>>>> risks softlockups.
>>>>>>
>>>>>> Previous efforts were made in 2 functions, see:
>>>>>>
>>>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>>>>>> ac05fbb inode: don't softlockup when evicting inodes
>>>>>>
>>>>>> but there hasn't been an audit of all walkers, so do that now.  This
>>>>>> also consistently moves the cond_resched() calls to the bottom of each
>>>>>> loop in cases where it already exists.
>>>>>>
>>>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
>>>>>> to deal with that one w/o taking the i_lock.
>>>>>>
>>>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>>>
>>>>> Thanks Eric. The patch looks good to me. You can add:
>>>>>
>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>
>>>> thanks
>>>>
>>>>> BTW, I suppose you need to add Al to pickup the patch?
>>>>
>>>> Yeah (cc'd now)
>>>>
>>>> But it was just pointed out to me that if/when the majority of inodes
>>>> at umount time have i_count == 0, we'll never hit the resched in 
>>>> fsnotify_unmount_inodes() and may still have an issue ...
>>>
>>> Yeah, that's a good point. So that loop will need some further tweaking
>>> (like doing iget-iput dance in need_resched() case like in some other
>>> places).
>>
>> Well, it's already got an iget/iput for anything with i_count > 0.  But
>> as the comment says (and I think it's right...) doing an iget/iput
>> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
>> iput here would actually start evicting inodes in /this/ loop, right?
> 
> Yes, it would but since this is just before calling evict_inodes(), I have
> currently hard time remembering why evicting inodes like that would be an
> issue.

Probably just weird to effectively evict all inodes prior to evict_inodes() ;)

>> I think we could (ab)use the lru list to construct a "dispose" list for
>> fsnotify processing as was done in evict_inodes...

[narrator: Eric's idea here is dumb and it won't work]

>> or maybe the two should be merged, and fsnotify watches could be handled
>> directly in evict_inodes.  But that doesn't feel quite right.
> 
> Merging the two would be possible (and faster!) as well but I agree it
> feels a bit dirty :)

It's starting to look like maybe the only option...

I'll see if Al is willing to merge this patch as is for the simple "schedule
the big loops" and see about a 2nd patch on top to do more surgery for this
case.

Thanks,
-Eric

> 								Honza
>
Eric Sandeen Oct. 16, 2019, 3:26 p.m. UTC | #8
On 10/16/19 9:39 AM, Eric Sandeen wrote:
> On 10/16/19 8:49 AM, Jan Kara wrote:
>> On Wed 16-10-19 08:23:51, Eric Sandeen wrote:
>>> On 10/16/19 4:42 AM, Jan Kara wrote:
>>>> On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
>>>>> On 10/15/19 2:37 AM, Jan Kara wrote:
>>>>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
>>>>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
>>>>>>> risks softlockups.
>>>>>>>
>>>>>>> Previous efforts were made in 2 functions, see:
>>>>>>>
>>>>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
>>>>>>> ac05fbb inode: don't softlockup when evicting inodes
>>>>>>>
>>>>>>> but there hasn't been an audit of all walkers, so do that now.  This
>>>>>>> also consistently moves the cond_resched() calls to the bottom of each
>>>>>>> loop in cases where it already exists.
>>>>>>>
>>>>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
>>>>>>> to deal with that one w/o taking the i_lock.
>>>>>>>
>>>>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
>>>>>>
>>>>>> Thanks Eric. The patch looks good to me. You can add:
>>>>>>
>>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
>>>>>
>>>>> thanks
>>>>>
>>>>>> BTW, I suppose you need to add Al to pickup the patch?
>>>>>
>>>>> Yeah (cc'd now)
>>>>>
>>>>> But it was just pointed out to me that if/when the majority of inodes
>>>>> at umount time have i_count == 0, we'll never hit the resched in 
>>>>> fsnotify_unmount_inodes() and may still have an issue ...
>>>>
>>>> Yeah, that's a good point. So that loop will need some further tweaking
>>>> (like doing iget-iput dance in need_resched() case like in some other
>>>> places).
>>>
>>> Well, it's already got an iget/iput for anything with i_count > 0.  But
>>> as the comment says (and I think it's right...) doing an iget/iput
>>> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
>>> iput here would actually start evicting inodes in /this/ loop, right?
>>
>> Yes, it would but since this is just before calling evict_inodes(), I have
>> currently hard time remembering why evicting inodes like that would be an
>> issue.
> 
> Probably just weird to effectively evict all inodes prior to evict_inodes() ;)
> 
>>> I think we could (ab)use the lru list to construct a "dispose" list for
>>> fsnotify processing as was done in evict_inodes...
> 
> [narrator: Eric's idea here is dumb and it won't work]
> 
>>> or maybe the two should be merged, and fsnotify watches could be handled
>>> directly in evict_inodes.  But that doesn't feel quite right.
>>
>> Merging the two would be possible (and faster!) as well but I agree it
>> feels a bit dirty :)
> 
> It's starting to look like maybe the only option...
> 
> I'll see if Al is willing to merge this patch as is for the simple "schedule
> the big loops" and see about a 2nd patch on top to do more surgery for this
> case.

Sorry for thinking out loud in public but I'm not too familiar with fsnotify, so
I'm being timid.  However, since fsnotify_sb_delete() and evict_inodes() are working
on orthogonal sets of inodes (fsnotify_sb_delete only cares about nonzero refcount,
and evict_inodes only cares about zero refcount), I think we can just swap the order
of the calls.  The fsnotify call will then have a much smaller list to walk
(any refcounted inodes) as well.

I'll try to give this a test.

diff --git a/fs/super.c b/fs/super.c
index cfadab2cbf35..cd352530eca9 100644
--- a/fs/super.c
+++ b/fs/super.c
@@ -448,10 +448,12 @@ void generic_shutdown_super(struct super_block *sb)
 		sync_filesystem(sb);
 		sb->s_flags &= ~SB_ACTIVE;
 
-		fsnotify_sb_delete(sb);
 		cgroup_writeback_umount();
 
+		/* evict all inodes with zero refcount */
 		evict_inodes(sb);
+		/* only nonzero refcount inodes can have marks */
+		fsnotify_sb_delete(sb);
 
 		if (sb->s_dio_done_wq) {
 			destroy_workqueue(sb->s_dio_done_wq);
Jan Kara Oct. 16, 2019, 3:35 p.m. UTC | #9
On Wed 16-10-19 10:26:16, Eric Sandeen wrote:
> On 10/16/19 9:39 AM, Eric Sandeen wrote:
> > On 10/16/19 8:49 AM, Jan Kara wrote:
> >> On Wed 16-10-19 08:23:51, Eric Sandeen wrote:
> >>> On 10/16/19 4:42 AM, Jan Kara wrote:
> >>>> On Tue 15-10-19 21:36:08, Eric Sandeen wrote:
> >>>>> On 10/15/19 2:37 AM, Jan Kara wrote:
> >>>>>> On Mon 14-10-19 16:30:24, Eric Sandeen wrote:
> >>>>>>> Anything that walks all inodes on sb->s_inodes list without rescheduling
> >>>>>>> risks softlockups.
> >>>>>>>
> >>>>>>> Previous efforts were made in 2 functions, see:
> >>>>>>>
> >>>>>>> c27d82f fs/drop_caches.c: avoid softlockups in drop_pagecache_sb()
> >>>>>>> ac05fbb inode: don't softlockup when evicting inodes
> >>>>>>>
> >>>>>>> but there hasn't been an audit of all walkers, so do that now.  This
> >>>>>>> also consistently moves the cond_resched() calls to the bottom of each
> >>>>>>> loop in cases where it already exists.
> >>>>>>>
> >>>>>>> One loop remains: remove_dquot_ref(), because I'm not quite sure how
> >>>>>>> to deal with that one w/o taking the i_lock.
> >>>>>>>
> >>>>>>> Signed-off-by: Eric Sandeen <sandeen@redhat.com>
> >>>>>>
> >>>>>> Thanks Eric. The patch looks good to me. You can add:
> >>>>>>
> >>>>>> Reviewed-by: Jan Kara <jack@suse.cz>
> >>>>>
> >>>>> thanks
> >>>>>
> >>>>>> BTW, I suppose you need to add Al to pickup the patch?
> >>>>>
> >>>>> Yeah (cc'd now)
> >>>>>
> >>>>> But it was just pointed out to me that if/when the majority of inodes
> >>>>> at umount time have i_count == 0, we'll never hit the resched in 
> >>>>> fsnotify_unmount_inodes() and may still have an issue ...
> >>>>
> >>>> Yeah, that's a good point. So that loop will need some further tweaking
> >>>> (like doing iget-iput dance in need_resched() case like in some other
> >>>> places).
> >>>
> >>> Well, it's already got an iget/iput for anything with i_count > 0.  But
> >>> as the comment says (and I think it's right...) doing an iget/iput
> >>> on i_count == 0 inodes at this point would be without SB_ACTIVE and the final
> >>> iput here would actually start evicting inodes in /this/ loop, right?
> >>
> >> Yes, it would but since this is just before calling evict_inodes(), I have
> >> currently hard time remembering why evicting inodes like that would be an
> >> issue.
> > 
> > Probably just weird to effectively evict all inodes prior to evict_inodes() ;)
> > 
> >>> I think we could (ab)use the lru list to construct a "dispose" list for
> >>> fsnotify processing as was done in evict_inodes...
> > 
> > [narrator: Eric's idea here is dumb and it won't work]
> > 
> >>> or maybe the two should be merged, and fsnotify watches could be handled
> >>> directly in evict_inodes.  But that doesn't feel quite right.
> >>
> >> Merging the two would be possible (and faster!) as well but I agree it
> >> feels a bit dirty :)
> > 
> > It's starting to look like maybe the only option...
> > 
> > I'll see if Al is willing to merge this patch as is for the simple "schedule
> > the big loops" and see about a 2nd patch on top to do more surgery for this
> > case.
> 
> Sorry for thinking out loud in public but I'm not too familiar with fsnotify, so
> I'm being timid.  However, since fsnotify_sb_delete() and evict_inodes() are working
> on orthogonal sets of inodes (fsnotify_sb_delete only cares about nonzero refcount,
> and evict_inodes only cares about zero refcount), I think we can just swap the order
> of the calls.  The fsnotify call will then have a much smaller list to walk
> (any refcounted inodes) as well.
> 
> I'll try to give this a test.

Yes, this should make the softlockup impossible to trigger in practice. So
agreed.

								Honza

> 
> diff --git a/fs/super.c b/fs/super.c
> index cfadab2cbf35..cd352530eca9 100644
> --- a/fs/super.c
> +++ b/fs/super.c
> @@ -448,10 +448,12 @@ void generic_shutdown_super(struct super_block *sb)
>  		sync_filesystem(sb);
>  		sb->s_flags &= ~SB_ACTIVE;
>  
> -		fsnotify_sb_delete(sb);
>  		cgroup_writeback_umount();
>  
> +		/* evict all inodes with zero refcount */
>  		evict_inodes(sb);
> +		/* only nonzero refcount inodes can have marks */
> +		fsnotify_sb_delete(sb);
>  
>  		if (sb->s_dio_done_wq) {
>  			destroy_workqueue(sb->s_dio_done_wq);
> 
>
diff mbox series

Patch

diff --git a/fs/drop_caches.c b/fs/drop_caches.c
index d31b6c72b476..dc1a1d5d825b 100644
--- a/fs/drop_caches.c
+++ b/fs/drop_caches.c
@@ -35,11 +35,11 @@  static void drop_pagecache_sb(struct super_block *sb, void *unused)
  		spin_unlock(&inode->i_lock);
  		spin_unlock(&sb->s_inode_list_lock);
  
-		cond_resched();
  		invalidate_mapping_pages(inode->i_mapping, 0, -1);
  		iput(toput_inode);
  		toput_inode = inode;
  
+		cond_resched();
  		spin_lock(&sb->s_inode_list_lock);
  	}
  	spin_unlock(&sb->s_inode_list_lock);
diff --git a/fs/inode.c b/fs/inode.c
index fef457a42882..b0c789bb3dba 100644
--- a/fs/inode.c
+++ b/fs/inode.c
@@ -676,6 +676,7 @@  int invalidate_inodes(struct super_block *sb, bool kill_dirty)
  	struct inode *inode, *next;
  	LIST_HEAD(dispose);
  
+again:
  	spin_lock(&sb->s_inode_list_lock);
  	list_for_each_entry_safe(inode, next, &sb->s_inodes, i_sb_list) {
  		spin_lock(&inode->i_lock);
@@ -698,6 +699,13 @@  int invalidate_inodes(struct super_block *sb, bool kill_dirty)
  		inode_lru_list_del(inode);
  		spin_unlock(&inode->i_lock);
  		list_add(&inode->i_lru, &dispose);
+
+		if (need_resched()) {
+			spin_unlock(&sb->s_inode_list_lock);
+			cond_resched();
+			dispose_list(&dispose);
+			goto again;
+		}
  	}
  	spin_unlock(&sb->s_inode_list_lock);
  
diff --git a/fs/notify/fsnotify.c b/fs/notify/fsnotify.c
index 2ecef6155fc0..ac9eb273e28c 100644
--- a/fs/notify/fsnotify.c
+++ b/fs/notify/fsnotify.c
@@ -77,6 +77,7 @@  static void fsnotify_unmount_inodes(struct super_block *sb)
  
  		iput_inode = inode;
  
+		cond_resched();
  		spin_lock(&sb->s_inode_list_lock);
  	}
  	spin_unlock(&sb->s_inode_list_lock);
diff --git a/fs/quota/dquot.c b/fs/quota/dquot.c
index 6e826b454082..4a085b3c7cac 100644
--- a/fs/quota/dquot.c
+++ b/fs/quota/dquot.c
@@ -985,6 +985,7 @@  static int add_dquot_ref(struct super_block *sb, int type)
  		 * later.
  		 */
  		old_inode = inode;
+		cond_resched();
  		spin_lock(&sb->s_inode_list_lock);
  	}
  	spin_unlock(&sb->s_inode_list_lock);