diff mbox series

[RFC,v2,2/5] libfs: Check dentry before locking in simple_offset_empty()

Message ID 20241126155444.2556-3-cel@kernel.org (mailing list archive)
State New
Headers show
Series Improve simple directory offset wrap behavior | expand

Commit Message

Chuck Lever Nov. 26, 2024, 3:54 p.m. UTC
From: Chuck Lever <chuck.lever@oracle.com>

Defensive change: Don't try to lock a dentry unless it is positive.
Trying to lock a negative entry will generate a refcount underflow.

The underflow has been seen only while testing.

Fixes: ecba88a3b32d ("libfs: Add simple_offset_empty()")
Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
---
 fs/libfs.c | 9 +++++----
 1 file changed, 5 insertions(+), 4 deletions(-)

Comments

yangerkun Nov. 27, 2024, 3:09 a.m. UTC | #1
Thank you very much for your efforts on this issue!

在 2024/11/26 23:54, cel@kernel.org 写道:
> From: Chuck Lever <chuck.lever@oracle.com>
> 
> Defensive change: Don't try to lock a dentry unless it is positive.
> Trying to lock a negative entry will generate a refcount underflow.

Which member trigger this underflow?

> 
> The underflow has been seen only while testing.
> 
> Fixes: ecba88a3b32d ("libfs: Add simple_offset_empty()")
> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> ---
>   fs/libfs.c | 9 +++++----
>   1 file changed, 5 insertions(+), 4 deletions(-)
> 
> diff --git a/fs/libfs.c b/fs/libfs.c
> index bf67954b525b..c88ed15437c7 100644
> --- a/fs/libfs.c
> +++ b/fs/libfs.c
> @@ -347,13 +347,14 @@ int simple_offset_empty(struct dentry *dentry)
>   	index = DIR_OFFSET_MIN;
>   	octx = inode->i_op->get_offset_ctx(inode);
>   	mt_for_each(&octx->mt, child, index, LONG_MAX) {
> -		spin_lock(&child->d_lock);
>   		if (simple_positive(child)) {
> +			spin_lock(&child->d_lock);
> +			if (simple_positive(child))
> +				ret = 0;
>   			spin_unlock(&child->d_lock);
> -			ret = 0;
> -			break;
> +			if (!ret)
> +				break;
>   		}
> -		spin_unlock(&child->d_lock);
>   	}

Calltrace arrived here means this is a active dir(a dentry with positive 
inode), and nowdays only .rmdir / .rename2 for shmem can reach this 
point. Lock for this dir inode has already been held, maybe this can 
protect child been negative or active? So d_lock here is no need?

>   
>   	return ret;
Chuck Lever Nov. 27, 2024, 2:36 p.m. UTC | #2
On Wed, Nov 27, 2024 at 11:09:11AM +0800, yangerkun wrote:
> Thank you very much for your efforts on this issue!
> 
> 在 2024/11/26 23:54, cel@kernel.org 写道:
> > From: Chuck Lever <chuck.lever@oracle.com>
> > 
> > Defensive change: Don't try to lock a dentry unless it is positive.
> > Trying to lock a negative entry will generate a refcount underflow.
> 
> Which member trigger this underflow?

dput() encounters a dentry refcount underflow because a negative
dentry's refcount is already zero.

But perhaps it would be more accurate to say this patch attempts to
avoid triggering the DEBUG_LOCKS_WARN_ON in hlock_class.


> > The underflow has been seen only while testing.
> > 
> > Fixes: ecba88a3b32d ("libfs: Add simple_offset_empty()")
> > Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
> > ---
> >   fs/libfs.c | 9 +++++----
> >   1 file changed, 5 insertions(+), 4 deletions(-)
> > 
> > diff --git a/fs/libfs.c b/fs/libfs.c
> > index bf67954b525b..c88ed15437c7 100644
> > --- a/fs/libfs.c
> > +++ b/fs/libfs.c
> > @@ -347,13 +347,14 @@ int simple_offset_empty(struct dentry *dentry)
> >   	index = DIR_OFFSET_MIN;
> >   	octx = inode->i_op->get_offset_ctx(inode);
> >   	mt_for_each(&octx->mt, child, index, LONG_MAX) {
> > -		spin_lock(&child->d_lock);
> >   		if (simple_positive(child)) {
> > +			spin_lock(&child->d_lock);
> > +			if (simple_positive(child))
> > +				ret = 0;
> >   			spin_unlock(&child->d_lock);
> > -			ret = 0;
> > -			break;
> > +			if (!ret)
> > +				break;
> >   		}
> > -		spin_unlock(&child->d_lock);
> >   	}
> 
> Calltrace arrived here means this is a active dir(a dentry with positive
> inode), and nowdays only .rmdir / .rename2 for shmem can reach this point.
> Lock for this dir inode has already been held, maybe this can protect child
> been negative or active? So d_lock here is no need?

My assumption was that child->d_lock was necessary for an
authoritative determination of whether "child" is positive or
negative. If holding that lock isn't necessary, then I agree,
there's no need to take child->d_lock here at all... There's
clearly nothing else to protect in this code path.


> >   	return ret;
yangerkun Nov. 29, 2024, 8:17 a.m. UTC | #3
在 2024/11/27 22:36, Chuck Lever 写道:
> On Wed, Nov 27, 2024 at 11:09:11AM +0800, yangerkun wrote:
>> Thank you very much for your efforts on this issue!
>>
>> 在 2024/11/26 23:54, cel@kernel.org 写道:
>>> From: Chuck Lever <chuck.lever@oracle.com>
>>>
>>> Defensive change: Don't try to lock a dentry unless it is positive.
>>> Trying to lock a negative entry will generate a refcount underflow.
>>
>> Which member trigger this underflow?
> 
> dput() encounters a dentry refcount underflow because a negative
> dentry's refcount is already zero.

OK, so you mean dentry->d_lockref here.

But I cannot see why we can trigger this, if it is, it's actually a bug...

Can you give a more detail why can trigger this?

> 
> But perhaps it would be more accurate to say this patch attempts to
> avoid triggering the DEBUG_LOCKS_WARN_ON in hlock_class.

DEBUG_LOCKS_WARN_ON in hlock_class means this class_idx not in use. I 
prefer this cannot happen. Am I think wrong..

> 
> 
>>> The underflow has been seen only while testing.
>>>
>>> Fixes: ecba88a3b32d ("libfs: Add simple_offset_empty()")
>>> Signed-off-by: Chuck Lever <chuck.lever@oracle.com>
>>> ---
>>>    fs/libfs.c | 9 +++++----
>>>    1 file changed, 5 insertions(+), 4 deletions(-)
>>>
>>> diff --git a/fs/libfs.c b/fs/libfs.c
>>> index bf67954b525b..c88ed15437c7 100644
>>> --- a/fs/libfs.c
>>> +++ b/fs/libfs.c
>>> @@ -347,13 +347,14 @@ int simple_offset_empty(struct dentry *dentry)
>>>    	index = DIR_OFFSET_MIN;
>>>    	octx = inode->i_op->get_offset_ctx(inode);
>>>    	mt_for_each(&octx->mt, child, index, LONG_MAX) {
>>> -		spin_lock(&child->d_lock);
>>>    		if (simple_positive(child)) {
>>> +			spin_lock(&child->d_lock);
>>> +			if (simple_positive(child))
>>> +				ret = 0;
>>>    			spin_unlock(&child->d_lock);
>>> -			ret = 0;
>>> -			break;
>>> +			if (!ret)
>>> +				break;
>>>    		}
>>> -		spin_unlock(&child->d_lock);
>>>    	}
>>
>> Calltrace arrived here means this is a active dir(a dentry with positive
>> inode), and nowdays only .rmdir / .rename2 for shmem can reach this point.
>> Lock for this dir inode has already been held, maybe this can protect child
>> been negative or active? So d_lock here is no need?
> 
> My assumption was that child->d_lock was necessary for an
> authoritative determination of whether "child" is positive or
> negative. If holding that lock isn't necessary, then I agree,
> there's no need to take child->d_lock here at all... There's
> clearly nothing else to protect in this code path.
> 
> 
>>>    	return ret;
>
Chuck Lever Nov. 30, 2024, 5:03 p.m. UTC | #4
On Fri, Nov 29, 2024 at 04:17:56PM +0800, yangerkun wrote:
> 
> 
> 在 2024/11/27 22:36, Chuck Lever 写道:
> > On Wed, Nov 27, 2024 at 11:09:11AM +0800, yangerkun wrote:
> > > Thank you very much for your efforts on this issue!
> > > 
> > > 在 2024/11/26 23:54, cel@kernel.org 写道:
> > > > From: Chuck Lever <chuck.lever@oracle.com>
> > > > 
> > > > Defensive change: Don't try to lock a dentry unless it is positive.
> > > > Trying to lock a negative entry will generate a refcount underflow.
> > > 
> > > Which member trigger this underflow?
> > 
> > dput() encounters a dentry refcount underflow because a negative
> > dentry's refcount is already zero.
> 
> OK, so you mean dentry->d_lockref here.
> 
> But I cannot see why we can trigger this, if it is, it's actually a bug...
> 
> Can you give a more detail why can trigger this?

I think it is possible for the mtree lookup in simple_offset_empty()
to return a pointer to a negative dentry, perhaps due to a race.
There is nothing explicit that prevents a dentry from going negative
while it is still stored in the mtree, although that condition would
be quite brief. So, best to harden the "is dentry positive" check
there so that it acts like other dentry locking code in fs/libfs.c.

Perhaps labeling this patch as a "clean up" would be more clear. I
will remove the "Fixes:" tag -- it does not fix a bug in the current
code.
diff mbox series

Patch

diff --git a/fs/libfs.c b/fs/libfs.c
index bf67954b525b..c88ed15437c7 100644
--- a/fs/libfs.c
+++ b/fs/libfs.c
@@ -347,13 +347,14 @@  int simple_offset_empty(struct dentry *dentry)
 	index = DIR_OFFSET_MIN;
 	octx = inode->i_op->get_offset_ctx(inode);
 	mt_for_each(&octx->mt, child, index, LONG_MAX) {
-		spin_lock(&child->d_lock);
 		if (simple_positive(child)) {
+			spin_lock(&child->d_lock);
+			if (simple_positive(child))
+				ret = 0;
 			spin_unlock(&child->d_lock);
-			ret = 0;
-			break;
+			if (!ret)
+				break;
 		}
-		spin_unlock(&child->d_lock);
 	}
 
 	return ret;