diff mbox

[02/19] fs: don't take the i_lock in inode_inc_iversion

Message ID 20171213142017.23653-3-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Dec. 13, 2017, 2:20 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jeff Layton Dec. 13, 2017, 9:52 p.m. UTC | #1
On Wed, 2017-12-13 at 09:20 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> The rationale for taking the i_lock when incrementing this value is
> lost in antiquity. The readers of the field don't take it (at least
> not universally), so my assumption is that it was only done here to
> serialize incrementors.
> 
> If that is indeed the case, then we can drop the i_lock from this
> codepath and treat it as a atomic64_t for the purposes of
> incrementing it. This allows us to use inode_inc_iversion without
> any danger of lock inversion.
> 
> Note that the read side is not fetched atomically with this change.
> The assumption here is that that is not a critical issue since the
> i_version is not fully synchronized with anything else anyway.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  include/linux/fs.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5001e77342fd..c234fac4bb77 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2136,9 +2136,9 @@ inode_set_iversion_queried(struct inode *inode, const u64 new)
>  static inline bool
>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>  {
> -	spin_lock(&inode->i_lock);
> -	inode->i_version++;
> -	spin_unlock(&inode->i_lock);
> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> +
> +	atomic64_inc(ivp);
>  	return true;
>  }
>  

FWIW, I'm not sure this patch is strictly necessary as an interim step.

Adding the i_lock into the all of the places where we currently just do
inode->i_version++ without properly auditing all of them gave me pause
though.

In any case, the last patch in the series cleans this nastiness up.
NeilBrown Dec. 13, 2017, 10:07 p.m. UTC | #2
On Wed, Dec 13 2017, Jeff Layton wrote:

> On Wed, 2017-12-13 at 09:20 -0500, Jeff Layton wrote:
>> From: Jeff Layton <jlayton@redhat.com>
>> 
>> The rationale for taking the i_lock when incrementing this value is
>> lost in antiquity. The readers of the field don't take it (at least
>> not universally), so my assumption is that it was only done here to
>> serialize incrementors.
>> 
>> If that is indeed the case, then we can drop the i_lock from this
>> codepath and treat it as a atomic64_t for the purposes of
>> incrementing it. This allows us to use inode_inc_iversion without
>> any danger of lock inversion.
>> 
>> Note that the read side is not fetched atomically with this change.
>> The assumption here is that that is not a critical issue since the
>> i_version is not fully synchronized with anything else anyway.
>> 
>> Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> ---
>>  include/linux/fs.h | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 5001e77342fd..c234fac4bb77 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2136,9 +2136,9 @@ inode_set_iversion_queried(struct inode *inode, const u64 new)
>>  static inline bool
>>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>>  {
>> -	spin_lock(&inode->i_lock);
>> -	inode->i_version++;
>> -	spin_unlock(&inode->i_lock);
>> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
>> +
>> +	atomic64_inc(ivp);
>>  	return true;
>>  }
>>  
>
> FWIW, I'm not sure this patch is strictly necessary as an interim step.
>
> Adding the i_lock into the all of the places where we currently just do
> inode->i_version++ without properly auditing all of them gave me pause
> though.
>
> In any case, the last patch in the series cleans this nastiness up.

Yes, I thought "nastiness" too, and was happy to see it cleaned up.

I would have guessed that the purpose of the spinlock was to avoid the
risk for torn-reads/writes on 32bit platforms that cannot access a 64bit
value atomically.  In either case, using atomic64_t is the right thing
to do.

Thanks,
NeilBrown
diff mbox

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5001e77342fd..c234fac4bb77 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2136,9 +2136,9 @@  inode_set_iversion_queried(struct inode *inode, const u64 new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-	spin_lock(&inode->i_lock);
-	inode->i_version++;
-	spin_unlock(&inode->i_lock);
+	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
+
+	atomic64_inc(ivp);
 	return true;
 }