[02/19] fs: don't take the i_lock in inode_inc_iversion

Message ID	20171213142017.23653-3-jlayton@kernel.org (mailing list archive)
State	New, archived
Headers	show Return-Path: <linux-fsdevel-owner@kernel.org> DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1004B218B4 From: Jeff Layton <jlayton@kernel.org> To: linux-fsdevel@vger.kernel.org Cc: linux-kernel@vger.kernel.org, hch@lst.de, neilb@suse.de, bfields@fieldses.org, amir73il@gmail.com, jack@suse.de, viro@zeniv.linux.org.uk Subject: [PATCH 02/19] fs: don't take the i_lock in inode_inc_iversion Date: Wed, 13 Dec 2017 09:20:00 -0500 Message-Id: <20171213142017.23653-3-jlayton@kernel.org> In-Reply-To: <20171213142017.23653-1-jlayton@kernel.org> References: <20171213142017.23653-1-jlayton@kernel.org> Sender: linux-fsdevel-owner@vger.kernel.org Precedence: bulk

Message ID

20171213142017.23653-3-jlayton@kernel.org (mailing list archive)

State

New, archived

Headers

DMARC-Filter: OpenDMARC Filter v1.3.2 mail.kernel.org 1004B218B4
From: Jeff Layton <jlayton@kernel.org>
To: linux-fsdevel@vger.kernel.org
Cc: linux-kernel@vger.kernel.org, hch@lst.de, neilb@suse.de,
	bfields@fieldses.org, amir73il@gmail.com, jack@suse.de,
	viro@zeniv.linux.org.uk
Subject: [PATCH 02/19] fs: don't take the i_lock in inode_inc_iversion
Date: Wed, 13 Dec 2017 09:20:00 -0500
Message-Id: <20171213142017.23653-3-jlayton@kernel.org>
In-Reply-To: <20171213142017.23653-1-jlayton@kernel.org>
References: <20171213142017.23653-1-jlayton@kernel.org>
Sender: linux-fsdevel-owner@vger.kernel.org
Precedence: bulk

Commit Message

Jeff Layton Dec. 13, 2017, 2:20 p.m. UTC

From: Jeff Layton <jlayton@redhat.com>

The rationale for taking the i_lock when incrementing this value is
lost in antiquity. The readers of the field don't take it (at least
not universally), so my assumption is that it was only done here to
serialize incrementors.

If that is indeed the case, then we can drop the i_lock from this
codepath and treat it as a atomic64_t for the purposes of
incrementing it. This allows us to use inode_inc_iversion without
any danger of lock inversion.

Note that the read side is not fetched atomically with this change.
The assumption here is that that is not a critical issue since the
i_version is not fully synchronized with anything else anyway.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 6 +++---
 1 file changed, 3 insertions(+), 3 deletions(-)

Comments

Jeff Layton Dec. 13, 2017, 9:52 p.m. UTC | #1

On Wed, 2017-12-13 at 09:20 -0500, Jeff Layton wrote:
> From: Jeff Layton <jlayton@redhat.com>
> 
> The rationale for taking the i_lock when incrementing this value is
> lost in antiquity. The readers of the field don't take it (at least
> not universally), so my assumption is that it was only done here to
> serialize incrementors.
> 
> If that is indeed the case, then we can drop the i_lock from this
> codepath and treat it as a atomic64_t for the purposes of
> incrementing it. This allows us to use inode_inc_iversion without
> any danger of lock inversion.
> 
> Note that the read side is not fetched atomically with this change.
> The assumption here is that that is not a critical issue since the
> i_version is not fully synchronized with anything else anyway.
> 
> Signed-off-by: Jeff Layton <jlayton@redhat.com>
> ---
>  include/linux/fs.h | 6 +++---
>  1 file changed, 3 insertions(+), 3 deletions(-)
> 
> diff --git a/include/linux/fs.h b/include/linux/fs.h
> index 5001e77342fd..c234fac4bb77 100644
> --- a/include/linux/fs.h
> +++ b/include/linux/fs.h
> @@ -2136,9 +2136,9 @@ inode_set_iversion_queried(struct inode *inode, const u64 new)
>  static inline bool
>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>  {
> -	spin_lock(&inode->i_lock);
> -	inode->i_version++;
> -	spin_unlock(&inode->i_lock);
> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
> +
> +	atomic64_inc(ivp);
>  	return true;
>  }
>  

FWIW, I'm not sure this patch is strictly necessary as an interim step.

Adding the i_lock into the all of the places where we currently just do
inode->i_version++ without properly auditing all of them gave me pause
though.

In any case, the last patch in the series cleans this nastiness up.

NeilBrown Dec. 13, 2017, 10:07 p.m. UTC | #2

On Wed, Dec 13 2017, Jeff Layton wrote:

> On Wed, 2017-12-13 at 09:20 -0500, Jeff Layton wrote:
>> From: Jeff Layton <jlayton@redhat.com>
>> 
>> The rationale for taking the i_lock when incrementing this value is
>> lost in antiquity. The readers of the field don't take it (at least
>> not universally), so my assumption is that it was only done here to
>> serialize incrementors.
>> 
>> If that is indeed the case, then we can drop the i_lock from this
>> codepath and treat it as a atomic64_t for the purposes of
>> incrementing it. This allows us to use inode_inc_iversion without
>> any danger of lock inversion.
>> 
>> Note that the read side is not fetched atomically with this change.
>> The assumption here is that that is not a critical issue since the
>> i_version is not fully synchronized with anything else anyway.
>> 
>> Signed-off-by: Jeff Layton <jlayton@redhat.com>
>> ---
>>  include/linux/fs.h | 6 +++---
>>  1 file changed, 3 insertions(+), 3 deletions(-)
>> 
>> diff --git a/include/linux/fs.h b/include/linux/fs.h
>> index 5001e77342fd..c234fac4bb77 100644
>> --- a/include/linux/fs.h
>> +++ b/include/linux/fs.h
>> @@ -2136,9 +2136,9 @@ inode_set_iversion_queried(struct inode *inode, const u64 new)
>>  static inline bool
>>  inode_maybe_inc_iversion(struct inode *inode, bool force)
>>  {
>> -	spin_lock(&inode->i_lock);
>> -	inode->i_version++;
>> -	spin_unlock(&inode->i_lock);
>> +	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
>> +
>> +	atomic64_inc(ivp);
>>  	return true;
>>  }
>>  
>
> FWIW, I'm not sure this patch is strictly necessary as an interim step.
>
> Adding the i_lock into the all of the places where we currently just do
> inode->i_version++ without properly auditing all of them gave me pause
> though.
>
> In any case, the last patch in the series cleans this nastiness up.

Yes, I thought "nastiness" too, and was happy to see it cleaned up.

I would have guessed that the purpose of the spinlock was to avoid the
risk for torn-reads/writes on 32bit platforms that cannot access a 64bit
value atomically.  In either case, using atomic64_t is the right thing
to do.

Thanks,
NeilBrown

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 5001e77342fd..c234fac4bb77 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2136,9 +2136,9 @@  inode_set_iversion_queried(struct inode *inode, const u64 new)
 static inline bool
 inode_maybe_inc_iversion(struct inode *inode, bool force)
 {
-	spin_lock(&inode->i_lock);
-	inode->i_version++;
-	spin_unlock(&inode->i_lock);
+	atomic64_t *ivp = (atomic64_t *)&inode->i_version;
+
+	atomic64_inc(ivp);
 	return true;
 }

[02/19] fs: don't take the i_lock in inode_inc_iversion

Commit Message

Comments

Patch