[v3,19/19] fs: handle inode->i_version more efficiently

On Wed, 2017-12-20 at 17:41 +0100, Jan Kara wrote:
> On Wed 20-12-17 09:03:06, Jeff Layton wrote:
> > On Tue, 2017-12-19 at 09:07 +1100, Dave Chinner wrote:
> > > On Mon, Dec 18, 2017 at 02:35:20PM -0500, Jeff Layton wrote:
> > > > [PATCH] SQUASH: add memory barriers around i_version accesses
> > > 
> > > Why explicit memory barriers rather than annotating the operations
> > > with the required semantics and getting the barriers the arch
> > > requires automatically?  I suspect this should be using
> > > atomic_read_acquire() and atomic_cmpxchg_release(), because AFAICT
> > > the atomic_cmpxchg needs to have release semantics to match the
> > > acquire semantics needed for the load of the current value.
> > > 
> > > From include/linux/atomics.h:
> > > 
> > >  * For compound atomics performing both a load and a store, ACQUIRE
> > >  * semantics apply only to the load and RELEASE semantics only to the
> > >  * store portion of the operation. Note that a failed cmpxchg_acquire
> > >  * does -not- imply any memory ordering constraints.
> > > 
> > > Memory barriers hurt my brain. :/
> > > 
> > > At minimum, shouldn't the atomic op specific barriers be used rather
> > > than full memory barriers? i.e:
> > > 
> > 
> > They hurt my brain too. Yes, definitely atomic-specific barriers should
> > be used here instead, since this is an atomic64_t now.
> > 
> > After going over the docs again...my understanding has always been that
> > you primarily need memory barriers to order accesses to different areas
> > of memory.
> 
> That is correct.
> 
> > As Jan and I were discussing in the other thread, i_version is not
> > synchronized with anything else. In this code, we're only dealing with a
> > single 64-bit word. I don't think there are any races there wrt the API
> > itself.
> 
> There are not but it is like saying that lock implementation is correct
> because the lock state does not get corrupted ;). Who cares about protected
> data...
> 
> > The "legacy" inode_inc_iversion() however _does_ have implied memory
> > barriers from the i_lock. There could be some subtle de-facto ordering
> > there, so I think we probably do want some barriers in here if only to
> > preserve that. It's not likely to cost much, and may save us tracking
> > down some fiddly bugs.
> > 
> > What about this patch? Note that I've only added barriers to
> > inode_maybe_inc_iversion. I don't see that we need it for the other
> > functions, but please do tell me if I'm wrong there:
> > 
> > --------------------8<---------------------------
> > 
> > [PATCH] SQUASH: add memory barriers around i_version accesses
> > 
> > Signed-off-by: Jeff Layton <jlayton@redhat.com>
> > ---
> >  include/linux/iversion.h | 53 +++++++++++++++++++++++++++++-------------------
> >  1 file changed, 32 insertions(+), 21 deletions(-)
> > 
> > diff --git a/include/linux/iversion.h b/include/linux/iversion.h
> > index a9fbf99709df..02187a3bec3b 100644
> > --- a/include/linux/iversion.h
> > +++ b/include/linux/iversion.h
> > @@ -89,6 +89,23 @@ inode_set_iversion_raw(struct inode *inode, const u64 val)
> >  	atomic64_set(&inode->i_version, val);
> >  }
> >  
> > +/**
> > + * inode_peek_iversion_raw - grab a "raw" iversion value
> > + * @inode: inode from which i_version should be read
> > + *
> > + * Grab a "raw" inode->i_version value and return it. The i_version is not
> > + * flagged or converted in any way. This is mostly used to access a self-managed
> > + * i_version.
> > + *
> > + * With those filesystems, we want to treat the i_version as an entirely
> > + * opaque value.
> > + */
> > +static inline u64
> > +inode_peek_iversion_raw(const struct inode *inode)
> > +{
> > +	return atomic64_read(&inode->i_version);
> > +}
> > +
> >  /**
> >   * inode_set_iversion - set i_version to a particular value
> >   * @inode: inode to set
> > @@ -152,7 +169,16 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> >  {
> >  	u64 cur, old, new;
> >  
> > -	cur = (u64)atomic64_read(&inode->i_version);
> > +	/*
> > +	 * The i_version field is not strictly ordered with any other inode
> > +	 * information, but the legacy inode_inc_iversion code used a spinlock
> > +	 * to serialize increments.
> > +	 *
> > +	 * This code adds full memory barriers to ensure that any de-facto
> > +	 * ordering with other info is preserved.
> > +	 */
> > +	smp_mb__before_atomic();
> 
> This should be just smp_mb(). __before_atomic() pairs with atomic
> operations like atomic_inc(). atomic_read() is completely unordered
> operation (happens to be plain memory read on x86) and so __before_atomic()
> is not enough.
> 
> > +	cur = inode_peek_iversion_raw(inode);
> >  	for (;;) {
> >  		/* If flag is clear then we needn't do anything */
> >  		if (!force && !(cur & I_VERSION_QUERIED))
> > @@ -162,8 +188,10 @@ inode_maybe_inc_iversion(struct inode *inode, bool force)
> >  		new = (cur & ~I_VERSION_QUERIED) + I_VERSION_INCREMENT;
> >  
> >  		old = atomic64_cmpxchg(&inode->i_version, cur, new);
> > -		if (likely(old == cur))
> > +		if (likely(old == cur)) {
> > +			smp_mb__after_atomic();
> 
> I don't think you need this. Cmpxchg is guaranteed to be full memory
> barrier - from Documentation/atomic_t.txt:
>   - RMW operations that have a return value are fully ordered;
> 
> >  			break;
> > +		}
> >  		cur = old;
> >  	}
> >  	return true;
> 
> ...
> 
> > @@ -248,7 +259,7 @@ inode_query_iversion(struct inode *inode)
> >  {
> >  	u64 cur, old, new;
> >  
> > -	cur = atomic64_read(&inode->i_version);
> > +	cur = inode_peek_iversion_raw(inode);
> >  	for (;;) {
> >  		/* If flag is already set, then no need to swap */
> >  		if (cur & I_VERSION_QUERIED)
> 
> And here I'd expect smp_mb() after inode_peek_iversion_raw() (actually be
> needed only if you are not going to do cmpxchg as that implies barrier as
> well). "Safe" use of i_version would be:
> 
> Update:
> 
> modify inode
> inode_maybe_inc_iversion(inode)
> 
> Read:
> 
> my_version = inode_query_iversion(inode)
> get inode data
> 
> And you need to make sure 'get inode data' does not get speculatively
> evaluated before you actually sample i_version so that you are guaranteed
> that if data changes, you will observe larger i_version in the future.
> 
> Also please add a comment smp_mb() in inode_maybe_inc_iversion() like:
> 
> /* This barrier pairs with the barrier in inode_query_iversion() */
> 
> and a similar comment to inode_query_iversion(). Because memory barriers
> make sense only in pairs (see SMP BARRIER PAIRING in
> Documentation/memory-barriers.txt).
> 

Got it, I think. How about this (sorry for the unrelated deltas here):

[PATCH] SQUASH: add memory barriers around i_version accesses

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/iversion.h | 60 +++++++++++++++++++++++++++++++-----------------
 1 file changed, 39 insertions(+), 21 deletions(-)

[v3,19/19] fs: handle inode->i_version more efficiently

Commit Message

Comments

Patch