diff mbox

[01/19] fs: new API for handling inode->i_version

Message ID 20171213142017.23653-2-jlayton@kernel.org (mailing list archive)
State New, archived
Headers show

Commit Message

Jeff Layton Dec. 13, 2017, 2:19 p.m. UTC
From: Jeff Layton <jlayton@redhat.com>

Add a documentation blob that explains what the i_version field is, how
it is expected to work, and how it is currently implemented by various
filesystems.

We already have inode_inc_iversion. Add several other functions for
manipulating and accessing the i_version counter. For now, the
implementation is trivial and basically works the way that all of the
open-coded i_version accesses work today.

Future patches will convert existing users of i_version to use the new
API, and then convert the backend implementation to do things more
efficiently.

Signed-off-by: Jeff Layton <jlayton@redhat.com>
---
 include/linux/fs.h | 200 ++++++++++++++++++++++++++++++++++++++++++++++++++---
 1 file changed, 192 insertions(+), 8 deletions(-)

Comments

NeilBrown Dec. 13, 2017, 10:04 p.m. UTC | #1
On Wed, Dec 13 2017, Jeff Layton wrote:

> +/*
> + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> + * appear different to observers if there was a change to the inode's data or
> + * metadata since it was last queried.
> + *
> + * It should be considered an opaque value by observers. If it remains the same
> + * since it was last checked, then nothing has changed in the inode. If it's
> + * different then something has changed. Observers cannot infer anything about
> + * the nature or magnitude of the changes from the value, only that the inode
> + * has changed in some fashion.

I agree that it "should be" considered opaque, but I have a suspicion
that NFSv4 doesn't consider it opaque.
There is something about write delegations and the server performing a
GETATTR callback to the delegated client so that it can answer GETATTR
from other clients without recalling the delegation.

Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
the text:

   o  The client will create a value greater than c that will be used
      for communicating that modified data is held at the client.  Let
      this value be represented by d.

"c" here is a 'change' attribute.

Then:

   While the change attribute is opaque to the client in the sense that
   it has no idea what units of time, if any, the server is counting
   change with, it is not opaque in that the client has to treat it as
   an unsigned integer, and the server has to be able to see the results
   of the client's changes to that integer.  Therefore, the server MUST
   encode the change attribute in network order when sending it to the
   client.  The client MUST decode it from network order to its native
   order when receiving it, and the client MUST encode it in network
   order when sending it to the server.  For this reason, change is
   defined as an unsigned integer rather than an opaque array of bytes.

This all suggests that nfsd needs to be certain that "incrementing" the
change id will produce a new changeid, which has not been used before,
and also suggests that nfsd needs to be able to control the changeid
stored after writes that result from a delegation being returned.

I'd just like to say that this is one of the most annoying dumb features
of NFSv4, because it is trivial to fix and I suggested a fix before
NFSv4.0 was finalized.  Grumble.

Otherwise the patch set looks good.  I haven't gone over the code
closely, the but approach is spot-on.

NeilBrown
Jeff Layton Dec. 14, 2017, 12:27 a.m. UTC | #2
On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
> On Wed, Dec 13 2017, Jeff Layton wrote:
> 
> > +/*
> > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > + * appear different to observers if there was a change to the inode's data or
> > + * metadata since it was last queried.
> > + *
> > + * It should be considered an opaque value by observers. If it remains the same
> > + * since it was last checked, then nothing has changed in the inode. If it's
> > + * different then something has changed. Observers cannot infer anything about
> > + * the nature or magnitude of the changes from the value, only that the inode
> > + * has changed in some fashion.
> 
> I agree that it "should be" considered opaque, but I have a suspicion
> that NFSv4 doesn't consider it opaque.
> There is something about write delegations and the server performing a
> GETATTR callback to the delegated client so that it can answer GETATTR
> from other clients without recalling the delegation.
> 
> Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
> the text:
> 
>    o  The client will create a value greater than c that will be used
>       for communicating that modified data is held at the client.  Let
>       this value be represented by d.
> 
> "c" here is a 'change' attribute.
> 
> Then:
> 
>    While the change attribute is opaque to the client in the sense that
>    it has no idea what units of time, if any, the server is counting
>    change with, it is not opaque in that the client has to treat it as
>    an unsigned integer, and the server has to be able to see the results
>    of the client's changes to that integer.  Therefore, the server MUST
>    encode the change attribute in network order when sending it to the
>    client.  The client MUST decode it from network order to its native
>    order when receiving it, and the client MUST encode it in network
>    order when sending it to the server.  For this reason, change is
>    defined as an unsigned integer rather than an opaque array of bytes.
> 
> This all suggests that nfsd needs to be certain that "incrementing" the
> change id will produce a new changeid, which has not been used before,
> and also suggests that nfsd needs to be able to control the changeid
> stored after writes that result from a delegation being returned.
> 
> I'd just like to say that this is one of the most annoying dumb features
> of NFSv4, because it is trivial to fix and I suggested a fix before
> NFSv4.0 was finalized.  Grumble.
> 
> Otherwise the patch set looks good.  I haven't gone over the code
> closely, the but approach is spot-on.

I don't think we have to do that. There are really only two states with
a client holding a write delegation, as far as the server is concerned.
Either:

a) the client has done no writes to the file, in which case it'll return
the same i_version that the server has when issued a CB_GETATTR

...or...

b) it has written to the file while holding the delegation, in which
case it'll return a different CB_GETATTR to the server

The simplest thing for the server to do is to just increment the change
attribute _once_ when it gets back a CB_GETATTR with a different change
attr than it has.

That's sufficient to tell another client issuing a a GETATTR that the
file has changed without needing to recall the delegation.

Prior to the delegation being returned, the client will send at least
one WRITE RPC, and that's enough to ensure that the the next stat will
see the thing increase.
NeilBrown Dec. 16, 2017, 4:17 a.m. UTC | #3
On Wed, Dec 13 2017, Jeff Layton wrote:

> On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
>> On Wed, Dec 13 2017, Jeff Layton wrote:
>> 
>> > +/*
>> > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
>> > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
>> > + * appear different to observers if there was a change to the inode's data or
>> > + * metadata since it was last queried.
>> > + *
>> > + * It should be considered an opaque value by observers. If it remains the same
>> > + * since it was last checked, then nothing has changed in the inode. If it's
>> > + * different then something has changed. Observers cannot infer anything about
>> > + * the nature or magnitude of the changes from the value, only that the inode
>> > + * has changed in some fashion.
>> 
>> I agree that it "should be" considered opaque, but I have a suspicion
>> that NFSv4 doesn't consider it opaque.
>> There is something about write delegations and the server performing a
>> GETATTR callback to the delegated client so that it can answer GETATTR
>> from other clients without recalling the delegation.
>> 
>> Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
>> the text:
>> 
>>    o  The client will create a value greater than c that will be used
>>       for communicating that modified data is held at the client.  Let
>>       this value be represented by d.
>> 
>> "c" here is a 'change' attribute.
>> 
>> Then:
>> 
>>    While the change attribute is opaque to the client in the sense that
>>    it has no idea what units of time, if any, the server is counting
>>    change with, it is not opaque in that the client has to treat it as
>>    an unsigned integer, and the server has to be able to see the results
>>    of the client's changes to that integer.  Therefore, the server MUST
>>    encode the change attribute in network order when sending it to the
>>    client.  The client MUST decode it from network order to its native
>>    order when receiving it, and the client MUST encode it in network
>>    order when sending it to the server.  For this reason, change is
>>    defined as an unsigned integer rather than an opaque array of bytes.
>> 
>> This all suggests that nfsd needs to be certain that "incrementing" the
>> change id will produce a new changeid, which has not been used before,
>> and also suggests that nfsd needs to be able to control the changeid
>> stored after writes that result from a delegation being returned.
>> 
>> I'd just like to say that this is one of the most annoying dumb features
>> of NFSv4, because it is trivial to fix and I suggested a fix before
>> NFSv4.0 was finalized.  Grumble.
>> 
>> Otherwise the patch set looks good.  I haven't gone over the code
>> closely, the but approach is spot-on.
>
> I don't think we have to do that. There are really only two states with
> a client holding a write delegation, as far as the server is concerned.
> Either:
>
> a) the client has done no writes to the file, in which case it'll return
> the same i_version that the server has when issued a CB_GETATTR
>
> ...or...
>
> b) it has written to the file while holding the delegation, in which
> case it'll return a different CB_GETATTR to the server
>
> The simplest thing for the server to do is to just increment the change
> attribute _once_ when it gets back a CB_GETATTR with a different change
> attr than it has.
>
> That's sufficient to tell another client issuing a a GETATTR that the
> file has changed without needing to recall the delegation.
>
> Prior to the delegation being returned, the client will send at least
> one WRITE RPC, and that's enough to ensure that the the next stat will
> see the thing increase.

"increment" and "increase" are not words that mean anything for an
"opaque value".
NFSd is, presumably, an "observer" of i_version (as it isn't the
filesytem that controls it), so your text says it must treat i_version as
opaque.  That means it cannot detect an "increase" (only a change), and
it certainly cannot "increment" the value.

I think you need to allow observers to treat i_version as a 64 bit number
which will monotonically increase.  Any change to the file will result
in an increment of at least '1'.

Thanks,
NeilBrown
Jeff Layton Dec. 17, 2017, 1:01 p.m. UTC | #4
On Sat, 2017-12-16 at 15:17 +1100, NeilBrown wrote:
> On Wed, Dec 13 2017, Jeff Layton wrote:
> 
> > On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
> > > On Wed, Dec 13 2017, Jeff Layton wrote:
> > > 
> > > > +/*
> > > > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > > > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > > > + * appear different to observers if there was a change to the inode's data or
> > > > + * metadata since it was last queried.
> > > > + *
> > > > + * It should be considered an opaque value by observers. If it remains the same
> > > > + * since it was last checked, then nothing has changed in the inode. If it's
> > > > + * different then something has changed. Observers cannot infer anything about
> > > > + * the nature or magnitude of the changes from the value, only that the inode
> > > > + * has changed in some fashion.
> > > 
> > > I agree that it "should be" considered opaque, but I have a suspicion
> > > that NFSv4 doesn't consider it opaque.
> > > There is something about write delegations and the server performing a
> > > GETATTR callback to the delegated client so that it can answer GETATTR
> > > from other clients without recalling the delegation.
> > > 
> > > Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
> > > the text:
> > > 
> > >    o  The client will create a value greater than c that will be used
> > >       for communicating that modified data is held at the client.  Let
> > >       this value be represented by d.
> > > 
> > > "c" here is a 'change' attribute.
> > > 
> > > Then:
> > > 
> > >    While the change attribute is opaque to the client in the sense that
> > >    it has no idea what units of time, if any, the server is counting
> > >    change with, it is not opaque in that the client has to treat it as
> > >    an unsigned integer, and the server has to be able to see the results
> > >    of the client's changes to that integer.  Therefore, the server MUST
> > >    encode the change attribute in network order when sending it to the
> > >    client.  The client MUST decode it from network order to its native
> > >    order when receiving it, and the client MUST encode it in network
> > >    order when sending it to the server.  For this reason, change is
> > >    defined as an unsigned integer rather than an opaque array of bytes.
> > > 
> > > This all suggests that nfsd needs to be certain that "incrementing" the
> > > change id will produce a new changeid, which has not been used before,
> > > and also suggests that nfsd needs to be able to control the changeid
> > > stored after writes that result from a delegation being returned.
> > > 
> > > I'd just like to say that this is one of the most annoying dumb features
> > > of NFSv4, because it is trivial to fix and I suggested a fix before
> > > NFSv4.0 was finalized.  Grumble.
> > > 
> > > Otherwise the patch set looks good.  I haven't gone over the code
> > > closely, the but approach is spot-on.
> > 
> > I don't think we have to do that. There are really only two states with
> > a client holding a write delegation, as far as the server is concerned.
> > Either:
> > 
> > a) the client has done no writes to the file, in which case it'll return
> > the same i_version that the server has when issued a CB_GETATTR
> > 
> > ...or...
> > 
> > b) it has written to the file while holding the delegation, in which
> > case it'll return a different CB_GETATTR to the server
> > 
> > The simplest thing for the server to do is to just increment the change
> > attribute _once_ when it gets back a CB_GETATTR with a different change
> > attr than it has.
> > 
> > That's sufficient to tell another client issuing a a GETATTR that the
> > file has changed without needing to recall the delegation.
> > 
> > Prior to the delegation being returned, the client will send at least
> > one WRITE RPC, and that's enough to ensure that the the next stat will
> > see the thing increase.
> 
> "increment" and "increase" are not words that mean anything for an
> "opaque value".
> NFSd is, presumably, an "observer" of i_version (as it isn't the
> filesytem that controls it), so your text says it must treat i_version as
> opaque.  That means it cannot detect an "increase" (only a change), and
> it certainly cannot "increment" the value.
> 
> I think you need to allow observers to treat i_version as a 64 bit number
> which will monotonically increase.  Any change to the file will result
> in an increment of at least '1'.

Here, I was mostly speaking about NFS in general. I think the above
method is the cheapest/best way to ensure that you don't end up with
reused change attributes, within the confines of the protocol.

With this implementation, it's probably safe enough to make a guarantee
that the value will increase wrt a previously sampled value if there was
a change. I'll have to think about how best to document that.

Thanks,
Jeff Layton Dec. 18, 2017, 2:03 p.m. UTC | #5
On Sat, 2017-12-16 at 15:17 +1100, NeilBrown wrote:
> On Wed, Dec 13 2017, Jeff Layton wrote:
> 
> > On Thu, 2017-12-14 at 09:04 +1100, NeilBrown wrote:
> > > On Wed, Dec 13 2017, Jeff Layton wrote:
> > > 
> > > > +/*
> > > > + * The change attribute (i_version) is mandated by NFSv4 and is mostly for
> > > > + * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
> > > > + * appear different to observers if there was a change to the inode's data or
> > > > + * metadata since it was last queried.
> > > > + *
> > > > + * It should be considered an opaque value by observers. If it remains the same
> > > > + * since it was last checked, then nothing has changed in the inode. If it's
> > > > + * different then something has changed. Observers cannot infer anything about
> > > > + * the nature or magnitude of the changes from the value, only that the inode
> > > > + * has changed in some fashion.
> > > 
> > > I agree that it "should be" considered opaque, but I have a suspicion
> > > that NFSv4 doesn't consider it opaque.
> > > There is something about write delegations and the server performing a
> > > GETATTR callback to the delegated client so that it can answer GETATTR
> > > from other clients without recalling the delegation.
> > > 
> > > Specifically section "10.4.3 Handling of CB_GETATTR" of RFC5661 contains
> > > the text:
> > > 
> > >    o  The client will create a value greater than c that will be used
> > >       for communicating that modified data is held at the client.  Let
> > >       this value be represented by d.
> > > 
> > > "c" here is a 'change' attribute.
> > > 
> > > Then:
> > > 
> > >    While the change attribute is opaque to the client in the sense that
> > >    it has no idea what units of time, if any, the server is counting
> > >    change with, it is not opaque in that the client has to treat it as
> > >    an unsigned integer, and the server has to be able to see the results
> > >    of the client's changes to that integer.  Therefore, the server MUST
> > >    encode the change attribute in network order when sending it to the
> > >    client.  The client MUST decode it from network order to its native
> > >    order when receiving it, and the client MUST encode it in network
> > >    order when sending it to the server.  For this reason, change is
> > >    defined as an unsigned integer rather than an opaque array of bytes.
> > > 
> > > This all suggests that nfsd needs to be certain that "incrementing" the
> > > change id will produce a new changeid, which has not been used before,
> > > and also suggests that nfsd needs to be able to control the changeid
> > > stored after writes that result from a delegation being returned.
> > > 
> > > I'd just like to say that this is one of the most annoying dumb features
> > > of NFSv4, because it is trivial to fix and I suggested a fix before
> > > NFSv4.0 was finalized.  Grumble.
> > > 
> > > Otherwise the patch set looks good.  I haven't gone over the code
> > > closely, the but approach is spot-on.
> > 
> > I don't think we have to do that. There are really only two states with
> > a client holding a write delegation, as far as the server is concerned.
> > Either:
> > 
> > a) the client has done no writes to the file, in which case it'll return
> > the same i_version that the server has when issued a CB_GETATTR
> > 
> > ...or...
> > 
> > b) it has written to the file while holding the delegation, in which
> > case it'll return a different CB_GETATTR to the server
> > 
> > The simplest thing for the server to do is to just increment the change
> > attribute _once_ when it gets back a CB_GETATTR with a different change
> > attr than it has.
> > 
> > That's sufficient to tell another client issuing a a GETATTR that the
> > file has changed without needing to recall the delegation.
> > 
> > Prior to the delegation being returned, the client will send at least
> > one WRITE RPC, and that's enough to ensure that the the next stat will
> > see the thing increase.
> 
> "increment" and "increase" are not words that mean anything for an
> "opaque value".
> NFSd is, presumably, an "observer" of i_version (as it isn't the
> filesytem that controls it), so your text says it must treat i_version as
> opaque.  That means it cannot detect an "increase" (only a change), and
> it certainly cannot "increment" the value.
> 
> I think you need to allow observers to treat i_version as a 64 bit number
> which will monotonically increase.  Any change to the file will result
> in an increment of at least '1'.
> 

One thing here...

I'm currently doing this:

static inline s64                                                               
inode_cmp_iversion(const struct inode *inode, const u64 old)                    
{                                                                               
        return (s64)inode_peek_iversion(inode) - (s64)old;                      
}                                                                               

But I don't think that'll handle wraparound correctly if we want to 
allow people to determine whether it's older or newer. I'll probably
change this to shift the old value left by one bit, and mask off the low
bit of the current inode->i_version.

That'll always give you a difference of 2 or more if they're different,
but it should return the correct sign, which is really all we care about
anyway.

Granted, we're unlikely to wrap around with a 64 bit value, but it's
hard to know for sure what values might be stored on disk on existing
filesystems.
diff mbox

Patch

diff --git a/include/linux/fs.h b/include/linux/fs.h
index 511fbaabf624..5001e77342fd 100644
--- a/include/linux/fs.h
+++ b/include/linux/fs.h
@@ -2036,19 +2036,203 @@  static inline void inode_dec_link_count(struct inode *inode)
 	mark_inode_dirty(inode);
 }
 
+/*
+ * The change attribute (i_version) is mandated by NFSv4 and is mostly for
+ * knfsd, but is also used for other purposes (e.g. IMA). The i_version must
+ * appear different to observers if there was a change to the inode's data or
+ * metadata since it was last queried.
+ *
+ * It should be considered an opaque value by observers. If it remains the same
+ * since it was last checked, then nothing has changed in the inode. If it's
+ * different then something has changed. Observers cannot infer anything about
+ * the nature or magnitude of the changes from the value, only that the inode
+ * has changed in some fashion.
+ *
+ * Not all filesystems properly implement the i_version counter. Subsystems that
+ * want to use i_version field on an inode should first check whether the
+ * filesystem sets the SB_I_VERSION flag (usually via the IS_I_VERSION macro).
+ *
+ * Those that set SB_I_VERSION will automatically have their i_version counter
+ * incremented on writes to normal files. If the SB_I_VERSION is not set, then
+ * the VFS will not touch it on writes, and the filesystem can use it how it
+ * wishes. Note that the filesystem is always responsible for updating the
+ * i_version on namespace changes in directories (mkdir, rmdir, unlink, etc.).
+ * We consider these sorts of filesystems to have a kernel-managed i_version.
+ *
+ * Note that some filesystems (e.g. NFS and AFS) just use the field to store
+ * a server-provided value (for the most part). For that reason, those
+ * filesystems do not set SB_I_VERSION. These filesystems are considered to
+ * have a self-managed i_version.
+ */
+
+/**
+ * inode_set_iversion_raw - set i_version to the specified raw value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for use by
+ * filesystems that self-manage the i_version.
+ *
+ * For example, the NFS client stores its NFSv4 change attribute in this way,
+ * and the AFS client stores the data_version from the server here.
+ */
+static inline void
+inode_set_iversion_raw(struct inode *inode, const u64 new)
+{
+	inode->i_version = new;
+}
+
+/**
+ * inode_set_iversion - set i_version to a particular value
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * Set @inode's i_version field to @new. This function is for filesystems with
+ * a kernel-managed i_version.
+ *
+ * For now, this just does the same thing as the _raw variant.
+ */
+static inline void
+inode_set_iversion(struct inode *inode, const u64 new)
+{
+	inode_set_iversion_raw(inode, new);
+}
+
+/**
+ * inode_set_iversion_queried - set i_version to a particular value and set
+ *                              flag to indicate that it has been viewed
+ * @inode: inode to set
+ * @new: new i_version value to set
+ *
+ * When loading in an i_version value from a backing store, we typically don't
+ * know whether it was previously viewed before being stored or not. Thus, we
+ * must assume that it was, to ensure that any changes will result in the
+ * value changing.
+ *
+ * This function will set the inode's i_version, and possibly flag the value
+ * as if it has already been viewed at least once.
+ *
+ * For now, this just does what inode_set_iversion does.
+ */
+static inline void
+inode_set_iversion_queried(struct inode *inode, const u64 new)
+{
+	inode_set_iversion(inode, new);
+}
+
+/**
+ * inode_maybe_inc_iversion - increments i_version
+ * @inode: inode with the i_version that should be updated
+ * @force: increment the counter even if it's not necessary
+ *
+ * Every time the inode is modified, the i_version field must be seen to have
+ * changed by any observer.
+ *
+ * In this implementation, we always increment it after taking the i_lock to
+ * ensure that we don't race with other incrementors.
+ *
+ * Returns true if counter was bumped, and false if it wasn't.
+ */
+static inline bool
+inode_maybe_inc_iversion(struct inode *inode, bool force)
+{
+	spin_lock(&inode->i_lock);
+	inode->i_version++;
+	spin_unlock(&inode->i_lock);
+	return true;
+}
+
+/**
+ * inode_inc_iversion - forcibly increment i_version
+ * @inode: inode that needs to be updated
+ *
+ * Forcbily increment the i_version field. This always results in a change to
+ * the observable value.
+ */
+static inline void
+inode_inc_iversion(struct inode *inode)
+{
+	inode_maybe_inc_iversion(inode, true);
+}
+
 /**
- * inode_inc_iversion - increments i_version
- * @inode: inode that need to be updated
+ * inode_iversion_need_inc - is the i_version in need of being incremented?
+ * @inode: inode to check
  *
- * Every time the inode is modified, the i_version field will be incremented.
- * The filesystem has to be mounted with i_version flag
+ * Returns whether the inode->i_version counter needs incrementing on the next
+ * change.
+ *
+ * For now, we assume that it always does.
  */
+static inline bool
+inode_iversion_need_inc(struct inode *inode)
+{
+	return true;
+}
 
-static inline void inode_inc_iversion(struct inode *inode)
+/**
+ * inode_peek_iversion_raw - grab a "raw" iversion value
+ * @inode: inode from which i_version should be read
+ *
+ * Grab a "raw" inode->i_version value and return it. The i_version is not
+ * flagged or converted in any way. This is mostly used to access a self-managed
+ * i_version.
+ *
+ * With those filesystems, we want to treat the i_version as an entirely
+ * opaque value.
+ */
+static inline u64
+inode_peek_iversion_raw(const struct inode *inode)
+{
+	return inode->i_version;
+}
+
+/**
+ * inode_peek_iversion - read i_version without flagging it to be incremented
+ * @inode: inode from which i_version should be read
+ *
+ * Read the inode i_version counter for an inode without registering it as a
+ * query.
+ *
+ * This is typically used by local filesystems that need to store an i_version
+ * on disk. In that situation, it's not necessary to flag it as having been
+ * viewed, as the result won't be used to gauge changes from that point.
+ */
+static inline u64
+inode_peek_iversion(const struct inode *inode)
+{
+	return inode_peek_iversion_raw(inode);
+}
+
+/**
+ * inode_query_iversion - read i_version for later use
+ * @inode: inode from which i_version should be read
+ *
+ * Read the inode i_version counter. This should be used by callers that wish
+ * to store the returned i_version for later comparison. This will guarantee
+ * that a later query of the i_version will result in a different value if
+ * anything has changed.
+ *
+ * This implementation just does a peek.
+ */
+static inline u64
+inode_query_iversion(struct inode *inode)
+{
+	return inode_peek_iversion(inode);
+}
+
+/**
+ * inode_cmp_iversion - check whether the i_version counter has changed
+ * @inode: inode to check
+ * @old: old value to check against its i_version
+ *
+ * Compare an i_version counter with a previous one. Returns 0 if they are
+ * the same or non-zero if they are different.
+ */
+static inline s64
+inode_cmp_iversion(const struct inode *inode, const u64 old)
 {
-       spin_lock(&inode->i_lock);
-       inode->i_version++;
-       spin_unlock(&inode->i_lock);
+	return (s64)inode_peek_iversion(inode) - (s64)old;
 }
 
 enum file_time_flags {