Message ID | 20220812123727.46397-1-lczerner@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v3,1/3] ext4: don't increase iversion counter for ea_inodes | expand |
On Fri, Aug 12, 2022 at 02:37:25PM +0200, Lukas Czerner wrote: > ea_inodes are using i_version for storing part of the reference count so > we really need to leave it alone. > > The problem can be reproduced by xfstest ext4/026 when iversion is > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL > inodes in ext4_mark_iloc_dirty(). > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > Reviewed-by: Jan Kara <jack@suse.cz> > Reviewed-by: Jeff Layton <jlayton@kernel.org> > --- Reviewed-by: Christian Brauner (Microsoft) <brauner@kernel.org>
On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote: > ea_inodes are using i_version for storing part of the reference count so > we really need to leave it alone. > > The problem can be reproduced by xfstest ext4/026 when iversion is > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL > inodes in ext4_mark_iloc_dirty(). > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > Reviewed-by: Jan Kara <jack@suse.cz> > Reviewed-by: Jeff Layton <jlayton@kernel.org> > --- > v2, v3: no change > > fs/ext4/inode.c | 7 ++++++- > 1 file changed, 6 insertions(+), 1 deletion(-) > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > index 601214453c3a..2a220be34caa 100644 > --- a/fs/ext4/inode.c > +++ b/fs/ext4/inode.c > @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle, > } > ext4_fc_track_inode(handle, inode); > > - if (IS_I_VERSION(inode)) > + /* > + * ea_inodes are using i_version for storing reference count, don't > + * mess with it > + */ > + if (IS_I_VERSION(inode) && > + !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) > inode_inc_iversion(inode); > > /* the do_update_inode consumes one bh->b_count */ I've spent some time writing tests for the i_version counter (still quite rough right now), and what I've found is that this particular inode_inc_iversion results in the counter being bumped on _reads_ as well as writes, due to the atime changing. This call to inode_inc_iversion seems to make no sense, as we aren't bumping the mtime here. I'm still working on and testing this, but I think we'll probably just want to remove this inode_inc_iversion entirely, and leave the i_version bumping for normal files to happen when the timestamps are updated. So far, my testing seems to indicate that that does the right thing. Hopefully I'll have some testcases + patches for this next week sometime. Cheers,
On Fri 12-08-22 14:42:36, Jeff Layton wrote: > On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote: > > ea_inodes are using i_version for storing part of the reference count so > > we really need to leave it alone. > > > > The problem can be reproduced by xfstest ext4/026 when iversion is > > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL > > inodes in ext4_mark_iloc_dirty(). > > > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > > Reviewed-by: Jan Kara <jack@suse.cz> > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > > --- > > v2, v3: no change > > > > fs/ext4/inode.c | 7 ++++++- > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > index 601214453c3a..2a220be34caa 100644 > > --- a/fs/ext4/inode.c > > +++ b/fs/ext4/inode.c > > @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle, > > } > > ext4_fc_track_inode(handle, inode); > > > > - if (IS_I_VERSION(inode)) > > + /* > > + * ea_inodes are using i_version for storing reference count, don't > > + * mess with it > > + */ > > + if (IS_I_VERSION(inode) && > > + !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) > > inode_inc_iversion(inode); > > > > /* the do_update_inode consumes one bh->b_count */ > > > I've spent some time writing tests for the i_version counter (still > quite rough right now), and what I've found is that this particular > inode_inc_iversion results in the counter being bumped on _reads_ as > well as writes, due to the atime changing. This call to > inode_inc_iversion seems to make no sense, as we aren't bumping the > mtime here. > > I'm still working on and testing this, but I think we'll probably just > want to remove this inode_inc_iversion entirely, and leave the i_version > bumping for normal files to happen when the timestamps are updated. So > far, my testing seems to indicate that that does the right thing. I agree that inode_inc_iversion() may be overly agressive here but where else does get iversion updated for things like inode owner update or permission changes? Honza
On Tue, 2022-08-16 at 13:52 +0200, Jan Kara wrote: > On Fri 12-08-22 14:42:36, Jeff Layton wrote: > > On Fri, 2022-08-12 at 14:37 +0200, Lukas Czerner wrote: > > > ea_inodes are using i_version for storing part of the reference count so > > > we really need to leave it alone. > > > > > > The problem can be reproduced by xfstest ext4/026 when iversion is > > > enabled. Fix it by not calling inode_inc_iversion() for EXT4_EA_INODE_FL > > > inodes in ext4_mark_iloc_dirty(). > > > > > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > > > Reviewed-by: Jan Kara <jack@suse.cz> > > > Reviewed-by: Jeff Layton <jlayton@kernel.org> > > > --- > > > v2, v3: no change > > > > > > fs/ext4/inode.c | 7 ++++++- > > > 1 file changed, 6 insertions(+), 1 deletion(-) > > > > > > diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c > > > index 601214453c3a..2a220be34caa 100644 > > > --- a/fs/ext4/inode.c > > > +++ b/fs/ext4/inode.c > > > @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle, > > > } > > > ext4_fc_track_inode(handle, inode); > > > > > > - if (IS_I_VERSION(inode)) > > > + /* > > > + * ea_inodes are using i_version for storing reference count, don't > > > + * mess with it > > > + */ > > > + if (IS_I_VERSION(inode) && > > > + !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) > > > inode_inc_iversion(inode) > > > > > > /* the do_update_inode consumes one bh->b_count */ > > > > > > I've spent some time writing tests for the i_version counter (still > > quite rough right now), and what I've found is that this particular > > inode_inc_iversion results in the counter being bumped on _reads_ as > > well as writes, due to the atime changing. This call to > > inode_inc_iversion seems to make no sense, as we aren't bumping the > > mtime here. > > > > I'm still working on and testing this, but I think we'll probably just > > want to remove this inode_inc_iversion entirely, and leave the i_version > > bumping for normal files to happen when the timestamps are updated. So > > far, my testing seems to indicate that that does the right thing. > > I agree that inode_inc_iversion() may be overly agressive here but where > else does get iversion updated for things like inode owner update or > permission changes? > > Honza If we remove it here, then both the setattr and setxattr codepaths will need to explicitly bump the iversion counter. Note that we update the ctime in those paths too, so that gives us a guidepost as to when we should update i_version. xfs will need similar changes, but btrfs turns out to already do the right thing. I'm planning to post my latest patches in just a bit.
diff --git a/fs/ext4/inode.c b/fs/ext4/inode.c index 601214453c3a..2a220be34caa 100644 --- a/fs/ext4/inode.c +++ b/fs/ext4/inode.c @@ -5731,7 +5731,12 @@ int ext4_mark_iloc_dirty(handle_t *handle, } ext4_fc_track_inode(handle, inode); - if (IS_I_VERSION(inode)) + /* + * ea_inodes are using i_version for storing reference count, don't + * mess with it + */ + if (IS_I_VERSION(inode) && + !(EXT4_I(inode)->i_flags & EXT4_EA_INODE_FL)) inode_inc_iversion(inode); /* the do_update_inode consumes one bh->b_count */