Message ID | 20220803105340.17377-2-lczerner@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2,1/3] ext4: don't increase iversion counter for ea_inodes | expand |
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote: > diff --git a/include/linux/fs.h b/include/linux/fs.h > index 9ad5e3520fae..2243797badf2 100644 > --- a/include/linux/fs.h > +++ b/include/linux/fs.h > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, > * The inode itself only has dirty timestamps, and the > * lazytime mount option is enabled. We keep track of this > * separately from I_DIRTY_SYNC in order to implement > * lazytime. This gets cleared if I_DIRTY_INODE > - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. > - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in > - * i_state, but not both. I_DIRTY_PAGES may still be set. > + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But > + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already > + * in place. I'm still having a hard time understanding the new semantics. The first sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode itself only has dirty timestamps", right? Also, have you checked all the places that I_DIRTY_TIME is used and verified they do the right thing now? What about inode_is_dirtytime_only()? Also what is the precise meaning of the flags argument to ->dirty_inode now? sb->s_op->dirty_inode(inode, flags & (I_DIRTY_INODE | I_DIRTY_TIME)); Note that dirty_inode is documented in Documentation/filesystems/vfs.rst. - Eric
On Fri, Aug 05, 2022 at 01:05:45AM -0700, Eric Biggers wrote: > On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote: > > diff --git a/include/linux/fs.h b/include/linux/fs.h > > index 9ad5e3520fae..2243797badf2 100644 > > --- a/include/linux/fs.h > > +++ b/include/linux/fs.h > > @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, > > * The inode itself only has dirty timestamps, and the > > * lazytime mount option is enabled. We keep track of this > > * separately from I_DIRTY_SYNC in order to implement > > * lazytime. This gets cleared if I_DIRTY_INODE > > - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. > > - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in > > - * i_state, but not both. I_DIRTY_PAGES may still be set. > > + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But > > + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already > > + * in place. > > I'm still having a hard time understanding the new semantics. The first > sentence above needs to be updated since I_DIRTY_TIME no longer means "the inode > itself only has dirty timestamps", right? The problem is that it was always assumed that I_DIRTY_INODE superseeds I_DIRTY_TIME and so it would get cleared in __mark_inode_dirty() when we have I_DIRTY_INODE. That's true, we call sb->s_op->dirty_inode(), the time update gets pushed into on-disk inode structure, I_DIRTY_TIME cleared and it will get queued for writeback. Any subsequent dirtying with I_DIRTY_TIME gets ignored simply because I_DIRTY_INODE is already set in i_state. But in ext4 this time update will never get pushed into on disk inode and there is no I_DIRTY_TIME so once the writeback is done we've lost all those I_DIRTY_TIME updates in between even if there was a sync. Now, we still clear I_DIRTY_TIME when we get I_DIRTY_INODE, but any subsequent I_DIRTY_TIME only updates won't be ignored and we set it into i_state. After the writeback is done it'll be moved to b_dirty_time list. So I am not sure how would you like it to be re-worded, simply removing the 'only' would be ok? > > Also, have you checked all the places that I_DIRTY_TIME is used and verified > they do the right thing now? What about inode_is_dirtytime_only()? Yes, that's fine, despite the slightly misleading name ;) > > Also what is the precise meaning of the flags argument to ->dirty_inode now? > > sb->s_op->dirty_inode(inode, > flags & (I_DIRTY_INODE | I_DIRTY_TIME)); > > Note that dirty_inode is documented in Documentation/filesystems/vfs.rst. Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as well. Additionaly it can have I_DIRTY_TIME to inform the fs we have a dirty timestamp as well (in case of lazytime). Perhaps we can add: If the inode has dirty timestamp and lazytime is enabled I_DIRTY_TIME will be set in the flags. -Lukas > > - Eric >
On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote: > Currently the I_DIRTY_TIME will never get set if the inode already has > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's > true, however ext4 will only update the on-disk inode in > ->dirty_inode(), not on actual writeback. As a result if the inode > already has I_DIRTY_INODE state by the time we get to > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled > into on-disk inode and will not get updated until the next I_DIRTY_INODE > update, which might never come if we crash or get a power failure. > > The problem can be reproduced on ext4 by running xfstest generic/622 > with -o iversion mount option. > > Fix it by allowing I_DIRTY_TIME to be set even if the inode already has > I_DIRTY_INODE. Also make sure that the case is properly handled in > writeback_single_inode() as well. Additionally changes in > xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. > > Thanks Jan Kara for suggestions on how to make this work properly. > > Cc: Dave Chinner <david@fromorbit.com> > Cc: Christoph Hellwig <hch@infradead.org> > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > Suggested-by: Jan Kara <jack@suse.cz> > --- > v2: Reworked according to suggestions from Jan .... > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > index aa977c7ea370..cff05a4771b5 100644 > --- a/fs/xfs/xfs_super.c > +++ b/fs/xfs/xfs_super.c > @@ -658,7 +658,8 @@ xfs_fs_dirty_inode( > > if (!(inode->i_sb->s_flags & SB_LAZYTIME)) > return; > - if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) > + if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC || > + !((inode->i_state | flag) & I_DIRTY_TIME)) > return; My eyes, they bleed. The dirty time code was already a horrid abomination, and this makes it worse. From looking at the code, I cannot work out what the new semantics for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work out what the condition this is new code is supposed to be doing. I *can't verify it is correct* by reading the code. Can you please add a comment here explaining the conditions where we don't have to log a new timestamp update? Also, if "flag" now contains multiple flags, can you rename it "flags"? Cheers, Dave.
On Mon, Aug 08, 2022 at 09:08:10AM +1000, Dave Chinner wrote: > On Wed, Aug 03, 2022 at 12:53:39PM +0200, Lukas Czerner wrote: > > Currently the I_DIRTY_TIME will never get set if the inode already has > > I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's > > true, however ext4 will only update the on-disk inode in > > ->dirty_inode(), not on actual writeback. As a result if the inode > > already has I_DIRTY_INODE state by the time we get to > > __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled > > into on-disk inode and will not get updated until the next I_DIRTY_INODE > > update, which might never come if we crash or get a power failure. > > > > The problem can be reproduced on ext4 by running xfstest generic/622 > > with -o iversion mount option. > > > > Fix it by allowing I_DIRTY_TIME to be set even if the inode already has > > I_DIRTY_INODE. Also make sure that the case is properly handled in > > writeback_single_inode() as well. Additionally changes in > > xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. > > > > Thanks Jan Kara for suggestions on how to make this work properly. > > > > Cc: Dave Chinner <david@fromorbit.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Signed-off-by: Lukas Czerner <lczerner@redhat.com> > > Suggested-by: Jan Kara <jack@suse.cz> > > --- > > v2: Reworked according to suggestions from Jan > > .... > > > diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c > > index aa977c7ea370..cff05a4771b5 100644 > > --- a/fs/xfs/xfs_super.c > > +++ b/fs/xfs/xfs_super.c > > @@ -658,7 +658,8 @@ xfs_fs_dirty_inode( > > > > if (!(inode->i_sb->s_flags & SB_LAZYTIME)) > > return; > > - if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) > > + if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC || > > + !((inode->i_state | flag) & I_DIRTY_TIME)) > > return; > > My eyes, they bleed. The dirty time code was already a horrid > abomination, and this makes it worse. > > From looking at the code, I cannot work out what the new semantics > for I_DIRTY_TIME and I_DIRTY_SYNC are supposed to be, nor can I work Hi Dave, please see the other thready for this patch with Eric Biggers, where I try to explain and give some suggestion to change the doc. Does it make sense to you, or am I missing something? https://marc.info/?l=linux-ext4&m=165970194205621&w=2 > out what the condition this is new code is supposed to be doing. I > *can't verify it is correct* by reading the code. The ->dirty_inode() needed to be changed to clear I_DIRTY_TIME from i_state *before* we call ->dirty_inode() to avoid race where we would lose timestamp update that comes just a little later, after -dirty_inode() call with I_DRITY_INODE. But that would break xfs, so I decided to keep the condition and loosen the requirement so that I_DIRTY_TIME can also be se in 'flag', not just the i_state. Hence the abomination. > > Can you please add a comment here explaining the conditions where we > don't have to log a new timestamp update? How about something like this? Only do the timestamp update if the inode is dirty (I_DIRTY_SYNC) and has dirty timestamp (I_DIRTY_TIME). I_DIRTY_TIME can be either already set in i_state, or passed in flags possibly together with I_DIRTY_SYNC. > > Also, if "flag" now contains multiple flags, can you rename it > "flags"? Sure, I can do that. Thanks! -Lukas > > Cheers, > > Dave. > > -- > Dave Chinner > david@fromorbit.com >
On Fri, Aug 05, 2022 at 02:23:06PM +0200, Lukas Czerner wrote: > > > > Also what is the precise meaning of the flags argument to ->dirty_inode now? > > > > sb->s_op->dirty_inode(inode, > > flags & (I_DIRTY_INODE | I_DIRTY_TIME)); > > > > Note that dirty_inode is documented in Documentation/filesystems/vfs.rst. > > Don't know. It alredy don't mention I_DIRTY_SYNC that can be there as > well. Well, it didn't really need to because there were only two possibilities: datasync and not datasync. This patch changes that. > Additionaly it can have I_DIRTY_TIME to inform the fs we have a > dirty timestamp as well (in case of lazytime). This is introduced by this patch. - Eric
diff --git a/fs/fs-writeback.c b/fs/fs-writeback.c index 05221366a16d..638dbf143727 100644 --- a/fs/fs-writeback.c +++ b/fs/fs-writeback.c @@ -1718,9 +1718,14 @@ static int writeback_single_inode(struct inode *inode, */ if (!(inode->i_state & I_DIRTY_ALL)) inode_cgwb_move_to_attached(inode, wb); - else if (!(inode->i_state & I_SYNC_QUEUED) && - (inode->i_state & I_DIRTY)) - redirty_tail_locked(inode, wb); + else if (!(inode->i_state & I_SYNC_QUEUED)) { + if ((inode->i_state & I_DIRTY)) + redirty_tail_locked(inode, wb); + else if (inode->i_state & I_DIRTY_TIME) { + inode->dirtied_when = jiffies; + inode_io_list_move_locked(inode, wb, &wb->b_dirty_time); + } + } spin_unlock(&wb->list_lock); inode_sync_complete(inode); @@ -2369,6 +2374,17 @@ void __mark_inode_dirty(struct inode *inode, int flags) trace_writeback_mark_inode_dirty(inode, flags); if (flags & I_DIRTY_INODE) { + + /* Inode timestamp update will piggback on this dirtying */ + if (inode->i_state & I_DIRTY_TIME) { + spin_lock(&inode->i_lock); + if (inode->i_state & I_DIRTY_TIME) { + inode->i_state &= ~I_DIRTY_TIME; + flags |= I_DIRTY_TIME; + } + spin_unlock(&inode->i_lock); + } + /* * Notify the filesystem about the inode being dirtied, so that * (if needed) it can update on-disk fields and journal the @@ -2378,7 +2394,8 @@ void __mark_inode_dirty(struct inode *inode, int flags) */ trace_writeback_dirty_inode_start(inode, flags); if (sb->s_op->dirty_inode) - sb->s_op->dirty_inode(inode, flags & I_DIRTY_INODE); + sb->s_op->dirty_inode(inode, + flags & (I_DIRTY_INODE | I_DIRTY_TIME)); trace_writeback_dirty_inode(inode, flags); /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */ @@ -2399,21 +2416,15 @@ void __mark_inode_dirty(struct inode *inode, int flags) */ smp_mb(); - if (((inode->i_state & flags) == flags) || - (dirtytime && (inode->i_state & I_DIRTY_INODE))) + if ((inode->i_state & flags) == flags) return; spin_lock(&inode->i_lock); - if (dirtytime && (inode->i_state & I_DIRTY_INODE)) - goto out_unlock_inode; if ((inode->i_state & flags) != flags) { const int was_dirty = inode->i_state & I_DIRTY; inode_attach_wb(inode, NULL); - /* I_DIRTY_INODE supersedes I_DIRTY_TIME. */ - if (flags & I_DIRTY_INODE) - inode->i_state &= ~I_DIRTY_TIME; inode->i_state |= flags; /* @@ -2486,7 +2497,6 @@ void __mark_inode_dirty(struct inode *inode, int flags) out_unlock: if (wb) spin_unlock(&wb->list_lock); -out_unlock_inode: spin_unlock(&inode->i_lock); } EXPORT_SYMBOL(__mark_inode_dirty); diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c index aa977c7ea370..cff05a4771b5 100644 --- a/fs/xfs/xfs_super.c +++ b/fs/xfs/xfs_super.c @@ -658,7 +658,8 @@ xfs_fs_dirty_inode( if (!(inode->i_sb->s_flags & SB_LAZYTIME)) return; - if (flag != I_DIRTY_SYNC || !(inode->i_state & I_DIRTY_TIME)) + if ((flag & ~I_DIRTY_TIME) != I_DIRTY_SYNC || + !((inode->i_state | flag) & I_DIRTY_TIME)) return; if (xfs_trans_alloc(mp, &M_RES(mp)->tr_fsyncts, 0, 0, 0, &tp)) diff --git a/include/linux/fs.h b/include/linux/fs.h index 9ad5e3520fae..2243797badf2 100644 --- a/include/linux/fs.h +++ b/include/linux/fs.h @@ -2245,9 +2245,9 @@ static inline void kiocb_clone(struct kiocb *kiocb, struct kiocb *kiocb_src, * lazytime mount option is enabled. We keep track of this * separately from I_DIRTY_SYNC in order to implement * lazytime. This gets cleared if I_DIRTY_INODE - * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. I.e. - * either I_DIRTY_TIME *or* I_DIRTY_INODE can be set in - * i_state, but not both. I_DIRTY_PAGES may still be set. + * (I_DIRTY_SYNC and/or I_DIRTY_DATASYNC) gets set. But + * I_DIRTY_TIME can still be set if I_DIRTY_SYNC is already + * in place. * I_NEW Serves as both a mutex and completion notification. * New inodes set I_NEW. If two processes both create * the same inode, one of them will release its inode and
Currently the I_DIRTY_TIME will never get set if the inode already has I_DIRTY_INODE with assumption that it supersedes I_DIRTY_TIME. That's true, however ext4 will only update the on-disk inode in ->dirty_inode(), not on actual writeback. As a result if the inode already has I_DIRTY_INODE state by the time we get to __mark_inode_dirty() only with I_DIRTY_TIME, the time was already filled into on-disk inode and will not get updated until the next I_DIRTY_INODE update, which might never come if we crash or get a power failure. The problem can be reproduced on ext4 by running xfstest generic/622 with -o iversion mount option. Fix it by allowing I_DIRTY_TIME to be set even if the inode already has I_DIRTY_INODE. Also make sure that the case is properly handled in writeback_single_inode() as well. Additionally changes in xfs_fs_dirty_inode() was made to accommodate for I_DIRTY_TIME in flag. Thanks Jan Kara for suggestions on how to make this work properly. Cc: Dave Chinner <david@fromorbit.com> Cc: Christoph Hellwig <hch@infradead.org> Signed-off-by: Lukas Czerner <lczerner@redhat.com> Suggested-by: Jan Kara <jack@suse.cz> --- v2: Reworked according to suggestions from Jan fs/fs-writeback.c | 34 ++++++++++++++++++++++------------ fs/xfs/xfs_super.c | 3 ++- include/linux/fs.h | 6 +++--- 3 files changed, 27 insertions(+), 16 deletions(-)