Message ID | 20241112221920.1105007-3-david@fromorbit.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | xfs: miscellaneous bug fixes | expand |
On Wed, Nov 13, 2024 at 09:05:15AM +1100, Dave Chinner wrote: > From: Supriya Wickrematillake <sup@sgi.com> Wow, there are still people working at SGI?? > I've been seeing this failure on during xfs/050 recently: > > XFS: Assertion failed: dst->d_spc_timer != 0, file: fs/xfs/xfs_qm_syscalls.c, line: 435 > .... > Call Trace: > <TASK> > xfs_qm_scall_getquota_fill_qc+0x2a2/0x2b0 > xfs_qm_scall_getquota_next+0x69/0xa0 > xfs_fs_get_nextdqblk+0x62/0xf0 > quota_getnextxquota+0xbf/0x320 > do_quotactl+0x1a1/0x410 > __se_sys_quotactl+0x126/0x310 > __x64_sys_quotactl+0x21/0x30 > x64_sys_call+0x2819/0x2ee0 > do_syscall_64+0x68/0x130 > entry_SYSCALL_64_after_hwframe+0x76/0x7e > > It turns out that the _qmount call has silently been failing to > unmount and mount the filesystem, so when the softlimit is pushed > past with a buffered write, it is not getting synced to disk before > the next quota report is being run. > > Hence when the quota report runs, we have 300 blocks of delalloc > data on an inode, with a soft limit of 200 blocks. XFS dquots > account delalloc reservations as used space, hence the dquot is over > the soft limit. > > However, we don't update the soft limit timers until we do a > transactional update of the dquot. That is, the dquot sits over the > soft limit without a softlimit timer being started until writeback > occurs and the allocation modifies the dquot and we call > xfs_qm_adjust_dqtimers() from xfs_trans_apply_dquot_deltas() in > xfs_trans_commit() context. > > This isn't really a problem, except for this debug code in > xfs_qm_scall_getquota_fill_qc(): > > #ifdef DEBUG > if (xfs_dquot_is_enforced(dqp) && dqp->q_id != 0) { > if ((dst->d_space > dst->d_spc_softlimit) && > (dst->d_spc_softlimit > 0)) { > ASSERT(dst->d_spc_timer != 0); > } > .... > > It asserts taht if the used block count is over the soft limit, > it *must* have a soft limit timer running. This is clearly not > the case, because we haven't committed the delalloc space to disk > yet. Hence the soft limit is only exceeded temporarily in memory > (which isn't an issue) and we start the timer the moment we exceed > the soft limit in journalled metadata. > > This debug was introduced in: > > commit 0d5ad8383061fbc0a9804fbb98218750000fe032 > Author: Supriya Wickrematillake <sup@sgi.com> > Date: Wed May 15 22:44:44 1996 +0000 > > initial checkin > quotactl syscall functions. > > The very first quota support commit back in 1996. This is zero-day > debug for Irix and, as it turns out, a zero-day bug in the debug > code because the delalloc code on Irix didn't update the softlimit > timers, either. > > IOWs, this issue has been in the code for 28 years. > > We obviously don't care if soft limit timers are a bit rubbery when > we have delalloc reservations in memory. Production systems running > quota reports have been exposed to this situation for 28 years and > nobody has noticed it, so the debug code is essentially worthless at > this point in time. > > We also have the on-disk dquot verifiers checking that the soft > limit timer is running whenever the dquot is over the soft limit > before we write it to disk and after we read it from disk. These > aren't firing, so it is clear the issue is purely a temporary > in-memory incoherency that I never would have noticed had the test > not silently failed to unmount the filesystem. > > Hence I'm simply going to trash this runtime debug because it isn't > useful in the slightest for catching quota bugs. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> Agreed! I've hit this once in a blue moon and didn't think it was especially useful either. Reviewed-by: Darrick J. Wong <djwong@kernel.org> --D > --- > fs/xfs/xfs_qm_syscalls.c | 13 ------------- > 1 file changed, 13 deletions(-) > > diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c > index 4eda50ae2d1c..0c78f30fa4a3 100644 > --- a/fs/xfs/xfs_qm_syscalls.c > +++ b/fs/xfs/xfs_qm_syscalls.c > @@ -427,19 +427,6 @@ xfs_qm_scall_getquota_fill_qc( > dst->d_ino_timer = 0; > dst->d_rt_spc_timer = 0; > } > - > -#ifdef DEBUG > - if (xfs_dquot_is_enforced(dqp) && dqp->q_id != 0) { > - if ((dst->d_space > dst->d_spc_softlimit) && > - (dst->d_spc_softlimit > 0)) { > - ASSERT(dst->d_spc_timer != 0); > - } > - if ((dst->d_ino_count > dqp->q_ino.softlimit) && > - (dqp->q_ino.softlimit > 0)) { > - ASSERT(dst->d_ino_timer != 0); > - } > - } > -#endif > } > > /* Return the quota information for the dquot matching id. */ > -- > 2.45.2 > >
On Tue, Nov 12, 2024 at 03:48:35PM -0800, Darrick J. Wong wrote: > On Wed, Nov 13, 2024 at 09:05:15AM +1100, Dave Chinner wrote: > > From: Supriya Wickrematillake <sup@sgi.com> > > Wow, there are still people working at SGI?? huh, that's a git-send-email screwup. It must have pulled that from the quote of the initial quota commit: > > This debug was introduced in: > > > > commit 0d5ad8383061fbc0a9804fbb98218750000fe032 > > Author: Supriya Wickrematillake <sup@sgi.com> > > Date: Wed May 15 22:44:44 1996 +0000 > > > > initial checkin > > quotactl syscall functions. Here, and decided to ignore the actual "From: dchinner@redhat" tag in the local commit message. Great work, git! > > The very first quota support commit back in 1996. This is zero-day > > debug for Irix and, as it turns out, a zero-day bug in the debug > > code because the delalloc code on Irix didn't update the softlimit > > timers, either. > > > > IOWs, this issue has been in the code for 28 years. > > > > We obviously don't care if soft limit timers are a bit rubbery when > > we have delalloc reservations in memory. Production systems running > > quota reports have been exposed to this situation for 28 years and > > nobody has noticed it, so the debug code is essentially worthless at > > this point in time. > > > > We also have the on-disk dquot verifiers checking that the soft > > limit timer is running whenever the dquot is over the soft limit > > before we write it to disk and after we read it from disk. These > > aren't firing, so it is clear the issue is purely a temporary > > in-memory incoherency that I never would have noticed had the test > > not silently failed to unmount the filesystem. > > > > Hence I'm simply going to trash this runtime debug because it isn't > > useful in the slightest for catching quota bugs. > > > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > > Agreed! I've hit this once in a blue moon and didn't think it was > especially useful either. > > Reviewed-by: Darrick J. Wong <djwong@kernel.org> Thanks! -Dave.
Looks good:
Reviewed-by: Christoph Hellwig <hch@lst.de>
diff --git a/fs/xfs/xfs_qm_syscalls.c b/fs/xfs/xfs_qm_syscalls.c index 4eda50ae2d1c..0c78f30fa4a3 100644 --- a/fs/xfs/xfs_qm_syscalls.c +++ b/fs/xfs/xfs_qm_syscalls.c @@ -427,19 +427,6 @@ xfs_qm_scall_getquota_fill_qc( dst->d_ino_timer = 0; dst->d_rt_spc_timer = 0; } - -#ifdef DEBUG - if (xfs_dquot_is_enforced(dqp) && dqp->q_id != 0) { - if ((dst->d_space > dst->d_spc_softlimit) && - (dst->d_spc_softlimit > 0)) { - ASSERT(dst->d_spc_timer != 0); - } - if ((dst->d_ino_count > dqp->q_ino.softlimit) && - (dqp->q_ino.softlimit > 0)) { - ASSERT(dst->d_ino_timer != 0); - } - } -#endif } /* Return the quota information for the dquot matching id. */