diff mbox series

[v2,2/3] xfs: transaction subsystem quiesce mechanism

Message ID 20210406144238.814558-3-bfoster@redhat.com (mailing list archive)
State Deferred, archived
Headers show
Series xfs: rework quotaoff to avoid log deadlock | expand

Commit Message

Brian Foster April 6, 2021, 2:42 p.m. UTC
The updated quotaoff logging algorithm depends on a runtime quiesce
of the transaction subsystem to guarantee all transactions after a
certain point detect quota subsystem changes. Implement this
mechanism using an internal lock, similar to the external filesystem
freeze mechanism. This is also somewhat analogous to the old percpu
transaction counter mechanism, but we don't actually need a counter.

Signed-off-by: Brian Foster <bfoster@redhat.com>
---
 fs/xfs/xfs_aops.c  |  2 ++
 fs/xfs/xfs_mount.h |  3 +++
 fs/xfs/xfs_super.c |  8 ++++++++
 fs/xfs/xfs_trans.c |  4 ++--
 fs/xfs/xfs_trans.h | 20 ++++++++++++++++++++
 5 files changed, 35 insertions(+), 2 deletions(-)

Comments

Christoph Hellwig April 7, 2021, 8 a.m. UTC | #1
On Tue, Apr 06, 2021 at 10:42:37AM -0400, Brian Foster wrote:
> The updated quotaoff logging algorithm depends on a runtime quiesce
> of the transaction subsystem to guarantee all transactions after a
> certain point detect quota subsystem changes. Implement this
> mechanism using an internal lock, similar to the external filesystem
> freeze mechanism. This is also somewhat analogous to the old percpu
> transaction counter mechanism, but we don't actually need a counter.

Stupid question that already came up when seeing the replies to my
s_inodes patch:  Why do we even care about quotaoff?  Is there any
real life use case for quotaoff, at least the kind that disables
accounting (vs enforcement)?  IMHO we spend a lot of effort on this
corner case that has no practical value, and just removing support
for quotaoff might serve us much better in the long run.
Brian Foster April 7, 2021, 11:36 a.m. UTC | #2
On Wed, Apr 07, 2021 at 09:00:41AM +0100, Christoph Hellwig wrote:
> On Tue, Apr 06, 2021 at 10:42:37AM -0400, Brian Foster wrote:
> > The updated quotaoff logging algorithm depends on a runtime quiesce
> > of the transaction subsystem to guarantee all transactions after a
> > certain point detect quota subsystem changes. Implement this
> > mechanism using an internal lock, similar to the external filesystem
> > freeze mechanism. This is also somewhat analogous to the old percpu
> > transaction counter mechanism, but we don't actually need a counter.
> 
> Stupid question that already came up when seeing the replies to my
> s_inodes patch:  Why do we even care about quotaoff?  Is there any
> real life use case for quotaoff, at least the kind that disables
> accounting (vs enforcement)?  IMHO we spend a lot of effort on this
> corner case that has no practical value, and just removing support
> for quotaoff might serve us much better in the long run.
> 

Hm, fair point. I think the historical fragility and complexity makes it
reasonable to question whether it's worth continued support. Looking
back through my notes, ISTM that the original report of the log
reservation deadlock came from fstests, so not necessarily an end user
report. I'm not aware of any real user reports around quotaoff, but then
again it's fairly boring functionality that probably just works most of
the time. It's kind of hard to surmise external dependencies from that
alone.

Personally, I'd probably have to think about it some more, but initially
I don't have any strong objection to removing quotaoff support. More
practically, I suspect we'd have to deprecate it for some period of time
given that it's a generic interface, has userspace tools, regression
tests, etc., and may or may not have real users who might want the
opportunity to object (or adjust).

Though perhaps potentially avoiding that mess is what you mean by "...
disables accounting vs.  enforcement." I.e., retain the interface and
general ability to turn off enforcement, but require a mount cycle in
the future to disable accounting..? Hmm... that seems like a potentially
nicer/easier path forward and a less disruptive change. I wonder even if
we could just (eventually) ignore the accounting disablement flags from
userspace and if any users would have reason to care about that change
in behavior.

Brian
Christoph Hellwig April 7, 2021, 1:24 p.m. UTC | #3
On Wed, Apr 07, 2021 at 07:36:37AM -0400, Brian Foster wrote:
> Personally, I'd probably have to think about it some more, but initially
> I don't have any strong objection to removing quotaoff support. More
> practically, I suspect we'd have to deprecate it for some period of time
> given that it's a generic interface, has userspace tools, regression
> tests, etc., and may or may not have real users who might want the
> opportunity to object (or adjust).
> 
> Though perhaps potentially avoiding that mess is what you mean by "...
> disables accounting vs.  enforcement." I.e., retain the interface and
> general ability to turn off enforcement, but require a mount cycle in
> the future to disable accounting..? Hmm... that seems like a potentially
> nicer/easier path forward and a less disruptive change. I wonder even if
> we could just (eventually) ignore the accounting disablement flags from
> userspace and if any users would have reason to care about that change
> in behavior.

I'm currently testing a series that just ignores disabling of accounting
and logs a message and that seems to do ok so far.  I'll check if
clearing the on-disk flags as well could work out even better.
Darrick J. Wong April 7, 2021, 3:50 p.m. UTC | #4
On Wed, Apr 07, 2021 at 02:24:55PM +0100, Christoph Hellwig wrote:
> On Wed, Apr 07, 2021 at 07:36:37AM -0400, Brian Foster wrote:
> > Personally, I'd probably have to think about it some more, but initially
> > I don't have any strong objection to removing quotaoff support. More
> > practically, I suspect we'd have to deprecate it for some period of time
> > given that it's a generic interface, has userspace tools, regression
> > tests, etc., and may or may not have real users who might want the
> > opportunity to object (or adjust).
> > 
> > Though perhaps potentially avoiding that mess is what you mean by "...
> > disables accounting vs.  enforcement." I.e., retain the interface and
> > general ability to turn off enforcement, but require a mount cycle in
> > the future to disable accounting..? Hmm... that seems like a potentially
> > nicer/easier path forward and a less disruptive change. I wonder even if
> > we could just (eventually) ignore the accounting disablement flags from
> > userspace and if any users would have reason to care about that change
> > in behavior.
> 
> I'm currently testing a series that just ignores disabling of accounting
> and logs a message and that seems to do ok so far.  I'll check if
> clearing the on-disk flags as well could work out even better.

While I was rejiggering the inode walk parts of quotaoff I did wonder
why it even mattered to dqpurge the affected dquots **now**.  With patch
1 applied, we could just turn off the _ACTIVE flag and let reclaim
erase them slowly.

--D
diff mbox series

Patch

diff --git a/fs/xfs/xfs_aops.c b/fs/xfs/xfs_aops.c
index 1cc7c36d98e9..dce52943e5a7 100644
--- a/fs/xfs/xfs_aops.c
+++ b/fs/xfs/xfs_aops.c
@@ -58,6 +58,7 @@  xfs_setfilesize_trans_alloc(
 	 * we released it.
 	 */
 	__sb_writers_release(ioend->io_inode->i_sb, SB_FREEZE_FS);
+	percpu_rwsem_release(&mp->m_trans_rwsem, true, _THIS_IP_);
 	/*
 	 * We hand off the transaction to the completion thread now, so
 	 * clear the flag here.
@@ -127,6 +128,7 @@  xfs_setfilesize_ioend(
 	 */
 	xfs_trans_set_context(tp);
 	__sb_writers_acquired(VFS_I(ip)->i_sb, SB_FREEZE_FS);
+	percpu_rwsem_acquire(&ip->i_mount->m_trans_rwsem, true, _THIS_IP_);
 
 	/* we abort the update if there was an IO error */
 	if (error) {
diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h
index 81829d19596e..27a2a53abb4f 100644
--- a/fs/xfs/xfs_mount.h
+++ b/fs/xfs/xfs_mount.h
@@ -171,6 +171,9 @@  typedef struct xfs_mount {
 	 */
 	struct percpu_counter	m_delalloc_blks;
 
+	/* lock for transaction quiesce (used by quotaoff) */
+	struct percpu_rw_semaphore	m_trans_rwsem;
+
 	struct radix_tree_root	m_perag_tree;	/* per-ag accounting info */
 	spinlock_t		m_perag_lock;	/* lock for m_perag_tree */
 	uint64_t		m_resblks;	/* total reserved blocks */
diff --git a/fs/xfs/xfs_super.c b/fs/xfs/xfs_super.c
index 8d079c5e7099..64feab042dea 100644
--- a/fs/xfs/xfs_super.c
+++ b/fs/xfs/xfs_super.c
@@ -1001,8 +1001,15 @@  xfs_init_percpu_counters(
 	if (error)
 		goto free_fdblocks;
 
+	/* not a counter, but close enough... */
+	error = percpu_init_rwsem(&mp->m_trans_rwsem);
+	if (error)
+		goto free_delalloc;
+
 	return 0;
 
+free_delalloc:
+	percpu_counter_destroy(&mp->m_delalloc_blks);
 free_fdblocks:
 	percpu_counter_destroy(&mp->m_fdblocks);
 free_ifree:
@@ -1025,6 +1032,7 @@  static void
 xfs_destroy_percpu_counters(
 	struct xfs_mount	*mp)
 {
+	percpu_free_rwsem(&mp->m_trans_rwsem);
 	percpu_counter_destroy(&mp->m_icount);
 	percpu_counter_destroy(&mp->m_ifree);
 	percpu_counter_destroy(&mp->m_fdblocks);
diff --git a/fs/xfs/xfs_trans.c b/fs/xfs/xfs_trans.c
index bc25afc10245..c46943f0fc77 100644
--- a/fs/xfs/xfs_trans.c
+++ b/fs/xfs/xfs_trans.c
@@ -74,7 +74,7 @@  xfs_trans_free(
 	trace_xfs_trans_free(tp, _RET_IP_);
 	xfs_trans_clear_context(tp);
 	if (!(tp->t_flags & XFS_TRANS_NO_WRITECOUNT))
-		sb_end_intwrite(tp->t_mountp->m_super);
+		xfs_trans_end(tp->t_mountp);
 	xfs_trans_free_dqinfo(tp);
 	kmem_cache_free(xfs_trans_zone, tp);
 }
@@ -265,7 +265,7 @@  xfs_trans_alloc(
 retry:
 	tp = kmem_cache_zalloc(xfs_trans_zone, GFP_KERNEL | __GFP_NOFAIL);
 	if (!(flags & XFS_TRANS_NO_WRITECOUNT))
-		sb_start_intwrite(mp->m_super);
+		xfs_trans_start(mp);
 	xfs_trans_set_context(tp);
 
 	/*
diff --git a/fs/xfs/xfs_trans.h b/fs/xfs/xfs_trans.h
index 9dd745cf77c9..95da3e179150 100644
--- a/fs/xfs/xfs_trans.h
+++ b/fs/xfs/xfs_trans.h
@@ -226,6 +226,26 @@  xfs_trans_read_buf(
 				      flags, bpp, ops);
 }
 
+/*
+ * Context tracking helpers for external (i.e. fs freeze) and internal
+ * transaction quiesce.
+ */
+static inline void
+xfs_trans_start(
+	struct xfs_mount	*mp)
+{
+	sb_start_intwrite(mp->m_super);
+	percpu_down_read(&mp->m_trans_rwsem);
+}
+
+static inline void
+xfs_trans_end(
+	struct xfs_mount	*mp)
+{
+	percpu_up_read(&mp->m_trans_rwsem);
+	sb_end_intwrite(mp->m_super);
+}
+
 struct xfs_buf	*xfs_trans_getsb(struct xfs_trans *);
 
 void		xfs_trans_brelse(xfs_trans_t *, struct xfs_buf *);