Message ID | 20210423131050.141140-3-bfoster@redhat.com (mailing list archive) |
---|---|
State | Superseded, archived |
Headers | show |
Series | xfs: set aside allocation btree blocks from block reservation | expand |
On 23 Apr 2021 at 18:40, Brian Foster wrote: > Introduce an in-core counter to track the sum of all allocbt blocks > used by the filesystem. This value is currently tracked per-ag via > the ->agf_btreeblks field in the AGF, which also happens to include > rmapbt blocks. A global, in-core count of allocbt blocks is required > to identify the subset of global ->m_fdblocks that consists of > unavailable blocks currently used for allocation btrees. To support > this calculation at block reservation time, construct a similar > global counter for allocbt blocks, populate it on first read of each > AGF and update it as allocbt blocks are used and released. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > fs/xfs/xfs_mount.h | 6 ++++++ > 3 files changed, 20 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index aaa19101bb2a..144e2d68245c 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > struct xfs_agf *agf; /* ag freelist header */ > struct xfs_perag *pag; /* per allocation group data */ > int error; > + uint32_t allocbt_blks; > > trace_xfs_alloc_read_agf(mp, agno); > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > pag->pagf_init = 1; > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > + > + /* > + * Update the global in-core allocbt block counter. Filter > + * rmapbt blocks from the on-disk counter because those are > + * managed by perag reservation. > + */ > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { pag->pagf_btreeblks gets incremented everytime a block is allocated to refill AGFL (via xfs_alloc_get_freelist()). Apart from the allobt trees, blocks for Rmap btree also get allocated from AGFL. Hence pag->pagf_btreeblks must be larger than agf->agf_rmap_blocks. Can you please describe the scenario in which pag->pagf_btreeblks has a value that is <= agf->agf_rmap_blocks? -- chandan
On Tue, Apr 27, 2021 at 03:58:16PM +0530, Chandan Babu R wrote: > On 23 Apr 2021 at 18:40, Brian Foster wrote: > > Introduce an in-core counter to track the sum of all allocbt blocks > > used by the filesystem. This value is currently tracked per-ag via > > the ->agf_btreeblks field in the AGF, which also happens to include > > rmapbt blocks. A global, in-core count of allocbt blocks is required > > to identify the subset of global ->m_fdblocks that consists of > > unavailable blocks currently used for allocation btrees. To support > > this calculation at block reservation time, construct a similar > > global counter for allocbt blocks, populate it on first read of each > > AGF and update it as allocbt blocks are used and released. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > > fs/xfs/xfs_mount.h | 6 ++++++ > > 3 files changed, 20 insertions(+) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index aaa19101bb2a..144e2d68245c 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > > struct xfs_agf *agf; /* ag freelist header */ > > struct xfs_perag *pag; /* per allocation group data */ > > int error; > > + uint32_t allocbt_blks; > > > > trace_xfs_alloc_read_agf(mp, agno); > > > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > > pag->pagf_init = 1; > > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > > + > > + /* > > + * Update the global in-core allocbt block counter. Filter > > + * rmapbt blocks from the on-disk counter because those are > > + * managed by perag reservation. > > + */ > > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { > > pag->pagf_btreeblks gets incremented everytime a block is allocated to refill > AGFL (via xfs_alloc_get_freelist()). Apart from the allobt trees, blocks for > Rmap btree also get allocated from AGFL. Hence pag->pagf_btreeblks must be > larger than agf->agf_rmap_blocks. > This function is actually to consume a block from the AGFL (as opposed to refill it). > Can you please describe the scenario in which pag->pagf_btreeblks has a value > that is <= agf->agf_rmap_blocks? > Ah, this was just an initialization quirk. I originally had an assert here and based the logic on the assumption that pagf_btreeblks >= agf_rmap_blocks, but alas: # mkfs.xfs -f -mrmapbt <dev> ... # xfs_db -c "agf 0" -c "p rmapblocks" -c "p btreeblks" <dev> rmapblocks = 1 btreeblks = 0 # Brian > -- > chandan >
On 27 Apr 2021 at 17:03, Brian Foster wrote: > On Tue, Apr 27, 2021 at 03:58:16PM +0530, Chandan Babu R wrote: >> On 23 Apr 2021 at 18:40, Brian Foster wrote: >> > Introduce an in-core counter to track the sum of all allocbt blocks >> > used by the filesystem. This value is currently tracked per-ag via >> > the ->agf_btreeblks field in the AGF, which also happens to include >> > rmapbt blocks. A global, in-core count of allocbt blocks is required >> > to identify the subset of global ->m_fdblocks that consists of >> > unavailable blocks currently used for allocation btrees. To support >> > this calculation at block reservation time, construct a similar >> > global counter for allocbt blocks, populate it on first read of each >> > AGF and update it as allocbt blocks are used and released. >> > >> > Signed-off-by: Brian Foster <bfoster@redhat.com> >> > --- >> > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ >> > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ >> > fs/xfs/xfs_mount.h | 6 ++++++ >> > 3 files changed, 20 insertions(+) >> > >> > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c >> > index aaa19101bb2a..144e2d68245c 100644 >> > --- a/fs/xfs/libxfs/xfs_alloc.c >> > +++ b/fs/xfs/libxfs/xfs_alloc.c >> > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( >> > struct xfs_agf *agf; /* ag freelist header */ >> > struct xfs_perag *pag; /* per allocation group data */ >> > int error; >> > + uint32_t allocbt_blks; >> > >> > trace_xfs_alloc_read_agf(mp, agno); >> > >> > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( >> > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); >> > pag->pagf_init = 1; >> > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); >> > + >> > + /* >> > + * Update the global in-core allocbt block counter. Filter >> > + * rmapbt blocks from the on-disk counter because those are >> > + * managed by perag reservation. >> > + */ >> > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { >> >> pag->pagf_btreeblks gets incremented everytime a block is allocated to refill >> AGFL (via xfs_alloc_get_freelist()). Apart from the allobt trees, blocks for >> Rmap btree also get allocated from AGFL. Hence pag->pagf_btreeblks must be >> larger than agf->agf_rmap_blocks. >> > > This function is actually to consume a block from the AGFL (as opposed > to refill it). Sorry, I meant to say "allocate a block from AGFL". > >> Can you please describe the scenario in which pag->pagf_btreeblks has a value >> that is <= agf->agf_rmap_blocks? >> > > Ah, this was just an initialization quirk. I originally had an assert > here and based the logic on the assumption that pagf_btreeblks >= > agf_rmap_blocks, but alas: > > # mkfs.xfs -f -mrmapbt <dev> > ... > # xfs_db -c "agf 0" -c "p rmapblocks" -c "p btreeblks" <dev> > rmapblocks = 1 > btreeblks = 0 > # Thanks for clarifying my doubt. The patch looks good to me. Reviewed-by: Chandan Babu R <chandanrlinux@gmail.com>
On 4/23/21 6:10 AM, Brian Foster wrote: > Introduce an in-core counter to track the sum of all allocbt blocks > used by the filesystem. This value is currently tracked per-ag via > the ->agf_btreeblks field in the AGF, which also happens to include > rmapbt blocks. A global, in-core count of allocbt blocks is required > to identify the subset of global ->m_fdblocks that consists of > unavailable blocks currently used for allocation btrees. To support > this calculation at block reservation time, construct a similar > global counter for allocbt blocks, populate it on first read of each > AGF and update it as allocbt blocks are used and released. > > Signed-off-by: Brian Foster <bfoster@redhat.com> OK, makes sense Reviewed-by: Allison Henderson <allison.henderson@oracle.com> > --- > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > fs/xfs/xfs_mount.h | 6 ++++++ > 3 files changed, 20 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index aaa19101bb2a..144e2d68245c 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > struct xfs_agf *agf; /* ag freelist header */ > struct xfs_perag *pag; /* per allocation group data */ > int error; > + uint32_t allocbt_blks; > > trace_xfs_alloc_read_agf(mp, agno); > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > pag->pagf_init = 1; > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > + > + /* > + * Update the global in-core allocbt block counter. Filter > + * rmapbt blocks from the on-disk counter because those are > + * managed by perag reservation. > + */ > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { > + allocbt_blks = pag->pagf_btreeblks - > + be32_to_cpu(agf->agf_rmap_blocks); > + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > + } > } > #ifdef DEBUG > else if (!XFS_FORCED_SHUTDOWN(mp)) { > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c > index 8e01231b308e..9f5a45f7baed 100644 > --- a/fs/xfs/libxfs/xfs_alloc_btree.c > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c > @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( > return 0; > } > > + atomic64_inc(&cur->bc_mp->m_allocbt_blks); > xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); > > xfs_trans_agbtree_delta(cur->bc_tp, 1); > @@ -95,6 +96,7 @@ xfs_allocbt_free_block( > if (error) > return error; > > + atomic64_dec(&cur->bc_mp->m_allocbt_blks); > xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, > XFS_EXTENT_BUSY_SKIP_DISCARD); > xfs_trans_agbtree_delta(cur->bc_tp, -1); > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 81829d19596e..bb67274ee23f 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -170,6 +170,12 @@ typedef struct xfs_mount { > * extents or anything related to the rt device. > */ > struct percpu_counter m_delalloc_blks; > + /* > + * Global count of allocation btree blocks in use across all AGs. Only > + * used when perag reservation is enabled. Helps prevent block > + * reservation from attempting to reserve allocation btree blocks. > + */ > + atomic64_t m_allocbt_blks; > > struct radix_tree_root m_perag_tree; /* per-ag accounting info */ > spinlock_t m_perag_lock; /* lock for m_perag_tree */ >
On Fri, Apr 23, 2021 at 09:10:49AM -0400, Brian Foster wrote: > Introduce an in-core counter to track the sum of all allocbt blocks > used by the filesystem. This value is currently tracked per-ag via > the ->agf_btreeblks field in the AGF, which also happens to include > rmapbt blocks. A global, in-core count of allocbt blocks is required > to identify the subset of global ->m_fdblocks that consists of > unavailable blocks currently used for allocation btrees. To support > this calculation at block reservation time, construct a similar > global counter for allocbt blocks, populate it on first read of each > AGF and update it as allocbt blocks are used and released. > > Signed-off-by: Brian Foster <bfoster@redhat.com> > --- > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > fs/xfs/xfs_mount.h | 6 ++++++ > 3 files changed, 20 insertions(+) > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > index aaa19101bb2a..144e2d68245c 100644 > --- a/fs/xfs/libxfs/xfs_alloc.c > +++ b/fs/xfs/libxfs/xfs_alloc.c > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > struct xfs_agf *agf; /* ag freelist header */ > struct xfs_perag *pag; /* per allocation group data */ > int error; > + uint32_t allocbt_blks; > > trace_xfs_alloc_read_agf(mp, agno); > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > pag->pagf_init = 1; > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > + > + /* > + * Update the global in-core allocbt block counter. Filter > + * rmapbt blocks from the on-disk counter because those are > + * managed by perag reservation. > + */ > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { As pointed out elsewhere in the thread, agf_rmap_blocks counts the total number of blocks in the rmapbt (whereas agf_btreeblks counts the number of non-root blocks in all three free space btrees). Does this need a change? int delta = (int)pag->pagf_btreeblks - (be32_to_cpu(...) - 1); if (delta > 0) atomic64_add(delta, &mp->m_allocbt_blks); --D > + allocbt_blks = pag->pagf_btreeblks - > + be32_to_cpu(agf->agf_rmap_blocks); > + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > + } > } > #ifdef DEBUG > else if (!XFS_FORCED_SHUTDOWN(mp)) { > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c > index 8e01231b308e..9f5a45f7baed 100644 > --- a/fs/xfs/libxfs/xfs_alloc_btree.c > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c > @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( > return 0; > } > > + atomic64_inc(&cur->bc_mp->m_allocbt_blks); > xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); > > xfs_trans_agbtree_delta(cur->bc_tp, 1); > @@ -95,6 +96,7 @@ xfs_allocbt_free_block( > if (error) > return error; > > + atomic64_dec(&cur->bc_mp->m_allocbt_blks); > xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, > XFS_EXTENT_BUSY_SKIP_DISCARD); > xfs_trans_agbtree_delta(cur->bc_tp, -1); > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > index 81829d19596e..bb67274ee23f 100644 > --- a/fs/xfs/xfs_mount.h > +++ b/fs/xfs/xfs_mount.h > @@ -170,6 +170,12 @@ typedef struct xfs_mount { > * extents or anything related to the rt device. > */ > struct percpu_counter m_delalloc_blks; > + /* > + * Global count of allocation btree blocks in use across all AGs. Only > + * used when perag reservation is enabled. Helps prevent block > + * reservation from attempting to reserve allocation btree blocks. > + */ > + atomic64_t m_allocbt_blks; > > struct radix_tree_root m_perag_tree; /* per-ag accounting info */ > spinlock_t m_perag_lock; /* lock for m_perag_tree */ > -- > 2.26.3 >
On Tue, Apr 27, 2021 at 09:15:09PM -0700, Darrick J. Wong wrote: > On Fri, Apr 23, 2021 at 09:10:49AM -0400, Brian Foster wrote: > > Introduce an in-core counter to track the sum of all allocbt blocks > > used by the filesystem. This value is currently tracked per-ag via > > the ->agf_btreeblks field in the AGF, which also happens to include > > rmapbt blocks. A global, in-core count of allocbt blocks is required > > to identify the subset of global ->m_fdblocks that consists of > > unavailable blocks currently used for allocation btrees. To support > > this calculation at block reservation time, construct a similar > > global counter for allocbt blocks, populate it on first read of each > > AGF and update it as allocbt blocks are used and released. > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > --- > > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > > fs/xfs/xfs_mount.h | 6 ++++++ > > 3 files changed, 20 insertions(+) > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > index aaa19101bb2a..144e2d68245c 100644 > > --- a/fs/xfs/libxfs/xfs_alloc.c > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > > struct xfs_agf *agf; /* ag freelist header */ > > struct xfs_perag *pag; /* per allocation group data */ > > int error; > > + uint32_t allocbt_blks; > > > > trace_xfs_alloc_read_agf(mp, agno); > > > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > > pag->pagf_init = 1; > > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > > + > > + /* > > + * Update the global in-core allocbt block counter. Filter > > + * rmapbt blocks from the on-disk counter because those are > > + * managed by perag reservation. > > + */ > > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { > > As pointed out elsewhere in the thread, agf_rmap_blocks counts the total > number of blocks in the rmapbt (whereas agf_btreeblks counts the number > of non-root blocks in all three free space btrees). Does this need a > change? > > int delta = (int)pag->pagf_btreeblks - (be32_to_cpu(...) - 1); > if (delta > 0) > atomic64_add(delta, &mp->m_allocbt_blks); > Hm yes, this makes more sense. Will fix and update the comment.. Brian > --D > > > + allocbt_blks = pag->pagf_btreeblks - > > + be32_to_cpu(agf->agf_rmap_blocks); > > + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > > + } > > } > > #ifdef DEBUG > > else if (!XFS_FORCED_SHUTDOWN(mp)) { > > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c > > index 8e01231b308e..9f5a45f7baed 100644 > > --- a/fs/xfs/libxfs/xfs_alloc_btree.c > > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c > > @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( > > return 0; > > } > > > > + atomic64_inc(&cur->bc_mp->m_allocbt_blks); > > xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); > > > > xfs_trans_agbtree_delta(cur->bc_tp, 1); > > @@ -95,6 +96,7 @@ xfs_allocbt_free_block( > > if (error) > > return error; > > > > + atomic64_dec(&cur->bc_mp->m_allocbt_blks); > > xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, > > XFS_EXTENT_BUSY_SKIP_DISCARD); > > xfs_trans_agbtree_delta(cur->bc_tp, -1); > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > index 81829d19596e..bb67274ee23f 100644 > > --- a/fs/xfs/xfs_mount.h > > +++ b/fs/xfs/xfs_mount.h > > @@ -170,6 +170,12 @@ typedef struct xfs_mount { > > * extents or anything related to the rt device. > > */ > > struct percpu_counter m_delalloc_blks; > > + /* > > + * Global count of allocation btree blocks in use across all AGs. Only > > + * used when perag reservation is enabled. Helps prevent block > > + * reservation from attempting to reserve allocation btree blocks. > > + */ > > + atomic64_t m_allocbt_blks; > > > > struct radix_tree_root m_perag_tree; /* per-ag accounting info */ > > spinlock_t m_perag_lock; /* lock for m_perag_tree */ > > -- > > 2.26.3 > > >
On Wed, Apr 28, 2021 at 11:01:24AM -0400, Brian Foster wrote: > On Tue, Apr 27, 2021 at 09:15:09PM -0700, Darrick J. Wong wrote: > > On Fri, Apr 23, 2021 at 09:10:49AM -0400, Brian Foster wrote: > > > Introduce an in-core counter to track the sum of all allocbt blocks > > > used by the filesystem. This value is currently tracked per-ag via > > > the ->agf_btreeblks field in the AGF, which also happens to include > > > rmapbt blocks. A global, in-core count of allocbt blocks is required > > > to identify the subset of global ->m_fdblocks that consists of > > > unavailable blocks currently used for allocation btrees. To support > > > this calculation at block reservation time, construct a similar > > > global counter for allocbt blocks, populate it on first read of each > > > AGF and update it as allocbt blocks are used and released. > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > --- > > > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > > > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > > > fs/xfs/xfs_mount.h | 6 ++++++ > > > 3 files changed, 20 insertions(+) > > > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > > index aaa19101bb2a..144e2d68245c 100644 > > > --- a/fs/xfs/libxfs/xfs_alloc.c > > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > > > struct xfs_agf *agf; /* ag freelist header */ > > > struct xfs_perag *pag; /* per allocation group data */ > > > int error; > > > + uint32_t allocbt_blks; > > > > > > trace_xfs_alloc_read_agf(mp, agno); > > > > > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > > > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > > > pag->pagf_init = 1; > > > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > > > + > > > + /* > > > + * Update the global in-core allocbt block counter. Filter > > > + * rmapbt blocks from the on-disk counter because those are > > > + * managed by perag reservation. > > > + */ > > > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { > > > > As pointed out elsewhere in the thread, agf_rmap_blocks counts the total > > number of blocks in the rmapbt (whereas agf_btreeblks counts the number > > of non-root blocks in all three free space btrees). Does this need a > > change? > > > > int delta = (int)pag->pagf_btreeblks - (be32_to_cpu(...) - 1); > > if (delta > 0) > > atomic64_add(delta, &mp->m_allocbt_blks); > > > > Hm yes, this makes more sense. Will fix and update the comment.. > I ended up with the following: ... /* * Update the in-core allocbt counter. Filter out the rmapbt * subset of the btreeblks counter because the rmapbt is managed * by perag reservation. Subtract one for the rmapbt root block * because the rmap counter includes it while the btreeblks * counter only tracks non-root blocks. */ allocbt_blks = pag->pagf_btreeblks; if (xfs_sb_version_hasrmapbt(&mp->m_sb)) allocbt_blks -= be32_to_cpu(agf->agf_rmap_blocks) - 1; if (allocbt_blks > 0) atomic64_add(allocbt_blks, &mp->m_allocbt_blks); ... Any thoughts before I post v5? Brian > Brian > > > --D > > > > > + allocbt_blks = pag->pagf_btreeblks - > > > + be32_to_cpu(agf->agf_rmap_blocks); > > > + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > > > + } > > > } > > > #ifdef DEBUG > > > else if (!XFS_FORCED_SHUTDOWN(mp)) { > > > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c > > > index 8e01231b308e..9f5a45f7baed 100644 > > > --- a/fs/xfs/libxfs/xfs_alloc_btree.c > > > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c > > > @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( > > > return 0; > > > } > > > > > > + atomic64_inc(&cur->bc_mp->m_allocbt_blks); > > > xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); > > > > > > xfs_trans_agbtree_delta(cur->bc_tp, 1); > > > @@ -95,6 +96,7 @@ xfs_allocbt_free_block( > > > if (error) > > > return error; > > > > > > + atomic64_dec(&cur->bc_mp->m_allocbt_blks); > > > xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, > > > XFS_EXTENT_BUSY_SKIP_DISCARD); > > > xfs_trans_agbtree_delta(cur->bc_tp, -1); > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > > index 81829d19596e..bb67274ee23f 100644 > > > --- a/fs/xfs/xfs_mount.h > > > +++ b/fs/xfs/xfs_mount.h > > > @@ -170,6 +170,12 @@ typedef struct xfs_mount { > > > * extents or anything related to the rt device. > > > */ > > > struct percpu_counter m_delalloc_blks; > > > + /* > > > + * Global count of allocation btree blocks in use across all AGs. Only > > > + * used when perag reservation is enabled. Helps prevent block > > > + * reservation from attempting to reserve allocation btree blocks. > > > + */ > > > + atomic64_t m_allocbt_blks; > > > > > > struct radix_tree_root m_perag_tree; /* per-ag accounting info */ > > > spinlock_t m_perag_lock; /* lock for m_perag_tree */ > > > -- > > > 2.26.3 > > > > >
On Wed, Apr 28, 2021 at 11:29:57AM -0400, Brian Foster wrote: > On Wed, Apr 28, 2021 at 11:01:24AM -0400, Brian Foster wrote: > > On Tue, Apr 27, 2021 at 09:15:09PM -0700, Darrick J. Wong wrote: > > > On Fri, Apr 23, 2021 at 09:10:49AM -0400, Brian Foster wrote: > > > > Introduce an in-core counter to track the sum of all allocbt blocks > > > > used by the filesystem. This value is currently tracked per-ag via > > > > the ->agf_btreeblks field in the AGF, which also happens to include > > > > rmapbt blocks. A global, in-core count of allocbt blocks is required > > > > to identify the subset of global ->m_fdblocks that consists of > > > > unavailable blocks currently used for allocation btrees. To support > > > > this calculation at block reservation time, construct a similar > > > > global counter for allocbt blocks, populate it on first read of each > > > > AGF and update it as allocbt blocks are used and released. > > > > > > > > Signed-off-by: Brian Foster <bfoster@redhat.com> > > > > --- > > > > fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ > > > > fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ > > > > fs/xfs/xfs_mount.h | 6 ++++++ > > > > 3 files changed, 20 insertions(+) > > > > > > > > diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c > > > > index aaa19101bb2a..144e2d68245c 100644 > > > > --- a/fs/xfs/libxfs/xfs_alloc.c > > > > +++ b/fs/xfs/libxfs/xfs_alloc.c > > > > @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( > > > > struct xfs_agf *agf; /* ag freelist header */ > > > > struct xfs_perag *pag; /* per allocation group data */ > > > > int error; > > > > + uint32_t allocbt_blks; > > > > > > > > trace_xfs_alloc_read_agf(mp, agno); > > > > > > > > @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( > > > > pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); > > > > pag->pagf_init = 1; > > > > pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); > > > > + > > > > + /* > > > > + * Update the global in-core allocbt block counter. Filter > > > > + * rmapbt blocks from the on-disk counter because those are > > > > + * managed by perag reservation. > > > > + */ > > > > + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { > > > > > > As pointed out elsewhere in the thread, agf_rmap_blocks counts the total > > > number of blocks in the rmapbt (whereas agf_btreeblks counts the number > > > of non-root blocks in all three free space btrees). Does this need a > > > change? > > > > > > int delta = (int)pag->pagf_btreeblks - (be32_to_cpu(...) - 1); > > > if (delta > 0) > > > atomic64_add(delta, &mp->m_allocbt_blks); > > > > > > > Hm yes, this makes more sense. Will fix and update the comment.. > > > > I ended up with the following: > > ... > /* > * Update the in-core allocbt counter. Filter out the rmapbt > * subset of the btreeblks counter because the rmapbt is managed > * by perag reservation. Subtract one for the rmapbt root block > * because the rmap counter includes it while the btreeblks > * counter only tracks non-root blocks. > */ > allocbt_blks = pag->pagf_btreeblks; > if (xfs_sb_version_hasrmapbt(&mp->m_sb)) > allocbt_blks -= be32_to_cpu(agf->agf_rmap_blocks) - 1; > if (allocbt_blks > 0) > atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > ... > > Any thoughts before I post v5? Looks reasonable to me. :) --D > Brian > > > Brian > > > > > --D > > > > > > > + allocbt_blks = pag->pagf_btreeblks - > > > > + be32_to_cpu(agf->agf_rmap_blocks); > > > > + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); > > > > + } > > > > } > > > > #ifdef DEBUG > > > > else if (!XFS_FORCED_SHUTDOWN(mp)) { > > > > diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c > > > > index 8e01231b308e..9f5a45f7baed 100644 > > > > --- a/fs/xfs/libxfs/xfs_alloc_btree.c > > > > +++ b/fs/xfs/libxfs/xfs_alloc_btree.c > > > > @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( > > > > return 0; > > > > } > > > > > > > > + atomic64_inc(&cur->bc_mp->m_allocbt_blks); > > > > xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); > > > > > > > > xfs_trans_agbtree_delta(cur->bc_tp, 1); > > > > @@ -95,6 +96,7 @@ xfs_allocbt_free_block( > > > > if (error) > > > > return error; > > > > > > > > + atomic64_dec(&cur->bc_mp->m_allocbt_blks); > > > > xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, > > > > XFS_EXTENT_BUSY_SKIP_DISCARD); > > > > xfs_trans_agbtree_delta(cur->bc_tp, -1); > > > > diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h > > > > index 81829d19596e..bb67274ee23f 100644 > > > > --- a/fs/xfs/xfs_mount.h > > > > +++ b/fs/xfs/xfs_mount.h > > > > @@ -170,6 +170,12 @@ typedef struct xfs_mount { > > > > * extents or anything related to the rt device. > > > > */ > > > > struct percpu_counter m_delalloc_blks; > > > > + /* > > > > + * Global count of allocation btree blocks in use across all AGs. Only > > > > + * used when perag reservation is enabled. Helps prevent block > > > > + * reservation from attempting to reserve allocation btree blocks. > > > > + */ > > > > + atomic64_t m_allocbt_blks; > > > > > > > > struct radix_tree_root m_perag_tree; /* per-ag accounting info */ > > > > spinlock_t m_perag_lock; /* lock for m_perag_tree */ > > > > -- > > > > 2.26.3 > > > > > > > >
diff --git a/fs/xfs/libxfs/xfs_alloc.c b/fs/xfs/libxfs/xfs_alloc.c index aaa19101bb2a..144e2d68245c 100644 --- a/fs/xfs/libxfs/xfs_alloc.c +++ b/fs/xfs/libxfs/xfs_alloc.c @@ -3036,6 +3036,7 @@ xfs_alloc_read_agf( struct xfs_agf *agf; /* ag freelist header */ struct xfs_perag *pag; /* per allocation group data */ int error; + uint32_t allocbt_blks; trace_xfs_alloc_read_agf(mp, agno); @@ -3066,6 +3067,17 @@ xfs_alloc_read_agf( pag->pagf_refcount_level = be32_to_cpu(agf->agf_refcount_level); pag->pagf_init = 1; pag->pagf_agflreset = xfs_agfl_needs_reset(mp, agf); + + /* + * Update the global in-core allocbt block counter. Filter + * rmapbt blocks from the on-disk counter because those are + * managed by perag reservation. + */ + if (pag->pagf_btreeblks > be32_to_cpu(agf->agf_rmap_blocks)) { + allocbt_blks = pag->pagf_btreeblks - + be32_to_cpu(agf->agf_rmap_blocks); + atomic64_add(allocbt_blks, &mp->m_allocbt_blks); + } } #ifdef DEBUG else if (!XFS_FORCED_SHUTDOWN(mp)) { diff --git a/fs/xfs/libxfs/xfs_alloc_btree.c b/fs/xfs/libxfs/xfs_alloc_btree.c index 8e01231b308e..9f5a45f7baed 100644 --- a/fs/xfs/libxfs/xfs_alloc_btree.c +++ b/fs/xfs/libxfs/xfs_alloc_btree.c @@ -71,6 +71,7 @@ xfs_allocbt_alloc_block( return 0; } + atomic64_inc(&cur->bc_mp->m_allocbt_blks); xfs_extent_busy_reuse(cur->bc_mp, cur->bc_ag.agno, bno, 1, false); xfs_trans_agbtree_delta(cur->bc_tp, 1); @@ -95,6 +96,7 @@ xfs_allocbt_free_block( if (error) return error; + atomic64_dec(&cur->bc_mp->m_allocbt_blks); xfs_extent_busy_insert(cur->bc_tp, be32_to_cpu(agf->agf_seqno), bno, 1, XFS_EXTENT_BUSY_SKIP_DISCARD); xfs_trans_agbtree_delta(cur->bc_tp, -1); diff --git a/fs/xfs/xfs_mount.h b/fs/xfs/xfs_mount.h index 81829d19596e..bb67274ee23f 100644 --- a/fs/xfs/xfs_mount.h +++ b/fs/xfs/xfs_mount.h @@ -170,6 +170,12 @@ typedef struct xfs_mount { * extents or anything related to the rt device. */ struct percpu_counter m_delalloc_blks; + /* + * Global count of allocation btree blocks in use across all AGs. Only + * used when perag reservation is enabled. Helps prevent block + * reservation from attempting to reserve allocation btree blocks. + */ + atomic64_t m_allocbt_blks; struct radix_tree_root m_perag_tree; /* per-ag accounting info */ spinlock_t m_perag_lock; /* lock for m_perag_tree */
Introduce an in-core counter to track the sum of all allocbt blocks used by the filesystem. This value is currently tracked per-ag via the ->agf_btreeblks field in the AGF, which also happens to include rmapbt blocks. A global, in-core count of allocbt blocks is required to identify the subset of global ->m_fdblocks that consists of unavailable blocks currently used for allocation btrees. To support this calculation at block reservation time, construct a similar global counter for allocbt blocks, populate it on first read of each AGF and update it as allocbt blocks are used and released. Signed-off-by: Brian Foster <bfoster@redhat.com> --- fs/xfs/libxfs/xfs_alloc.c | 12 ++++++++++++ fs/xfs/libxfs/xfs_alloc_btree.c | 2 ++ fs/xfs/xfs_mount.h | 6 ++++++ 3 files changed, 20 insertions(+)