Message ID | 20240118222216.4131379-4-david@fromorbit.com (mailing list archive) |
---|---|
State | New |
Headers | show |
Series | xfs: use large folios for buffers | expand |
On Fri, Jan 19, 2024 at 09:19:41AM +1100, Dave Chinner wrote: > From: Dave Chinner <dchinner@redhat.com> > > Now that we have the buffer cache using the folio API, we can extend > the use of folios to allocate high order folios for multi-page > buffers rather than an array of single pages that are then vmapped > into a contiguous range. > > This creates two types of buffers: single folio buffers that can > have arbitrary order, and multi-folio buffers made up of many single > page folios that get vmapped. The latter is essentially the existing > code, so there are no logic changes to handle this case. > > There are a few places where we iterate the folios on a buffer. > These need to be converted to handle the high order folio case. > Luckily, this only occurs when bp->b_folio_count == 1, and the code > for handling this case is just a simple application of the folio API > to the operations that need to be performed. > > The code that allocates buffers will optimistically attempt a high > order folio allocation as a fast path. If this high order allocation > fails, then we fall back to the existing multi-folio allocation > code. This now forms the slow allocation path, and hopefully will be > largely unused in normal conditions. > > This should improve performance of large buffer operations (e.g. > large directory block sizes) as we should now mostly avoid the > expense of vmapping large buffers (and the vmap lock contention that > can occur) as well as avoid the runtime pressure that frequently > accessing kernel vmapped pages put on the TLBs. > > Signed-off-by: Dave Chinner <dchinner@redhat.com> > --- > fs/xfs/xfs_buf.c | 150 +++++++++++++++++++++++++++++++++++++---------- > 1 file changed, 119 insertions(+), 31 deletions(-) > > diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c > index 15907e92d0d3..df363f17ea1a 100644 > --- a/fs/xfs/xfs_buf.c > +++ b/fs/xfs/xfs_buf.c > @@ -74,6 +74,10 @@ xfs_buf_is_vmapped( > return bp->b_addr && bp->b_folio_count > 1; > } > > +/* > + * See comment above xfs_buf_alloc_folios() about the constraints placed on > + * allocating vmapped buffers. > + */ > static inline int > xfs_buf_vmap_len( > struct xfs_buf *bp) > @@ -344,14 +348,72 @@ xfs_buf_alloc_kmem( > bp->b_addr = NULL; > return -ENOMEM; > } > - bp->b_offset = offset_in_page(bp->b_addr); > bp->b_folios = bp->b_folio_array; > bp->b_folios[0] = kmem_to_folio(bp->b_addr); > + bp->b_offset = offset_in_folio(bp->b_folios[0], bp->b_addr); > bp->b_folio_count = 1; > bp->b_flags |= _XBF_KMEM; > return 0; > } > > +/* > + * Allocating a high order folio makes the assumption that buffers are a > + * power-of-2 size so that ilog2() returns the exact order needed to fit > + * the contents of the buffer. Buffer lengths are mostly a power of two, > + * so this is not an unreasonable approach to take by default. > + * > + * The exception here are user xattr data buffers, which can be arbitrarily > + * sized up to 64kB plus structure metadata. In that case, round up the order. > + */ > +static bool > +xfs_buf_alloc_folio( > + struct xfs_buf *bp, > + gfp_t gfp_mask) > +{ > + int length = BBTOB(bp->b_length); > + int order; > + > + order = ilog2(length); > + if ((1 << order) < length) > + order = ilog2(length - 1) + 1; > + > + if (order <= PAGE_SHIFT) > + order = 0; > + else > + order -= PAGE_SHIFT; > + > + bp->b_folio_array[0] = folio_alloc(gfp_mask, order); > + if (!bp->b_folio_array[0]) > + return false; > + > + bp->b_folios = bp->b_folio_array; > + bp->b_folio_count = 1; > + bp->b_flags |= _XBF_FOLIOS; > + return true; Hmm. So I guess with this patchset, either we get one multi-page large folio for the whole buffer, or we fall back to the array of basepage sized folios? /me wonders if the extra flexibility from alloc_folio_bulk_array and a folioized vm_map_ram would eliminate all this special casing? Uhoh, lights flickering again and ice crashing off the roof. I better go before the lights go out again. :( --D > +} > + > +/* > + * When we allocate folios for a buffer, we end up with one of two types of > + * buffer. > + * > + * The first type is a single folio buffer - this may be a high order > + * folio or just a single page sized folio, but either way they get treated the > + * same way by the rest of the code - the buffer memory spans a single > + * contiguous memory region that we don't have to map and unmap to access the > + * data directly. > + * > + * The second type of buffer is the multi-folio buffer. These are *always* made > + * up of single page folios so that they can be fed to vmap_ram() to return a > + * contiguous memory region we can access the data through, or mark it as > + * XBF_UNMAPPED and access the data directly through individual folio_address() > + * calls. > + * > + * We don't use high order folios for this second type of buffer (yet) because > + * having variable size folios makes offset-to-folio indexing and iteration of > + * the data range more complex than if they are fixed size. This case should now > + * be the slow path, though, so unless we regularly fail to allocate high order > + * folios, there should be little need to optimise this path. > + */ > static int > xfs_buf_alloc_folios( > struct xfs_buf *bp, > @@ -363,7 +425,15 @@ xfs_buf_alloc_folios( > if (flags & XBF_READ_AHEAD) > gfp_mask |= __GFP_NORETRY; > > - /* Make sure that we have a page list */ > + /* Assure zeroed buffer for non-read cases. */ > + if (!(flags & XBF_READ)) > + gfp_mask |= __GFP_ZERO; > + > + /* Optimistically attempt a single high order folio allocation. */ > + if (xfs_buf_alloc_folio(bp, gfp_mask)) > + return 0; > + > + /* Fall back to allocating an array of single page folios. */ > bp->b_folio_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE); > if (bp->b_folio_count <= XB_FOLIOS) { > bp->b_folios = bp->b_folio_array; > @@ -375,9 +445,6 @@ xfs_buf_alloc_folios( > } > bp->b_flags |= _XBF_FOLIOS; > > - /* Assure zeroed buffer for non-read cases. */ > - if (!(flags & XBF_READ)) > - gfp_mask |= __GFP_ZERO; > > /* > * Bulk filling of pages can take multiple calls. Not filling the entire > @@ -418,7 +485,7 @@ _xfs_buf_map_folios( > { > ASSERT(bp->b_flags & _XBF_FOLIOS); > if (bp->b_folio_count == 1) { > - /* A single page buffer is always mappable */ > + /* A single folio buffer is always mappable */ > bp->b_addr = folio_address(bp->b_folios[0]); > } else if (flags & XBF_UNMAPPED) { > bp->b_addr = NULL; > @@ -1465,20 +1532,28 @@ xfs_buf_ioapply_map( > int *count, > blk_opf_t op) > { > - int page_index; > - unsigned int total_nr_pages = bp->b_folio_count; > - int nr_pages; > + int folio_index; > + unsigned int total_nr_folios = bp->b_folio_count; > + int nr_folios; > struct bio *bio; > sector_t sector = bp->b_maps[map].bm_bn; > int size; > int offset; > > - /* skip the pages in the buffer before the start offset */ > - page_index = 0; > + /* > + * If the start offset if larger than a single page, we need to be > + * careful. We might have a high order folio, in which case the indexing > + * is from the start of the buffer. However, if we have more than one > + * folio single page folio in the buffer, we need to skip the folios in > + * the buffer before the start offset. > + */ > + folio_index = 0; > offset = *buf_offset; > - while (offset >= PAGE_SIZE) { > - page_index++; > - offset -= PAGE_SIZE; > + if (bp->b_folio_count > 1) { > + while (offset >= PAGE_SIZE) { > + folio_index++; > + offset -= PAGE_SIZE; > + } > } > > /* > @@ -1491,28 +1566,28 @@ xfs_buf_ioapply_map( > > next_chunk: > atomic_inc(&bp->b_io_remaining); > - nr_pages = bio_max_segs(total_nr_pages); > + nr_folios = bio_max_segs(total_nr_folios); > > - bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO); > + bio = bio_alloc(bp->b_target->bt_bdev, nr_folios, op, GFP_NOIO); > bio->bi_iter.bi_sector = sector; > bio->bi_end_io = xfs_buf_bio_end_io; > bio->bi_private = bp; > > - for (; size && nr_pages; nr_pages--, page_index++) { > - int rbytes, nbytes = PAGE_SIZE - offset; > + for (; size && nr_folios; nr_folios--, folio_index++) { > + struct folio *folio = bp->b_folios[folio_index]; > + int nbytes = folio_size(folio) - offset; > > if (nbytes > size) > nbytes = size; > > - rbytes = bio_add_folio(bio, bp->b_folios[page_index], nbytes, > - offset); > - if (rbytes < nbytes) > + if (!bio_add_folio(bio, folio, nbytes, > + offset_in_folio(folio, offset))) > break; > > offset = 0; > sector += BTOBB(nbytes); > size -= nbytes; > - total_nr_pages--; > + total_nr_folios--; > } > > if (likely(bio->bi_iter.bi_size)) { > @@ -1722,6 +1797,13 @@ xfs_buf_offset( > if (bp->b_addr) > return bp->b_addr + offset; > > + /* Single folio buffers may use large folios. */ > + if (bp->b_folio_count == 1) { > + folio = bp->b_folios[0]; > + return folio_address(folio) + offset_in_folio(folio, offset); > + } > + > + /* Multi-folio buffers always use PAGE_SIZE folios */ > folio = bp->b_folios[offset >> PAGE_SHIFT]; > return folio_address(folio) + (offset & (PAGE_SIZE-1)); > } > @@ -1737,18 +1819,24 @@ xfs_buf_zero( > bend = boff + bsize; > while (boff < bend) { > struct folio *folio; > - int page_index, page_offset, csize; > + int folio_index, folio_offset, csize; > > - page_index = (boff + bp->b_offset) >> PAGE_SHIFT; > - page_offset = (boff + bp->b_offset) & ~PAGE_MASK; > - folio = bp->b_folios[page_index]; > - csize = min_t(size_t, PAGE_SIZE - page_offset, > + /* Single folio buffers may use large folios. */ > + if (bp->b_folio_count == 1) { > + folio = bp->b_folios[0]; > + folio_offset = offset_in_folio(folio, > + bp->b_offset + boff); > + } else { > + folio_index = (boff + bp->b_offset) >> PAGE_SHIFT; > + folio_offset = (boff + bp->b_offset) & ~PAGE_MASK; > + folio = bp->b_folios[folio_index]; > + } > + > + csize = min_t(size_t, folio_size(folio) - folio_offset, > BBTOB(bp->b_length) - boff); > + ASSERT((csize + folio_offset) <= folio_size(folio)); > > - ASSERT((csize + page_offset) <= PAGE_SIZE); > - > - memset(folio_address(folio) + page_offset, 0, csize); > - > + memset(folio_address(folio) + folio_offset, 0, csize); > boff += csize; > } > } > -- > 2.43.0 > >
On Thu, Jan 18, 2024 at 05:31:00PM -0800, Darrick J. Wong wrote: > On Fri, Jan 19, 2024 at 09:19:41AM +1100, Dave Chinner wrote: > > +/* > > + * Allocating a high order folio makes the assumption that buffers are a > > + * power-of-2 size so that ilog2() returns the exact order needed to fit > > + * the contents of the buffer. Buffer lengths are mostly a power of two, > > + * so this is not an unreasonable approach to take by default. > > + * > > + * The exception here are user xattr data buffers, which can be arbitrarily > > + * sized up to 64kB plus structure metadata. In that case, round up the order. > > + */ > > +static bool > > +xfs_buf_alloc_folio( > > + struct xfs_buf *bp, > > + gfp_t gfp_mask) > > +{ > > + int length = BBTOB(bp->b_length); > > + int order; > > + > > + order = ilog2(length); > > + if ((1 << order) < length) > > + order = ilog2(length - 1) + 1; > > + > > + if (order <= PAGE_SHIFT) > > + order = 0; > > + else > > + order -= PAGE_SHIFT; > > + > > + bp->b_folio_array[0] = folio_alloc(gfp_mask, order); > > + if (!bp->b_folio_array[0]) > > + return false; > > + > > + bp->b_folios = bp->b_folio_array; > > + bp->b_folio_count = 1; > > + bp->b_flags |= _XBF_FOLIOS; > > + return true; > > Hmm. So I guess with this patchset, either we get one multi-page large > folio for the whole buffer, or we fall back to the array of basepage > sized folios? Yes. > /me wonders if the extra flexibility from alloc_folio_bulk_array and a > folioized vm_map_ram would eliminate all this special casing? IMO, no. The basic requirement for a buffer is to provide contiguous memory space unless XBF_UNMAPPED is specified. In the case of contiguous space, we either get a single contiguous range allocated or we have to vmap multiple segments. The moment we can't get a contiguous memory range, we've lost, we're in the slow path and we should just do what we know will reliably work as efficiently as possible. In the case of XBF_UNMAPPED, if we get a single contiguous range we can ignore XBF_UNMAPPED, otherwise we should just do what we know will reliably work as efficiently as possible. Hence I don't think trying to optimise the "we didn't get a contiguous memory allocation for the buffer" case for smaller multi-page folios because it just adds complexity and doesn't provide any real advantage over the PAGE_SIZE allocation path we do now. > Uhoh, lights flickering again and ice crashing off the roof. I better > go before the lights go out again. :( Fun fun! -Dave.
> + int length = BBTOB(bp->b_length); > + int order; > + > + order = ilog2(length); > + if ((1 << order) < length) > + order = ilog2(length - 1) + 1; > + > + if (order <= PAGE_SHIFT) > + order = 0; > + else > + order -= PAGE_SHIFT; Shouldn't this simply use get_order()?
On Sun, Jan 21, 2024 at 10:45:55PM -0800, Christoph Hellwig wrote: > > + int length = BBTOB(bp->b_length); > > + int order; > > + > > + order = ilog2(length); > > + if ((1 << order) < length) > > + order = ilog2(length - 1) + 1; > > + > > + if (order <= PAGE_SHIFT) > > + order = 0; > > + else > > + order -= PAGE_SHIFT; > > Shouldn't this simply use get_order()? Huh. Yes, it should. I went looking for a helper and didn't find one in the mm or folio code. Now you point it out, I find that it is in it's own asm header (include/asm-generic/getorder.h) so it's no wonder I didn't find it. Why is it in include/asm-generic? There's nothing asm related to that function or it's implementation.... -Dave.
diff --git a/fs/xfs/xfs_buf.c b/fs/xfs/xfs_buf.c index 15907e92d0d3..df363f17ea1a 100644 --- a/fs/xfs/xfs_buf.c +++ b/fs/xfs/xfs_buf.c @@ -74,6 +74,10 @@ xfs_buf_is_vmapped( return bp->b_addr && bp->b_folio_count > 1; } +/* + * See comment above xfs_buf_alloc_folios() about the constraints placed on + * allocating vmapped buffers. + */ static inline int xfs_buf_vmap_len( struct xfs_buf *bp) @@ -344,14 +348,72 @@ xfs_buf_alloc_kmem( bp->b_addr = NULL; return -ENOMEM; } - bp->b_offset = offset_in_page(bp->b_addr); bp->b_folios = bp->b_folio_array; bp->b_folios[0] = kmem_to_folio(bp->b_addr); + bp->b_offset = offset_in_folio(bp->b_folios[0], bp->b_addr); bp->b_folio_count = 1; bp->b_flags |= _XBF_KMEM; return 0; } +/* + * Allocating a high order folio makes the assumption that buffers are a + * power-of-2 size so that ilog2() returns the exact order needed to fit + * the contents of the buffer. Buffer lengths are mostly a power of two, + * so this is not an unreasonable approach to take by default. + * + * The exception here are user xattr data buffers, which can be arbitrarily + * sized up to 64kB plus structure metadata. In that case, round up the order. + */ +static bool +xfs_buf_alloc_folio( + struct xfs_buf *bp, + gfp_t gfp_mask) +{ + int length = BBTOB(bp->b_length); + int order; + + order = ilog2(length); + if ((1 << order) < length) + order = ilog2(length - 1) + 1; + + if (order <= PAGE_SHIFT) + order = 0; + else + order -= PAGE_SHIFT; + + bp->b_folio_array[0] = folio_alloc(gfp_mask, order); + if (!bp->b_folio_array[0]) + return false; + + bp->b_folios = bp->b_folio_array; + bp->b_folio_count = 1; + bp->b_flags |= _XBF_FOLIOS; + return true; +} + +/* + * When we allocate folios for a buffer, we end up with one of two types of + * buffer. + * + * The first type is a single folio buffer - this may be a high order + * folio or just a single page sized folio, but either way they get treated the + * same way by the rest of the code - the buffer memory spans a single + * contiguous memory region that we don't have to map and unmap to access the + * data directly. + * + * The second type of buffer is the multi-folio buffer. These are *always* made + * up of single page folios so that they can be fed to vmap_ram() to return a + * contiguous memory region we can access the data through, or mark it as + * XBF_UNMAPPED and access the data directly through individual folio_address() + * calls. + * + * We don't use high order folios for this second type of buffer (yet) because + * having variable size folios makes offset-to-folio indexing and iteration of + * the data range more complex than if they are fixed size. This case should now + * be the slow path, though, so unless we regularly fail to allocate high order + * folios, there should be little need to optimise this path. + */ static int xfs_buf_alloc_folios( struct xfs_buf *bp, @@ -363,7 +425,15 @@ xfs_buf_alloc_folios( if (flags & XBF_READ_AHEAD) gfp_mask |= __GFP_NORETRY; - /* Make sure that we have a page list */ + /* Assure zeroed buffer for non-read cases. */ + if (!(flags & XBF_READ)) + gfp_mask |= __GFP_ZERO; + + /* Optimistically attempt a single high order folio allocation. */ + if (xfs_buf_alloc_folio(bp, gfp_mask)) + return 0; + + /* Fall back to allocating an array of single page folios. */ bp->b_folio_count = DIV_ROUND_UP(BBTOB(bp->b_length), PAGE_SIZE); if (bp->b_folio_count <= XB_FOLIOS) { bp->b_folios = bp->b_folio_array; @@ -375,9 +445,6 @@ xfs_buf_alloc_folios( } bp->b_flags |= _XBF_FOLIOS; - /* Assure zeroed buffer for non-read cases. */ - if (!(flags & XBF_READ)) - gfp_mask |= __GFP_ZERO; /* * Bulk filling of pages can take multiple calls. Not filling the entire @@ -418,7 +485,7 @@ _xfs_buf_map_folios( { ASSERT(bp->b_flags & _XBF_FOLIOS); if (bp->b_folio_count == 1) { - /* A single page buffer is always mappable */ + /* A single folio buffer is always mappable */ bp->b_addr = folio_address(bp->b_folios[0]); } else if (flags & XBF_UNMAPPED) { bp->b_addr = NULL; @@ -1465,20 +1532,28 @@ xfs_buf_ioapply_map( int *count, blk_opf_t op) { - int page_index; - unsigned int total_nr_pages = bp->b_folio_count; - int nr_pages; + int folio_index; + unsigned int total_nr_folios = bp->b_folio_count; + int nr_folios; struct bio *bio; sector_t sector = bp->b_maps[map].bm_bn; int size; int offset; - /* skip the pages in the buffer before the start offset */ - page_index = 0; + /* + * If the start offset if larger than a single page, we need to be + * careful. We might have a high order folio, in which case the indexing + * is from the start of the buffer. However, if we have more than one + * folio single page folio in the buffer, we need to skip the folios in + * the buffer before the start offset. + */ + folio_index = 0; offset = *buf_offset; - while (offset >= PAGE_SIZE) { - page_index++; - offset -= PAGE_SIZE; + if (bp->b_folio_count > 1) { + while (offset >= PAGE_SIZE) { + folio_index++; + offset -= PAGE_SIZE; + } } /* @@ -1491,28 +1566,28 @@ xfs_buf_ioapply_map( next_chunk: atomic_inc(&bp->b_io_remaining); - nr_pages = bio_max_segs(total_nr_pages); + nr_folios = bio_max_segs(total_nr_folios); - bio = bio_alloc(bp->b_target->bt_bdev, nr_pages, op, GFP_NOIO); + bio = bio_alloc(bp->b_target->bt_bdev, nr_folios, op, GFP_NOIO); bio->bi_iter.bi_sector = sector; bio->bi_end_io = xfs_buf_bio_end_io; bio->bi_private = bp; - for (; size && nr_pages; nr_pages--, page_index++) { - int rbytes, nbytes = PAGE_SIZE - offset; + for (; size && nr_folios; nr_folios--, folio_index++) { + struct folio *folio = bp->b_folios[folio_index]; + int nbytes = folio_size(folio) - offset; if (nbytes > size) nbytes = size; - rbytes = bio_add_folio(bio, bp->b_folios[page_index], nbytes, - offset); - if (rbytes < nbytes) + if (!bio_add_folio(bio, folio, nbytes, + offset_in_folio(folio, offset))) break; offset = 0; sector += BTOBB(nbytes); size -= nbytes; - total_nr_pages--; + total_nr_folios--; } if (likely(bio->bi_iter.bi_size)) { @@ -1722,6 +1797,13 @@ xfs_buf_offset( if (bp->b_addr) return bp->b_addr + offset; + /* Single folio buffers may use large folios. */ + if (bp->b_folio_count == 1) { + folio = bp->b_folios[0]; + return folio_address(folio) + offset_in_folio(folio, offset); + } + + /* Multi-folio buffers always use PAGE_SIZE folios */ folio = bp->b_folios[offset >> PAGE_SHIFT]; return folio_address(folio) + (offset & (PAGE_SIZE-1)); } @@ -1737,18 +1819,24 @@ xfs_buf_zero( bend = boff + bsize; while (boff < bend) { struct folio *folio; - int page_index, page_offset, csize; + int folio_index, folio_offset, csize; - page_index = (boff + bp->b_offset) >> PAGE_SHIFT; - page_offset = (boff + bp->b_offset) & ~PAGE_MASK; - folio = bp->b_folios[page_index]; - csize = min_t(size_t, PAGE_SIZE - page_offset, + /* Single folio buffers may use large folios. */ + if (bp->b_folio_count == 1) { + folio = bp->b_folios[0]; + folio_offset = offset_in_folio(folio, + bp->b_offset + boff); + } else { + folio_index = (boff + bp->b_offset) >> PAGE_SHIFT; + folio_offset = (boff + bp->b_offset) & ~PAGE_MASK; + folio = bp->b_folios[folio_index]; + } + + csize = min_t(size_t, folio_size(folio) - folio_offset, BBTOB(bp->b_length) - boff); + ASSERT((csize + folio_offset) <= folio_size(folio)); - ASSERT((csize + page_offset) <= PAGE_SIZE); - - memset(folio_address(folio) + page_offset, 0, csize); - + memset(folio_address(folio) + folio_offset, 0, csize); boff += csize; } }