Message ID | 51905c4fcb222e14a1d5cb676364c1b4f177f582.1607477897.git.asml.silence@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | nocopy bvec for direct IO | expand |
> + /* > + * In practice groups of pages tend to be accessed/reclaimed/refaulted > + * together. To not go over bvec for those who didn't set BIO_WORKINGSET > + * approximate it by looking at the first page and inducing it to the > + * whole bio > + */ > + if (unlikely(PageWorkingset(iter->bvec->bv_page))) > + bio_set_flag(bio, BIO_WORKINGSET); IIRC the feedback was that we do not need to deal with BIO_WORKINGSET at all for direct I/O. > + bio_set_flag(bio, BIO_NO_PAGE_REF); > + > + iter->count = 0; > + return 0; > +} This helper should go into bio.c, next to bio_iov_iter_get_pages. And please convert the other callers of bio_iov_iter_get_pages to this scheme as well.
On 09/12/2020 08:40, Christoph Hellwig wrote: >> + /* >> + * In practice groups of pages tend to be accessed/reclaimed/refaulted >> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET >> + * approximate it by looking at the first page and inducing it to the >> + * whole bio >> + */ >> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) >> + bio_set_flag(bio, BIO_WORKINGSET); > > IIRC the feedback was that we do not need to deal with BIO_WORKINGSET > at all for direct I/O. I was waiting for the conversation to unfold, i.e. for Johannes to answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for iomap? > >> + bio_set_flag(bio, BIO_NO_PAGE_REF); >> + >> + iter->count = 0; >> + return 0; >> +} > > This helper should go into bio.c, next to bio_iov_iter_get_pages. > And please convert the other callers of bio_iov_iter_get_pages to this > scheme as well. Agree. In the end I want to merge that into bio_iov_iter_get_pages().
On 09/12/2020 12:05, Christoph Hellwig wrote: > On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote: >> On 09/12/2020 08:40, Christoph Hellwig wrote: >>>> + /* >>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted >>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET >>>> + * approximate it by looking at the first page and inducing it to the >>>> + * whole bio >>>> + */ >>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) >>>> + bio_set_flag(bio, BIO_WORKINGSET); >>> >>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET >>> at all for direct I/O. >> >> I was waiting for the conversation to unfold, i.e. for Johannes to >> answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for >> iomap? > > iomap direct I/O: yes. That one, got it
On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote: > On 09/12/2020 08:40, Christoph Hellwig wrote: > >> + /* > >> + * In practice groups of pages tend to be accessed/reclaimed/refaulted > >> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET > >> + * approximate it by looking at the first page and inducing it to the > >> + * whole bio > >> + */ > >> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) > >> + bio_set_flag(bio, BIO_WORKINGSET); > > > > IIRC the feedback was that we do not need to deal with BIO_WORKINGSET > > at all for direct I/O. > > I was waiting for the conversation to unfold, i.e. for Johannes to > answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for > iomap? iomap direct I/O: yes.
From: Pavel Begunkov > Sent: 09 December 2020 02:20 > > The block layer spends quite a while in blkdev_direct_IO() to copy and > initialise bio's bvec. However, if we've already got a bvec in the input > iterator it might be reused in some cases, i.e. when new > ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable > performance boost, and it also reduces memory footprint. ... > @@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) > bio->bi_end_io = blkdev_bio_end_io; > bio->bi_ioprio = iocb->ki_ioprio; > > - ret = bio_iov_iter_get_pages(bio, iter); > + if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter)) > + ret = bio_iov_fixed_bvec_get_pages(bio, iter); > + else > + ret = bio_iov_iter_get_pages(bio, iter); > + Is it necessary to check iov_iter_is_bvec() as well as iov_iter_bvec_fixed() ? If so it is probably worth using & not && so the compiler stands a chance of generating a & (B | C) == B instead of 2 conditionals. (I think I saw the bits in the same field being tested. David - Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK Registration No: 1397386 (Wales)
On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote: > > + /* > > + * In practice groups of pages tend to be accessed/reclaimed/refaulted > > + * together. To not go over bvec for those who didn't set BIO_WORKINGSET > > + * approximate it by looking at the first page and inducing it to the > > + * whole bio > > + */ > > + if (unlikely(PageWorkingset(iter->bvec->bv_page))) > > + bio_set_flag(bio, BIO_WORKINGSET); > > IIRC the feedback was that we do not need to deal with BIO_WORKINGSET > at all for direct I/O. Yes, this hunk is incorrect. We must not use this flag for direct IO. It's only for paging IO, when you bring in the data at page->mapping + page->index. Otherwise you tell the pressure accounting code that you are paging in a thrashing page, when really you're just reading new data into a page frame that happens to be hot. (As per the other thread, bio_add_page() currently makes that same mistake for direct IO. I'm fixing that.)
On 11/12/2020 14:06, Johannes Weiner wrote: > On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote: >>> + /* >>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted >>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET >>> + * approximate it by looking at the first page and inducing it to the >>> + * whole bio >>> + */ >>> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) >>> + bio_set_flag(bio, BIO_WORKINGSET); >> >> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET >> at all for direct I/O. > > Yes, this hunk is incorrect. We must not use this flag for direct IO. > It's only for paging IO, when you bring in the data at page->mapping + > page->index. Otherwise you tell the pressure accounting code that you > are paging in a thrashing page, when really you're just reading new > data into a page frame that happens to be hot. > > (As per the other thread, bio_add_page() currently makes that same > mistake for direct IO. I'm fixing that.) I have that stuff fixed, it just didn't go into the RFC. That's basically removing replacing add_page() with its version without BIO_WORKINGSET in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() + fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss some.
On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote: > On 11/12/2020 14:06, Johannes Weiner wrote: > > On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote: > >>> + /* > >>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted > >>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET > >>> + * approximate it by looking at the first page and inducing it to the > >>> + * whole bio > >>> + */ > >>> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) > >>> + bio_set_flag(bio, BIO_WORKINGSET); > >> > >> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET > >> at all for direct I/O. > > > > Yes, this hunk is incorrect. We must not use this flag for direct IO. > > It's only for paging IO, when you bring in the data at page->mapping + > > page->index. Otherwise you tell the pressure accounting code that you > > are paging in a thrashing page, when really you're just reading new > > data into a page frame that happens to be hot. > > > > (As per the other thread, bio_add_page() currently makes that same > > mistake for direct IO. I'm fixing that.) > > I have that stuff fixed, it just didn't go into the RFC. That's basically > removing replacing add_page() with its version without BIO_WORKINGSET > in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() + > fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss > some. Ah, that's fantastic! Thanks for clarifying.
On 11/12/2020 15:38, Johannes Weiner wrote: > On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote: >> On 11/12/2020 14:06, Johannes Weiner wrote: >>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote: >>>>> + /* >>>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted >>>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET >>>>> + * approximate it by looking at the first page and inducing it to the >>>>> + * whole bio >>>>> + */ >>>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) >>>>> + bio_set_flag(bio, BIO_WORKINGSET); >>>> >>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET >>>> at all for direct I/O. >>> >>> Yes, this hunk is incorrect. We must not use this flag for direct IO. >>> It's only for paging IO, when you bring in the data at page->mapping + >>> page->index. Otherwise you tell the pressure accounting code that you >>> are paging in a thrashing page, when really you're just reading new >>> data into a page frame that happens to be hot. >>> >>> (As per the other thread, bio_add_page() currently makes that same >>> mistake for direct IO. I'm fixing that.) >> >> I have that stuff fixed, it just didn't go into the RFC. That's basically >> removing replacing add_page() with its version without BIO_WORKINGSET I wrote something strange... Should have been "replacing add_page() in those functions with a version without BIO_WORKINGSET". >> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() + >> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss >> some. > > Ah, that's fantastic! Thanks for clarifying. To keep it clear, do we go with what I have stashed (I'm planning to reiterate this weekend)? or you're going to write it up yourself? Just in case there is some cooler way you have in mind :)
On Fri, Dec 11, 2020 at 03:47:23PM +0000, Pavel Begunkov wrote: > On 11/12/2020 15:38, Johannes Weiner wrote: > > On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote: > >> On 11/12/2020 14:06, Johannes Weiner wrote: > >>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote: > >>>>> + /* > >>>>> + * In practice groups of pages tend to be accessed/reclaimed/refaulted > >>>>> + * together. To not go over bvec for those who didn't set BIO_WORKINGSET > >>>>> + * approximate it by looking at the first page and inducing it to the > >>>>> + * whole bio > >>>>> + */ > >>>>> + if (unlikely(PageWorkingset(iter->bvec->bv_page))) > >>>>> + bio_set_flag(bio, BIO_WORKINGSET); > >>>> > >>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET > >>>> at all for direct I/O. > >>> > >>> Yes, this hunk is incorrect. We must not use this flag for direct IO. > >>> It's only for paging IO, when you bring in the data at page->mapping + > >>> page->index. Otherwise you tell the pressure accounting code that you > >>> are paging in a thrashing page, when really you're just reading new > >>> data into a page frame that happens to be hot. > >>> > >>> (As per the other thread, bio_add_page() currently makes that same > >>> mistake for direct IO. I'm fixing that.) > >> > >> I have that stuff fixed, it just didn't go into the RFC. That's basically > >> removing replacing add_page() with its version without BIO_WORKINGSET > > I wrote something strange... Should have been "replacing add_page() in > those functions with a version without BIO_WORKINGSET". No worries, I understood. > >> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() + > >> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss > >> some. > > > > Ah, that's fantastic! Thanks for clarifying. > > To keep it clear, do we go with what I have stashed (I'm planning to > reiterate this weekend)? or you're going to write it up yourself? > Just in case there is some cooler way you have in mind :) Honestly, I only wrote all my ideas down and asked for feedback because I wasn't super excited about any of them ;-) If your changes happen to separate the direct io path from the buffered io path naturally, I'm okay with it. I'd say let's go with what you already have and see whether Jens and Christoph like it. We can always do follow-on cleanups.
diff --git a/fs/block_dev.c b/fs/block_dev.c index d699f3af1a09..aee5d2e4f324 100644 --- a/fs/block_dev.c +++ b/fs/block_dev.c @@ -349,6 +349,28 @@ static void blkdev_bio_end_io(struct bio *bio) } } +static int bio_iov_fixed_bvec_get_pages(struct bio *bio, struct iov_iter *iter) +{ + bio->bi_vcnt = iter->nr_segs; + bio->bi_max_vecs = iter->nr_segs; + bio->bi_io_vec = (struct bio_vec *)iter->bvec; + bio->bi_iter.bi_bvec_done = iter->iov_offset; + bio->bi_iter.bi_size = iter->count; + + /* + * In practice groups of pages tend to be accessed/reclaimed/refaulted + * together. To not go over bvec for those who didn't set BIO_WORKINGSET + * approximate it by looking at the first page and inducing it to the + * whole bio + */ + if (unlikely(PageWorkingset(iter->bvec->bv_page))) + bio_set_flag(bio, BIO_WORKINGSET); + bio_set_flag(bio, BIO_NO_PAGE_REF); + + iter->count = 0; + return 0; +} + static ssize_t __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) { @@ -368,6 +390,8 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) (bdev_logical_block_size(bdev) - 1)) return -EINVAL; + if (iov_iter_bvec_fixed(iter)) + nr_vecs = 0; bio = bio_alloc_bioset(GFP_KERNEL, nr_vecs, &blkdev_dio_pool); dio = container_of(bio, struct blkdev_dio, bio); @@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs) bio->bi_end_io = blkdev_bio_end_io; bio->bi_ioprio = iocb->ki_ioprio; - ret = bio_iov_iter_get_pages(bio, iter); + if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter)) + ret = bio_iov_fixed_bvec_get_pages(bio, iter); + else + ret = bio_iov_iter_get_pages(bio, iter); + if (unlikely(ret)) { bio->bi_status = BLK_STS_IOERR; bio_endio(bio);
The block layer spends quite a while in blkdev_direct_IO() to copy and initialise bio's bvec. However, if we've already got a bvec in the input iterator it might be reused in some cases, i.e. when new ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable performance boost, and it also reduces memory footprint. Suggested-by: Matthew Wilcox <willy@infradead.org> [BIO_WORKINGSET] Suggested-by: Johannes Weiner <hannes@cmpxchg.org> Signed-off-by: Pavel Begunkov <asml.silence@gmail.com> --- fs/block_dev.c | 30 +++++++++++++++++++++++++++++- 1 file changed, 29 insertions(+), 1 deletion(-)