diff mbox series

[2/2] block: no-copy bvec for direct IO

Message ID 51905c4fcb222e14a1d5cb676364c1b4f177f582.1607477897.git.asml.silence@gmail.com (mailing list archive)
State New, archived
Headers show
Series nocopy bvec for direct IO | expand

Commit Message

Pavel Begunkov Dec. 9, 2020, 2:19 a.m. UTC
The block layer spends quite a while in blkdev_direct_IO() to copy and
initialise bio's bvec. However, if we've already got a bvec in the input
iterator it might be reused in some cases, i.e. when new
ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
performance boost, and it also reduces memory footprint.

Suggested-by: Matthew Wilcox <willy@infradead.org>
[BIO_WORKINGSET] Suggested-by: Johannes Weiner <hannes@cmpxchg.org>
Signed-off-by: Pavel Begunkov <asml.silence@gmail.com>
---
 fs/block_dev.c | 30 +++++++++++++++++++++++++++++-
 1 file changed, 29 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig Dec. 9, 2020, 8:40 a.m. UTC | #1
> +	/*
> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> +	 * approximate it by looking at the first page and inducing it to the
> +	 * whole bio
> +	 */
> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> +		bio_set_flag(bio, BIO_WORKINGSET);

IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
at all for direct I/O.

> +	bio_set_flag(bio, BIO_NO_PAGE_REF);
> +
> +	iter->count = 0;
> +	return 0;
> +}

This helper should go into bio.c, next to bio_iov_iter_get_pages.
And please convert the other callers of bio_iov_iter_get_pages to this
scheme as well.
Pavel Begunkov Dec. 9, 2020, 12:01 p.m. UTC | #2
On 09/12/2020 08:40, Christoph Hellwig wrote:
>> +	/*
>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>> +	 * approximate it by looking at the first page and inducing it to the
>> +	 * whole bio
>> +	 */
>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>> +		bio_set_flag(bio, BIO_WORKINGSET);
> 
> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> at all for direct I/O.

I was waiting for the conversation to unfold, i.e. for Johannes to
answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
iomap?

> 
>> +	bio_set_flag(bio, BIO_NO_PAGE_REF);
>> +
>> +	iter->count = 0;
>> +	return 0;
>> +}
> 
> This helper should go into bio.c, next to bio_iov_iter_get_pages.
> And please convert the other callers of bio_iov_iter_get_pages to this
> scheme as well.

Agree. In the end I want to merge that into bio_iov_iter_get_pages().
Pavel Begunkov Dec. 9, 2020, 12:03 p.m. UTC | #3
On 09/12/2020 12:05, Christoph Hellwig wrote:
> On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote:
>> On 09/12/2020 08:40, Christoph Hellwig wrote:
>>>> +	/*
>>>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>>> +	 * approximate it by looking at the first page and inducing it to the
>>>> +	 * whole bio
>>>> +	 */
>>>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>>> +		bio_set_flag(bio, BIO_WORKINGSET);
>>>
>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>>> at all for direct I/O.
>>
>> I was waiting for the conversation to unfold, i.e. for Johannes to
>> answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
>> iomap?
> 
> iomap direct I/O: yes.

That one, got it
Christoph Hellwig Dec. 9, 2020, 12:05 p.m. UTC | #4
On Wed, Dec 09, 2020 at 12:01:27PM +0000, Pavel Begunkov wrote:
> On 09/12/2020 08:40, Christoph Hellwig wrote:
> >> +	/*
> >> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >> +	 * approximate it by looking at the first page and inducing it to the
> >> +	 * whole bio
> >> +	 */
> >> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >> +		bio_set_flag(bio, BIO_WORKINGSET);
> > 
> > IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> > at all for direct I/O.
> 
> I was waiting for the conversation to unfold, i.e. for Johannes to
> answer. BTW, would the same (skipping BIO_WORKINGSET) stand true for
> iomap?

iomap direct I/O: yes.
David Laight Dec. 9, 2020, 9:13 p.m. UTC | #5
From: Pavel Begunkov
> Sent: 09 December 2020 02:20
> 
> The block layer spends quite a while in blkdev_direct_IO() to copy and
> initialise bio's bvec. However, if we've already got a bvec in the input
> iterator it might be reused in some cases, i.e. when new
> ITER_BVEC_FLAG_FIXED flag is set. Simple tests show considerable
> performance boost, and it also reduces memory footprint.
...
> @@ -398,7 +422,11 @@ __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
>  		bio->bi_end_io = blkdev_bio_end_io;
>  		bio->bi_ioprio = iocb->ki_ioprio;
> 
> -		ret = bio_iov_iter_get_pages(bio, iter);
> +		if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter))
> +			ret = bio_iov_fixed_bvec_get_pages(bio, iter);
> +		else
> +			ret = bio_iov_iter_get_pages(bio, iter);
> +

Is it necessary to check iov_iter_is_bvec() as well as iov_iter_bvec_fixed() ?
If so it is probably worth using & not && so the compiler stands
a chance of generating a & (B | C) == B instead of 2 conditionals.
(I think I saw the bits in the same field being tested.

	David

-
Registered Address Lakeside, Bramley Road, Mount Farm, Milton Keynes, MK1 1PT, UK
Registration No: 1397386 (Wales)
Johannes Weiner Dec. 11, 2020, 2:06 p.m. UTC | #6
On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> > +	/*
> > +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
> > +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> > +	 * approximate it by looking at the first page and inducing it to the
> > +	 * whole bio
> > +	 */
> > +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> > +		bio_set_flag(bio, BIO_WORKINGSET);
> 
> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> at all for direct I/O.

Yes, this hunk is incorrect. We must not use this flag for direct IO.
It's only for paging IO, when you bring in the data at page->mapping +
page->index. Otherwise you tell the pressure accounting code that you
are paging in a thrashing page, when really you're just reading new
data into a page frame that happens to be hot.

(As per the other thread, bio_add_page() currently makes that same
mistake for direct IO. I'm fixing that.)
Pavel Begunkov Dec. 11, 2020, 2:20 p.m. UTC | #7
On 11/12/2020 14:06, Johannes Weiner wrote:
> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
>>> +	/*
>>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>> +	 * approximate it by looking at the first page and inducing it to the
>>> +	 * whole bio
>>> +	 */
>>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>> +		bio_set_flag(bio, BIO_WORKINGSET);
>>
>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>> at all for direct I/O.
> 
> Yes, this hunk is incorrect. We must not use this flag for direct IO.
> It's only for paging IO, when you bring in the data at page->mapping +
> page->index. Otherwise you tell the pressure accounting code that you
> are paging in a thrashing page, when really you're just reading new
> data into a page frame that happens to be hot.
> 
> (As per the other thread, bio_add_page() currently makes that same
> mistake for direct IO. I'm fixing that.)

I have that stuff fixed, it just didn't go into the RFC. That's basically
removing replacing add_page() with its version without BIO_WORKINGSET
in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
some.
Johannes Weiner Dec. 11, 2020, 3:38 p.m. UTC | #8
On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
> On 11/12/2020 14:06, Johannes Weiner wrote:
> > On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> >>> +	/*
> >>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >>> +	 * approximate it by looking at the first page and inducing it to the
> >>> +	 * whole bio
> >>> +	 */
> >>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >>> +		bio_set_flag(bio, BIO_WORKINGSET);
> >>
> >> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> >> at all for direct I/O.
> > 
> > Yes, this hunk is incorrect. We must not use this flag for direct IO.
> > It's only for paging IO, when you bring in the data at page->mapping +
> > page->index. Otherwise you tell the pressure accounting code that you
> > are paging in a thrashing page, when really you're just reading new
> > data into a page frame that happens to be hot.
> > 
> > (As per the other thread, bio_add_page() currently makes that same
> > mistake for direct IO. I'm fixing that.)
> 
> I have that stuff fixed, it just didn't go into the RFC. That's basically
> removing replacing add_page() with its version without BIO_WORKINGSET
> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
> some.

Ah, that's fantastic! Thanks for clarifying.
Pavel Begunkov Dec. 11, 2020, 3:47 p.m. UTC | #9
On 11/12/2020 15:38, Johannes Weiner wrote:
> On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
>> On 11/12/2020 14:06, Johannes Weiner wrote:
>>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
>>>>> +	/*
>>>>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
>>>>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
>>>>> +	 * approximate it by looking at the first page and inducing it to the
>>>>> +	 * whole bio
>>>>> +	 */
>>>>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
>>>>> +		bio_set_flag(bio, BIO_WORKINGSET);
>>>>
>>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
>>>> at all for direct I/O.
>>>
>>> Yes, this hunk is incorrect. We must not use this flag for direct IO.
>>> It's only for paging IO, when you bring in the data at page->mapping +
>>> page->index. Otherwise you tell the pressure accounting code that you
>>> are paging in a thrashing page, when really you're just reading new
>>> data into a page frame that happens to be hot.
>>>
>>> (As per the other thread, bio_add_page() currently makes that same
>>> mistake for direct IO. I'm fixing that.)
>>
>> I have that stuff fixed, it just didn't go into the RFC. That's basically
>> removing replacing add_page() with its version without BIO_WORKINGSET

I wrote something strange... Should have been "replacing add_page() in
those functions with a version without BIO_WORKINGSET".

>> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
>> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
>> some.
> 
> Ah, that's fantastic! Thanks for clarifying.

To keep it clear, do we go with what I have stashed (I'm planning to
reiterate this weekend)? or you're going to write it up yourself?
Just in case there is some cooler way you have in mind :)
Johannes Weiner Dec. 11, 2020, 4:13 p.m. UTC | #10
On Fri, Dec 11, 2020 at 03:47:23PM +0000, Pavel Begunkov wrote:
> On 11/12/2020 15:38, Johannes Weiner wrote:
> > On Fri, Dec 11, 2020 at 02:20:11PM +0000, Pavel Begunkov wrote:
> >> On 11/12/2020 14:06, Johannes Weiner wrote:
> >>> On Wed, Dec 09, 2020 at 08:40:05AM +0000, Christoph Hellwig wrote:
> >>>>> +	/*
> >>>>> +	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
> >>>>> +	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
> >>>>> +	 * approximate it by looking at the first page and inducing it to the
> >>>>> +	 * whole bio
> >>>>> +	 */
> >>>>> +	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
> >>>>> +		bio_set_flag(bio, BIO_WORKINGSET);
> >>>>
> >>>> IIRC the feedback was that we do not need to deal with BIO_WORKINGSET
> >>>> at all for direct I/O.
> >>>
> >>> Yes, this hunk is incorrect. We must not use this flag for direct IO.
> >>> It's only for paging IO, when you bring in the data at page->mapping +
> >>> page->index. Otherwise you tell the pressure accounting code that you
> >>> are paging in a thrashing page, when really you're just reading new
> >>> data into a page frame that happens to be hot.
> >>>
> >>> (As per the other thread, bio_add_page() currently makes that same
> >>> mistake for direct IO. I'm fixing that.)
> >>
> >> I have that stuff fixed, it just didn't go into the RFC. That's basically
> >> removing replacing add_page() with its version without BIO_WORKINGSET
> 
> I wrote something strange... Should have been "replacing add_page() in
> those functions with a version without BIO_WORKINGSET".

No worries, I understood.

> >> in bio_iov_iter_get_pages() and all __bio_iov_*_{add,get}_pages() +
> >> fix up ./fs/direct-io.c. Should cover all direct cases if I didn't miss
> >> some.
> > 
> > Ah, that's fantastic! Thanks for clarifying.
> 
> To keep it clear, do we go with what I have stashed (I'm planning to
> reiterate this weekend)? or you're going to write it up yourself?
> Just in case there is some cooler way you have in mind :)

Honestly, I only wrote all my ideas down and asked for feedback
because I wasn't super excited about any of them ;-)

If your changes happen to separate the direct io path from the
buffered io path naturally, I'm okay with it.

I'd say let's go with what you already have and see whether Jens and
Christoph like it. We can always do follow-on cleanups.
diff mbox series

Patch

diff --git a/fs/block_dev.c b/fs/block_dev.c
index d699f3af1a09..aee5d2e4f324 100644
--- a/fs/block_dev.c
+++ b/fs/block_dev.c
@@ -349,6 +349,28 @@  static void blkdev_bio_end_io(struct bio *bio)
 	}
 }
 
+static int bio_iov_fixed_bvec_get_pages(struct bio *bio, struct iov_iter *iter)
+{
+	bio->bi_vcnt = iter->nr_segs;
+	bio->bi_max_vecs = iter->nr_segs;
+	bio->bi_io_vec = (struct bio_vec *)iter->bvec;
+	bio->bi_iter.bi_bvec_done = iter->iov_offset;
+	bio->bi_iter.bi_size = iter->count;
+
+	/*
+	 * In practice groups of pages tend to be accessed/reclaimed/refaulted
+	 * together. To not go over bvec for those who didn't set BIO_WORKINGSET
+	 * approximate it by looking at the first page and inducing it to the
+	 * whole bio
+	 */
+	if (unlikely(PageWorkingset(iter->bvec->bv_page)))
+		bio_set_flag(bio, BIO_WORKINGSET);
+	bio_set_flag(bio, BIO_NO_PAGE_REF);
+
+	iter->count = 0;
+	return 0;
+}
+
 static ssize_t
 __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
 {
@@ -368,6 +390,8 @@  __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
 	    (bdev_logical_block_size(bdev) - 1))
 		return -EINVAL;
 
+	if (iov_iter_bvec_fixed(iter))
+		nr_vecs = 0;
 	bio = bio_alloc_bioset(GFP_KERNEL, nr_vecs, &blkdev_dio_pool);
 
 	dio = container_of(bio, struct blkdev_dio, bio);
@@ -398,7 +422,11 @@  __blkdev_direct_IO(struct kiocb *iocb, struct iov_iter *iter, int nr_vecs)
 		bio->bi_end_io = blkdev_bio_end_io;
 		bio->bi_ioprio = iocb->ki_ioprio;
 
-		ret = bio_iov_iter_get_pages(bio, iter);
+		if (iov_iter_is_bvec(iter) && iov_iter_bvec_fixed(iter))
+			ret = bio_iov_fixed_bvec_get_pages(bio, iter);
+		else
+			ret = bio_iov_iter_get_pages(bio, iter);
+
 		if (unlikely(ret)) {
 			bio->bi_status = BLK_STS_IOERR;
 			bio_endio(bio);