diff mbox

[v4,3/4] block: add bio_iov_iter_get_all_pages() helper

Message ID 20180720130552.21432-4-mwilck@suse.com (mailing list archive)
State New, archived
Headers show

Commit Message

Martin Wilck July 20, 2018, 1:05 p.m. UTC
bio_iov_iter_get_pages() only adds pages for the next non-zero
segment from the iov_iter to the bio. Some callers prefer to
obtain as many pages as would fit into the bio, with proper
rollback in case of failure. Add bio_iov_iter_get_all_pages()
for this purpose.

Signed-off-by: Martin Wilck <mwilck@suse.com>
---
 block/bio.c         | 43 ++++++++++++++++++++++++++++++++++++++++++-
 include/linux/bio.h |  1 +
 2 files changed, 43 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig July 20, 2018, 3:11 p.m. UTC | #1
On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> bio_iov_iter_get_pages() only adds pages for the next non-zero
> segment from the iov_iter to the bio. Some callers prefer to
> obtain as many pages as would fit into the bio, with proper
> rollback in case of failure. Add bio_iov_iter_get_all_pages()
> for this purpose.

I'd much rather have you fix bio_iov_iter_get_pages.  It only has
three callers, all beeing slight variations of the same direct I/O
pattern.  There is no point in diverging in implementation details
for them.

> +	do {
> +		int ret = bio_iov_iter_get_pages(bio, iter);
> +
> +		if (unlikely(ret)) {
> +			struct bio_vec *bvec;
> +			unsigned short i;
> +
> +			bio_for_each_segment_all(bvec, bio, i) {
> +				if (i >= orig_vcnt) {
> +					put_page(bvec->bv_page);
> +					bvec->bv_page = NULL;
> +					bvec->bv_len = 0;
> +					bvec->bv_offset = 0;
> +				}
> +			}

I don't think we need any of the zeroing here.   Also for code flow
purposes I'd rather see a goto for the error handling here.
Martin Wilck July 20, 2018, 3:29 p.m. UTC | #2
On Fri, 2018-07-20 at 17:11 +0200, Christoph Hellwig wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as many pages as would fit into the bio, with proper
> > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > for this purpose.
> 
> I'd much rather have you fix bio_iov_iter_get_pages.  It only has
> three callers, all beeing slight variations of the same direct I/O
> pattern.  There is no point in diverging in implementation details
> for them.

If that's consensus, I'll do it happily. So far I felt there was
opposition against it.


> > +	do {
> > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > +
> > +		if (unlikely(ret)) {
> > +			struct bio_vec *bvec;
> > +			unsigned short i;
> > +
> > +			bio_for_each_segment_all(bvec, bio, i) {
> > +				if (i >= orig_vcnt) {
> > +					put_page(bvec->bv_page);
> > +					bvec->bv_page = NULL;
> > +					bvec->bv_len = 0;
> > +					bvec->bv_offset = 0;
> > +				}
> > +			}
> 
> I don't think we need any of the zeroing here.   Also for code flow
> purposes I'd rather see a goto for the error handling here.

OK, will do.

I'll wait for some more feedback on the "fix bio_iov_iter_get_pages"
part.

Martin
Ming Lei July 20, 2018, 4:16 p.m. UTC | #3
On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> bio_iov_iter_get_pages() only adds pages for the next non-zero
> segment from the iov_iter to the bio. Some callers prefer to
> obtain as many pages as would fit into the bio, with proper
> rollback in case of failure. Add bio_iov_iter_get_all_pages()
> for this purpose.
> 
> Signed-off-by: Martin Wilck <mwilck@suse.com>
> ---
>  block/bio.c         | 43 ++++++++++++++++++++++++++++++++++++++++++-
>  include/linux/bio.h |  1 +
>  2 files changed, 43 insertions(+), 1 deletion(-)
> 
> diff --git a/block/bio.c b/block/bio.c
> index 489a430..693eb3b 100644
> --- a/block/bio.c
> +++ b/block/bio.c
> @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
>   * @bio: bio to add pages to
>   * @iter: iov iterator describing the region to be mapped
>   *
> - * Pins as many pages from *iter and appends them to @bio's bvec array. The
> + * Pins pages from *iter and appends them to @bio's bvec array. The
>   * pages will have to be released using put_page() when done.
> + * For multi-segment *iter, this function only adds pages from the
> + * the next non-empty segment of the iov iterator.
>   */
>  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  {
> @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
>  }
>  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
>  
> +/**
> + * bio_iov_iter_get_all_pages - pin user or kernel pages and add them to a bio
> + * @bio: bio to add pages to
> + * @iter: iov iterator describing the region to be mapped
> + *
> + * Pins pages from *iter and appends them to @bio's bvec array. The
> + * pages will have to be released using put_page() when done.
> + * This function adds as many pages as possible to a bio.
> + * If this function encounters an error, it unpins the pages it has
> + * pinned before, leaving previously pinned pages untouched.
> + */
> +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter)
> +{
> +	unsigned short orig_vcnt = bio->bi_vcnt;
> +
> +	do {
> +		int ret = bio_iov_iter_get_pages(bio, iter);
> +
> +		if (unlikely(ret)) {
> +			struct bio_vec *bvec;
> +			unsigned short i;
> +
> +			bio_for_each_segment_all(bvec, bio, i) {
> +				if (i >= orig_vcnt) {
> +					put_page(bvec->bv_page);
> +					bvec->bv_page = NULL;
> +					bvec->bv_len = 0;
> +					bvec->bv_offset = 0;
> +				}
> +			}
> +			bio->bi_vcnt = orig_vcnt;
> +			return ret;
> +		}
> +	} while (iov_iter_count(iter) && !bio_full(bio));

The failure handling part(release pages) may be moved out of this
helper, so usage of this helper can be aligned with bio_iov_iter_get_pages().

BTW, I agree with Christoph, we may just fix/improve bio_iov_iter_get_pages()
for all users.

Thanks,
Ming
Martin Wilck July 20, 2018, 4:54 p.m. UTC | #4
On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote:
> On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > segment from the iov_iter to the bio. Some callers prefer to
> > obtain as many pages as would fit into the bio, with proper
> > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > for this purpose.
> > 
> > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > ---
> >  block/bio.c         | 43
> > ++++++++++++++++++++++++++++++++++++++++++-
> >  include/linux/bio.h |  1 +
> >  2 files changed, 43 insertions(+), 1 deletion(-)
> > 
> > diff --git a/block/bio.c b/block/bio.c
> > index 489a430..693eb3b 100644
> > --- a/block/bio.c
> > +++ b/block/bio.c
> > @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
> >   * @bio: bio to add pages to
> >   * @iter: iov iterator describing the region to be mapped
> >   *
> > - * Pins as many pages from *iter and appends them to @bio's bvec
> > array. The
> > + * Pins pages from *iter and appends them to @bio's bvec array.
> > The
> >   * pages will have to be released using put_page() when done.
> > + * For multi-segment *iter, this function only adds pages from the
> > + * the next non-empty segment of the iov iterator.
> >   */
> >  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> >  {
> > @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio,
> > struct iov_iter *iter)
> >  }
> >  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> >  
> > +/**
> > + * bio_iov_iter_get_all_pages - pin user or kernel pages and add
> > them to a bio
> > + * @bio: bio to add pages to
> > + * @iter: iov iterator describing the region to be mapped
> > + *
> > + * Pins pages from *iter and appends them to @bio's bvec array.
> > The
> > + * pages will have to be released using put_page() when done.
> > + * This function adds as many pages as possible to a bio.
> > + * If this function encounters an error, it unpins the pages it
> > has
> > + * pinned before, leaving previously pinned pages untouched.
> > + */
> > +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter
> > *iter)
> > +{
> > +	unsigned short orig_vcnt = bio->bi_vcnt;
> > +
> > +	do {
> > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > +
> > +		if (unlikely(ret)) {
> > +			struct bio_vec *bvec;
> > +			unsigned short i;
> > +
> > +			bio_for_each_segment_all(bvec, bio, i) {
> > +				if (i >= orig_vcnt) {
> > +					put_page(bvec->bv_page);
> > +					bvec->bv_page = NULL;
> > +					bvec->bv_len = 0;
> > +					bvec->bv_offset = 0;
> > +				}
> > +			}
> > +			bio->bi_vcnt = orig_vcnt;
> > +			return ret;
> > +		}
> > +	} while (iov_iter_count(iter) && !bio_full(bio));
> 
> The failure handling part(release pages) may be moved out of this
> helper, so usage of this helper can be aligned with
> bio_iov_iter_get_pages().

I wrote the failure handling precisely for being compatible with
bio_iov_iter_get_pages(), which requires no rollback if it returns an
error. If we don't do this, we have to add extra error handling code to
every caller. It was the issue you raised with my v3 submission...
apparently I misunderstood you.

What should happen if the new handler encounters an error from
bio_iov_iter_get_pages() in the 2nd or later iteration?

 1 return success, so that the caller doesn't realize there was a
problem,
 2 return error and roll back bio changes, as I implemented it here,
 3 return error and keep the already allocated pages.

You seem to support option 3. But that leaves it to the caller to 
differentiate this from a failure with zero allocated pages, and clean
up appropriately. I not sure if that's wise, and it's for sure
different from bio_iov_iter_get_pages()' behavior. Or what am I
missing?

> BTW, I agree with Christoph, we may just fix/improve
> bio_iov_iter_get_pages()
> for all users.

Ok, thanks for confirming.

Martin
Ming Lei July 20, 2018, 11:48 p.m. UTC | #5
On Fri, Jul 20, 2018 at 06:54:48PM +0200, Martin Wilck wrote:
> On Sat, 2018-07-21 at 00:16 +0800, Ming Lei wrote:
> > On Fri, Jul 20, 2018 at 03:05:51PM +0200, Martin Wilck wrote:
> > > bio_iov_iter_get_pages() only adds pages for the next non-zero
> > > segment from the iov_iter to the bio. Some callers prefer to
> > > obtain as many pages as would fit into the bio, with proper
> > > rollback in case of failure. Add bio_iov_iter_get_all_pages()
> > > for this purpose.
> > > 
> > > Signed-off-by: Martin Wilck <mwilck@suse.com>
> > > ---
> > >  block/bio.c         | 43
> > > ++++++++++++++++++++++++++++++++++++++++++-
> > >  include/linux/bio.h |  1 +
> > >  2 files changed, 43 insertions(+), 1 deletion(-)
> > > 
> > > diff --git a/block/bio.c b/block/bio.c
> > > index 489a430..693eb3b 100644
> > > --- a/block/bio.c
> > > +++ b/block/bio.c
> > > @@ -907,8 +907,10 @@ EXPORT_SYMBOL(bio_add_page);
> > >   * @bio: bio to add pages to
> > >   * @iter: iov iterator describing the region to be mapped
> > >   *
> > > - * Pins as many pages from *iter and appends them to @bio's bvec
> > > array. The
> > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > The
> > >   * pages will have to be released using put_page() when done.
> > > + * For multi-segment *iter, this function only adds pages from the
> > > + * the next non-empty segment of the iov iterator.
> > >   */
> > >  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
> > >  {
> > > @@ -949,6 +951,45 @@ int bio_iov_iter_get_pages(struct bio *bio,
> > > struct iov_iter *iter)
> > >  }
> > >  EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
> > >  
> > > +/**
> > > + * bio_iov_iter_get_all_pages - pin user or kernel pages and add
> > > them to a bio
> > > + * @bio: bio to add pages to
> > > + * @iter: iov iterator describing the region to be mapped
> > > + *
> > > + * Pins pages from *iter and appends them to @bio's bvec array.
> > > The
> > > + * pages will have to be released using put_page() when done.
> > > + * This function adds as many pages as possible to a bio.
> > > + * If this function encounters an error, it unpins the pages it
> > > has
> > > + * pinned before, leaving previously pinned pages untouched.
> > > + */
> > > +int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter
> > > *iter)
> > > +{
> > > +	unsigned short orig_vcnt = bio->bi_vcnt;
> > > +
> > > +	do {
> > > +		int ret = bio_iov_iter_get_pages(bio, iter);
> > > +
> > > +		if (unlikely(ret)) {
> > > +			struct bio_vec *bvec;
> > > +			unsigned short i;
> > > +
> > > +			bio_for_each_segment_all(bvec, bio, i) {
> > > +				if (i >= orig_vcnt) {
> > > +					put_page(bvec->bv_page);
> > > +					bvec->bv_page = NULL;
> > > +					bvec->bv_len = 0;
> > > +					bvec->bv_offset = 0;
> > > +				}
> > > +			}
> > > +			bio->bi_vcnt = orig_vcnt;
> > > +			return ret;
> > > +		}
> > > +	} while (iov_iter_count(iter) && !bio_full(bio));
> > 
> > The failure handling part(release pages) may be moved out of this
> > helper, so usage of this helper can be aligned with
> > bio_iov_iter_get_pages().
> 
> I wrote the failure handling precisely for being compatible with
> bio_iov_iter_get_pages(), which requires no rollback if it returns an
> error. If we don't do this, we have to add extra error handling code to
> every caller. It was the issue you raised with my v3 submission...
> apparently I misunderstood you.
> 
> What should happen if the new handler encounters an error from
> bio_iov_iter_get_pages() in the 2nd or later iteration?
> 
>  1 return success, so that the caller doesn't realize there was a
> problem,
>  2 return error and roll back bio changes, as I implemented it here,
>  3 return error and keep the already allocated pages.
> 
> You seem to support option 3. But that leaves it to the caller to 
> differentiate this from a failure with zero allocated pages, and clean
> up appropriately. I not sure if that's wise, and it's for sure
> different from bio_iov_iter_get_pages()' behavior. Or what am I
> missing?
> > BTW, I agree with Christoph, we may just fix/improve
> > bio_iov_iter_get_pages()
> > for all users.
> 
> Ok, thanks for confirming.

OK, if you follow this suggestion, the pinned pages may be released
in bio_iov_iter_get_pages() like what this patch does, but the bio_endio(bio)
in failure handler of __blkdev_direct_IO() has to be handled in the
following way for avoiding double release:

1) if this bio is allocated from &blkdev_dio_pool, the bio_endio() need
to be removed

2) otherwise, the bio_endio() need to be replaced with bio_put().

Frankly speaking, after your patch is in, seems it is fine to allocate
single bio for doing the dio in __blkdev_direct_IO(), given the passed
'max_pages' is <= BIO_MAX_PAGES. Then __blkdev_direct_IO() can be
simplified much.

But, both current __blkdev_direct_IO() and iomap_dio_actor() supports
short dio, do you think there is same issue in the two with yours? Or
do we need to support it in the new bio_iov_iter_get_pages()?


Thanks,
Ming
diff mbox

Patch

diff --git a/block/bio.c b/block/bio.c
index 489a430..693eb3b 100644
--- a/block/bio.c
+++ b/block/bio.c
@@ -907,8 +907,10 @@  EXPORT_SYMBOL(bio_add_page);
  * @bio: bio to add pages to
  * @iter: iov iterator describing the region to be mapped
  *
- * Pins as many pages from *iter and appends them to @bio's bvec array. The
+ * Pins pages from *iter and appends them to @bio's bvec array. The
  * pages will have to be released using put_page() when done.
+ * For multi-segment *iter, this function only adds pages from the
+ * the next non-empty segment of the iov iterator.
  */
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 {
@@ -949,6 +951,45 @@  int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter)
 }
 EXPORT_SYMBOL_GPL(bio_iov_iter_get_pages);
 
+/**
+ * bio_iov_iter_get_all_pages - pin user or kernel pages and add them to a bio
+ * @bio: bio to add pages to
+ * @iter: iov iterator describing the region to be mapped
+ *
+ * Pins pages from *iter and appends them to @bio's bvec array. The
+ * pages will have to be released using put_page() when done.
+ * This function adds as many pages as possible to a bio.
+ * If this function encounters an error, it unpins the pages it has
+ * pinned before, leaving previously pinned pages untouched.
+ */
+int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter)
+{
+	unsigned short orig_vcnt = bio->bi_vcnt;
+
+	do {
+		int ret = bio_iov_iter_get_pages(bio, iter);
+
+		if (unlikely(ret)) {
+			struct bio_vec *bvec;
+			unsigned short i;
+
+			bio_for_each_segment_all(bvec, bio, i) {
+				if (i >= orig_vcnt) {
+					put_page(bvec->bv_page);
+					bvec->bv_page = NULL;
+					bvec->bv_len = 0;
+					bvec->bv_offset = 0;
+				}
+			}
+			bio->bi_vcnt = orig_vcnt;
+			return ret;
+		}
+	} while (iov_iter_count(iter) && !bio_full(bio));
+
+	return 0;
+}
+EXPORT_SYMBOL_GPL(bio_iov_iter_get_all_pages);
+
 static void submit_bio_wait_endio(struct bio *bio)
 {
 	complete(bio->bi_private);
diff --git a/include/linux/bio.h b/include/linux/bio.h
index f08f5fe..824cb74 100644
--- a/include/linux/bio.h
+++ b/include/linux/bio.h
@@ -485,6 +485,7 @@  bool __bio_try_merge_page(struct bio *bio, struct page *page,
 void __bio_add_page(struct bio *bio, struct page *page,
 		unsigned int len, unsigned int off);
 int bio_iov_iter_get_pages(struct bio *bio, struct iov_iter *iter);
+int bio_iov_iter_get_all_pages(struct bio *bio, struct iov_iter *iter);
 struct rq_map_data;
 extern struct bio *bio_map_user_iov(struct request_queue *,
 				    struct iov_iter *, gfp_t);