diff mbox

block: don't make BLK_DEF_MAX_SECTORS too big

Message ID 21cf85d32278bbe5acbc3def0a6db75db98a2670.1459269590.git.shli@fb.com (mailing list archive)
State New, archived
Headers show

Commit Message

Shaohua Li March 29, 2016, 4:42 p.m. UTC
bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
fail.

In the future, we can extend the size either bvec_slabs array is
expanded or the upcoming multipage bvec is added if pages are
contiguous. This one is suitable for stable.

Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
Cc: stable@vger.kernel.org (4.2+)
Cc: Ming Lei <ming.lei@canonical.com>
Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
Signed-off-by: Shaohua Li <shli@fb.com>
---
 include/linux/blkdev.h | 6 +++++-
 1 file changed, 5 insertions(+), 1 deletion(-)

Comments

Christoph Hellwig March 29, 2016, 9:18 p.m. UTC | #1
On Tue, Mar 29, 2016 at 09:42:33AM -0700, Shaohua Li wrote:
> bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
> allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
> BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
> fail.

This might be true, but it's not a good enough reason.  Request based
driver couldn't care less about the limits of bio_alloc_bioset.

It seems the bug is that somone (would be great to know whoe exactly)
passes a too large value to bio_alloc_bioset. And given that we still
have bio_add_page around for actually adding pages to a bio it seems
like the proper fix would be to simply clamp down the actual allocation
and segment limit inside bio_alloc_bioset.  Which would also help to
eventually remove code doing just that in tons of callers.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li March 29, 2016, 10:01 p.m. UTC | #2
On Tue, Mar 29, 2016 at 02:18:33PM -0700, Christoph Hellwig wrote:
> On Tue, Mar 29, 2016 at 09:42:33AM -0700, Shaohua Li wrote:
> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
> > fail.
> 
> This might be true, but it's not a good enough reason.  Request based
> driver couldn't care less about the limits of bio_alloc_bioset.
> 
> It seems the bug is that somone (would be great to know whoe exactly)
> passes a too large value to bio_alloc_bioset. And given that we still
> have bio_add_page around for actually adding pages to a bio it seems
> like the proper fix would be to simply clamp down the actual allocation
> and segment limit inside bio_alloc_bioset.  Which would also help to
> eventually remove code doing just that in tons of callers.

The problem is bcache allocates a big bio (with bio_alloc). The bio is
split with blk_queue_split, but it isn't split to small size because
queue limit. the bio is cloned later in md, which uses bio_alloc_bioset.
bio_alloc_bioset itself can't allocate big size bio.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei March 30, 2016, 1:39 a.m. UTC | #3
On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@fb.com> wrote:
> bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
> allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
> BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
> fail.
>
> In the future, we can extend the size either bvec_slabs array is
> expanded or the upcoming multipage bvec is added if pages are
> contiguous. This one is suitable for stable.
>
> Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
> Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
> Cc: stable@vger.kernel.org (4.2+)
> Cc: Ming Lei <ming.lei@canonical.com>
> Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
> Signed-off-by: Shaohua Li <shli@fb.com>
> ---
>  include/linux/blkdev.h | 6 +++++-
>  1 file changed, 5 insertions(+), 1 deletion(-)
>
> diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> index 7e5d7e0..da64325 100644
> --- a/include/linux/blkdev.h
> +++ b/include/linux/blkdev.h
> @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
>  enum blk_default_limits {
>         BLK_MAX_SEGMENTS        = 128,
>         BLK_SAFE_MAX_SECTORS    = 255,
> -       BLK_DEF_MAX_SECTORS     = 2560,
> +       /*
> +        * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
> +        * Otherwise bio_alloc_bioset will break.
> +        */
> +       BLK_DEF_MAX_SECTORS     = BIO_MAX_SECTORS,

Thinking about it further, it isn't good to change the default max
sectors because
the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.

So suggest to just change bcache's queue max sector limit to 1M, that means
we shouldn't encourage bcache's usage of bypassing bio_add_page().

Thanks,
Ming

>         BLK_MAX_SEGMENT_SIZE    = 65536,
>         BLK_SEG_BOUNDARY_MASK   = 0xFFFFFFFFUL,
>  };
> --
> 2.8.0.rc2
>
> --
> To unsubscribe from this list: send the line "unsubscribe linux-block" in
> the body of a message to majordomo@vger.kernel.org
> More majordomo info at  http://vger.kernel.org/majordomo-info.html
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li March 30, 2016, 2:27 a.m. UTC | #4
On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote:
> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@fb.com> wrote:
> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
> > fail.
> >
> > In the future, we can extend the size either bvec_slabs array is
> > expanded or the upcoming multipage bvec is added if pages are
> > contiguous. This one is suitable for stable.
> >
> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
> > Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
> > Cc: stable@vger.kernel.org (4.2+)
> > Cc: Ming Lei <ming.lei@canonical.com>
> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
> > Signed-off-by: Shaohua Li <shli@fb.com>
> > ---
> >  include/linux/blkdev.h | 6 +++++-
> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >
> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> > index 7e5d7e0..da64325 100644
> > --- a/include/linux/blkdev.h
> > +++ b/include/linux/blkdev.h
> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> >  enum blk_default_limits {
> >         BLK_MAX_SEGMENTS        = 128,
> >         BLK_SAFE_MAX_SECTORS    = 255,
> > -       BLK_DEF_MAX_SECTORS     = 2560,
> > +       /*
> > +        * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
> > +        * Otherwise bio_alloc_bioset will break.
> > +        */
> > +       BLK_DEF_MAX_SECTORS     = BIO_MAX_SECTORS,
> 
> Thinking about it further, it isn't good to change the default max
> sectors because
> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.

what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help
big raid array performance, but REQ_PC isn't the case.

> So suggest to just change bcache's queue max sector limit to 1M, that means
> we shouldn't encourage bcache's usage of bypassing bio_add_page().

Don't think this is a good idea. This is a limitation of block core,
block core should make sure the limitation doesn't break, not the
driver. On the other hand, are you going to fix all drivers? drivers can
set arbitrary max sector limit.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 30, 2016, 6:51 a.m. UTC | #5
On Tue, Mar 29, 2016 at 03:01:10PM -0700, Shaohua Li wrote:
> The problem is bcache allocates a big bio (with bio_alloc). The bio is
> split with blk_queue_split, but it isn't split to small size because
> queue limit. the bio is cloned later in md, which uses bio_alloc_bioset.
> bio_alloc_bioset itself can't allocate big size bio.

bcache should be fixed to not allocate larger than allowed bios then.
And handling too large arguments to bio_alloc_bioset is still useful to
avoid the checks in the callers and make it robust.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei March 30, 2016, 12:13 p.m. UTC | #6
Hi Shaohua,

On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li <shli@fb.com> wrote:
> On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote:
>> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@fb.com> wrote:
>> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
>> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
>> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
>> > fail.
>> >
>> > In the future, we can extend the size either bvec_slabs array is
>> > expanded or the upcoming multipage bvec is added if pages are
>> > contiguous. This one is suitable for stable.
>> >
>> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
>> > Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
>> > Cc: stable@vger.kernel.org (4.2+)
>> > Cc: Ming Lei <ming.lei@canonical.com>
>> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
>> > Signed-off-by: Shaohua Li <shli@fb.com>
>> > ---
>> >  include/linux/blkdev.h | 6 +++++-
>> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >
>> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> > index 7e5d7e0..da64325 100644
>> > --- a/include/linux/blkdev.h
>> > +++ b/include/linux/blkdev.h
>> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
>> >  enum blk_default_limits {
>> >         BLK_MAX_SEGMENTS        = 128,
>> >         BLK_SAFE_MAX_SECTORS    = 255,
>> > -       BLK_DEF_MAX_SECTORS     = 2560,
>> > +       /*
>> > +        * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
>> > +        * Otherwise bio_alloc_bioset will break.
>> > +        */
>> > +       BLK_DEF_MAX_SECTORS     = BIO_MAX_SECTORS,
>>
>> Thinking about it further, it isn't good to change the default max
>> sectors because
>> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.
>
> what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help
> big raid array performance, but REQ_PC isn't the case.

I mean REQ_PC can include at most 1024 vectors intead of 256, so looks it isn't
fair to introduce the strict limit for all kinds of requests.

More importantly, the max sector limit is for limitting max sectors in
a request,
and is used for bios merging, not same with bio's 256 bvecs limit.

>
>> So suggest to just change bcache's queue max sector limit to 1M, that means
>> we shouldn't encourage bcache's usage of bypassing bio_add_page().
>
> Don't think this is a good idea. This is a limitation of block core,

This bio's 256 bvecs limitation is from block implementation, think about why
one bvec just includes one page, instead of one segment. In the future, it can
be improved absolutely, that is why I said it isn't good to use BIO_MAX_SECTORS.
Also you can find that there is only one user of BIO_MAX_SECTORS.

> block core should make sure the limitation doesn't break, not the
> driver. On the other hand, are you going to fix all drivers? drivers can
> set arbitrary max sector limit.

The issue only exists if drivers(fs, dm, md, bcache) do not use bio_add_page().
All this kind of usage shouldn't be encouraged.

So how about fixing the issue by introducing the limit into get_max_io_size()?
Such as, add something like below at the end of this function?

        sectors = min_t(unsigned, sectors, BIO_MAX_PAGES <<
                        (PAGE_CACHE_SHIFT - 9));


thanks,
Ming

>
> Thanks,
> Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li March 30, 2016, 4:50 p.m. UTC | #7
On Tue, Mar 29, 2016 at 11:51:51PM -0700, Christoph Hellwig wrote:
> On Tue, Mar 29, 2016 at 03:01:10PM -0700, Shaohua Li wrote:
> > The problem is bcache allocates a big bio (with bio_alloc). The bio is
> > split with blk_queue_split, but it isn't split to small size because
> > queue limit. the bio is cloned later in md, which uses bio_alloc_bioset.
> > bio_alloc_bioset itself can't allocate big size bio.
> 
> bcache should be fixed to not allocate larger than allowed bios then.
> And handling too large arguments to bio_alloc_bioset is still useful to
> avoid the checks in the callers and make it robust.

Doesn't this conflict the goal of arbitrary bio size? I think nothing is
wrong in bcache side. The caller can allocate any size of bio, the block
layer will split the bio into proper size according to block layer
limitatio and driver limitation. As long as bio_split can do the right
job, caller of bio allo is good. Fixing bcache is in the opposite side.
I'm Cc Kent to check if he wants to fix bcache.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Shaohua Li March 30, 2016, 5:07 p.m. UTC | #8
On Wed, Mar 30, 2016 at 08:13:07PM +0800, Ming Lei wrote:
> Hi Shaohua,
> 
> On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li <shli@fb.com> wrote:
> > On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote:
> >> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@fb.com> wrote:
> >> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
> >> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
> >> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
> >> > fail.
> >> >
> >> > In the future, we can extend the size either bvec_slabs array is
> >> > expanded or the upcoming multipage bvec is added if pages are
> >> > contiguous. This one is suitable for stable.
> >> >
> >> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
> >> > Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
> >> > Cc: stable@vger.kernel.org (4.2+)
> >> > Cc: Ming Lei <ming.lei@canonical.com>
> >> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
> >> > Signed-off-by: Shaohua Li <shli@fb.com>
> >> > ---
> >> >  include/linux/blkdev.h | 6 +++++-
> >> >  1 file changed, 5 insertions(+), 1 deletion(-)
> >> >
> >> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
> >> > index 7e5d7e0..da64325 100644
> >> > --- a/include/linux/blkdev.h
> >> > +++ b/include/linux/blkdev.h
> >> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
> >> >  enum blk_default_limits {
> >> >         BLK_MAX_SEGMENTS        = 128,
> >> >         BLK_SAFE_MAX_SECTORS    = 255,
> >> > -       BLK_DEF_MAX_SECTORS     = 2560,
> >> > +       /*
> >> > +        * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
> >> > +        * Otherwise bio_alloc_bioset will break.
> >> > +        */
> >> > +       BLK_DEF_MAX_SECTORS     = BIO_MAX_SECTORS,
> >>
> >> Thinking about it further, it isn't good to change the default max
> >> sectors because
> >> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.
> >
> > what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help
> > big raid array performance, but REQ_PC isn't the case.
> 
> I mean REQ_PC can include at most 1024 vectors intead of 256, so looks it isn't
> fair to introduce the strict limit for all kinds of requests.
> 
> More importantly, the max sector limit is for limitting max sectors in
> a request,
> and is used for bios merging, not same with bio's 256 bvecs limit.

My point is this doesn't matter because there is no performance issue. 2560
isn't fair too which uses 320 vectors. And note,
blk_queue_max_hw_sectors doesn't force max_hw_sectors has the
BLK_DEF_MAX_SECTORS limit.
> >
> >> So suggest to just change bcache's queue max sector limit to 1M, that means
> >> we shouldn't encourage bcache's usage of bypassing bio_add_page().
> >
> > Don't think this is a good idea. This is a limitation of block core,
> 
> This bio's 256 bvecs limitation is from block implementation, think about why
> one bvec just includes one page, instead of one segment. In the future, it can
> be improved absolutely, that is why I said it isn't good to use BIO_MAX_SECTORS.
> Also you can find that there is only one user of BIO_MAX_SECTORS.

Don't disagree. But when you switch to multpage bvec, you must fix this
anyway, let's fix current problem. Both 1M or 2560 sectors are wrong in
that case. The size limit could be 1M if pages are not contiguous or 256
* max_segment_size.

> > block core should make sure the limitation doesn't break, not the
> > driver. On the other hand, are you going to fix all drivers? drivers can
> > set arbitrary max sector limit.
> 
> The issue only exists if drivers(fs, dm, md, bcache) do not use bio_add_page().
> All this kind of usage shouldn't be encouraged.

bio_add_page can add pages to big bio too, there is no limitation. 
> So how about fixing the issue by introducing the limit into get_max_io_size()?
> Such as, add something like below at the end of this function?
> 
>         sectors = min_t(unsigned, sectors, BIO_MAX_PAGES <<
>                         (PAGE_CACHE_SHIFT - 9));

I can do this, just don't see the point why. max_sectors is a software
limitation.

Thanks,
Shaohua
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Christoph Hellwig March 30, 2016, 5:40 p.m. UTC | #9
On Wed, Mar 30, 2016 at 09:50:30AM -0700, Shaohua Li wrote:
> > bcache should be fixed to not allocate larger than allowed bios then.
> > And handling too large arguments to bio_alloc_bioset is still useful to
> > avoid the checks in the callers and make it robust.
> 
> Doesn't this conflict the goal of arbitrary bio size?

I don't think we ever had the goal of entirely arbitrary bio sizes,
we wanted to get rid of the driver imposed limits.  And I/O submitter
deciding that it's not bound by BIO_MAX_PAGES is something entirely
different.

> I think nothing is
> wrong in bcache side. The caller can allocate any size of bio, the block
> layer will split the bio into proper size according to block layer
> limitatio and driver limitation.

If we get actual arbitrary large bios we

 a) assume drivers can handle bios larger than BIO_MAX_PAGES, which
    we've just noticed md can't
 b) have to handle all sorts of mempools to handle this giant size

nothing that can't be be done, but it's pretty obvious that we're not
there yet.  And I'm not really sure it's necessarily worth it, but
I'm happy to be proven wrong.

> As long as bio_split can do the right
> job, caller of bio allo is good.

But it's pretty clear that it currently doesn't do the right job,
and reducing general queue limits for a single submitter that doesn't
follow the protocol isn't the way to go.  The obvious fix is to
make bcache behave like everyone else for now, and then look into
how useful and painful it would be to move to larger bios in general.
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Ming Lei March 31, 2016, 12:49 a.m. UTC | #10
Hi Shaohua,

On Thu, Mar 31, 2016 at 1:07 AM, Shaohua Li <shli@fb.com> wrote:
> On Wed, Mar 30, 2016 at 08:13:07PM +0800, Ming Lei wrote:
>> Hi Shaohua,
>>
>> On Wed, Mar 30, 2016 at 10:27 AM, Shaohua Li <shli@fb.com> wrote:
>> > On Wed, Mar 30, 2016 at 09:39:35AM +0800, Ming Lei wrote:
>> >> On Wed, Mar 30, 2016 at 12:42 AM, Shaohua Li <shli@fb.com> wrote:
>> >> > bio_alloc_bioset() allocates bvecs from bvec_slabs which can only
>> >> > allocate maximum 256 bvec (eg, 1M for 4k pages). We can't bump
>> >> > BLK_DEF_MAX_SECTORS to exceed this value otherwise bio_alloc_bioset will
>> >> > fail.
>> >> >
>> >> > In the future, we can extend the size either bvec_slabs array is
>> >> > expanded or the upcoming multipage bvec is added if pages are
>> >> > contiguous. This one is suitable for stable.
>> >> >
>> >> > Fixes: d2be537c3ba (block: bump BLK_DEF_MAX_SECTORS to 2560)
>> >> > Reported-by: Sebastian Roesner <sroesner-kernelorg@roesner-online.de>
>> >> > Cc: stable@vger.kernel.org (4.2+)
>> >> > Cc: Ming Lei <ming.lei@canonical.com>
>> >> > Reviewed-by: Jeff Moyer <jmoyer@redhat.com>
>> >> > Signed-off-by: Shaohua Li <shli@fb.com>
>> >> > ---
>> >> >  include/linux/blkdev.h | 6 +++++-
>> >> >  1 file changed, 5 insertions(+), 1 deletion(-)
>> >> >
>> >> > diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
>> >> > index 7e5d7e0..da64325 100644
>> >> > --- a/include/linux/blkdev.h
>> >> > +++ b/include/linux/blkdev.h
>> >> > @@ -1153,7 +1153,11 @@ extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
>> >> >  enum blk_default_limits {
>> >> >         BLK_MAX_SEGMENTS        = 128,
>> >> >         BLK_SAFE_MAX_SECTORS    = 255,
>> >> > -       BLK_DEF_MAX_SECTORS     = 2560,
>> >> > +       /*
>> >> > +        * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
>> >> > +        * Otherwise bio_alloc_bioset will break.
>> >> > +        */
>> >> > +       BLK_DEF_MAX_SECTORS     = BIO_MAX_SECTORS,
>> >>
>> >> Thinking about it further, it isn't good to change the default max
>> >> sectors because
>> >> the patch affects REQ_PC bios too, which don't have the 1Mbytes limit at all.
>> >
>> > what breaks setting REQ_PC to 1M limit? I can understand bigger limit might help
>> > big raid array performance, but REQ_PC isn't the case.
>>
>> I mean REQ_PC can include at most 1024 vectors intead of 256, so looks it isn't
>> fair to introduce the strict limit for all kinds of requests.
>>
>> More importantly, the max sector limit is for limitting max sectors in
>> a request,
>> and is used for bios merging, not same with bio's 256 bvecs limit.
>
> My point is this doesn't matter because there is no performance issue. 2560
> isn't fair too which uses 320 vectors. And note,
> blk_queue_max_hw_sectors doesn't force max_hw_sectors has the
> BLK_DEF_MAX_SECTORS limit.

OK

>> >
>> >> So suggest to just change bcache's queue max sector limit to 1M, that means
>> >> we shouldn't encourage bcache's usage of bypassing bio_add_page().
>> >
>> > Don't think this is a good idea. This is a limitation of block core,
>>
>> This bio's 256 bvecs limitation is from block implementation, think about why
>> one bvec just includes one page, instead of one segment. In the future, it can
>> be improved absolutely, that is why I said it isn't good to use BIO_MAX_SECTORS.
>> Also you can find that there is only one user of BIO_MAX_SECTORS.
>
> Don't disagree. But when you switch to multpage bvec, you must fix this
> anyway, let's fix current problem. Both 1M or 2560 sectors are wrong in
> that case. The size limit could be 1M if pages are not contiguous or 256
> * max_segment_size.
>
>> > block core should make sure the limitation doesn't break, not the
>> > driver. On the other hand, are you going to fix all drivers? drivers can
>> > set arbitrary max sector limit.
>>
>> The issue only exists if drivers(fs, dm, md, bcache) do not use bio_add_page().
>> All this kind of usage shouldn't be encouraged.
>
> bio_add_page can add pages to big bio too, there is no limitation.

Yeah, it is possible, but unusual.

>> So how about fixing the issue by introducing the limit into get_max_io_size()?
>> Such as, add something like below at the end of this function?
>>
>>         sectors = min_t(unsigned, sectors, BIO_MAX_PAGES <<
>>                         (PAGE_CACHE_SHIFT - 9));
>
> I can do this, just don't see the point why. max_sectors is a software
> limitation.

Let me make it clear.  blk_rq_get_max_sectors() is used in for merging
bios/reqs, and that means limits.max_sectors is for limitting max sectors
in one request or transfer. Now this patch decreases it just for single bio's
256 bvec's limitation. Is it correct? That is the reason why I suggest to
change get_max_io_size() for bio's 256 bvecs limit.

On the contrary, the default max sectors should have been increased
since hardware is becoming quicker, and we should send more to drive
in one request, IMO.

Thanks,
Ming
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
Kent Overstreet March 31, 2016, 12:52 a.m. UTC | #11
On Wed, Mar 30, 2016 at 09:50:30AM -0700, Shaohua Li wrote:
> On Tue, Mar 29, 2016 at 11:51:51PM -0700, Christoph Hellwig wrote:
> > On Tue, Mar 29, 2016 at 03:01:10PM -0700, Shaohua Li wrote:
> > > The problem is bcache allocates a big bio (with bio_alloc). The bio is
> > > split with blk_queue_split, but it isn't split to small size because
> > > queue limit. the bio is cloned later in md, which uses bio_alloc_bioset.
> > > bio_alloc_bioset itself can't allocate big size bio.
> > 
> > bcache should be fixed to not allocate larger than allowed bios then.
> > And handling too large arguments to bio_alloc_bioset is still useful to
> > avoid the checks in the callers and make it robust.
> 
> Doesn't this conflict the goal of arbitrary bio size? I think nothing is
> wrong in bcache side. The caller can allocate any size of bio, the block
> layer will split the bio into proper size according to block layer
> limitatio and driver limitation. As long as bio_split can do the right
> job, caller of bio allo is good. Fixing bcache is in the opposite side.
> I'm Cc Kent to check if he wants to fix bcache.

_Allocating_ large bios definitely shouldn't be an issue provided they're split
by the time they get to a driver they pose an isuse for; reason is when the
driver clones the bio & bvec, they're only going to clone the bvecs that are
live in the current split, not all the bvecs in the original bio (and if they
were they'd be broken, as they'd have to be looking at bi_vcnt and bi_vcnt can
be 0 in a split now).

And then since generic_make_request() always calls blk_queue_split() before
passing a bio onto a driver, I'm wondering what the actual bug was...
--
To unsubscribe from this list: send the line "unsubscribe linux-block" in
the body of a message to majordomo@vger.kernel.org
More majordomo info at  http://vger.kernel.org/majordomo-info.html
diff mbox

Patch

diff --git a/include/linux/blkdev.h b/include/linux/blkdev.h
index 7e5d7e0..da64325 100644
--- a/include/linux/blkdev.h
+++ b/include/linux/blkdev.h
@@ -1153,7 +1153,11 @@  extern int blk_verify_command(unsigned char *cmd, fmode_t has_write_perm);
 enum blk_default_limits {
 	BLK_MAX_SEGMENTS	= 128,
 	BLK_SAFE_MAX_SECTORS	= 255,
-	BLK_DEF_MAX_SECTORS	= 2560,
+	/*
+	 * if you change this, please also change bvec_alloc and BIO_MAX_PAGES.
+	 * Otherwise bio_alloc_bioset will break.
+	 */
+	BLK_DEF_MAX_SECTORS	= BIO_MAX_SECTORS,
 	BLK_MAX_SEGMENT_SIZE	= 65536,
 	BLK_SEG_BOUNDARY_MASK	= 0xFFFFFFFFUL,
 };