Message ID | 20240619033443.3017568-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: check bio alignment in blk_mq_submit_bio | expand |
On 6/19/24 12:34, Ming Lei wrote: > IO logical block size is one fundamental queue limit, and every IO has > to be aligned with logical block size because our bio split can't deal > with unaligned bio. > > The check has to be done with queue usage counter grabbed because device > reconfiguration may change logical block size, and we can prevent the > reconfiguration from happening by holding queue usage counter. > > logical_block_size stays in the 1st cache line of queue_limits, and this > cache line is always fetched in fast path via bio_may_exceed_limits(), > so IO perf won't be affected by this check. > > Cc: Yi Zhang <yi.zhang@redhat.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Ye Bin <yebin10@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 3b4df8e5ac9e..7bb50b6b9567 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug, > INIT_LIST_HEAD(&rq->queuelist); > } > > +static bool bio_unaligned(const struct bio *bio, > + const struct request_queue *q) > +{ > + unsigned int bs = queue_logical_block_size(q); > + > + if (bio->bi_iter.bi_size & (bs - 1)) > + return true; > + > + if (bio->bi_iter.bi_size && > + ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone management operations). So this seems incorrect to me... > + return true; > + > + return false; > +} > + > /** > * blk_mq_submit_bio - Create and send a request to block device. > * @bio: Bio pointer. > @@ -2966,6 +2981,15 @@ void blk_mq_submit_bio(struct bio *bio) > return; > } > > + /* > + * Device reconfiguration may change logical block size, so alignment > + * check has to be done with queue usage counter held > + */ > + if (unlikely(bio_unaligned(bio, q))) { > + bio_io_error(bio); > + goto queue_exit; > + } > + > if (unlikely(bio_may_exceed_limits(bio, &q->limits))) { > bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); > if (!bio)
On 6/19/24 13:14, Damien Le Moal wrote: > On 6/19/24 12:34, Ming Lei wrote: >> IO logical block size is one fundamental queue limit, and every IO has >> to be aligned with logical block size because our bio split can't deal >> with unaligned bio. >> >> The check has to be done with queue usage counter grabbed because device >> reconfiguration may change logical block size, and we can prevent the >> reconfiguration from happening by holding queue usage counter. >> >> logical_block_size stays in the 1st cache line of queue_limits, and this >> cache line is always fetched in fast path via bio_may_exceed_limits(), >> so IO perf won't be affected by this check. >> >> Cc: Yi Zhang <yi.zhang@redhat.com> >> Cc: Christoph Hellwig <hch@infradead.org> >> Cc: Ye Bin <yebin10@huawei.com> >> Signed-off-by: Ming Lei <ming.lei@redhat.com> >> --- >> block/blk-mq.c | 24 ++++++++++++++++++++++++ >> 1 file changed, 24 insertions(+) >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index 3b4df8e5ac9e..7bb50b6b9567 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug, >> INIT_LIST_HEAD(&rq->queuelist); >> } >> >> +static bool bio_unaligned(const struct bio *bio, >> + const struct request_queue *q) >> +{ >> + unsigned int bs = queue_logical_block_size(q); >> + >> + if (bio->bi_iter.bi_size & (bs - 1)) >> + return true; >> + >> + if (bio->bi_iter.bi_size && >> + ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone > management operations). So this seems incorrect to me... I meant to say, why not checking the sector alignment for these BIOs as well ? Something like: static bool bio_unaligned(const struct bio *bio, const struct request_queue *q) { unsigned int bs_mask = queue_logical_block_size(q) - 1; return (bio->bi_iter.bi_size & bs_mask) || ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & bs_mask); } > >> + return true; >> + >> + return false; >> +} >> + >> /** >> * blk_mq_submit_bio - Create and send a request to block device. >> * @bio: Bio pointer. >> @@ -2966,6 +2981,15 @@ void blk_mq_submit_bio(struct bio *bio) >> return; >> } >> >> + /* >> + * Device reconfiguration may change logical block size, so alignment >> + * check has to be done with queue usage counter held >> + */ >> + if (unlikely(bio_unaligned(bio, q))) { >> + bio_io_error(bio); >> + goto queue_exit; >> + } >> + >> if (unlikely(bio_may_exceed_limits(bio, &q->limits))) { >> bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); >> if (!bio) >
On Wed, Jun 19, 2024 at 01:22:27PM +0900, Damien Le Moal wrote: > static bool bio_unaligned(const struct bio *bio, > const struct request_queue *q) > { > unsigned int bs_mask = queue_logical_block_size(q) - 1; Please avoid use of the queue helpers. This should be: unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev);
On 6/19/24 05:34, Ming Lei wrote: > IO logical block size is one fundamental queue limit, and every IO has > to be aligned with logical block size because our bio split can't deal > with unaligned bio. > > The check has to be done with queue usage counter grabbed because device > reconfiguration may change logical block size, and we can prevent the > reconfiguration from happening by holding queue usage counter. > > logical_block_size stays in the 1st cache line of queue_limits, and this > cache line is always fetched in fast path via bio_may_exceed_limits(), > so IO perf won't be affected by this check. > > Cc: Yi Zhang <yi.zhang@redhat.com> > Cc: Christoph Hellwig <hch@infradead.org> > Cc: Ye Bin <yebin10@huawei.com> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-mq.c | 24 ++++++++++++++++++++++++ > 1 file changed, 24 insertions(+) > Is this still an issue after the atomic queue limits patchset from Christoph? One of the changes there is that we now always freeze the queue before changing any limits. So really this check should never trigger. Hmm? Cheers, Hannes
On Wed, Jun 19, 2024 at 01:14:02PM +0900, Damien Le Moal wrote: > On 6/19/24 12:34, Ming Lei wrote: > > IO logical block size is one fundamental queue limit, and every IO has > > to be aligned with logical block size because our bio split can't deal > > with unaligned bio. > > > > The check has to be done with queue usage counter grabbed because device > > reconfiguration may change logical block size, and we can prevent the > > reconfiguration from happening by holding queue usage counter. > > > > logical_block_size stays in the 1st cache line of queue_limits, and this > > cache line is always fetched in fast path via bio_may_exceed_limits(), > > so IO perf won't be affected by this check. > > > > Cc: Yi Zhang <yi.zhang@redhat.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Cc: Ye Bin <yebin10@huawei.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq.c | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > > index 3b4df8e5ac9e..7bb50b6b9567 100644 > > --- a/block/blk-mq.c > > +++ b/block/blk-mq.c > > @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug, > > INIT_LIST_HEAD(&rq->queuelist); > > } > > > > +static bool bio_unaligned(const struct bio *bio, > > + const struct request_queue *q) > > +{ > > + unsigned int bs = queue_logical_block_size(q); > > + > > + if (bio->bi_iter.bi_size & (bs - 1)) > > + return true; > > + > > + if (bio->bi_iter.bi_size && > > + ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone > management operations). If we add the check for all type of IO, it requires ->bi_sector to be meaningful for zero size bio. I am not sure if it is always true, such as RESET_ALL. > So this seems incorrect to me... It is correct, but only cover bio with real ->bi_sector & ->bi_size. Thanks, Ming
On Wed, Jun 19, 2024 at 12:33:49AM -0700, Christoph Hellwig wrote: > On Wed, Jun 19, 2024 at 01:22:27PM +0900, Damien Le Moal wrote: > > static bool bio_unaligned(const struct bio *bio, > > const struct request_queue *q) > > { > > unsigned int bs_mask = queue_logical_block_size(q) - 1; > > Please avoid use of the queue helpers. This should be: > > unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev); It is one blk-mq internal helper, I think queue helper is more efficient since it is definitely in fast path. Thanks, Ming
On Wed, Jun 19, 2024 at 09:50:38AM +0200, Hannes Reinecke wrote: > On 6/19/24 05:34, Ming Lei wrote: > > IO logical block size is one fundamental queue limit, and every IO has > > to be aligned with logical block size because our bio split can't deal > > with unaligned bio. > > > > The check has to be done with queue usage counter grabbed because device > > reconfiguration may change logical block size, and we can prevent the > > reconfiguration from happening by holding queue usage counter. > > > > logical_block_size stays in the 1st cache line of queue_limits, and this > > cache line is always fetched in fast path via bio_may_exceed_limits(), > > so IO perf won't be affected by this check. > > > > Cc: Yi Zhang <yi.zhang@redhat.com> > > Cc: Christoph Hellwig <hch@infradead.org> > > Cc: Ye Bin <yebin10@huawei.com> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-mq.c | 24 ++++++++++++++++++++++++ > > 1 file changed, 24 insertions(+) > > > Is this still an issue after the atomic queue limits patchset from > Christoph? > One of the changes there is that we now always freeze the queue before > changing any limits. > So really this check should never trigger. submit_bio() just blocks on queue freezing, and once queue is unfrozen, submit_bio() still moves on, then unaligned bio is issued to driver/hardware, please see: https://lore.kernel.org/linux-block/ZnDmXsFIPmPlT6Si@fedora/T/#m48c098e6d2df142da97ee3992b47d2b7e942a161 Thanks, Ming
On Wed, Jun 19, 2024 at 03:56:43PM +0800, Ming Lei wrote: > If we add the check for all type of IO, it requires ->bi_sector to > be meaningful for zero size bio. I am not sure if it is always true, > such as RESET_ALL. meaningful or initialized to zero. Given that bio_init initializes it to zero we should generally be fine (and are for BIO_OP_ZONE_RESET_ALL for all callers in tree).
On Wed, Jun 19, 2024 at 03:58:37PM +0800, Ming Lei wrote: > > > unsigned int bs_mask = queue_logical_block_size(q) - 1; > > > > Please avoid use of the queue helpers. This should be: > > > > unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev); > > It is one blk-mq internal helper, I think queue helper is more > efficient since it is definitely in fast path. Does it actually generate different code for you with all the inlining modern compilers do?
On Wed, Jun 19, 2024 at 01:06:32AM -0700, Christoph Hellwig wrote: > On Wed, Jun 19, 2024 at 03:58:37PM +0800, Ming Lei wrote: > > > > unsigned int bs_mask = queue_logical_block_size(q) - 1; > > > > > > Please avoid use of the queue helpers. This should be: > > > > > > unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev); > > > > It is one blk-mq internal helper, I think queue helper is more > > efficient since it is definitely in fast path. > > Does it actually generate different code for you with all the inlining > modern compilers do? It is hard to answer, cause there are so many compilers(versions). I definitely agree bdev_logical_block_size() should be used in external users, but it is fine to use queue helper in block internal functions. thanks, Ming
On Wed, Jun 19, 2024 at 01:05:58AM -0700, Christoph Hellwig wrote: > On Wed, Jun 19, 2024 at 03:56:43PM +0800, Ming Lei wrote: > > If we add the check for all type of IO, it requires ->bi_sector to > > be meaningful for zero size bio. I am not sure if it is always true, > > such as RESET_ALL. > > meaningful or initialized to zero. Given that bio_init initializes it > to zero we should generally be fine (and are for BIO_OP_ZONE_RESET_ALL > for all callers in tree). Fine, let's fail this kind potential not-well-initialized bio, which is brittle anyway. Thanks, Ming
diff --git a/block/blk-mq.c b/block/blk-mq.c index 3b4df8e5ac9e..7bb50b6b9567 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug, INIT_LIST_HEAD(&rq->queuelist); } +static bool bio_unaligned(const struct bio *bio, + const struct request_queue *q) +{ + unsigned int bs = queue_logical_block_size(q); + + if (bio->bi_iter.bi_size & (bs - 1)) + return true; + + if (bio->bi_iter.bi_size && + ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) + return true; + + return false; +} + /** * blk_mq_submit_bio - Create and send a request to block device. * @bio: Bio pointer. @@ -2966,6 +2981,15 @@ void blk_mq_submit_bio(struct bio *bio) return; } + /* + * Device reconfiguration may change logical block size, so alignment + * check has to be done with queue usage counter held + */ + if (unlikely(bio_unaligned(bio, q))) { + bio_io_error(bio); + goto queue_exit; + } + if (unlikely(bio_may_exceed_limits(bio, &q->limits))) { bio = __bio_split_to_limits(bio, &q->limits, &nr_segs); if (!bio)
IO logical block size is one fundamental queue limit, and every IO has to be aligned with logical block size because our bio split can't deal with unaligned bio. The check has to be done with queue usage counter grabbed because device reconfiguration may change logical block size, and we can prevent the reconfiguration from happening by holding queue usage counter. logical_block_size stays in the 1st cache line of queue_limits, and this cache line is always fetched in fast path via bio_may_exceed_limits(), so IO perf won't be affected by this check. Cc: Yi Zhang <yi.zhang@redhat.com> Cc: Christoph Hellwig <hch@infradead.org> Cc: Ye Bin <yebin10@huawei.com> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-mq.c | 24 ++++++++++++++++++++++++ 1 file changed, 24 insertions(+)