diff mbox series

block: check bio alignment in blk_mq_submit_bio

Message ID 20240619033443.3017568-1-ming.lei@redhat.com (mailing list archive)
State New, archived
Headers show
Series block: check bio alignment in blk_mq_submit_bio | expand

Commit Message

Ming Lei June 19, 2024, 3:34 a.m. UTC
IO logical block size is one fundamental queue limit, and every IO has
to be aligned with logical block size because our bio split can't deal
with unaligned bio.

The check has to be done with queue usage counter grabbed because device
reconfiguration may change logical block size, and we can prevent the
reconfiguration from happening by holding queue usage counter.

logical_block_size stays in the 1st cache line of queue_limits, and this
cache line is always fetched in fast path via bio_may_exceed_limits(),
so IO perf won't be affected by this check.

Cc: Yi Zhang <yi.zhang@redhat.com>
Cc: Christoph Hellwig <hch@infradead.org>
Cc: Ye Bin <yebin10@huawei.com>
Signed-off-by: Ming Lei <ming.lei@redhat.com>
---
 block/blk-mq.c | 24 ++++++++++++++++++++++++
 1 file changed, 24 insertions(+)

Comments

Damien Le Moal June 19, 2024, 4:14 a.m. UTC | #1
On 6/19/24 12:34, Ming Lei wrote:
> IO logical block size is one fundamental queue limit, and every IO has
> to be aligned with logical block size because our bio split can't deal
> with unaligned bio.
> 
> The check has to be done with queue usage counter grabbed because device
> reconfiguration may change logical block size, and we can prevent the
> reconfiguration from happening by holding queue usage counter.
> 
> logical_block_size stays in the 1st cache line of queue_limits, and this
> cache line is always fetched in fast path via bio_may_exceed_limits(),
> so IO perf won't be affected by this check.
> 
> Cc: Yi Zhang <yi.zhang@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ye Bin <yebin10@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>  block/blk-mq.c | 24 ++++++++++++++++++++++++
>  1 file changed, 24 insertions(+)
> 
> diff --git a/block/blk-mq.c b/block/blk-mq.c
> index 3b4df8e5ac9e..7bb50b6b9567 100644
> --- a/block/blk-mq.c
> +++ b/block/blk-mq.c
> @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
>  	INIT_LIST_HEAD(&rq->queuelist);
>  }
>  
> +static bool bio_unaligned(const struct bio *bio,
> +		const struct request_queue *q)
> +{
> +	unsigned int bs = queue_logical_block_size(q);
> +
> +	if (bio->bi_iter.bi_size & (bs - 1))
> +		return true;
> +
> +	if (bio->bi_iter.bi_size &&
> +	    ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1)))

Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone
management operations). So this seems incorrect to me...

> +		return true;
> +
> +	return false;
> +}
> +
>  /**
>   * blk_mq_submit_bio - Create and send a request to block device.
>   * @bio: Bio pointer.
> @@ -2966,6 +2981,15 @@ void blk_mq_submit_bio(struct bio *bio)
>  			return;
>  	}
>  
> +	/*
> +	 * Device reconfiguration may change logical block size, so alignment
> +	 * check has to be done with queue usage counter held
> +	 */
> +	if (unlikely(bio_unaligned(bio, q))) {
> +		bio_io_error(bio);
> +		goto queue_exit;
> +	}
> +
>  	if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
>  		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
>  		if (!bio)
Damien Le Moal June 19, 2024, 4:22 a.m. UTC | #2
On 6/19/24 13:14, Damien Le Moal wrote:
> On 6/19/24 12:34, Ming Lei wrote:
>> IO logical block size is one fundamental queue limit, and every IO has
>> to be aligned with logical block size because our bio split can't deal
>> with unaligned bio.
>>
>> The check has to be done with queue usage counter grabbed because device
>> reconfiguration may change logical block size, and we can prevent the
>> reconfiguration from happening by holding queue usage counter.
>>
>> logical_block_size stays in the 1st cache line of queue_limits, and this
>> cache line is always fetched in fast path via bio_may_exceed_limits(),
>> so IO perf won't be affected by this check.
>>
>> Cc: Yi Zhang <yi.zhang@redhat.com>
>> Cc: Christoph Hellwig <hch@infradead.org>
>> Cc: Ye Bin <yebin10@huawei.com>
>> Signed-off-by: Ming Lei <ming.lei@redhat.com>
>> ---
>>  block/blk-mq.c | 24 ++++++++++++++++++++++++
>>  1 file changed, 24 insertions(+)
>>
>> diff --git a/block/blk-mq.c b/block/blk-mq.c
>> index 3b4df8e5ac9e..7bb50b6b9567 100644
>> --- a/block/blk-mq.c
>> +++ b/block/blk-mq.c
>> @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
>>  	INIT_LIST_HEAD(&rq->queuelist);
>>  }
>>  
>> +static bool bio_unaligned(const struct bio *bio,
>> +		const struct request_queue *q)
>> +{
>> +	unsigned int bs = queue_logical_block_size(q);
>> +
>> +	if (bio->bi_iter.bi_size & (bs - 1))
>> +		return true;
>> +
>> +	if (bio->bi_iter.bi_size &&
>> +	    ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1)))
> 
> Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone
> management operations). So this seems incorrect to me...

I meant to say, why not checking the sector alignment for these BIOs as well ?
Something like:

static bool bio_unaligned(const struct bio *bio,
		          const struct request_queue *q)
{
	unsigned int bs_mask = queue_logical_block_size(q) - 1;

	return (bio->bi_iter.bi_size & bs_mask) ||
		((bio->bi_iter.bi_sector << SECTOR_SHIFT) & bs_mask);
}

> 
>> +		return true;
>> +
>> +	return false;
>> +}
>> +
>>  /**
>>   * blk_mq_submit_bio - Create and send a request to block device.
>>   * @bio: Bio pointer.
>> @@ -2966,6 +2981,15 @@ void blk_mq_submit_bio(struct bio *bio)
>>  			return;
>>  	}
>>  
>> +	/*
>> +	 * Device reconfiguration may change logical block size, so alignment
>> +	 * check has to be done with queue usage counter held
>> +	 */
>> +	if (unlikely(bio_unaligned(bio, q))) {
>> +		bio_io_error(bio);
>> +		goto queue_exit;
>> +	}
>> +
>>  	if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
>>  		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
>>  		if (!bio)
>
Christoph Hellwig June 19, 2024, 7:33 a.m. UTC | #3
On Wed, Jun 19, 2024 at 01:22:27PM +0900, Damien Le Moal wrote:
> static bool bio_unaligned(const struct bio *bio,
> 		          const struct request_queue *q)
> {
> 	unsigned int bs_mask = queue_logical_block_size(q) - 1;

Please avoid use of the queue helpers.  This should be:

	unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev);
Hannes Reinecke June 19, 2024, 7:50 a.m. UTC | #4
On 6/19/24 05:34, Ming Lei wrote:
> IO logical block size is one fundamental queue limit, and every IO has
> to be aligned with logical block size because our bio split can't deal
> with unaligned bio.
> 
> The check has to be done with queue usage counter grabbed because device
> reconfiguration may change logical block size, and we can prevent the
> reconfiguration from happening by holding queue usage counter.
> 
> logical_block_size stays in the 1st cache line of queue_limits, and this
> cache line is always fetched in fast path via bio_may_exceed_limits(),
> so IO perf won't be affected by this check.
> 
> Cc: Yi Zhang <yi.zhang@redhat.com>
> Cc: Christoph Hellwig <hch@infradead.org>
> Cc: Ye Bin <yebin10@huawei.com>
> Signed-off-by: Ming Lei <ming.lei@redhat.com>
> ---
>   block/blk-mq.c | 24 ++++++++++++++++++++++++
>   1 file changed, 24 insertions(+)
> 
Is this still an issue after the atomic queue limits patchset from 
Christoph?
One of the changes there is that we now always freeze the queue before
changing any limits.
So really this check should never trigger.

Hmm?

Cheers,

Hannes
Ming Lei June 19, 2024, 7:56 a.m. UTC | #5
On Wed, Jun 19, 2024 at 01:14:02PM +0900, Damien Le Moal wrote:
> On 6/19/24 12:34, Ming Lei wrote:
> > IO logical block size is one fundamental queue limit, and every IO has
> > to be aligned with logical block size because our bio split can't deal
> > with unaligned bio.
> > 
> > The check has to be done with queue usage counter grabbed because device
> > reconfiguration may change logical block size, and we can prevent the
> > reconfiguration from happening by holding queue usage counter.
> > 
> > logical_block_size stays in the 1st cache line of queue_limits, and this
> > cache line is always fetched in fast path via bio_may_exceed_limits(),
> > so IO perf won't be affected by this check.
> > 
> > Cc: Yi Zhang <yi.zhang@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Ye Bin <yebin10@huawei.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >  block/blk-mq.c | 24 ++++++++++++++++++++++++
> >  1 file changed, 24 insertions(+)
> > 
> > diff --git a/block/blk-mq.c b/block/blk-mq.c
> > index 3b4df8e5ac9e..7bb50b6b9567 100644
> > --- a/block/blk-mq.c
> > +++ b/block/blk-mq.c
> > @@ -2914,6 +2914,21 @@ static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
> >  	INIT_LIST_HEAD(&rq->queuelist);
> >  }
> >  
> > +static bool bio_unaligned(const struct bio *bio,
> > +		const struct request_queue *q)
> > +{
> > +	unsigned int bs = queue_logical_block_size(q);
> > +
> > +	if (bio->bi_iter.bi_size & (bs - 1))
> > +		return true;
> > +
> > +	if (bio->bi_iter.bi_size &&
> > +	    ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1)))
> 
> Hmmm... Some BIO operations have a 0 size but do specify a sector (e.g. zone
> management operations).

If we add the check for all type of IO, it requires ->bi_sector to
be meaningful for zero size bio. I am not sure if it is always true,
such as RESET_ALL.

> So this seems incorrect to me...

It is correct, but only cover bio with real ->bi_sector & ->bi_size.


Thanks,
Ming
Ming Lei June 19, 2024, 7:58 a.m. UTC | #6
On Wed, Jun 19, 2024 at 12:33:49AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 19, 2024 at 01:22:27PM +0900, Damien Le Moal wrote:
> > static bool bio_unaligned(const struct bio *bio,
> > 		          const struct request_queue *q)
> > {
> > 	unsigned int bs_mask = queue_logical_block_size(q) - 1;
> 
> Please avoid use of the queue helpers.  This should be:
> 
> 	unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev);
 
It is one blk-mq internal helper, I think queue helper is more
efficient since it is definitely in fast path.


Thanks,
Ming
Ming Lei June 19, 2024, 8:02 a.m. UTC | #7
On Wed, Jun 19, 2024 at 09:50:38AM +0200, Hannes Reinecke wrote:
> On 6/19/24 05:34, Ming Lei wrote:
> > IO logical block size is one fundamental queue limit, and every IO has
> > to be aligned with logical block size because our bio split can't deal
> > with unaligned bio.
> > 
> > The check has to be done with queue usage counter grabbed because device
> > reconfiguration may change logical block size, and we can prevent the
> > reconfiguration from happening by holding queue usage counter.
> > 
> > logical_block_size stays in the 1st cache line of queue_limits, and this
> > cache line is always fetched in fast path via bio_may_exceed_limits(),
> > so IO perf won't be affected by this check.
> > 
> > Cc: Yi Zhang <yi.zhang@redhat.com>
> > Cc: Christoph Hellwig <hch@infradead.org>
> > Cc: Ye Bin <yebin10@huawei.com>
> > Signed-off-by: Ming Lei <ming.lei@redhat.com>
> > ---
> >   block/blk-mq.c | 24 ++++++++++++++++++++++++
> >   1 file changed, 24 insertions(+)
> > 
> Is this still an issue after the atomic queue limits patchset from
> Christoph?
> One of the changes there is that we now always freeze the queue before
> changing any limits.
> So really this check should never trigger.

submit_bio() just blocks on queue freezing, and once queue is unfrozen,
submit_bio() still moves on, then unaligned bio is issued to
driver/hardware, please see:

https://lore.kernel.org/linux-block/ZnDmXsFIPmPlT6Si@fedora/T/#m48c098e6d2df142da97ee3992b47d2b7e942a161


Thanks,
Ming
Christoph Hellwig June 19, 2024, 8:05 a.m. UTC | #8
On Wed, Jun 19, 2024 at 03:56:43PM +0800, Ming Lei wrote:
> If we add the check for all type of IO, it requires ->bi_sector to
> be meaningful for zero size bio. I am not sure if it is always true,
> such as RESET_ALL.

meaningful or initialized to zero.  Given that bio_init initializes it
to zero we should generally be fine (and are for BIO_OP_ZONE_RESET_ALL
for all callers in tree).
Christoph Hellwig June 19, 2024, 8:06 a.m. UTC | #9
On Wed, Jun 19, 2024 at 03:58:37PM +0800, Ming Lei wrote:
> > > 	unsigned int bs_mask = queue_logical_block_size(q) - 1;
> > 
> > Please avoid use of the queue helpers.  This should be:
> > 
> > 	unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev);
>  
> It is one blk-mq internal helper, I think queue helper is more
> efficient since it is definitely in fast path.

Does it actually generate different code for you with all the inlining
modern compilers do?
Ming Lei June 19, 2024, 8:29 a.m. UTC | #10
On Wed, Jun 19, 2024 at 01:06:32AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 19, 2024 at 03:58:37PM +0800, Ming Lei wrote:
> > > > 	unsigned int bs_mask = queue_logical_block_size(q) - 1;
> > > 
> > > Please avoid use of the queue helpers.  This should be:
> > > 
> > > 	unsigned int bs_mask = bdev_logical_block_size(bio->bi_bdev);
> >  
> > It is one blk-mq internal helper, I think queue helper is more
> > efficient since it is definitely in fast path.
> 
> Does it actually generate different code for you with all the inlining
> modern compilers do?
 
It is hard to answer, cause there are so many compilers(versions).

I definitely agree bdev_logical_block_size() should be used in external
users, but it is fine to use queue helper in block internal functions.


thanks,
Ming
Ming Lei June 19, 2024, 8:37 a.m. UTC | #11
On Wed, Jun 19, 2024 at 01:05:58AM -0700, Christoph Hellwig wrote:
> On Wed, Jun 19, 2024 at 03:56:43PM +0800, Ming Lei wrote:
> > If we add the check for all type of IO, it requires ->bi_sector to
> > be meaningful for zero size bio. I am not sure if it is always true,
> > such as RESET_ALL.
> 
> meaningful or initialized to zero.  Given that bio_init initializes it
> to zero we should generally be fine (and are for BIO_OP_ZONE_RESET_ALL
> for all callers in tree).

Fine, let's fail this kind potential not-well-initialized bio, which is
brittle anyway.


Thanks,
Ming
diff mbox series

Patch

diff --git a/block/blk-mq.c b/block/blk-mq.c
index 3b4df8e5ac9e..7bb50b6b9567 100644
--- a/block/blk-mq.c
+++ b/block/blk-mq.c
@@ -2914,6 +2914,21 @@  static void blk_mq_use_cached_rq(struct request *rq, struct blk_plug *plug,
 	INIT_LIST_HEAD(&rq->queuelist);
 }
 
+static bool bio_unaligned(const struct bio *bio,
+		const struct request_queue *q)
+{
+	unsigned int bs = queue_logical_block_size(q);
+
+	if (bio->bi_iter.bi_size & (bs - 1))
+		return true;
+
+	if (bio->bi_iter.bi_size &&
+	    ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1)))
+		return true;
+
+	return false;
+}
+
 /**
  * blk_mq_submit_bio - Create and send a request to block device.
  * @bio: Bio pointer.
@@ -2966,6 +2981,15 @@  void blk_mq_submit_bio(struct bio *bio)
 			return;
 	}
 
+	/*
+	 * Device reconfiguration may change logical block size, so alignment
+	 * check has to be done with queue usage counter held
+	 */
+	if (unlikely(bio_unaligned(bio, q))) {
+		bio_io_error(bio);
+		goto queue_exit;
+	}
+
 	if (unlikely(bio_may_exceed_limits(bio, &q->limits))) {
 		bio = __bio_split_to_limits(bio, &q->limits, &nr_segs);
 		if (!bio)