Message ID | 20240321131634.1009972-1-ming.lei@redhat.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | block: fail unaligned bio from submit_bio_noacct() | expand |
On 3/21/24 06:16, Ming Lei wrote: > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + unsigned int size = bio->bi_iter.bi_size; > + > + if (size & (bs - 1)) > + return false; > + > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > + return false; Why "size &&"? It doesn't harm to reject unaligned bios if size == 0 and it will reduce the number of if-tests in the hot path. Why to shift bio->bi_iter.bi_sector left instead of shifting (bs - 1) right? Thanks, Bart.
On Thu, Mar 21, 2024 at 08:14:24AM -0700, Bart Van Assche wrote: > On 3/21/24 06:16, Ming Lei wrote: > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > +{ > > + unsigned int bs = q->limits.logical_block_size; > > + unsigned int size = bio->bi_iter.bi_size; > > + > > + if (size & (bs - 1)) > > + return false; > > + > > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > + return false; > Why "size &&"? It doesn't harm to reject unaligned bios if size == 0 and > it will reduce the number of if-tests in the hot path. It doesn't make sense to check the alignment for bio without data. > > Why to shift bio->bi_iter.bi_sector left instead of shifting (bs - 1) > right? unit of bs is bytes, so .bi_sector needs to be converted to byte first. Thanks, Ming
On Thu, Mar 21 2024 at 9:16P -0400, Ming Lei <ming.lei@redhat.com> wrote: > For any bio with data, its start sector and size have to be aligned with > the queue's logical block size. > > This rule is obvious, but there is still user which may send unaligned > bio to block layer, and it is observed that dm-integrity can do that, > and cause double free of driver's dma meta buffer. > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > troubles. > > Cc: Mikulas Patocka <mpatocka@redhat.com> > Cc: Mike Snitzer <snitzer@kernel.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-core.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..b1a10187ef74 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) > __submit_bio_noacct(bio); > } > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + unsigned int size = bio->bi_iter.bi_size; > + > + if (size & (bs - 1)) > + return false; > + > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > + return false; > + > + return true; > +} > + > /** > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > * @bio: The bio describing the location in memory and on the device. > @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) > } > } > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > + goto end_io; > + > if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > bio_clear_polled(bio); > > -- > 2.41.0 > > This check would really help more quickly find buggy code, but it would be unfortunate for these extra checks to be required in production. It feels like this is the type of check that should be wrapped by a debug CONFIG option (so only debug kernels have it). Do we already have an appropriate CONFIG option to use? Mike
On Thu, 21 Mar 2024, Mike Snitzer wrote: > On Thu, Mar 21 2024 at 9:16P -0400, > Ming Lei <ming.lei@redhat.com> wrote: > > > For any bio with data, its start sector and size have to be aligned with > > the queue's logical block size. > > > > This rule is obvious, but there is still user which may send unaligned > > bio to block layer, and it is observed that dm-integrity can do that, > > and cause double free of driver's dma meta buffer. > > > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > > troubles. > > > > Cc: Mikulas Patocka <mpatocka@redhat.com> > > Cc: Mike Snitzer <snitzer@kernel.org> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-core.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index a16b5abdbbf5..b1a10187ef74 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) > > __submit_bio_noacct(bio); > > } > > > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > +{ > > + unsigned int bs = q->limits.logical_block_size; > > + unsigned int size = bio->bi_iter.bi_size; > > + > > + if (size & (bs - 1)) > > + return false; > > + > > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > + return false; > > + > > + return true; > > +} I would change it to if (unlikely(((bi_iter.bi_sector | bio_sectors(bio)) & ((queue_logical_block_size(q) >> 9) - 1)) != 0)) return false; > > /** > > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > > * @bio: The bio describing the location in memory and on the device. > > @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) > > } > > } > > > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > > + goto end_io; > > + > > if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > > bio_clear_polled(bio); > > > > -- > > 2.41.0 > > > > > > This check would really help more quickly find buggy code, but it > would be unfortunate for these extra checks to be required in > production. It feels like this is the type of check that should be > wrapped by a debug CONFIG option (so only debug kernels have it). > > Do we already have an appropriate CONFIG option to use? > > Mike But then, the system would crash with the config option being 'n' and return an error with the config option being 'y' - which would be unfortunate. We could remove the check from the drivers and add it to the generic I/O path - this wouldn't add extra overhead. Mikulas
On 3/21/24 7:16 AM, Ming Lei wrote: > For any bio with data, its start sector and size have to be aligned with > the queue's logical block size. > > This rule is obvious, but there is still user which may send unaligned > bio to block layer, and it is observed that dm-integrity can do that, > and cause double free of driver's dma meta buffer. > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > troubles. > > Cc: Mikulas Patocka <mpatocka@redhat.com> > Cc: Mike Snitzer <snitzer@kernel.org> > Signed-off-by: Ming Lei <ming.lei@redhat.com> > --- > block/blk-core.c | 17 +++++++++++++++++ > 1 file changed, 17 insertions(+) > > diff --git a/block/blk-core.c b/block/blk-core.c > index a16b5abdbbf5..b1a10187ef74 100644 > --- a/block/blk-core.c > +++ b/block/blk-core.c > @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) > __submit_bio_noacct(bio); > } > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + unsigned int size = bio->bi_iter.bi_size; > + > + if (size & (bs - 1)) > + return false; > + > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > + return false; > + > + return true; > +} > + > /** > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > * @bio: The bio describing the location in memory and on the device. > @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) > } > } > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > + goto end_io; > + > if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > bio_clear_polled(bio); Where is this IO coming from? The normal block level dio has checks. And in fact they are expensive... If we add this one, then we should be able to kill the block/fops.c checks, no?
On Thu, Mar 21, 2024 at 09:16:34PM +0800, Ming Lei wrote: > For any bio with data, its start sector and size have to be aligned with > the queue's logical block size. > > This rule is obvious, but there is still user which may send unaligned > bio to block layer, and it is observed that dm-integrity can do that, > and cause double free of driver's dma meta buffer. > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > troubles. I've been wanting to do that for the next merge window, as the lack of this check is kinda stunning. Note that we have open coded versions of it in __blkdev_issue_dicard and blkdev_issue_zeroout that can go away now. > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > +{ > + unsigned int bs = q->limits.logical_block_size; > + unsigned int size = bio->bi_iter.bi_size; This should just use bdev_logical_block_size() on bio->bi_bdev.
On Thu, Mar 21, 2024 at 06:01:41PM +0100, Mikulas Patocka wrote: > But then, the system would crash with the config option being 'n' and > return an error with the config option being 'y' - which would be > unfortunate. > > We could remove the check from the drivers and add it to the generic I/O > path - this wouldn't add extra overhead. Agreed. This is a pretty cheap check, and I'd rather have it all the time instead of being duplicated in various drivers, just like we do a lot of other sanity checking in submit_bio.
On Thu, Mar 21, 2024 at 11:09:25AM -0600, Jens Axboe wrote: > Where is this IO coming from? The normal block level dio has checks. And > in fact they are expensive... How is this expensive when we need the bio cache lines all the time during I/O submission which follows instantly? > If we add this one, then we should be able > to kill the block/fops.c checks, no? That would mean we'd only get an async error back, and it would be past say page cache invalidation. Not sure that's a good way to handle failed alignment from userspace.
On 3/21/24 4:09 PM, Christoph Hellwig wrote: > On Thu, Mar 21, 2024 at 11:09:25AM -0600, Jens Axboe wrote: >> Where is this IO coming from? The normal block level dio has checks. And >> in fact they are expensive... > > How is this expensive when we need the bio cache lines all the time > during I/O submission which follows instantly? This part isn't expensive, it's the general dio checks that are which check memory alignment as well. >> If we add this one, then we should be able >> to kill the block/fops.c checks, no? > > That would mean we'd only get an async error back, and it would be past > say page cache invalidation. Not sure that's a good way to handle > failed alignment from userspace. It would fail during submission, at least for sane cases, so it would be readily apparent after submit. Unfortunately we still don't have sane passback of errors there, this is one example and EWOULDBLOCK as well.
On Thu, Mar 21, 2024 at 04:50:44PM -0600, Jens Axboe wrote: > On 3/21/24 4:09 PM, Christoph Hellwig wrote: > > On Thu, Mar 21, 2024 at 11:09:25AM -0600, Jens Axboe wrote: > >> Where is this IO coming from? The normal block level dio has checks. And > >> in fact they are expensive... > > > > How is this expensive when we need the bio cache lines all the time > > during I/O submission which follows instantly? > > This part isn't expensive, it's the general dio checks that are which > check memory alignment as well. Well, the memory alignment has to walk over the whole iterator, so it definitively is way more expensive.
On Thu, Mar 21, 2024 at 11:09:25AM -0600, Jens Axboe wrote: > On 3/21/24 7:16 AM, Ming Lei wrote: > > For any bio with data, its start sector and size have to be aligned with > > the queue's logical block size. > > > > This rule is obvious, but there is still user which may send unaligned > > bio to block layer, and it is observed that dm-integrity can do that, > > and cause double free of driver's dma meta buffer. > > > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > > troubles. > > > > Cc: Mikulas Patocka <mpatocka@redhat.com> > > Cc: Mike Snitzer <snitzer@kernel.org> > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > --- > > block/blk-core.c | 17 +++++++++++++++++ > > 1 file changed, 17 insertions(+) > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > index a16b5abdbbf5..b1a10187ef74 100644 > > --- a/block/blk-core.c > > +++ b/block/blk-core.c > > @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) > > __submit_bio_noacct(bio); > > } > > > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > +{ > > + unsigned int bs = q->limits.logical_block_size; > > + unsigned int size = bio->bi_iter.bi_size; > > + > > + if (size & (bs - 1)) > > + return false; > > + > > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > + return false; > > + > > + return true; > > +} > > + > > /** > > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > > * @bio: The bio describing the location in memory and on the device. > > @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) > > } > > } > > > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > > + goto end_io; > > + > > if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > > bio_clear_polled(bio); > > Where is this IO coming from? The normal block level dio has checks. And > in fact they are expensive... If we add this one, then we should be able > to kill the block/fops.c checks, no? I think Most of fs code should send good bio since I didn't trigger it in xfstests. But we still have md, dm, bcache and target code which build bio in their way. The reported issue is from device mapper integrity code. Yes, all (offset & size) alignment in fops.c shouldn't be needed any more. Thanks, Ming
On Thu, Mar 21, 2024 at 06:01:41PM +0100, Mikulas Patocka wrote: > > > On Thu, 21 Mar 2024, Mike Snitzer wrote: > > > On Thu, Mar 21 2024 at 9:16P -0400, > > Ming Lei <ming.lei@redhat.com> wrote: > > > > > For any bio with data, its start sector and size have to be aligned with > > > the queue's logical block size. > > > > > > This rule is obvious, but there is still user which may send unaligned > > > bio to block layer, and it is observed that dm-integrity can do that, > > > and cause double free of driver's dma meta buffer. > > > > > > So failfast unaligned bio from submit_bio_noacct() for avoiding more > > > troubles. > > > > > > Cc: Mikulas Patocka <mpatocka@redhat.com> > > > Cc: Mike Snitzer <snitzer@kernel.org> > > > Signed-off-by: Ming Lei <ming.lei@redhat.com> > > > --- > > > block/blk-core.c | 17 +++++++++++++++++ > > > 1 file changed, 17 insertions(+) > > > > > > diff --git a/block/blk-core.c b/block/blk-core.c > > > index a16b5abdbbf5..b1a10187ef74 100644 > > > --- a/block/blk-core.c > > > +++ b/block/blk-core.c > > > @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) > > > __submit_bio_noacct(bio); > > > } > > > > > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > > +{ > > > + unsigned int bs = q->limits.logical_block_size; > > > + unsigned int size = bio->bi_iter.bi_size; > > > + > > > + if (size & (bs - 1)) > > > + return false; > > > + > > > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > > + return false; > > > + > > > + return true; > > > +} > > I would change it to > > if (unlikely(((bi_iter.bi_sector | bio_sectors(bio)) & ((queue_logical_block_size(q) >> 9) - 1)) != 0)) > return false; What if bio->bi_iter.bi_size isn't aligned with 512? The above check can't find that at all. > > > > /** > > > * submit_bio_noacct - re-submit a bio to the block device layer for I/O > > > * @bio: The bio describing the location in memory and on the device. > > > @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) > > > } > > > } > > > > > > + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) > > > + goto end_io; > > > + > > > if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) > > > bio_clear_polled(bio); > > > > > > -- > > > 2.41.0 > > > > > > > > > > This check would really help more quickly find buggy code, but it > > would be unfortunate for these extra checks to be required in > > production. It feels like this is the type of check that should be > > wrapped by a debug CONFIG option (so only debug kernels have it). > > > > Do we already have an appropriate CONFIG option to use? > > > > Mike > > But then, the system would crash with the config option being 'n' and > return an error with the config option being 'y' - which would be > unfortunate. Yes, the check is basically zero-cost, not necessary to add config to make things more complicated. Thanks, Ming
On Fri, Mar 22, 2024 at 10:08:11AM +0800, Ming Lei wrote: > On Thu, Mar 21, 2024 at 06:01:41PM +0100, Mikulas Patocka wrote: > > I would change it to > > > > if (unlikely(((bi_iter.bi_sector | bio_sectors(bio)) & ((queue_logical_block_size(q) >> 9) - 1)) != 0)) > > return false; > > What if bio->bi_iter.bi_size isn't aligned with 512? The above check > can't find that at all. Shouldn't that mean this check doesn't apply to REQ_OP_DRV_IN/OUT? Those ops don't necessarily imply any alignment requirements. It may not matter here since it looks like all existing users go through blk_execute_rq() instead of submit_bio(), but there are other checks for DRV_IN/OUT in this path, so I guess it is supposed to be supported?
On Fri, 22 Mar 2024, Ming Lei wrote: > On Thu, Mar 21, 2024 at 06:01:41PM +0100, Mikulas Patocka wrote: > > > > > > > > +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) > > > > +{ > > > > + unsigned int bs = q->limits.logical_block_size; > > > > + unsigned int size = bio->bi_iter.bi_size; > > > > + > > > > + if (size & (bs - 1)) > > > > + return false; > > > > + > > > > + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) > > > > + return false; > > > > + > > > > + return true; > > > > +} > > > > I would change it to > > > > if (unlikely(((bi_iter.bi_sector | bio_sectors(bio)) & ((queue_logical_block_size(q) >> 9) - 1)) != 0)) > > return false; > > What if bio->bi_iter.bi_size isn't aligned with 512? The above check > can't find that at all. Could it happen that bi_size is not aligned to 512? I haven't seen such a bio yet. If you have seen it, say where was it created. Mikulas
On Thu, Mar 21, 2024 at 08:39:06PM -0600, Keith Busch wrote: > On Fri, Mar 22, 2024 at 10:08:11AM +0800, Ming Lei wrote: > > On Thu, Mar 21, 2024 at 06:01:41PM +0100, Mikulas Patocka wrote: > > > I would change it to > > > > > > if (unlikely(((bi_iter.bi_sector | bio_sectors(bio)) & ((queue_logical_block_size(q) >> 9) - 1)) != 0)) > > > return false; > > > > What if bio->bi_iter.bi_size isn't aligned with 512? The above check > > can't find that at all. > > Shouldn't that mean this check doesn't apply to REQ_OP_DRV_IN/OUT? For REQ_OP_DRV_IN/OUT, only the real user IO command may need the check, and others needn't this check, maybe they don't use .bi_sector or .bi_size at all. > Those ops don't necessarily imply any alignment requirements. It may not > matter here since it looks like all existing users go through > blk_execute_rq() instead of submit_bio(), but there are other checks for > DRV_IN/OUT in this path, so I guess it is supposed to be supported? This patch focuses on FS bio, and passthough request case is more complicated, and we even don't have central entry for real user pt IO request only. Thanks, Ming
diff --git a/block/blk-core.c b/block/blk-core.c index a16b5abdbbf5..b1a10187ef74 100644 --- a/block/blk-core.c +++ b/block/blk-core.c @@ -729,6 +729,20 @@ void submit_bio_noacct_nocheck(struct bio *bio) __submit_bio_noacct(bio); } +static bool bio_check_alignment(struct bio *bio, struct request_queue *q) +{ + unsigned int bs = q->limits.logical_block_size; + unsigned int size = bio->bi_iter.bi_size; + + if (size & (bs - 1)) + return false; + + if (size && ((bio->bi_iter.bi_sector << SECTOR_SHIFT) & (bs - 1))) + return false; + + return true; +} + /** * submit_bio_noacct - re-submit a bio to the block device layer for I/O * @bio: The bio describing the location in memory and on the device. @@ -780,6 +794,9 @@ void submit_bio_noacct(struct bio *bio) } } + if (WARN_ON_ONCE(!bio_check_alignment(bio, q))) + goto end_io; + if (!test_bit(QUEUE_FLAG_POLL, &q->queue_flags)) bio_clear_polled(bio);
For any bio with data, its start sector and size have to be aligned with the queue's logical block size. This rule is obvious, but there is still user which may send unaligned bio to block layer, and it is observed that dm-integrity can do that, and cause double free of driver's dma meta buffer. So failfast unaligned bio from submit_bio_noacct() for avoiding more troubles. Cc: Mikulas Patocka <mpatocka@redhat.com> Cc: Mike Snitzer <snitzer@kernel.org> Signed-off-by: Ming Lei <ming.lei@redhat.com> --- block/blk-core.c | 17 +++++++++++++++++ 1 file changed, 17 insertions(+)