Message ID | 20230502090018.169275-1-mcgrof@kernel.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] block: dio: immediately fail when count < logical block size | expand |
On 5/2/23 11:00, Luis Chamberlain wrote: > When using direct IO of say 4k on a 32k physical block size device > we crash. The amount of data requested must match the minimum IO supported > by the device but instead we take it for a ride right now and try to fail > later after checking alignments. > > Use the logical block size to ensure the data passed on matches our minimum > supported. > > Without this we end up in a crash below: > > kernel BUG at lib/iov_iter.c:999! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 4 PID: 949 Comm: fio Not tainted 6.3.0-large-block-20230426-dirty #28 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 43.513057] RIP: 0010:iov_iter_revert.part.0+0x16e/0x170 > Code: f9 40 a2 63 af 74 07 03 56 08 89 d8 29 d0 89 45 08 44 89 6d 20 <etc> > RSP: 0018:ffffaa52006cfc60 EFLAGS: 00010246 > RAX: 0000000000000016 RBX: 0000000000000016 RCX: 0000000000000000 > RDX: 0000000000000004 RSI: 0000000000000006 RDI: ffffaa52006cfd08 > RBP: ffffaa52006cfd08 R08: 0000000000000000 R09: ffffaa52006cfb40 > R10: 0000000000000003 R11: ffffffffafcc21e8 R12: 0000000000004000 > R13: 0000000000003fea R14: ffff9de3d7565e00 R15: ffff9de3c1f68600 > FS: 00007f8bfe726c40(0000) GS:ffff9de43bd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f8bf5eadd68 CR3: 0000000102c76001 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > <TASK> > blkdev_direct_write+0xf0/0x160 > blkdev_write_iter+0x11b/0x230 > io_write+0x10c/0x420 > ? kmem_cache_alloc_bulk+0x2a1/0x410 > ? fget+0x79/0xb0 > io_issue_sqe+0x60/0x3b0 > ? io_prep_rw+0x5a/0x190 > io_submit_sqes+0x1e6/0x640 > __do_sys_io_uring_enter+0x54c/0xb90 > ? handle_mm_fault+0x9a/0x340 > ? preempt_count_add+0x47/0xa0 > ? up_read+0x37/0x70 > ? do_user_addr_fault+0x27c/0x780 > do_syscall_64+0x37/0x90 > entry_SYSCALL_64_after_hw > > The issue is we end up calling iov_iter_revert() at the end of > blkdev_direct_write() due to the writes not being valid and > being unaligned. We can fail twice, for on __blkdev_direct_IO_simple() > and later on __blkdev_direct_IO_async(). Something is askew with that reasoning. If the above were true, we would also crash if we were attempting a 512 byte direct I/O write on a 4k drive. And I'm reasonably sure that we don't. So where's the difference? Cheers, Hannes
On Tue, May 2, 2023 at 2:11 AM Hannes Reinecke <hare@suse.de> wrote: > > On 5/2/23 11:00, Luis Chamberlain wrote: > > When using direct IO of say 4k on a 32k physical block size device > > we crash. The amount of data requested must match the minimum IO supported > > by the device but instead we take it for a ride right now and try to fail > > later after checking alignments. > > Something is askew with that reasoning. > If the above were true, we would also crash if we were attempting a 512 > byte direct I/O write on a 4k drive. We do fail, the question is if the math with iov_iter_revert() is right for the failed return value when unaligned. > And I'm reasonably sure that we don't. > > So where's the difference? I think a different return value used which busts iov_iter_revert(). Luis
Hi Luis, On Tue, May 02, 2023 at 02:00:18AM -0700, Luis Chamberlain wrote: > When using direct IO of say 4k on a 32k physical block size device > we crash. The amount of data requested must match the minimum IO supported I guess you meant 32K logical block size here, which isn't supported yet. Otherwise physical bs is supposed to not make difference here. > by the device but instead we take it for a ride right now and try to fail > later after checking alignments. > > Use the logical block size to ensure the data passed on matches our minimum > supported. > > Without this we end up in a crash below: > > kernel BUG at lib/iov_iter.c:999! > invalid opcode: 0000 [#1] PREEMPT SMP PTI > CPU: 4 PID: 949 Comm: fio Not tainted 6.3.0-large-block-20230426-dirty #28 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 43.513057] RIP: 0010:iov_iter_revert.part.0+0x16e/0x170 > Code: f9 40 a2 63 af 74 07 03 56 08 89 d8 29 d0 89 45 08 44 89 6d 20 <etc> > RSP: 0018:ffffaa52006cfc60 EFLAGS: 00010246 > RAX: 0000000000000016 RBX: 0000000000000016 RCX: 0000000000000000 > RDX: 0000000000000004 RSI: 0000000000000006 RDI: ffffaa52006cfd08 > RBP: ffffaa52006cfd08 R08: 0000000000000000 R09: ffffaa52006cfb40 > R10: 0000000000000003 R11: ffffffffafcc21e8 R12: 0000000000004000 > R13: 0000000000003fea R14: ffff9de3d7565e00 R15: ffff9de3c1f68600 > FS: 00007f8bfe726c40(0000) GS:ffff9de43bd00000(0000) knlGS:0000000000000000 > CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 > CR2: 00007f8bf5eadd68 CR3: 0000000102c76001 CR4: 0000000000770ee0 > DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 > DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 > PKRU: 55555554 > Call Trace: > <TASK> > blkdev_direct_write+0xf0/0x160 > blkdev_write_iter+0x11b/0x230 > io_write+0x10c/0x420 > ? kmem_cache_alloc_bulk+0x2a1/0x410 > ? fget+0x79/0xb0 > io_issue_sqe+0x60/0x3b0 > ? io_prep_rw+0x5a/0x190 > io_submit_sqes+0x1e6/0x640 > __do_sys_io_uring_enter+0x54c/0xb90 > ? handle_mm_fault+0x9a/0x340 > ? preempt_count_add+0x47/0xa0 > ? up_read+0x37/0x70 > ? do_user_addr_fault+0x27c/0x780 > do_syscall_64+0x37/0x90 > entry_SYSCALL_64_after_hw > > The issue is we end up calling iov_iter_revert() at the end of > blkdev_direct_write() due to the writes not being valid and Not see such function in linus tree? Which tree is this patch against? > being unaligned. We can fail twice, for on __blkdev_direct_IO_simple() > and later on __blkdev_direct_IO_async(). > > Instead of allowing such writes through and letting them in for > an attempt to ride, kill them early and fail early. > > Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> > --- > > But this can't be right, this should mean other failed unaligned > writes were possibly doing the wrong accounting also with > iov_iter_revert(). So this patch can't be right, right? > > block/fops.c | 5 +++++ > 1 file changed, 5 insertions(+) > > diff --git a/block/fops.c b/block/fops.c > index 524b8a828aad..c03c1bafcb1b 100644 > --- a/block/fops.c > +++ b/block/fops.c > @@ -583,9 +583,14 @@ static int blkdev_close(struct inode *inode, struct file *filp) > static ssize_t > blkdev_direct_write(struct kiocb *iocb, struct iov_iter *from) > { > + struct file *file = iocb->ki_filp; > + struct block_device *bdev = file->private_data; > size_t count = iov_iter_count(from); > ssize_t written; > > + if (count < bdev_logical_block_size(bdev)) > + return -EINVAL; > + > written = kiocb_invalidate_pages(iocb, count); I guess your test must be against out-of-tree patches, and the change isn't needed given blkdev_dio_unaligned() is called in __blkdev_direct_IO(), wrt. upstream repo. Thanks, Ming
diff --git a/block/fops.c b/block/fops.c index 524b8a828aad..c03c1bafcb1b 100644 --- a/block/fops.c +++ b/block/fops.c @@ -583,9 +583,14 @@ static int blkdev_close(struct inode *inode, struct file *filp) static ssize_t blkdev_direct_write(struct kiocb *iocb, struct iov_iter *from) { + struct file *file = iocb->ki_filp; + struct block_device *bdev = file->private_data; size_t count = iov_iter_count(from); ssize_t written; + if (count < bdev_logical_block_size(bdev)) + return -EINVAL; + written = kiocb_invalidate_pages(iocb, count); if (written) { if (written == -EBUSY)
When using direct IO of say 4k on a 32k physical block size device we crash. The amount of data requested must match the minimum IO supported by the device but instead we take it for a ride right now and try to fail later after checking alignments. Use the logical block size to ensure the data passed on matches our minimum supported. Without this we end up in a crash below: kernel BUG at lib/iov_iter.c:999! invalid opcode: 0000 [#1] PREEMPT SMP PTI CPU: 4 PID: 949 Comm: fio Not tainted 6.3.0-large-block-20230426-dirty #28 Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 [ 43.513057] RIP: 0010:iov_iter_revert.part.0+0x16e/0x170 Code: f9 40 a2 63 af 74 07 03 56 08 89 d8 29 d0 89 45 08 44 89 6d 20 <etc> RSP: 0018:ffffaa52006cfc60 EFLAGS: 00010246 RAX: 0000000000000016 RBX: 0000000000000016 RCX: 0000000000000000 RDX: 0000000000000004 RSI: 0000000000000006 RDI: ffffaa52006cfd08 RBP: ffffaa52006cfd08 R08: 0000000000000000 R09: ffffaa52006cfb40 R10: 0000000000000003 R11: ffffffffafcc21e8 R12: 0000000000004000 R13: 0000000000003fea R14: ffff9de3d7565e00 R15: ffff9de3c1f68600 FS: 00007f8bfe726c40(0000) GS:ffff9de43bd00000(0000) knlGS:0000000000000000 CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 CR2: 00007f8bf5eadd68 CR3: 0000000102c76001 CR4: 0000000000770ee0 DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 PKRU: 55555554 Call Trace: <TASK> blkdev_direct_write+0xf0/0x160 blkdev_write_iter+0x11b/0x230 io_write+0x10c/0x420 ? kmem_cache_alloc_bulk+0x2a1/0x410 ? fget+0x79/0xb0 io_issue_sqe+0x60/0x3b0 ? io_prep_rw+0x5a/0x190 io_submit_sqes+0x1e6/0x640 __do_sys_io_uring_enter+0x54c/0xb90 ? handle_mm_fault+0x9a/0x340 ? preempt_count_add+0x47/0xa0 ? up_read+0x37/0x70 ? do_user_addr_fault+0x27c/0x780 do_syscall_64+0x37/0x90 entry_SYSCALL_64_after_hw The issue is we end up calling iov_iter_revert() at the end of blkdev_direct_write() due to the writes not being valid and being unaligned. We can fail twice, for on __blkdev_direct_IO_simple() and later on __blkdev_direct_IO_async(). Instead of allowing such writes through and letting them in for an attempt to ride, kill them early and fail early. Signed-off-by: Luis Chamberlain <mcgrof@kernel.org> --- But this can't be right, this should mean other failed unaligned writes were possibly doing the wrong accounting also with iov_iter_revert(). So this patch can't be right, right? block/fops.c | 5 +++++ 1 file changed, 5 insertions(+)