Message ID | 20230406145050.49914-1-zhouchengming@bytedance.com (mailing list archive) |
---|---|
Headers | show |
Series | blk-cgroup: some cleanup | expand |
On Thu, 06 Apr 2023 22:50:47 +0800, Chengming Zhou wrote: > These are some cleanup patches of blk-cgroup. Thanks for review. > > v2: > - Add Acked tags from Tejun. > > Chengming Zhou (3): > block, bfq: remove BFQ_WEIGHT_LEGACY_DFL > blk-cgroup: delete cpd_bind_fn of blkcg_policy > blk-cgroup: delete cpd_init_fn of blkcg_policy > > [...] Applied, thanks! [1/3] block, bfq: remove BFQ_WEIGHT_LEGACY_DFL commit: e9f2f3f590289681c71d0137d4e5e88421f934c6 [2/3] blk-cgroup: delete cpd_bind_fn of blkcg_policy commit: d1023165eef83dace7cc6299af904f26272baaca [3/3] blk-cgroup: delete cpd_init_fn of blkcg_policy commit: 650e2cb50f3fc45d0585ed8609db9519f6c9bcd8 Best regards,
On 4/6/23 07:50, Chengming Zhou wrote:
> These are some cleanup patches of blk-cgroup. Thanks for review.
With these patches applied, my kernel test VM crashes during boot. The
following crash disappears if I revert these patches:
BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240
Read of size 8 at addr 0000000000000518 by task blkid/5885
Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS
1.16.0-debian-1.16.0-5 04/01/2014
Call Trace:
dump_stack_lvl+0x4a/0x80
print_report+0x21e/0x260
kasan_report+0xc2/0xf0
__asan_load8+0x69/0x90
bio_associate_blkg_from_css+0x83/0x240
bfq_bio_bfqg+0xce/0x120 [bfq]
bfq_bic_update_cgroup+0x2f/0x3c0 [bfq]
bfq_init_rq+0x1e8/0xb10 [bfq]
bfq_insert_request.isra.0+0xa3/0x420 [bfq]
bfq_insert_requests+0xca/0xf0 [bfq]
blk_mq_dispatch_rq_list+0x4c0/0xb00
__blk_mq_sched_dispatch_requests+0x15e/0x200
blk_mq_sched_dispatch_requests+0x8b/0xc0
__blk_mq_run_hw_queue+0x3ff/0x500
__blk_mq_delay_run_hw_queue+0x23a/0x300
blk_mq_run_hw_queue+0x14e/0x350
blk_mq_sched_insert_request+0x181/0x1f0
blk_execute_rq+0xf4/0x300
scsi_execute_cmd+0x23e/0x350
sr_do_ioctl+0x173/0x3d0 [sr_mod]
sr_packet+0x60/0x90 [sr_mod]
cdrom_get_track_info.constprop.0+0x125/0x170 [cdrom]
cdrom_get_last_written+0x1d4/0x2d0 [cdrom]
mmc_ioctl_cdrom_last_written+0x85/0x120 [cdrom]
mmc_ioctl+0x10b/0x1d0 [cdrom]
cdrom_ioctl+0xa66/0x1270 [cdrom]
sr_block_ioctl+0xee/0x130 [sr_mod]
blkdev_ioctl+0x1bb/0x3f0
__x64_sys_ioctl+0xc7/0xe0
do_syscall_64+0x34/0x80
entry_SYSCALL_64_after_hwframe+0x46/0xb0
Bart.
On 4/7/23 12:41 PM, Bart Van Assche wrote: > On 4/6/23 07:50, Chengming Zhou wrote: >> These are some cleanup patches of blk-cgroup. Thanks for review. > > With these patches applied, my kernel test VM crashes during boot. The following crash disappears if I revert these patches: > > BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 Would be useful in the report to know where that is, as it doesn't include the code output.
On 4/7/23 11:44, Jens Axboe wrote: > On 4/7/23 12:41 PM, Bart Van Assche wrote: >> On 4/6/23 07:50, Chengming Zhou wrote: >>> These are some cleanup patches of blk-cgroup. Thanks for review. >> >> With these patches applied, my kernel test VM crashes during boot. The following crash disappears if I revert these patches: >> >> BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 > > Would be useful in the report to know where that is, as it doesn't include > the code output. Hi Jens, This is what gdb tells me about the crash address: $ gdb vmlinux (gdb) list *(bio_associate_blkg_from_css+0x83) 0xffffffff81856923 is in bio_associate_blkg_from_css (./include/linux/blkdev.h:865). 860 int iocb_bio_iopoll(struct kiocb *kiocb, struct io_comp_batch *iob, 861 unsigned int flags); 862 863 static inline struct request_queue *bdev_get_queue(struct block_device *bdev) 864 { 865 return bdev->bd_queue; /* this is never NULL */ 866 } 867 868 /* Helper to convert BLK_ZONE_ZONE_XXX to its string format XXX */ 869 const char *blk_zone_cond_str(enum blk_zone_cond zone_cond); Thanks, Bart.
On 2023/4/8 02:41, Bart Van Assche wrote: > On 4/6/23 07:50, Chengming Zhou wrote: >> These are some cleanup patches of blk-cgroup. Thanks for review. > > With these patches applied, my kernel test VM crashes during boot. The following crash disappears if I revert these patches: Thanks for the report. I will try to reproduce it first and look into this today. > > BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 > Read of size 8 at addr 0000000000000518 by task blkid/5885 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 > Call Trace: > dump_stack_lvl+0x4a/0x80 > print_report+0x21e/0x260 > kasan_report+0xc2/0xf0 > __asan_load8+0x69/0x90 > bio_associate_blkg_from_css+0x83/0x240 > bfq_bio_bfqg+0xce/0x120 [bfq] > bfq_bic_update_cgroup+0x2f/0x3c0 [bfq] > bfq_init_rq+0x1e8/0xb10 [bfq] > bfq_insert_request.isra.0+0xa3/0x420 [bfq] > bfq_insert_requests+0xca/0xf0 [bfq] > blk_mq_dispatch_rq_list+0x4c0/0xb00 > __blk_mq_sched_dispatch_requests+0x15e/0x200 > blk_mq_sched_dispatch_requests+0x8b/0xc0 > __blk_mq_run_hw_queue+0x3ff/0x500 > __blk_mq_delay_run_hw_queue+0x23a/0x300 > blk_mq_run_hw_queue+0x14e/0x350 > blk_mq_sched_insert_request+0x181/0x1f0 > blk_execute_rq+0xf4/0x300 > scsi_execute_cmd+0x23e/0x350 > sr_do_ioctl+0x173/0x3d0 [sr_mod] > sr_packet+0x60/0x90 [sr_mod] > cdrom_get_track_info.constprop.0+0x125/0x170 [cdrom] > cdrom_get_last_written+0x1d4/0x2d0 [cdrom] > mmc_ioctl_cdrom_last_written+0x85/0x120 [cdrom] > mmc_ioctl+0x10b/0x1d0 [cdrom] > cdrom_ioctl+0xa66/0x1270 [cdrom] > sr_block_ioctl+0xee/0x130 [sr_mod] > blkdev_ioctl+0x1bb/0x3f0 > __x64_sys_ioctl+0xc7/0xe0 > do_syscall_64+0x34/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > Bart.
On 2023/4/8 11:37, Chengming Zhou wrote: > On 2023/4/8 02:41, Bart Van Assche wrote: >> On 4/6/23 07:50, Chengming Zhou wrote: >>> These are some cleanup patches of blk-cgroup. Thanks for review. >> >> With these patches applied, my kernel test VM crashes during boot. The following crash disappears if I revert these patches: > > Thanks for the report. > I will try to reproduce it first and look into this today. Hi Bart, I tried a few times to reproduce it, but still can't for now. Do you mind to share more details? I don't know how to specify bfq as the default scheduler for the device, since "elevator=" is not working anymore. Do you use something like sysfsutils to set sysfs config during boot? So I just boot the qemu VM, set bfq as the scheduler for the root device, run "blkid", but no bug shows. Then I use sysfsutils to set bfq as the default scheduler during reboot, the VM still no bug shows. I will continue to look into this issue and review related code. BTW, my codebase is e134c93f788f ("Add linux-next specific files for 20230406") with these three patches applied. Thanks. > >> >> BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 >> Read of size 8 at addr 0000000000000518 by task blkid/5885 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS 1.16.0-debian-1.16.0-5 04/01/2014 >> Call Trace: >> dump_stack_lvl+0x4a/0x80 >> print_report+0x21e/0x260 >> kasan_report+0xc2/0xf0 >> __asan_load8+0x69/0x90 >> bio_associate_blkg_from_css+0x83/0x240 >> bfq_bio_bfqg+0xce/0x120 [bfq] >> bfq_bic_update_cgroup+0x2f/0x3c0 [bfq] >> bfq_init_rq+0x1e8/0xb10 [bfq] >> bfq_insert_request.isra.0+0xa3/0x420 [bfq] >> bfq_insert_requests+0xca/0xf0 [bfq] >> blk_mq_dispatch_rq_list+0x4c0/0xb00 >> __blk_mq_sched_dispatch_requests+0x15e/0x200 >> blk_mq_sched_dispatch_requests+0x8b/0xc0 >> __blk_mq_run_hw_queue+0x3ff/0x500 >> __blk_mq_delay_run_hw_queue+0x23a/0x300 >> blk_mq_run_hw_queue+0x14e/0x350 >> blk_mq_sched_insert_request+0x181/0x1f0 >> blk_execute_rq+0xf4/0x300 >> scsi_execute_cmd+0x23e/0x350 >> sr_do_ioctl+0x173/0x3d0 [sr_mod] >> sr_packet+0x60/0x90 [sr_mod] >> cdrom_get_track_info.constprop.0+0x125/0x170 [cdrom] >> cdrom_get_last_written+0x1d4/0x2d0 [cdrom] >> mmc_ioctl_cdrom_last_written+0x85/0x120 [cdrom] >> mmc_ioctl+0x10b/0x1d0 [cdrom] >> cdrom_ioctl+0xa66/0x1270 [cdrom] >> sr_block_ioctl+0xee/0x130 [sr_mod] >> blkdev_ioctl+0x1bb/0x3f0 >> __x64_sys_ioctl+0xc7/0xe0 >> do_syscall_64+0x34/0x80 >> entry_SYSCALL_64_after_hwframe+0x46/0xb0 >> >> Bart.
Hi, Bart 在 2023/04/08 2:41, Bart Van Assche 写道: > On 4/6/23 07:50, Chengming Zhou wrote: >> These are some cleanup patches of blk-cgroup. Thanks for review. > > With these patches applied, my kernel test VM crashes during boot. The > following crash disappears if I revert these patches: > > BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 > Read of size 8 at addr 0000000000000518 by task blkid/5885 > Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS > 1.16.0-debian-1.16.0-5 04/01/2014 > Call Trace: > dump_stack_lvl+0x4a/0x80 > print_report+0x21e/0x260 > kasan_report+0xc2/0xf0 > __asan_load8+0x69/0x90 > bio_associate_blkg_from_css+0x83/0x240 > bfq_bio_bfqg+0xce/0x120 [bfq] > bfq_bic_update_cgroup+0x2f/0x3c0 [bfq] > bfq_init_rq+0x1e8/0xb10 [bfq] > bfq_insert_request.isra.0+0xa3/0x420 [bfq] > bfq_insert_requests+0xca/0xf0 [bfq] > blk_mq_dispatch_rq_list+0x4c0/0xb00 I found this call trace quite weird, I can't figure out how bfq_insert_requests can be called from blk_mq_dispatch_rq_list, can you show the add2line result? Thanks, Kuai > __blk_mq_sched_dispatch_requests+0x15e/0x200 > blk_mq_sched_dispatch_requests+0x8b/0xc0 > __blk_mq_run_hw_queue+0x3ff/0x500 > __blk_mq_delay_run_hw_queue+0x23a/0x300 > blk_mq_run_hw_queue+0x14e/0x350 > blk_mq_sched_insert_request+0x181/0x1f0 > blk_execute_rq+0xf4/0x300 > scsi_execute_cmd+0x23e/0x350 > sr_do_ioctl+0x173/0x3d0 [sr_mod] > sr_packet+0x60/0x90 [sr_mod] > cdrom_get_track_info.constprop.0+0x125/0x170 [cdrom] > cdrom_get_last_written+0x1d4/0x2d0 [cdrom] > mmc_ioctl_cdrom_last_written+0x85/0x120 [cdrom] > mmc_ioctl+0x10b/0x1d0 [cdrom] > cdrom_ioctl+0xa66/0x1270 [cdrom] > sr_block_ioctl+0xee/0x130 [sr_mod] > blkdev_ioctl+0x1bb/0x3f0 > __x64_sys_ioctl+0xc7/0xe0 > do_syscall_64+0x34/0x80 > entry_SYSCALL_64_after_hwframe+0x46/0xb0 > > Bart. > > . >
On 4/9/23 18:57, Yu Kuai wrote: > Hi, Bart > > 在 2023/04/08 2:41, Bart Van Assche 写道: >> On 4/6/23 07:50, Chengming Zhou wrote: >>> These are some cleanup patches of blk-cgroup. Thanks for review. >> >> With these patches applied, my kernel test VM crashes during boot. The >> following crash disappears if I revert these patches: >> >> BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 >> Read of size 8 at addr 0000000000000518 by task blkid/5885 >> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >> 1.16.0-debian-1.16.0-5 04/01/2014 >> Call Trace: >> dump_stack_lvl+0x4a/0x80 >> print_report+0x21e/0x260 >> kasan_report+0xc2/0xf0 >> __asan_load8+0x69/0x90 >> bio_associate_blkg_from_css+0x83/0x240 >> bfq_bio_bfqg+0xce/0x120 [bfq] >> bfq_bic_update_cgroup+0x2f/0x3c0 [bfq] >> bfq_init_rq+0x1e8/0xb10 [bfq] >> bfq_insert_request.isra.0+0xa3/0x420 [bfq] >> bfq_insert_requests+0xca/0xf0 [bfq] >> blk_mq_dispatch_rq_list+0x4c0/0xb00 > > I found this call trace quite weird, I can't figure out how > bfq_insert_requests can be called from blk_mq_dispatch_rq_list, > can you show the add2line result? Hi Kuai, Thanks for having taken a look. I ran my tests with this patch series on top of Jens' for-next branch: "[PATCH v2 00/12] Submit zoned writes in order" (https://lore.kernel.org/linux-block/20230407235822.1672286-1-bvanassche@acm.org/T/#m4c8c7ca5a5627510dc1709847b11589e8791b6b6). I will take a closer look and see which of these two patch series needs to be adjusted. Bart.
On 4/10/23 11:47, Bart Van Assche wrote: > On 4/9/23 18:57, Yu Kuai wrote: >> Hi, Bart >> >> 在 2023/04/08 2:41, Bart Van Assche 写道: >>> On 4/6/23 07:50, Chengming Zhou wrote: >>>> These are some cleanup patches of blk-cgroup. Thanks for review. >>> >>> With these patches applied, my kernel test VM crashes during boot. >>> The following crash disappears if I revert these patches: >>> >>> BUG: KASAN: null-ptr-deref in bio_associate_blkg_from_css+0x83/0x240 >>> Read of size 8 at addr 0000000000000518 by task blkid/5885 >>> Hardware name: QEMU Standard PC (Q35 + ICH9, 2009), BIOS >>> 1.16.0-debian-1.16.0-5 04/01/2014 >>> Call Trace: >>> dump_stack_lvl+0x4a/0x80 >>> print_report+0x21e/0x260 >>> kasan_report+0xc2/0xf0 >>> __asan_load8+0x69/0x90 >>> bio_associate_blkg_from_css+0x83/0x240 >>> bfq_bio_bfqg+0xce/0x120 [bfq] >>> bfq_bic_update_cgroup+0x2f/0x3c0 [bfq] >>> bfq_init_rq+0x1e8/0xb10 [bfq] >>> bfq_insert_request.isra.0+0xa3/0x420 [bfq] >>> bfq_insert_requests+0xca/0xf0 [bfq] >>> blk_mq_dispatch_rq_list+0x4c0/0xb00 >> >> I found this call trace quite weird, I can't figure out how >> bfq_insert_requests can be called from blk_mq_dispatch_rq_list, >> can you show the add2line result? > > Hi Kuai, > > Thanks for having taken a look. I ran my tests with this patch series on > top of Jens' for-next branch: "[PATCH v2 00/12] Submit zoned writes in > order" > (https://lore.kernel.org/linux-block/20230407235822.1672286-1-bvanassche@acm.org/T/#m4c8c7ca5a5627510dc1709847b11589e8791b6b6). I will take a closer look and see which of these two patch series needs to be adjusted. (replying to my own e-mail) I think I found the root cause: bio->bi_bdev is NULL for pass-through requests and BFQ doesn't like it that bio->bi_bdev is NULL. I will make sure that pass-through requests are not submitted to any I/O scheduler. Thanks, Bart.