Message ID | 20230813152325.3017343-1-chengming.zhou@linux.dev (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] blk-mq: release scheduler resource when request complete | expand |
On 8/13/23 9:23 AM, chengming.zhou@linux.dev wrote: > From: Chengming Zhou <zhouchengming@bytedance.com> > > Chuck reported [1] a IO hang problem on NFS exports that reside on SATA > devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal > submission path for post-flush requests"). > > We analysed the IO hang problem, found there are two postflush requests > are waiting for each other. > > The first postflush request completed the REQ_FSEQ_DATA sequence, so go to > the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but > failed to blk_kick_flush() because of the second postflush request which > is inflight waiting in scheduler queue. > > The second postflush waiting in scheduler queue can't be dispatched because > the first postflush hasn't released scheduler resource even though it has > completed by itself. > > Fix it by releasing scheduler resource when the first postflush request > completed, so the second postflush can be dispatched and completed, then > make blk_kick_flush() succeed. Thanks, applied with a bit of commit message massaging and adding a comment for the newly added WARN_ON_ONCE().
On 2023/8/13 23:34, Jens Axboe wrote: > On 8/13/23 9:23 AM, chengming.zhou@linux.dev wrote: >> From: Chengming Zhou <zhouchengming@bytedance.com> >> >> Chuck reported [1] a IO hang problem on NFS exports that reside on SATA >> devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal >> submission path for post-flush requests"). >> >> We analysed the IO hang problem, found there are two postflush requests >> are waiting for each other. >> >> The first postflush request completed the REQ_FSEQ_DATA sequence, so go to >> the REQ_FSEQ_POSTFLUSH sequence and added in the flush pending list, but >> failed to blk_kick_flush() because of the second postflush request which >> is inflight waiting in scheduler queue. >> >> The second postflush waiting in scheduler queue can't be dispatched because >> the first postflush hasn't released scheduler resource even though it has >> completed by itself. >> >> Fix it by releasing scheduler resource when the first postflush request >> completed, so the second postflush can be dispatched and completed, then >> make blk_kick_flush() succeed. > > Thanks, applied with a bit of commit message massaging and adding a > comment for the newly added WARN_ON_ONCE(). > Ok, thanks!
On Sun, 13 Aug 2023 23:23:25 +0800, chengming.zhou@linux.dev wrote: > Chuck reported [1] a IO hang problem on NFS exports that reside on SATA > devices and bisected to commit 615939a2ae73 ("blk-mq: defer to the normal > submission path for post-flush requests"). > > We analysed the IO hang problem, found there are two postflush requests > are waiting for each other. > > [...] Applied, thanks! [1/1] blk-mq: release scheduler resource when request complete (no commit info) Best regards,
Hello, kernel test robot noticed "WARNING:at_block/mq-deadline.c:#dd_exit_sched" on: commit: 567b5a4f2f8b9e9fe97d5f8dca266d95c734ae91 ("[PATCH v2] blk-mq: release scheduler resource when request complete") url: https://github.com/intel-lab-lkp/linux/commits/chengming-zhou-linux-dev/blk-mq-release-scheduler-resource-when-request-complete/20230813-232513 base: https://git.kernel.org/cgit/linux/kernel/git/axboe/linux-block.git for-next patch link: https://lore.kernel.org/all/20230813152325.3017343-1-chengming.zhou@linux.dev/ patch subject: [PATCH v2] blk-mq: release scheduler resource when request complete this also caused xfstests.generic.704.fail in testcase: xfstests version: xfstests-x86_64-bb8af9c-1_20230807 with following parameters: disk: 4HDD fs: ext4 test: generic-704 compiler: gcc-12 test machine: 4 threads Intel(R) Core(TM) i5-6500 CPU @ 3.20GHz (Skylake) with 32G memory (please refer to attached dmesg/kmsg for entire log/backtrace) If you fix the issue in a separate patch/commit (i.e. not just a new version of the same patch/commit), kindly add following tags | Reported-by: kernel test robot <oliver.sang@intel.com> | Closes: https://lore.kernel.org/oe-lkp/202308172100.8ce4b853-oliver.sang@intel.com [ 222.617396][ T2216] ------------[ cut here ]------------ [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) [ 222.638218][ T2216] Modules linked in: scsi_debug(-) dm_mod btrfs blake2b_generic xor intel_rapl_msr raid6_pq intel_rapl_common zstd_compress libcrc32c x86_pkg_temp_thermal intel_powerclamp coretemp kvm_intel sd_mod t10_pi kvm crc64_rocksoft_generic crc64_rocksoft crc64 irqbypass sg crct10dif_pclmul crc32_pclmul crc32c_intel ghash_clmulni_intel sha512_ssse3 ipmi_devintf ipmi_msghandler mei_wdt i915 rapl drm_buddy intel_cstate intel_gtt wmi_bmof drm_display_helper intel_uncore drm_kms_helper ahci libahci mei_me libata intel_pch_thermal ttm mei video wmi intel_pmc_core acpi_pad drm fuse ip_tables [ 222.690679][ T2216] CPU: 0 PID: 2216 Comm: modprobe Tainted: G I 6.5.0-rc5-00190-g567b5a4f2f8b #1 [ 222.700878][ T2216] Hardware name: Dell Inc. OptiPlex 7040/0Y7WYT, BIOS 1.1.1 10/07/2015 [ 222.708919][ T2216] RIP: 0010:dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) [ 222.714130][ T2216] Code: 89 da 8b 4b 04 48 c1 ea 03 0f b6 04 02 84 c0 74 08 3c 03 0f 8e c4 00 00 00 8b 13 44 89 e6 48 c7 c7 20 d3 e2 83 e8 e2 d1 08 ff <0f> 0b e9 f4 fe ff ff 0f 0b e9 78 fe ff ff 48 89 ee 48 c7 c7 20 cb All code ======== 0: 89 da mov %ebx,%edx 2: 8b 4b 04 mov 0x4(%rbx),%ecx 5: 48 c1 ea 03 shr $0x3,%rdx 9: 0f b6 04 02 movzbl (%rdx,%rax,1),%eax d: 84 c0 test %al,%al f: 74 08 je 0x19 11: 3c 03 cmp $0x3,%al 13: 0f 8e c4 00 00 00 jle 0xdd 19: 8b 13 mov (%rbx),%edx 1b: 44 89 e6 mov %r12d,%esi 1e: 48 c7 c7 20 d3 e2 83 mov $0xffffffff83e2d320,%rdi 25: e8 e2 d1 08 ff callq 0xffffffffff08d20c 2a:* 0f 0b ud2 <-- trapping instruction 2c: e9 f4 fe ff ff jmpq 0xffffffffffffff25 31: 0f 0b ud2 33: e9 78 fe ff ff jmpq 0xfffffffffffffeb0 38: 48 89 ee mov %rbp,%rsi 3b: 48 rex.W 3c: c7 .byte 0xc7 3d: c7 (bad) 3e: 20 cb and %cl,%bl Code starting with the faulting instruction =========================================== 0: 0f 0b ud2 2: e9 f4 fe ff ff jmpq 0xfffffffffffffefb 7: 0f 0b ud2 9: e9 78 fe ff ff jmpq 0xfffffffffffffe86 e: 48 89 ee mov %rbp,%rsi 11: 48 rex.W 12: c7 .byte 0xc7 13: c7 (bad) 14: 20 cb and %cl,%bl [ 222.733468][ T2216] RSP: 0018:ffffc9000092f858 EFLAGS: 00010282 [ 222.739367][ T2216] RAX: 0000000000000000 RBX: ffff88819e33e0b0 RCX: ffffffff821fb87e [ 222.747166][ T2216] RDX: 1ffff110fe1c57b8 RSI: 0000000000000008 RDI: ffffc9000092f610 [ 222.754975][ T2216] RBP: 0000000000000116 R08: 0000000000000001 R09: fffff52000125ec2 [ 222.762782][ T2216] R10: ffffc9000092f617 R11: 0000000000000001 R12: 0000000000000001 [ 222.770591][ T2216] R13: dffffc0000000000 R14: ffff88819e33e000 R15: ffff88819e33e0bc [ 222.778386][ T2216] FS: 00007fd9aea5e540(0000) GS:ffff8887f0e00000(0000) knlGS:0000000000000000 [ 222.787144][ T2216] CS: 0010 DS: 0000 ES: 0000 CR0: 0000000080050033 [ 222.793561][ T2216] CR2: 00007fbe00432bb0 CR3: 0000000126856005 CR4: 00000000003706f0 [ 222.801359][ T2216] DR0: 0000000000000000 DR1: 0000000000000000 DR2: 0000000000000000 [ 222.809156][ T2216] DR3: 0000000000000000 DR6: 00000000fffe0ff0 DR7: 0000000000000400 [ 222.816970][ T2216] Call Trace: [ 222.820093][ T2216] <TASK> [ 222.822872][ T2216] ? __warn (kernel/panic.c:673) [ 222.826794][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) [ 222.831403][ T2216] ? report_bug (lib/bug.c:180 lib/bug.c:219) [ 222.835752][ T2216] ? handle_bug (arch/x86/kernel/traps.c:324) [ 222.839912][ T2216] ? exc_invalid_op (arch/x86/kernel/traps.c:345 (discriminator 1)) [ 222.844433][ T2216] ? asm_exc_invalid_op (arch/x86/include/asm/idtentry.h:568) [ 222.849286][ T2216] ? llist_add_batch (lib/llist.c:33 (discriminator 14)) [ 222.853980][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) [ 222.858586][ T2216] ? dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) [ 222.863193][ T2216] blk_mq_exit_sched (block/blk-mq-sched.c:550) [ 222.867973][ T2216] ? blk_mq_sched_free_rqs (block/blk-mq-sched.c:527) [ 222.873257][ T2216] ? __mutex_lock_slowpath (kernel/locking/mutex.c:282) [ 222.878384][ T2216] ? ioc_clear_queue (include/linux/list.h:292 block/blk-ioc.c:174) [ 222.883076][ T2216] ? blk_mq_poll (block/blk-mq.c:4832) [ 222.887324][ T2216] elevator_exit (block/elevator.c:168) [ 222.891572][ T2216] del_gendisk (block/genhd.c:701) [ 222.895834][ T2216] ? diskstats_show (block/genhd.c:631) [ 222.900526][ T2216] ? __pm_runtime_resume (drivers/base/power/runtime.c:1174) [ 222.905568][ T2216] sd_remove (drivers/scsi/sd.c:3736) sd_mod [ 222.910436][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293) [ 222.916335][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581) [ 222.921225][ T2216] device_del (drivers/base/core.c:3815) [ 222.925388][ T2216] ? attribute_container_device_trigger (drivers/base/attribute_container.c:357) [ 222.931807][ T2216] ? __device_link_del (drivers/base/core.c:3769) [ 222.936760][ T2216] ? attribute_container_device_trigger_safe (drivers/base/attribute_container.c:357) [ 222.943612][ T2216] __scsi_remove_device (drivers/scsi/scsi_sysfs.c:1489) [ 222.948651][ T2216] scsi_forget_host (drivers/scsi/scsi_scan.c:1988) [ 222.953243][ T2216] scsi_remove_host (drivers/scsi/hosts.c:182) [ 222.957838][ T2216] sdebug_driver_remove (drivers/scsi/scsi_debug.c:7841) scsi_debug [ 222.963933][ T2216] ? kernfs_remove_by_name_ns (fs/kernfs/dir.c:1679) [ 222.969492][ T2216] device_release_driver_internal (drivers/base/dd.c:1272 drivers/base/dd.c:1293) [ 222.975392][ T2216] bus_remove_device (include/linux/kobject.h:191 drivers/base/base.h:73 drivers/base/bus.c:581) [ 222.980173][ T2216] device_del (drivers/base/core.c:3815) [ 222.984349][ T2216] ? __device_link_del (drivers/base/core.c:3769) [ 222.989299][ T2216] ? mutex_unlock (arch/x86/include/asm/atomic64_64.h:109 include/linux/atomic/atomic-arch-fallback.h:4303 include/linux/atomic/atomic-long.h:1499 include/linux/atomic/atomic-instrumented.h:4446 kernel/locking/mutex.c:181 kernel/locking/mutex.c:540) [ 222.993645][ T2216] ? __mutex_unlock_slowpath+0x2b0/0x2b0 [ 223.000160][ T2216] device_unregister (drivers/base/core.c:3732 drivers/base/core.c:3845) [ 223.004772][ T2216] sdebug_do_remove_host (drivers/scsi/scsi_debug.c:7299) scsi_debug [ 223.011039][ T2216] scsi_debug_exit (drivers/scsi/scsi_debug.c:7670) scsi_debug [ 223.016625][ T2216] __do_sys_delete_module+0x316/0x540 [ 223.022880][ T2216] ? module_flags (kernel/module/main.c:698) [ 223.027401][ T2216] ? task_work_cancel (kernel/task_work.c:147) [ 223.032081][ T2216] ? _raw_spin_lock (arch/x86/include/asm/atomic.h:115 include/linux/atomic/atomic-arch-fallback.h:2155 include/linux/atomic/atomic-instrumented.h:1296 include/asm-generic/qspinlock.h:111 include/linux/spinlock.h:187 include/linux/spinlock_api_smp.h:134 kernel/locking/spinlock.c:154) [ 223.036590][ T2216] ? exit_to_user_mode_loop (include/linux/sched.h:2337 include/linux/resume_user_mode.h:61 kernel/entry/common.c:171) [ 223.041890][ T2216] do_syscall_64 (arch/x86/entry/common.c:50 arch/x86/entry/common.c:80) [ 223.046154][ T2216] entry_SYSCALL_64_after_hwframe (arch/x86/entry/entry_64.S:120) [ 223.051867][ T2216] RIP: 0033:0x7fd9aeb83417 [ 223.056129][ T2216] Code: 73 01 c3 48 8b 0d 79 1a 0d 00 f7 d8 64 89 01 48 83 c8 ff c3 66 2e 0f 1f 84 00 00 00 00 00 0f 1f 44 00 00 b8 b0 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d 49 1a 0d 00 f7 d8 64 89 01 48 All code ======== 0: 73 01 jae 0x3 2: c3 retq 3: 48 8b 0d 79 1a 0d 00 mov 0xd1a79(%rip),%rcx # 0xd1a83 a: f7 d8 neg %eax c: 64 89 01 mov %eax,%fs:(%rcx) f: 48 83 c8 ff or $0xffffffffffffffff,%rax 13: c3 retq 14: 66 2e 0f 1f 84 00 00 nopw %cs:0x0(%rax,%rax,1) 1b: 00 00 00 1e: 0f 1f 44 00 00 nopl 0x0(%rax,%rax,1) 23: b8 b0 00 00 00 mov $0xb0,%eax 28: 0f 05 syscall 2a:* 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax <-- trapping instruction 30: 73 01 jae 0x33 32: c3 retq 33: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a83 3a: f7 d8 neg %eax 3c: 64 89 01 mov %eax,%fs:(%rcx) 3f: 48 rex.W Code starting with the faulting instruction =========================================== 0: 48 3d 01 f0 ff ff cmp $0xfffffffffffff001,%rax 6: 73 01 jae 0x9 8: c3 retq 9: 48 8b 0d 49 1a 0d 00 mov 0xd1a49(%rip),%rcx # 0xd1a59 10: f7 d8 neg %eax 12: 64 89 01 mov %eax,%fs:(%rcx) 15: 48 rex.W The kernel config and materials to reproduce are available at: https://download.01.org/0day-ci/archive/20230817/202308172100.8ce4b853-oliver.sang@intel.com
On 8/17/23 07:41, kernel test robot wrote: > [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 > [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) The above information shows that dd_inserted_request() has been called 276 times and also that dd_finish_request() has been called 278 times. Calling dd_finish_request() more than once per request breaks the code for priority handling since that code checks how many requests are pending per priority level by subtracting the number of completion calls from the number of insertion calls (see also dd_queued()). I think the above output indicates that this patch introduced a regression. Bart.
On 2023/8/17 22:50, Bart Van Assche wrote: > On 8/17/23 07:41, kernel test robot wrote: >> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 >> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) > > The above information shows that dd_inserted_request() has been called > 276 times and also that dd_finish_request() has been called 278 times. Thanks much for your help. This patch indeed introduced a regression, postflush requests will be completed twice, so here dd_finish_request() is more than dd_inserted_request(). diff --git a/block/blk-mq.c b/block/blk-mq.c index a8c63bef8ff1..7cd47ffc04ce 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq) { struct request_queue *q = rq->q; - if (rq->rq_flags & RQF_USE_SCHED) + if (rq->rq_flags & RQF_USE_SCHED) { q->elevator->type->ops.finish_request(rq); + rq->rq_flags &= ~RQF_USE_SCHED; + } } Clear RQF_USE_SCHED flag here should fix this problem, which should be ok since finish_request() is the last callback, this flag isn't needed anymore. Jens, should I send this diff as another patch or resend updated v3? Thanks. > Calling dd_finish_request() more than once per request breaks the code > for priority handling since that code checks how many requests are > pending per priority level by subtracting the number of completion calls > from the number of insertion calls (see also dd_queued()). I think the > above output indicates that this patch introduced a regression. > > Bart. >
On 2023/8/17 23:29, Chengming Zhou wrote: > On 2023/8/17 22:50, Bart Van Assche wrote: >> On 8/17/23 07:41, kernel test robot wrote: >>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 >>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) >> >> The above information shows that dd_inserted_request() has been called >> 276 times and also that dd_finish_request() has been called 278 times. > > Thanks much for your help. > > This patch indeed introduced a regression, postflush requests will be completed > twice, so here dd_finish_request() is more than dd_inserted_request(). > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index a8c63bef8ff1..7cd47ffc04ce 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq) > { > struct request_queue *q = rq->q; > > - if (rq->rq_flags & RQF_USE_SCHED) > + if (rq->rq_flags & RQF_USE_SCHED) { > q->elevator->type->ops.finish_request(rq); > + rq->rq_flags &= ~RQF_USE_SCHED; > + } > } > I just tried to run LKP and xfstests, firstly failed to run LKP on my server which seems to miss some dependencies. Then I ran xfstests successfully. But xfstests generic/704 always pass and no WARN in dmesg. (I don't know why, maybe my server settings are some different from the test robot.) So I try to reproduce it manually. Steps: ``` echo mq-deadline > /sys/block/sdb/queue/scheduler mkfs.ext4 /dev/sdb mount /dev/sdb /fs/sdb cd /fs/sdb stress-ng --symlink 4 --timeout 60 echo none > /sys/block/sdb/queue/scheduler ``` This way the WARNING in mq-deadline can be reproduced easily. Then retest with the diff, mq-deadline WARNING still happened... So there are still other requests which have RQF_USE_SCHED flag completed without being inserted into elevator. Will use some tracing and look again. Thanks. > > Clear RQF_USE_SCHED flag here should fix this problem, which should be ok > since finish_request() is the last callback, this flag isn't needed anymore. > > Jens, should I send this diff as another patch or resend updated v3? > > Thanks. > >> Calling dd_finish_request() more than once per request breaks the code >> for priority handling since that code checks how many requests are >> pending per priority level by subtracting the number of completion calls >> from the number of insertion calls (see also dd_queued()). I think the >> above output indicates that this patch introduced a regression. >> >> Bart. >>
On 8/17/23 9:29 AM, Chengming Zhou wrote: > On 2023/8/17 22:50, Bart Van Assche wrote: >> On 8/17/23 07:41, kernel test robot wrote: >>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 >>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) >> >> The above information shows that dd_inserted_request() has been called >> 276 times and also that dd_finish_request() has been called 278 times. > > Thanks much for your help. > > This patch indeed introduced a regression, postflush requests will be completed > twice, so here dd_finish_request() is more than dd_inserted_request(). > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index a8c63bef8ff1..7cd47ffc04ce 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq) > { > struct request_queue *q = rq->q; > > - if (rq->rq_flags & RQF_USE_SCHED) > + if (rq->rq_flags & RQF_USE_SCHED) { > q->elevator->type->ops.finish_request(rq); > + rq->rq_flags &= ~RQF_USE_SCHED; > + } > } > > > Clear RQF_USE_SCHED flag here should fix this problem, which should be ok > since finish_request() is the last callback, this flag isn't needed anymore. > > Jens, should I send this diff as another patch or resend updated v3? I don't think this is the right solution, it makes all kinds of assumptions on what that flag is and when it's safe to clear it. It's a very fragile fix, I think we need to do better than that.
On 8/17/23 11:20 AM, Jens Axboe wrote: > On 8/17/23 9:29 AM, Chengming Zhou wrote: >> On 2023/8/17 22:50, Bart Van Assche wrote: >>> On 8/17/23 07:41, kernel test robot wrote: >>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 >>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) >>> >>> The above information shows that dd_inserted_request() has been called >>> 276 times and also that dd_finish_request() has been called 278 times. >> >> Thanks much for your help. >> >> This patch indeed introduced a regression, postflush requests will be completed >> twice, so here dd_finish_request() is more than dd_inserted_request(). >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index a8c63bef8ff1..7cd47ffc04ce 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq) >> { >> struct request_queue *q = rq->q; >> >> - if (rq->rq_flags & RQF_USE_SCHED) >> + if (rq->rq_flags & RQF_USE_SCHED) { >> q->elevator->type->ops.finish_request(rq); >> + rq->rq_flags &= ~RQF_USE_SCHED; >> + } >> } >> >> >> Clear RQF_USE_SCHED flag here should fix this problem, which should be ok >> since finish_request() is the last callback, this flag isn't needed anymore. >> >> Jens, should I send this diff as another patch or resend updated v3? > > I don't think this is the right solution, it makes all kinds of > assumptions on what that flag is and when it's safe to clear it. It's a > very fragile fix, I think we need to do better than that. Well maybe this is actually fine, since we're freeing the request now anyway. I can fold it in the fix, I'll add a comment as well. If this is subtle enough that it caused this issue, we definitely should have a comment on why we're clearing this flag.
On 8/17/23 11:17 AM, Chengming Zhou wrote: > On 2023/8/17 23:29, Chengming Zhou wrote: >> On 2023/8/17 22:50, Bart Van Assche wrote: >>> On 8/17/23 07:41, kernel test robot wrote: >>>> [ 222.622837][ T2216] statistics for priority 1: i 276 m 0 d 276 c 278 >>>> [ 222.629307][ T2216] WARNING: CPU: 0 PID: 2216 at block/mq-deadline.c:680 dd_exit_sched (block/mq-deadline.c:680 (discriminator 3)) >>> >>> The above information shows that dd_inserted_request() has been called >>> 276 times and also that dd_finish_request() has been called 278 times. >> >> Thanks much for your help. >> >> This patch indeed introduced a regression, postflush requests will be completed >> twice, so here dd_finish_request() is more than dd_inserted_request(). >> >> diff --git a/block/blk-mq.c b/block/blk-mq.c >> index a8c63bef8ff1..7cd47ffc04ce 100644 >> --- a/block/blk-mq.c >> +++ b/block/blk-mq.c >> @@ -686,8 +686,10 @@ static void blk_mq_finish_request(struct request *rq) >> { >> struct request_queue *q = rq->q; >> >> - if (rq->rq_flags & RQF_USE_SCHED) >> + if (rq->rq_flags & RQF_USE_SCHED) { >> q->elevator->type->ops.finish_request(rq); >> + rq->rq_flags &= ~RQF_USE_SCHED; >> + } >> } >> > > I just tried to run LKP and xfstests, firstly failed to run LKP on my server > which seems to miss some dependencies. Then I ran xfstests successfully. > > But xfstests generic/704 always pass and no WARN in dmesg. (I don't know why, > maybe my server settings are some different from the test robot.) > > So I try to reproduce it manually. Steps: > > ``` > echo mq-deadline > /sys/block/sdb/queue/scheduler > > mkfs.ext4 /dev/sdb > mount /dev/sdb /fs/sdb > cd /fs/sdb > stress-ng --symlink 4 --timeout 60 > > echo none > /sys/block/sdb/queue/scheduler > ``` > > This way the WARNING in mq-deadline can be reproduced easily. > > Then retest with the diff, mq-deadline WARNING still happened... So there > are still other requests which have RQF_USE_SCHED flag completed without > being inserted into elevator. > > Will use some tracing and look again. Ah missed this, thanks for doing this testing. I'll wait for an update version. We can just fold in whatever change we need, and most likely just push the patch to next week rather than send off a pull request for this week. It's the only important thing pending on the block side.
diff --git a/block/blk-mq.c b/block/blk-mq.c index f14b8669ac69..a8c63bef8ff1 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -682,6 +682,14 @@ struct request *blk_mq_alloc_request_hctx(struct request_queue *q, } EXPORT_SYMBOL_GPL(blk_mq_alloc_request_hctx); +static void blk_mq_finish_request(struct request *rq) +{ + struct request_queue *q = rq->q; + + if (rq->rq_flags & RQF_USE_SCHED) + q->elevator->type->ops.finish_request(rq); +} + static void __blk_mq_free_request(struct request *rq) { struct request_queue *q = rq->q; @@ -708,10 +716,6 @@ void blk_mq_free_request(struct request *rq) { struct request_queue *q = rq->q; - if ((rq->rq_flags & RQF_USE_SCHED) && - q->elevator->type->ops.finish_request) - q->elevator->type->ops.finish_request(rq); - if (unlikely(laptop_mode && !blk_rq_is_passthrough(rq))) laptop_io_completion(q->disk->bdi); @@ -1021,6 +1025,8 @@ inline void __blk_mq_end_request(struct request *rq, blk_status_t error) if (blk_mq_need_time_stamp(rq)) __blk_mq_end_request_acct(rq, ktime_get_ns()); + blk_mq_finish_request(rq); + if (rq->end_io) { rq_qos_done(rq->q, rq); if (rq->end_io(rq, error) == RQ_END_IO_FREE) @@ -1075,6 +1081,8 @@ void blk_mq_end_request_batch(struct io_comp_batch *iob) if (iob->need_ts) __blk_mq_end_request_acct(rq, now); + blk_mq_finish_request(rq); + rq_qos_done(rq->q, rq); /* diff --git a/block/elevator.c b/block/elevator.c index 8400e303fbcb..ac2cb3814eac 100644 --- a/block/elevator.c +++ b/block/elevator.c @@ -499,6 +499,9 @@ void elv_unregister_queue(struct request_queue *q) int elv_register(struct elevator_type *e) { + if (WARN_ON_ONCE(!e->ops.finish_request)) + return -EINVAL; + /* insert_requests and dispatch_request are mandatory */ if (WARN_ON_ONCE(!e->ops.insert_requests || !e->ops.dispatch_request)) return -EINVAL;