Message ID | 20241016211154.2425403-6-bvanassche@acm.org (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | UFS driver fixes and cleanups | expand |
> Use blk_mq_quiesce_tagset() instead of ufshcd_scsi_block_requests() and > blk_mq_wait_quiesce_done(). Maybe also add a sentence indicating that this was the last caller of scsi_block_requests hence it can be removed. > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> Reviewed-by: Avri Altman <avri.altman@wdc.com> > --- > drivers/ufs/core/ufshcd.c | 19 +++---------------- > include/ufs/ufshcd.h | 2 -- > 2 files changed, 3 insertions(+), 18 deletions(-) > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index > ff1b0af74041..75e00e5b3f79 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -349,18 +349,6 @@ static void ufshcd_configure_wb(struct ufs_hba > *hba) > ufshcd_wb_toggle_buf_flush(hba, true); } > > -static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba) -{ > - if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt)) > - scsi_unblock_requests(hba->host); > -} > - > -static void ufshcd_scsi_block_requests(struct ufs_hba *hba) -{ > - if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1) > - scsi_block_requests(hba->host); > -} > - > static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned int > tag, > enum ufs_trace_str_t str_t) { @@ -6379,15 +6367,14 > @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba) > ufshcd_suspend_clkscaling(hba); > ufshcd_clk_scaling_allow(hba, false); > } > - ufshcd_scsi_block_requests(hba); > /* Wait for ongoing ufshcd_queuecommand() calls to finish. */ > - blk_mq_wait_quiesce_done(&hba->host->tag_set); > + blk_mq_quiesce_tagset(&hba->host->tag_set); > cancel_work_sync(&hba->eeh_work); } > > static void ufshcd_err_handling_unprepare(struct ufs_hba *hba) { > - ufshcd_scsi_unblock_requests(hba); > + blk_mq_unquiesce_tagset(&hba->host->tag_set); > ufshcd_release(hba); > if (ufshcd_is_clkscaling_supported(hba)) > ufshcd_clk_scaling_suspend(hba, false); @@ -10560,7 +10547,7 > @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, > unsigned int irq) > > /* Hold auto suspend until async scan completes */ > pm_runtime_get_sync(dev); > - atomic_set(&hba->scsi_block_reqs_cnt, 0); > + > /* > * We are assuming that device wasn't put in sleep/power-down > * state exclusively during the boot stage before kernel. > diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h index > a0b325a32aca..36bd91ff3593 100644 > --- a/include/ufs/ufshcd.h > +++ b/include/ufs/ufshcd.h > @@ -928,7 +928,6 @@ enum ufshcd_mcq_opr { > * @wb_mutex: used to serialize devfreq and sysfs write booster toggling > * @clk_scaling_lock: used to serialize device commands and clock scaling > * @desc_size: descriptor sizes reported by device > - * @scsi_block_reqs_cnt: reference counting for scsi block requests > * @bsg_dev: struct device associated with the BSG queue > * @bsg_queue: BSG queue associated with the UFS controller > * @rpm_dev_flush_recheck_work: used to suspend from RPM (runtime > power @@ -1089,7 +1088,6 @@ struct ufs_hba { > > struct mutex wb_mutex; > struct rw_semaphore clk_scaling_lock; > - atomic_t scsi_block_reqs_cnt; > > struct device bsg_dev; > struct request_queue *bsg_queue;
On 10/17/24 10:53 PM, Avri Altman wrote: > Maybe also add a sentence indicating that this was the last caller of > scsi_block_requests hence it can be removed. I will do that. Bart.
On Wed, 2024-10-16 at 14:11 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > Use blk_mq_quiesce_tagset() instead of ufshcd_scsi_block_requests() > and blk_mq_wait_quiesce_done(). > > Signed-off-by: Bart Van Assche <bvanassche@acm.org> > Hi Bart, Using blk_mq_quiesce_tagset instead of ufshcd_scsi_block_requests could cause issues. After the patch below was merged, Mediatek received three cases of IO hang. 77691af484e2 ("scsi: ufs: core: Quiesce request queues before checking pending cmds") I think this patch might need to be reverted first. Here is backtrace of IO hang. ppid=3952 pid=3952 D cpu=6 prio=120 wait=188s kworker/u16:0 vmlinux __synchronize_srcu() + 216 </proc/self/cwd/common/kernel/rcu/srcutree.c:1386> vmlinux synchronize_srcu() + 276 </proc/self/cwd/common/kernel/rcu/srcutree.c:0> vmlinux blk_mq_wait_quiesce_done() + 20 </proc/self/cwd/common/block/blk-mq.c:226> vmlinux blk_mq_quiesce_tagset() + 156 </proc/self/cwd/common/block/blk-mq.c:286> vmlinux ufshcd_clock_scaling_prepare(timeout_us=1000000) + 16 </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1276> vmlinux ufshcd_devfreq_scale() + 52 </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1322> vmlinux ufshcd_devfreq_target() + 384 </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1440> vmlinux devfreq_set_target(flags=0) + 184 </proc/self/cwd/common/drivers/devfreq/devfreq.c:363> vmlinux devfreq_update_target(freq=0) + 296 </proc/self/cwd/common/drivers/devfreq/devfreq.c:429> vmlinux update_devfreq() + 8 </proc/self/cwd/common/drivers/devfreq/devfreq.c:444> vmlinux devfreq_monitor() + 48 </proc/self/cwd/common/drivers/devfreq/devfreq.c:460> vmlinux process_one_work() + 476 </proc/self/cwd/common/kernel/workqueue.c:2643> vmlinux process_scheduled_works() + 580 </proc/self/cwd/common/kernel/workqueue.c:2717> vmlinux worker_thread() + 576 </proc/self/cwd/common/kernel/workqueue.c:2798> vmlinux kthread() + 272 </proc/self/cwd/common/kernel/kthread.c:388> vmlinux 0xFFFFFFE239A164EC() </proc/self/cwd/common/arch/arm64/kernel/entry.S:846> Thanks Peter
On 10/21/24 2:43 AM, Peter Wang (王信友) wrote: > Using blk_mq_quiesce_tagset instead of ufshcd_scsi_block_requests > could cause issues. After the patch below was merged, Mediatek > received three cases of IO hang. > 77691af484e2 ("scsi: ufs: core: Quiesce request queues before checking > pending cmds") > I think this patch might need to be reverted first. > > Here is backtrace of IO hang. > ppid=3952 pid=3952 D cpu=6 prio=120 wait=188s kworker/u16:0 > vmlinux __synchronize_srcu() + 216 > </proc/self/cwd/common/kernel/rcu/srcutree.c:1386> > vmlinux synchronize_srcu() + 276 > </proc/self/cwd/common/kernel/rcu/srcutree.c:0> > vmlinux blk_mq_wait_quiesce_done() + 20 > </proc/self/cwd/common/block/blk-mq.c:226> > vmlinux blk_mq_quiesce_tagset() + 156 > </proc/self/cwd/common/block/blk-mq.c:286> > vmlinux ufshcd_clock_scaling_prepare(timeout_us=1000000) + 16 > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1276> > vmlinux ufshcd_devfreq_scale() + 52 > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1322> > vmlinux ufshcd_devfreq_target() + 384 > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1440> > vmlinux devfreq_set_target(flags=0) + 184 > </proc/self/cwd/common/drivers/devfreq/devfreq.c:363> > vmlinux devfreq_update_target(freq=0) + 296 > </proc/self/cwd/common/drivers/devfreq/devfreq.c:429> > vmlinux update_devfreq() + 8 > </proc/self/cwd/common/drivers/devfreq/devfreq.c:444> > vmlinux devfreq_monitor() + 48 > </proc/self/cwd/common/drivers/devfreq/devfreq.c:460> > vmlinux process_one_work() + 476 > </proc/self/cwd/common/kernel/workqueue.c:2643> > vmlinux process_scheduled_works() + 580 > </proc/self/cwd/common/kernel/workqueue.c:2717> > vmlinux worker_thread() + 576 > </proc/self/cwd/common/kernel/workqueue.c:2798> > vmlinux kthread() + 272 > </proc/self/cwd/common/kernel/kthread.c:388> > vmlinux 0xFFFFFFE239A164EC() > </proc/self/cwd/common/arch/arm64/kernel/entry.S:846> Hi Peter, Thank you very much for having reported this hang early. Would it be possible for you to test the patch below on top of this patch series? I think the root cause of the hang that you reported is in the block layer. Thanks, Bart. diff --git a/block/blk-mq.c b/block/blk-mq.c index 7b02188feed5..7482e682deca 100644 --- a/block/blk-mq.c +++ b/block/blk-mq.c @@ -283,8 +283,9 @@ void blk_mq_quiesce_tagset(struct blk_mq_tag_set *set) if (!blk_queue_skip_tagset_quiesce(q)) blk_mq_quiesce_queue_nowait(q); } - blk_mq_wait_quiesce_done(set); mutex_unlock(&set->tag_list_lock); + + blk_mq_wait_quiesce_done(set); } EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset);
On Mon, 2024-10-21 at 13:41 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 10/21/24 2:43 AM, Peter Wang (王信友) wrote: > > Using blk_mq_quiesce_tagset instead of ufshcd_scsi_block_requests > > could cause issues. After the patch below was merged, Mediatek > > received three cases of IO hang. > > 77691af484e2 ("scsi: ufs: core: Quiesce request queues before > checking > > pending cmds") > > I think this patch might need to be reverted first. > > > > Here is backtrace of IO hang. > > ppid=3952 pid=3952 D cpu=6 prio=120 wait=188s kworker/u16:0 > > vmlinux __synchronize_srcu() + 216 > > </proc/self/cwd/common/kernel/rcu/srcutree.c:1386> > > vmlinux synchronize_srcu() + 276 > > </proc/self/cwd/common/kernel/rcu/srcutree.c:0> > > vmlinux blk_mq_wait_quiesce_done() + 20 > > </proc/self/cwd/common/block/blk-mq.c:226> > > vmlinux blk_mq_quiesce_tagset() + 156 > > </proc/self/cwd/common/block/blk-mq.c:286> > > vmlinux ufshcd_clock_scaling_prepare(timeout_us=1000000) + 16 > > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1276> > > vmlinux ufshcd_devfreq_scale() + 52 > > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1322> > > vmlinux ufshcd_devfreq_target() + 384 > > </proc/self/cwd/common/drivers/ufs/core/ufshcd.c:1440> > > vmlinux devfreq_set_target(flags=0) + 184 > > </proc/self/cwd/common/drivers/devfreq/devfreq.c:363> > > vmlinux devfreq_update_target(freq=0) + 296 > > </proc/self/cwd/common/drivers/devfreq/devfreq.c:429> > > vmlinux update_devfreq() + 8 > > </proc/self/cwd/common/drivers/devfreq/devfreq.c:444> > > vmlinux devfreq_monitor() + 48 > > </proc/self/cwd/common/drivers/devfreq/devfreq.c:460> > > vmlinux process_one_work() + 476 > > </proc/self/cwd/common/kernel/workqueue.c:2643> > > vmlinux process_scheduled_works() + 580 > > </proc/self/cwd/common/kernel/workqueue.c:2717> > > vmlinux worker_thread() + 576 > > </proc/self/cwd/common/kernel/workqueue.c:2798> > > vmlinux kthread() + 272 > > </proc/self/cwd/common/kernel/kthread.c:388> > > vmlinux 0xFFFFFFE239A164EC() > > </proc/self/cwd/common/arch/arm64/kernel/entry.S:846> > > Hi Peter, > > Thank you very much for having reported this hang early. Would it be > possible for you to test the patch below on top of this patch series? > I think the root cause of the hang that you reported is in the block > layer. > > Thanks, > > Bart. > > > diff --git a/block/blk-mq.c b/block/blk-mq.c > index 7b02188feed5..7482e682deca 100644 > --- a/block/blk-mq.c > +++ b/block/blk-mq.c > @@ -283,8 +283,9 @@ void blk_mq_quiesce_tagset(struct blk_mq_tag_set > *set) > if (!blk_queue_skip_tagset_quiesce(q)) > blk_mq_quiesce_queue_nowait(q); > } > -blk_mq_wait_quiesce_done(set); > mutex_unlock(&set->tag_list_lock); > + > +blk_mq_wait_quiesce_done(set); > } > EXPORT_SYMBOL_GPL(blk_mq_quiesce_tagset); > > Hi Bart, We can test this patch, but because the low probability of the issue reproduce rate, I'm concerned that not encountering it during testing doesn't necessarily mean the problem has been truly resolved. However, from my understanding, this patch should be able to resolve the deadlock. Thanks Peter
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index ff1b0af74041..75e00e5b3f79 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -349,18 +349,6 @@ static void ufshcd_configure_wb(struct ufs_hba *hba) ufshcd_wb_toggle_buf_flush(hba, true); } -static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba) -{ - if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt)) - scsi_unblock_requests(hba->host); -} - -static void ufshcd_scsi_block_requests(struct ufs_hba *hba) -{ - if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1) - scsi_block_requests(hba->host); -} - static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned int tag, enum ufs_trace_str_t str_t) { @@ -6379,15 +6367,14 @@ static void ufshcd_err_handling_prepare(struct ufs_hba *hba) ufshcd_suspend_clkscaling(hba); ufshcd_clk_scaling_allow(hba, false); } - ufshcd_scsi_block_requests(hba); /* Wait for ongoing ufshcd_queuecommand() calls to finish. */ - blk_mq_wait_quiesce_done(&hba->host->tag_set); + blk_mq_quiesce_tagset(&hba->host->tag_set); cancel_work_sync(&hba->eeh_work); } static void ufshcd_err_handling_unprepare(struct ufs_hba *hba) { - ufshcd_scsi_unblock_requests(hba); + blk_mq_unquiesce_tagset(&hba->host->tag_set); ufshcd_release(hba); if (ufshcd_is_clkscaling_supported(hba)) ufshcd_clk_scaling_suspend(hba, false); @@ -10560,7 +10547,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Hold auto suspend until async scan completes */ pm_runtime_get_sync(dev); - atomic_set(&hba->scsi_block_reqs_cnt, 0); + /* * We are assuming that device wasn't put in sleep/power-down * state exclusively during the boot stage before kernel. diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h index a0b325a32aca..36bd91ff3593 100644 --- a/include/ufs/ufshcd.h +++ b/include/ufs/ufshcd.h @@ -928,7 +928,6 @@ enum ufshcd_mcq_opr { * @wb_mutex: used to serialize devfreq and sysfs write booster toggling * @clk_scaling_lock: used to serialize device commands and clock scaling * @desc_size: descriptor sizes reported by device - * @scsi_block_reqs_cnt: reference counting for scsi block requests * @bsg_dev: struct device associated with the BSG queue * @bsg_queue: BSG queue associated with the UFS controller * @rpm_dev_flush_recheck_work: used to suspend from RPM (runtime power @@ -1089,7 +1088,6 @@ struct ufs_hba { struct mutex wb_mutex; struct rw_semaphore clk_scaling_lock; - atomic_t scsi_block_reqs_cnt; struct device bsg_dev; struct request_queue *bsg_queue;
Use blk_mq_quiesce_tagset() instead of ufshcd_scsi_block_requests() and blk_mq_wait_quiesce_done(). Signed-off-by: Bart Van Assche <bvanassche@acm.org> --- drivers/ufs/core/ufshcd.c | 19 +++---------------- include/ufs/ufshcd.h | 2 -- 2 files changed, 3 insertions(+), 18 deletions(-)