Message ID | 1717754818-39863-1-git-send-email-quic_ziqichen@quicinc.com (mailing list archive) |
---|---|
State | Accepted |
Headers | show |
Series | scsi: ufs: core: quiesce request queues before check pending cmds | expand |
On 6/7/24 04:06, Ziqi Chen wrote: > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index 21429ee..1afa862 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -1392,7 +1392,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) > * make sure that there are no outstanding requests when > * clock scaling is in progress > */ > - ufshcd_scsi_block_requests(hba); > + blk_mq_quiesce_tagset(&hba->host->tag_set); > mutex_lock(&hba->wb_mutex); > down_write(&hba->clk_scaling_lock); > > @@ -1401,7 +1401,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) > ret = -EBUSY; > up_write(&hba->clk_scaling_lock); > mutex_unlock(&hba->wb_mutex); > - ufshcd_scsi_unblock_requests(hba); > + blk_mq_unquiesce_tagset(&hba->host->tag_set); > goto out; > } > > @@ -1422,7 +1422,7 @@ static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int err, bool sc > > mutex_unlock(&hba->wb_mutex); > > - ufshcd_scsi_unblock_requests(hba); > + blk_mq_unquiesce_tagset(&hba->host->tag_set); > ufshcd_release(hba); > } Why to replace only those ufshcd_scsi_block_requests() / ufshcd_scsi_unblock_requests() calls? I don't think that it is ever safe to use these functions instead of blk_mq_quiesce_tagset() / blk_mq_unquiesce_tagset(). Please replace all ufshcd_scsi_block_requests() / ufshcd_scsi_unblock_requests() calls and remove the ufshcd_scsi_*block_requests() functions. Thanks, Bart.
On 6/7/2024 8:33 PM, Bart Van Assche wrote: > On 6/7/24 04:06, Ziqi Chen wrote: >> diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c >> index 21429ee..1afa862 100644 >> --- a/drivers/ufs/core/ufshcd.c >> +++ b/drivers/ufs/core/ufshcd.c >> @@ -1392,7 +1392,7 @@ static int ufshcd_clock_scaling_prepare(struct >> ufs_hba *hba, u64 timeout_us) >> * make sure that there are no outstanding requests when >> * clock scaling is in progress >> */ >> - ufshcd_scsi_block_requests(hba); >> + blk_mq_quiesce_tagset(&hba->host->tag_set); >> mutex_lock(&hba->wb_mutex); >> down_write(&hba->clk_scaling_lock); >> @@ -1401,7 +1401,7 @@ static int ufshcd_clock_scaling_prepare(struct >> ufs_hba *hba, u64 timeout_us) >> ret = -EBUSY; >> up_write(&hba->clk_scaling_lock); >> mutex_unlock(&hba->wb_mutex); >> - ufshcd_scsi_unblock_requests(hba); >> + blk_mq_unquiesce_tagset(&hba->host->tag_set); >> goto out; >> } >> @@ -1422,7 +1422,7 @@ static void >> ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int err, bool sc >> mutex_unlock(&hba->wb_mutex); >> - ufshcd_scsi_unblock_requests(hba); >> + blk_mq_unquiesce_tagset(&hba->host->tag_set); >> ufshcd_release(hba); >> } > > Why to replace only those ufshcd_scsi_block_requests() / > ufshcd_scsi_unblock_requests() calls? I don't think that it is ever safe to > use these functions instead of blk_mq_quiesce_tagset() / > blk_mq_unquiesce_tagset(). Please replace all > ufshcd_scsi_block_requests() / > ufshcd_scsi_unblock_requests() calls and remove the > ufshcd_scsi_*block_requests() functions. Hi Bart, Thank you for the review. This issue was not easy to debug, it took us more than 3 months to narrow down to the root cause. Our mutual customers and internal test teams had to pull in quite amount of resources to help narrow down the issue and test the fix. It is a key change to unblock customers from commercializing Android15, so we are pushed to upstream this fix ASAP. As for removing the rest calls to ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests(), I think you are right, but I am not quite sure because we haven't seen issue reported w.r.t those spots. If possible, we can co-work on this sometime later. Hope you can understand. Thanks, Ziqi > > Thanks, > > Bart. >
On 6/11/24 04:02, Ziqi Chen wrote: > As for removing the rest calls to ufshcd_scsi_block_requests() and > ufshcd_scsi_unblock_requests(), I think you are right, but I am not > quite sure because we haven't seen issue reported w.r.t those spots. If > possible, we can co-work on this sometime later. If you want I can work on the removal of the ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests() functions. Thanks, Bart.
On 6/7/24 03:06, Ziqi Chen wrote: > Fix this race condition by quiescing the request queues before calling > ufshcd_pending_cmds() so that block layer won't touch the budget map > when ufshcd_pending_cmds() is working on it. In addition, remove the > scsi layer blocking/unblocking to reduce redundancies and latencies. Reviewed-by: Bart Van Assche <bvanassche@acm.org>
On 6/11/2024 9:51 PM, Bart Van Assche wrote: > On 6/11/24 04:02, Ziqi Chen wrote: >> As for removing the rest calls to ufshcd_scsi_block_requests() and >> ufshcd_scsi_unblock_requests(), I think you are right, but I am not >> quite sure because we haven't seen issue reported w.r.t those spots. >> If possible, we can co-work on this sometime later. > > If you want I can work on the removal of the > ufshcd_scsi_block_requests() and ufshcd_scsi_unblock_requests() > functions. Thank you, Bart. I will help to review and give you support as possible as I can. BRs, Ziqi > > Thanks, > > Bart. >
On 6/7/24 3:06 AM, Ziqi Chen wrote: > Fix this race condition by quiescing the request queues before calling > ufshcd_pending_cmds() so that block layer won't touch the budget map > when ufshcd_pending_cmds() is working on it. In addition, remove the > scsi layer blocking/unblocking to reduce redundancies and latencies. Can you please help with testing whether the patch below would be a good alternative to your patch (compile-tested only)? Thanks, Bart. diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index aa00978c6c0e..1d981283b03c 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -332,14 +332,12 @@ static void ufshcd_configure_wb(struct ufs_hba *hba) static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba) { - if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt)) - scsi_unblock_requests(hba->host); + blk_mq_quiesce_tagset(&hba->host->tag_set); } static void ufshcd_scsi_block_requests(struct ufs_hba *hba) { - if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1) - scsi_block_requests(hba->host); + blk_mq_unquiesce_tagset(&hba->host->tag_set); } static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned int tag, @@ -10590,7 +10588,7 @@ int ufshcd_init(struct ufs_hba *hba, void __iomem *mmio_base, unsigned int irq) /* Hold auto suspend until async scan completes */ pm_runtime_get_sync(dev); - atomic_set(&hba->scsi_block_reqs_cnt, 0); + /* * We are assuming that device wasn't put in sleep/power-down * state exclusively during the boot stage before kernel. diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h index 443afb97a637..58705994fc46 100644 --- a/include/ufs/ufshcd.h +++ b/include/ufs/ufshcd.h @@ -889,7 +889,6 @@ enum ufshcd_mcq_opr { * @wb_mutex: used to serialize devfreq and sysfs write booster toggling * @clk_scaling_lock: used to serialize device commands and clock scaling * @desc_size: descriptor sizes reported by device - * @scsi_block_reqs_cnt: reference counting for scsi block requests * @bsg_dev: struct device associated with the BSG queue * @bsg_queue: BSG queue associated with the UFS controller * @rpm_dev_flush_recheck_work: used to suspend from RPM (runtime power @@ -1050,7 +1049,6 @@ struct ufs_hba { struct mutex wb_mutex; struct rw_semaphore clk_scaling_lock; - atomic_t scsi_block_reqs_cnt; struct device bsg_dev; struct request_queue *bsg_queue;
On 6/21/2024 4:57 AM, Bart Van Assche wrote: > On 6/7/24 3:06 AM, Ziqi Chen wrote: >> Fix this race condition by quiescing the request queues before calling >> ufshcd_pending_cmds() so that block layer won't touch the budget map >> when ufshcd_pending_cmds() is working on it. In addition, remove the >> scsi layer blocking/unblocking to reduce redundancies and latencies. > > Can you please help with testing whether the patch below would be a good > alternative to your patch (compile-tested only)? > > Thanks, > > Bart. Hi Bart, Compile-tested is OK, but I don't think it is a better alternative way. 1. Why do we need to call blk_mq_quiesce_tagset() into ufshcd_scsi_block_requests() instead directly replace all ufshcd_scsi_block_requests() with blk_mq_quiesce_tagset()? 2. This patch need to to do long-term stress test, I think many OEMs can't wait as it is a blocker issue for them. BRs Ziqi > > diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c > index aa00978c6c0e..1d981283b03c 100644 > --- a/drivers/ufs/core/ufshcd.c > +++ b/drivers/ufs/core/ufshcd.c > @@ -332,14 +332,12 @@ static void ufshcd_configure_wb(struct ufs_hba *hba) > > static void ufshcd_scsi_unblock_requests(struct ufs_hba *hba) > { > - if (atomic_dec_and_test(&hba->scsi_block_reqs_cnt)) > - scsi_unblock_requests(hba->host); > + blk_mq_quiesce_tagset(&hba->host->tag_set); > } > > static void ufshcd_scsi_block_requests(struct ufs_hba *hba) > { > - if (atomic_inc_return(&hba->scsi_block_reqs_cnt) == 1) > - scsi_block_requests(hba->host); > + blk_mq_unquiesce_tagset(&hba->host->tag_set); > } > > static void ufshcd_add_cmd_upiu_trace(struct ufs_hba *hba, unsigned > int tag, > @@ -10590,7 +10588,7 @@ int ufshcd_init(struct ufs_hba *hba, void > __iomem *mmio_base, unsigned int irq) > > /* Hold auto suspend until async scan completes */ > pm_runtime_get_sync(dev); > - atomic_set(&hba->scsi_block_reqs_cnt, 0); > + > /* > * We are assuming that device wasn't put in sleep/power-down > * state exclusively during the boot stage before kernel. > diff --git a/include/ufs/ufshcd.h b/include/ufs/ufshcd.h > index 443afb97a637..58705994fc46 100644 > --- a/include/ufs/ufshcd.h > +++ b/include/ufs/ufshcd.h > @@ -889,7 +889,6 @@ enum ufshcd_mcq_opr { > * @wb_mutex: used to serialize devfreq and sysfs write booster toggling > * @clk_scaling_lock: used to serialize device commands and clock scaling > * @desc_size: descriptor sizes reported by device > - * @scsi_block_reqs_cnt: reference counting for scsi block requests > * @bsg_dev: struct device associated with the BSG queue > * @bsg_queue: BSG queue associated with the UFS controller > * @rpm_dev_flush_recheck_work: used to suspend from RPM (runtime power > @@ -1050,7 +1049,6 @@ struct ufs_hba { > > struct mutex wb_mutex; > struct rw_semaphore clk_scaling_lock; > - atomic_t scsi_block_reqs_cnt; > > struct device bsg_dev; > struct request_queue *bsg_queue; >
On 6/24/24 2:56 AM, Ziqi Chen wrote: > 1. Why do we need to call blk_mq_quiesce_tagset() into > ufshcd_scsi_block_requests() instead directly replace all > ufshcd_scsi_block_requests() with blk_mq_quiesce_tagset()? Because ufshcd_scsi_block_requests() has more callers than the clock scaling code and because all callers of ufshcd_scsi_block_requests() should be fixed. > 2. This patch need to to do long-term stress test, I think many OEMs > can't wait as it is a blocker issue for them. Patch "scsi: ufs: core: Quiesce request queues before checking pending cmds" is already in Linus' master branch. I will rebase my patch on top of linux-next. Best regards, Bart.
On Mon, 2024-06-24 at 09:29 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/24/24 2:56 AM, Ziqi Chen wrote: > > 1. Why do we need to call blk_mq_quiesce_tagset() into > > ufshcd_scsi_block_requests() instead directly replace all > > ufshcd_scsi_block_requests() with blk_mq_quiesce_tagset()? > > Because ufshcd_scsi_block_requests() has more callers than the clock > scaling code and because all callers of ufshcd_scsi_block_requests() > should be fixed. > > > 2. This patch need to to do long-term stress test, I think many > OEMs > > can't wait as it is a blocker issue for them. > Patch "scsi: ufs: core: Quiesce request queues before checking > pending > cmds" is already in Linus' master branch. I will rebase my patch on > top > of linux-next. > > Best regards, > > Bart. Hi Bart, But ufshcd_scsi_block_requests usage is correct in SDR mode. So, I don't think it is ufshcd_scsi_block_requests bug. Actually. this bug is triggered by this patch. 8d077ede48c1 ("scsi: ufs: core: scsi: ufs: Optimize the command queueing code") Which after ufshcd_scsi_block_requests, ufs cannot make sure ongoing request is all completed by ufshcd_wait_for_doorbell_clr. That is means, it is ufshcd_wait_for_doorbell_clr bug. So, I think ufshcd_wait_for_doorbell_clr should be revise. Check tr_doorbell in SDR mode. (before 8d077ede48c1 do) Check each HWQ's are all empty in MCQ mode. (need think how to do) Make sure all requests is complete, and finish this function' job correctly. Or there still have a gap in ufshcd_wait_for_doorbell_clr. And someday somebody's patch may stepping into it in the future. Thanks. Peter
On 6/24/24 8:38 PM, Peter Wang (王信友) wrote: > But ufshcd_scsi_block_requests usage is correct in SDR mode. ufshcd_scsi_block_requests() uses scsi_block_requests(). It is almost never correct to use scsi_block_requests() in a blk-mq driver because scsi_block_requests() does not wait for ongoing request submission calls to complete. scsi_block_requests() is a legacy from the time when all request dispatching and queueing was protected by the SCSI host lock, a change that was made in 2010 or about 14 years ago. See also https://lore.kernel.org/linux-scsi/20101105002409.GA21714@havoc.gtf.org/ > So, I think ufshcd_wait_for_doorbell_clr should be revise. > Check tr_doorbell in SDR mode. (before 8d077ede48c1 do) > Check each HWQ's are all empty in MCQ mode. (need think how to do) > Make sure all requests is complete, and finish this function' job > correctly. > Or there still have a gap in ufshcd_wait_for_doorbell_clr. ufshcd_wait_for_doorbell_clr() should be removed and ufshcd_clock_scaling_prepare() should use blk_mq_freeze_*(). See also my patch "ufs: Simplify the clock scaling mechanism implementation" from 5 years ago (https://lore.kernel.org/linux-scsi/20191112173743.141503-5-bvanassche@acm.org/). Best regards, Bart.
On Tue, 2024-06-25 at 09:13 -0700, Bart Van Assche wrote: > > External email : Please do not click links or open attachments until > you have verified the sender or the content. > On 6/24/24 8:38 PM, Peter Wang (王信友) wrote: > > But ufshcd_scsi_block_requests usage is correct in SDR mode. > > ufshcd_scsi_block_requests() uses scsi_block_requests(). It is almost > never correct to use scsi_block_requests() in a blk-mq driver because > scsi_block_requests() does not wait for ongoing request submission > calls to complete. scsi_block_requests() is a legacy from the time > when > all request dispatching and queueing was protected by the SCSI host > lock, a change that was made in 2010 or about 14 years ago. See also > https://lore.kernel.org/linux-scsi/20101105002409.GA21714@havoc.gtf.org/ > > > So, I think ufshcd_wait_for_doorbell_clr should be revise. > > Check tr_doorbell in SDR mode. (before 8d077ede48c1 do) > > Check each HWQ's are all empty in MCQ mode. (need think how to do) > > Make sure all requests is complete, and finish this function' job > > correctly. > > Or there still have a gap in ufshcd_wait_for_doorbell_clr. > > ufshcd_wait_for_doorbell_clr() should be removed and > ufshcd_clock_scaling_prepare() should use blk_mq_freeze_*(). > See also my patch "ufs: Simplify the clock scaling mechanism > implementation" from 5 years ago > ( > https://lore.kernel.org/linux-scsi/20191112173743.141503-5-bvanassche@acm.org/ > ). > > Best regards, > > Bart. Hi Bart, Yes, remove ufshcd_wait_for_doorbell_clr() is more reasonable if this function cannot make sure all on-going request is done. Thanks. Peter
On 6/26/2024 12:13 AM, Bart Van Assche wrote: > On 6/24/24 8:38 PM, Peter Wang (王信友) wrote: >> But ufshcd_scsi_block_requests usage is correct in SDR mode. > > ufshcd_scsi_block_requests() uses scsi_block_requests(). It is almost > never correct to use scsi_block_requests() in a blk-mq driver because > scsi_block_requests() does not wait for ongoing request submission > calls to complete. scsi_block_requests() is a legacy from the time when > all request dispatching and queueing was protected by the SCSI host > lock, a change that was made in 2010 or about 14 years ago. See also > https://lore.kernel.org/linux-scsi/20101105002409.GA21714@havoc.gtf.org/ > >> So, I think ufshcd_wait_for_doorbell_clr should be revise. >> Check tr_doorbell in SDR mode. (before 8d077ede48c1 do) >> Check each HWQ's are all empty in MCQ mode. (need think how to do) >> Make sure all requests is complete, and finish this function' job >> correctly. >> Or there still have a gap in ufshcd_wait_for_doorbell_clr. > > ufshcd_wait_for_doorbell_clr() should be removed and > ufshcd_clock_scaling_prepare() should use blk_mq_freeze_*(). > See also my patch "ufs: Simplify the clock scaling mechanism > implementation" from 5 years ago > (https://lore.kernel.org/linux-scsi/20191112173743.141503-5-bvanassche@acm.org/). > The defect of blk_mq_freeze_*() is that it would bring in significant latency and performance regression. I don't think it is what many people want to see. BRs, Ziqi > Best regards, > > Bart.
On 6/26/24 6:18 AM, Ziqi Chen wrote: > On 6/26/2024 12:13 AM, Bart Van Assche wrote: >> ufshcd_wait_for_doorbell_clr() should be removed and >> ufshcd_clock_scaling_prepare() should use blk_mq_freeze_*(). >> See also my patch "ufs: Simplify the clock scaling mechanism >> implementation" from 5 years ago >> (https://lore.kernel.org/linux-scsi/20191112173743.141503-5-bvanassche@acm.org/). > > The defect of blk_mq_freeze_*() is that it would bring in significant > latency and performance regression. I don't think it is what many people > want to see. This can be solved by inserting a synchronize_srcu_expedited() call between the blk_freeze_queue_start() and the blk_mq_freeze_queue_wait_timeout() calls. With that call added, the latency of the new code should be lower than that of the io_schedule_timeout(msecs_to_jiffies(20)) call in the current code. blk_mq_freeze_*() is only slow if the RCU grace period is not expedited. Bart.
diff --git a/drivers/ufs/core/ufshcd.c b/drivers/ufs/core/ufshcd.c index 21429ee..1afa862 100644 --- a/drivers/ufs/core/ufshcd.c +++ b/drivers/ufs/core/ufshcd.c @@ -1392,7 +1392,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) * make sure that there are no outstanding requests when * clock scaling is in progress */ - ufshcd_scsi_block_requests(hba); + blk_mq_quiesce_tagset(&hba->host->tag_set); mutex_lock(&hba->wb_mutex); down_write(&hba->clk_scaling_lock); @@ -1401,7 +1401,7 @@ static int ufshcd_clock_scaling_prepare(struct ufs_hba *hba, u64 timeout_us) ret = -EBUSY; up_write(&hba->clk_scaling_lock); mutex_unlock(&hba->wb_mutex); - ufshcd_scsi_unblock_requests(hba); + blk_mq_unquiesce_tagset(&hba->host->tag_set); goto out; } @@ -1422,7 +1422,7 @@ static void ufshcd_clock_scaling_unprepare(struct ufs_hba *hba, int err, bool sc mutex_unlock(&hba->wb_mutex); - ufshcd_scsi_unblock_requests(hba); + blk_mq_unquiesce_tagset(&hba->host->tag_set); ufshcd_release(hba); }