Message ID | d0c6c22455811e9f0eda01f9bc70d1398b51b2bd.1585160616.git.asutoshd@codeaurora.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | UFS Clock Scaling fixes and enhancements | expand |
On Wed, Mar 25, 2020 at 12:29 PM Asutosh Das <asutoshd@codeaurora.org> wrote: > > Currently, the frequency that devfreq provides the > driver to set always leads the clocks to be scaled up. > Hence, round the clock-rate to the nearest frequency > before deciding to scale. > > Also update the devfreq statistics of current frequency. > > Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> This change appears to cause issues for the Lenovo Miix 630, as identified by git bisect. On 5.6-final, My boot log looks normal. On 5.7-rc7, the Lenovo Miix 630 rarely boots, usually stuck in some kind of infinite printk loop. If I disable some of the UFS logging, I can capture this from the logs, as soon as UFS inits - [ 4.353860] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled interrupt 0x00000000 [ 4.359605] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled interrupt 0x00000000 [ 4.365412] ufshcd-qcom 1da4000.ufshc: ufshcd_check_errors: saved_err 0x4 saved_uic_err 0x2 [ 4.371121] ufshcd-qcom 1da4000.ufshc: hba->ufs_version = 0x210, hba->capabilities = 0x1587001f [ 4.376846] ufshcd-qcom 1da4000.ufshc: hba->outstanding_reqs = 0x100000, hba->outstanding_tasks = 0x0 [ 4.382636] ufshcd-qcom 1da4000.ufshc: last_hibern8_exit_tstamp at 0 us, hibern8_exit_cnt = 0 [ 4.388368] ufshcd-qcom 1da4000.ufshc: No record of pa_err [ 4.394001] ufshcd-qcom 1da4000.ufshc: dl_err[0] = 0x80000001 at 3873626 us [ 4.399577] ufshcd-qcom 1da4000.ufshc: No record of nl_err [ 4.405053] ufshcd-qcom 1da4000.ufshc: No record of tl_err [ 4.410464] ufshcd-qcom 1da4000.ufshc: No record of dme_err [ 4.415747] ufshcd-qcom 1da4000.ufshc: No record of auto_hibern8_err [ 4.420950] ufshcd-qcom 1da4000.ufshc: No record of fatal_err [ 4.426013] ufshcd-qcom 1da4000.ufshc: No record of link_startup_fail [ 4.430950] ufshcd-qcom 1da4000.ufshc: No record of resume_fail [ 4.435786] ufshcd-qcom 1da4000.ufshc: No record of suspend_fail [ 4.440538] ufshcd-qcom 1da4000.ufshc: dev_reset[0] = 0x0 at 3031009 us [ 4.445199] ufshcd-qcom 1da4000.ufshc: No record of host_reset [ 4.449750] ufshcd-qcom 1da4000.ufshc: No record of task_abort [ 4.454214] ufshcd-qcom 1da4000.ufshc: clk: core_clk, rate: 50000000 [ 4.458590] ufshcd-qcom 1da4000.ufshc: clk: core_clk_unipro, rate: 37500000 I don't understand how this change is breaking things, but it clearly is for me. What kind of additional data would be useful to get to the bottom of this?
Hi Jeffrey On 5/25/2020 3:19 PM, Jeffrey Hugo wrote: > On Wed, Mar 25, 2020 at 12:29 PM Asutosh Das <asutoshd@codeaurora.org> wrote: >> >> Currently, the frequency that devfreq provides the >> driver to set always leads the clocks to be scaled up. >> Hence, round the clock-rate to the nearest frequency >> before deciding to scale. >> >> Also update the devfreq statistics of current frequency. >> >> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> > > This change appears to cause issues for the Lenovo Miix 630, as > identified by git bisect. > Thanks for reporting this. > On 5.6-final, My boot log looks normal. On 5.7-rc7, the Lenovo Miix > 630 rarely boots, usually stuck in some kind of infinite printk loop. > > If I disable some of the UFS logging, I can capture this from the > logs, as soon as UFS inits - > > [ 4.353860] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled > interrupt 0x00000000 > [ 4.359605] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled > interrupt 0x00000000 > [ 4.365412] ufshcd-qcom 1da4000.ufshc: ufshcd_check_errors: > saved_err 0x4 saved_uic_err 0x2 > [ 4.371121] ufshcd-qcom 1da4000.ufshc: hba->ufs_version = 0x210, > hba->capabilities = 0x1587001f > [ 4.376846] ufshcd-qcom 1da4000.ufshc: hba->outstanding_reqs = > 0x100000, hba->outstanding_tasks = 0x0 > [ 4.382636] ufshcd-qcom 1da4000.ufshc: last_hibern8_exit_tstamp at > 0 us, hibern8_exit_cnt = 0 > [ 4.388368] ufshcd-qcom 1da4000.ufshc: No record of pa_err > [ 4.394001] ufshcd-qcom 1da4000.ufshc: dl_err[0] = 0x80000001 at 3873626 us > [ 4.399577] ufshcd-qcom 1da4000.ufshc: No record of nl_err > [ 4.405053] ufshcd-qcom 1da4000.ufshc: No record of tl_err > [ 4.410464] ufshcd-qcom 1da4000.ufshc: No record of dme_err > [ 4.415747] ufshcd-qcom 1da4000.ufshc: No record of auto_hibern8_err > [ 4.420950] ufshcd-qcom 1da4000.ufshc: No record of fatal_err > [ 4.426013] ufshcd-qcom 1da4000.ufshc: No record of link_startup_fail > [ 4.430950] ufshcd-qcom 1da4000.ufshc: No record of resume_fail > [ 4.435786] ufshcd-qcom 1da4000.ufshc: No record of suspend_fail > [ 4.440538] ufshcd-qcom 1da4000.ufshc: dev_reset[0] = 0x0 at 3031009 us > [ 4.445199] ufshcd-qcom 1da4000.ufshc: No record of host_reset > [ 4.449750] ufshcd-qcom 1da4000.ufshc: No record of task_abort > [ 4.454214] ufshcd-qcom 1da4000.ufshc: clk: core_clk, rate: 50000000 > [ 4.458590] ufshcd-qcom 1da4000.ufshc: clk: core_clk_unipro, rate: 37500000 > > I don't understand how this change is breaking things, but it clearly is for me. > > What kind of additional data would be useful to get to the bottom of this? > ++ Let me take a look and get back on this.
On Tue, May 26, 2020 at 11:17 AM Asutosh Das (asd) <asutoshd@codeaurora.org> wrote: > > Hi Jeffrey > On 5/25/2020 3:19 PM, Jeffrey Hugo wrote: > > On Wed, Mar 25, 2020 at 12:29 PM Asutosh Das <asutoshd@codeaurora.org> wrote: > >> > >> Currently, the frequency that devfreq provides the > >> driver to set always leads the clocks to be scaled up. > >> Hence, round the clock-rate to the nearest frequency > >> before deciding to scale. > >> > >> Also update the devfreq statistics of current frequency. > >> > >> Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> > > > > This change appears to cause issues for the Lenovo Miix 630, as > > identified by git bisect. > > > > Thanks for reporting this. > > > On 5.6-final, My boot log looks normal. On 5.7-rc7, the Lenovo Miix > > 630 rarely boots, usually stuck in some kind of infinite printk loop. > > > > If I disable some of the UFS logging, I can capture this from the > > logs, as soon as UFS inits - > > > > [ 4.353860] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled > > interrupt 0x00000000 > > [ 4.359605] ufshcd-qcom 1da4000.ufshc: ufshcd_intr: Unhandled > > interrupt 0x00000000 > > [ 4.365412] ufshcd-qcom 1da4000.ufshc: ufshcd_check_errors: > > saved_err 0x4 saved_uic_err 0x2 > > [ 4.371121] ufshcd-qcom 1da4000.ufshc: hba->ufs_version = 0x210, > > hba->capabilities = 0x1587001f > > [ 4.376846] ufshcd-qcom 1da4000.ufshc: hba->outstanding_reqs = > > 0x100000, hba->outstanding_tasks = 0x0 > > [ 4.382636] ufshcd-qcom 1da4000.ufshc: last_hibern8_exit_tstamp at > > 0 us, hibern8_exit_cnt = 0 > > [ 4.388368] ufshcd-qcom 1da4000.ufshc: No record of pa_err > > [ 4.394001] ufshcd-qcom 1da4000.ufshc: dl_err[0] = 0x80000001 at 3873626 us > > [ 4.399577] ufshcd-qcom 1da4000.ufshc: No record of nl_err > > [ 4.405053] ufshcd-qcom 1da4000.ufshc: No record of tl_err > > [ 4.410464] ufshcd-qcom 1da4000.ufshc: No record of dme_err > > [ 4.415747] ufshcd-qcom 1da4000.ufshc: No record of auto_hibern8_err > > [ 4.420950] ufshcd-qcom 1da4000.ufshc: No record of fatal_err > > [ 4.426013] ufshcd-qcom 1da4000.ufshc: No record of link_startup_fail > > [ 4.430950] ufshcd-qcom 1da4000.ufshc: No record of resume_fail > > [ 4.435786] ufshcd-qcom 1da4000.ufshc: No record of suspend_fail > > [ 4.440538] ufshcd-qcom 1da4000.ufshc: dev_reset[0] = 0x0 at 3031009 us > > [ 4.445199] ufshcd-qcom 1da4000.ufshc: No record of host_reset > > [ 4.449750] ufshcd-qcom 1da4000.ufshc: No record of task_abort > > [ 4.454214] ufshcd-qcom 1da4000.ufshc: clk: core_clk, rate: 50000000 > > [ 4.458590] ufshcd-qcom 1da4000.ufshc: clk: core_clk_unipro, rate: 37500000 > > > > I don't understand how this change is breaking things, but it clearly is for me. > > > > What kind of additional data would be useful to get to the bottom of this? > > It turns out that the unipro_core clock had no parent, and thus no ability to scale. Fixing that in GCC seems to have resolved this. I suspect the UFS clock scaling code attempted to scale the core clock, didn't check that the clock could change rates, and went along assuming the new rate was effective, thus putting the hardware into a bad state.
diff --git a/drivers/scsi/ufs/ufshcd.c b/drivers/scsi/ufs/ufshcd.c index 2a2a63b..7916c8b 100644 --- a/drivers/scsi/ufs/ufshcd.c +++ b/drivers/scsi/ufs/ufshcd.c @@ -1187,6 +1187,9 @@ static int ufshcd_devfreq_target(struct device *dev, if (!ufshcd_is_clkscaling_supported(hba)) return -EINVAL; + clki = list_first_entry(&hba->clk_list_head, struct ufs_clk_info, list); + /* Override with the closest supported frequency */ + *freq = (unsigned long) clk_round_rate(clki->clk, *freq); spin_lock_irqsave(hba->host->host_lock, irq_flags); if (ufshcd_eh_in_progress(hba)) { spin_unlock_irqrestore(hba->host->host_lock, irq_flags); @@ -1201,8 +1204,11 @@ static int ufshcd_devfreq_target(struct device *dev, goto out; } - clki = list_first_entry(&hba->clk_list_head, struct ufs_clk_info, list); + /* Decide based on the rounded-off frequency and update */ scale_up = (*freq == clki->max_freq) ? true : false; + if (!scale_up) + *freq = clki->min_freq; + /* Update the frequency */ if (!ufshcd_is_devfreq_scaling_required(hba, scale_up)) { spin_unlock_irqrestore(hba->host->host_lock, irq_flags); ret = 0; @@ -1250,6 +1256,8 @@ static int ufshcd_devfreq_get_dev_status(struct device *dev, struct ufs_hba *hba = dev_get_drvdata(dev); struct ufs_clk_scaling *scaling = &hba->clk_scaling; unsigned long flags; + struct list_head *clk_list = &hba->clk_list_head; + struct ufs_clk_info *clki; if (!ufshcd_is_clkscaling_supported(hba)) return -EINVAL; @@ -1260,6 +1268,13 @@ static int ufshcd_devfreq_get_dev_status(struct device *dev, if (!scaling->window_start_t) goto start_window; + clki = list_first_entry(clk_list, struct ufs_clk_info, list); + /* + * If current frequency is 0, then the ondemand governor considers + * there's no initial frequency set. And it always requests to set + * to max. frequency. + */ + stat->current_frequency = clki->curr_freq; if (scaling->is_busy_started) scaling->tot_busy_t += ktime_to_us(ktime_sub(ktime_get(), scaling->busy_start_t));
Currently, the frequency that devfreq provides the driver to set always leads the clocks to be scaled up. Hence, round the clock-rate to the nearest frequency before deciding to scale. Also update the devfreq statistics of current frequency. Signed-off-by: Asutosh Das <asutoshd@codeaurora.org> --- drivers/scsi/ufs/ufshcd.c | 17 ++++++++++++++++- 1 file changed, 16 insertions(+), 1 deletion(-)