Message ID | 20220307153050.3392700-3-dmitry.baryshkov@linaro.org (mailing list archive) |
---|---|
State | New, archived |
Delegated to: | viresh kumar |
Headers | show |
Series | [1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ | expand |
On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote: > The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for > throttling, but will not put it, ending up with leaking a reference > count and the following backtrace when putting the CPU offline. > Good catch, and nice to see this kind of testing of the driver! > Correctly put the reference count of the returned opp instance. > > [ 84.418025] ------------[ cut here ]------------ > [ 84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190 > [ 84.431966] Modules linked in: > [ 84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S 5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721 > [ 84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) > [ 84.458781] pc : _opp_table_kref_release+0x188/0x190 > [ 84.463878] lr : _opp_table_kref_release+0x78/0x190 > [ 84.468885] sp : ffff80000841bc70 > [ 84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908 > [ 84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0 > [ 84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80 > [ 84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff > [ 84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448 > [ 84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000 > [ 84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000 > [ 84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001 > [ 84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00 > [ 84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0 > [ 84.545579] Call trace: > [ 84.548101] _opp_table_kref_release+0x188/0x190 > [ 84.552842] dev_pm_opp_remove_all_dynamic+0x8c/0xc0 > [ 84.557949] qcom_cpufreq_hw_cpu_exit+0x30/0xdc > [ 84.562608] cpufreq_offline.isra.0+0x1b4/0x1d8 > [ 84.567270] cpuhp_cpufreq_offline+0x10/0x6c > [ 84.571663] cpuhp_invoke_callback+0x16c/0x2b0 > [ 84.576231] cpuhp_thread_fun+0x190/0x250 > [ 84.580353] smpboot_thread_fn+0x12c/0x230 > [ 84.584568] kthread+0xfc/0x100 > [ 84.587810] ret_from_fork+0x10/0x20 > [ 84.591490] irq event stamp: 3482 > [ 84.594901] hardirqs last enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c > [ 84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c > [ 84.611074] softirqs last enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588 > [ 84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174 > [ 84.627691] ---[ end trace 0000000000000000 ]--- > > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support") > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > --- > drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++---- > 1 file changed, 8 insertions(+), 4 deletions(-) > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c > index 920c80d91c21..580520215ee7 100644 > --- a/drivers/cpufreq/qcom-cpufreq-hw.c > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) > > opp = dev_pm_opp_find_freq_floor(dev, &freq_hz); > if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE) > - dev_pm_opp_find_freq_ceil(dev, &freq_hz); > + opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz); > > - throttled_freq = freq_hz / HZ_PER_KHZ; Maybe I'm missing something, but where did this division go after your change? > + if (IS_ERR(opp)) { > + dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp); qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware for changing circumstances during thermal pressure. If for some reason dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably continue to fail every 10ms. As such I think you should either omit the warning print, or possibly use dev_warn_once(). Regards, Bjorn > + } else { > + /* Update thermal pressure (the boost frequencies are accepted) */ > + arch_update_thermal_pressure(policy->related_cpus, throttled_freq); > > - /* Update thermal pressure (the boost frequencies are accepted) */ > - arch_update_thermal_pressure(policy->related_cpus, throttled_freq); > + dev_pm_opp_put(opp); > + } > > /* > * In the unlikely case policy is unregistered do not enable > -- > 2.34.1 >
On Tue, 8 Mar 2022 at 01:14, Bjorn Andersson <bjorn.andersson@linaro.org> wrote: > > On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote: > > > The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for > > throttling, but will not put it, ending up with leaking a reference > > count and the following backtrace when putting the CPU offline. > > > > Good catch, and nice to see this kind of testing of the driver! > > > Correctly put the reference count of the returned opp instance. > > > > [ 84.418025] ------------[ cut here ]------------ > > [ 84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190 > > [ 84.431966] Modules linked in: > > [ 84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S 5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721 > > [ 84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) > > [ 84.458781] pc : _opp_table_kref_release+0x188/0x190 > > [ 84.463878] lr : _opp_table_kref_release+0x78/0x190 > > [ 84.468885] sp : ffff80000841bc70 > > [ 84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908 > > [ 84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0 > > [ 84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80 > > [ 84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff > > [ 84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448 > > [ 84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000 > > [ 84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000 > > [ 84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001 > > [ 84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00 > > [ 84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0 > > [ 84.545579] Call trace: > > [ 84.548101] _opp_table_kref_release+0x188/0x190 > > [ 84.552842] dev_pm_opp_remove_all_dynamic+0x8c/0xc0 > > [ 84.557949] qcom_cpufreq_hw_cpu_exit+0x30/0xdc > > [ 84.562608] cpufreq_offline.isra.0+0x1b4/0x1d8 > > [ 84.567270] cpuhp_cpufreq_offline+0x10/0x6c > > [ 84.571663] cpuhp_invoke_callback+0x16c/0x2b0 > > [ 84.576231] cpuhp_thread_fun+0x190/0x250 > > [ 84.580353] smpboot_thread_fn+0x12c/0x230 > > [ 84.584568] kthread+0xfc/0x100 > > [ 84.587810] ret_from_fork+0x10/0x20 > > [ 84.591490] irq event stamp: 3482 > > [ 84.594901] hardirqs last enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c > > [ 84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c > > [ 84.611074] softirqs last enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588 > > [ 84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174 > > [ 84.627691] ---[ end trace 0000000000000000 ]--- > > > > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support") > > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> > > --- > > drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++---- > > 1 file changed, 8 insertions(+), 4 deletions(-) > > > > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c > > index 920c80d91c21..580520215ee7 100644 > > --- a/drivers/cpufreq/qcom-cpufreq-hw.c > > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c > > @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) > > > > opp = dev_pm_opp_find_freq_floor(dev, &freq_hz); > > if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE) > > - dev_pm_opp_find_freq_ceil(dev, &freq_hz); > > + opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz); > > > > - throttled_freq = freq_hz / HZ_PER_KHZ; > > Maybe I'm missing something, but where did this division go after your > change? Oops. It got dropped when fixing the conflict. Let's get it back. > > > + if (IS_ERR(opp)) { > > + dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp); > > qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware > for changing circumstances during thermal pressure. If for some reason > dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably > continue to fail every 10ms. I'll change it to dev_warn_ratelimited() instead. dev_warn_once() can easily get lost. > > As such I think you should either omit the warning print, or possibly > use dev_warn_once(). > > Regards, > Bjorn > > > + } else { > > + /* Update thermal pressure (the boost frequencies are accepted) */ > > + arch_update_thermal_pressure(policy->related_cpus, throttled_freq); > > > > - /* Update thermal pressure (the boost frequencies are accepted) */ > > - arch_update_thermal_pressure(policy->related_cpus, throttled_freq); > > + dev_pm_opp_put(opp); > > + } > > > > /* > > * In the unlikely case policy is unregistered do not enable > > -- > > 2.34.1 > >
diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c index 920c80d91c21..580520215ee7 100644 --- a/drivers/cpufreq/qcom-cpufreq-hw.c +++ b/drivers/cpufreq/qcom-cpufreq-hw.c @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data) opp = dev_pm_opp_find_freq_floor(dev, &freq_hz); if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE) - dev_pm_opp_find_freq_ceil(dev, &freq_hz); + opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz); - throttled_freq = freq_hz / HZ_PER_KHZ; + if (IS_ERR(opp)) { + dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp); + } else { + /* Update thermal pressure (the boost frequencies are accepted) */ + arch_update_thermal_pressure(policy->related_cpus, throttled_freq); - /* Update thermal pressure (the boost frequencies are accepted) */ - arch_update_thermal_pressure(policy->related_cpus, throttled_freq); + dev_pm_opp_put(opp); + } /* * In the unlikely case policy is unregistered do not enable
The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for throttling, but will not put it, ending up with leaking a reference count and the following backtrace when putting the CPU offline. Correctly put the reference count of the returned opp instance. [ 84.418025] ------------[ cut here ]------------ [ 84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190 [ 84.431966] Modules linked in: [ 84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S 5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721 [ 84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--) [ 84.458781] pc : _opp_table_kref_release+0x188/0x190 [ 84.463878] lr : _opp_table_kref_release+0x78/0x190 [ 84.468885] sp : ffff80000841bc70 [ 84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908 [ 84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0 [ 84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80 [ 84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff [ 84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448 [ 84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000 [ 84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000 [ 84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001 [ 84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00 [ 84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0 [ 84.545579] Call trace: [ 84.548101] _opp_table_kref_release+0x188/0x190 [ 84.552842] dev_pm_opp_remove_all_dynamic+0x8c/0xc0 [ 84.557949] qcom_cpufreq_hw_cpu_exit+0x30/0xdc [ 84.562608] cpufreq_offline.isra.0+0x1b4/0x1d8 [ 84.567270] cpuhp_cpufreq_offline+0x10/0x6c [ 84.571663] cpuhp_invoke_callback+0x16c/0x2b0 [ 84.576231] cpuhp_thread_fun+0x190/0x250 [ 84.580353] smpboot_thread_fn+0x12c/0x230 [ 84.584568] kthread+0xfc/0x100 [ 84.587810] ret_from_fork+0x10/0x20 [ 84.591490] irq event stamp: 3482 [ 84.594901] hardirqs last enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c [ 84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c [ 84.611074] softirqs last enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588 [ 84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174 [ 84.627691] ---[ end trace 0000000000000000 ]--- Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support") Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org> --- drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++---- 1 file changed, 8 insertions(+), 4 deletions(-)