diff mbox series

[3/4] cpufreq: qcom-hw: fix the opp entries refcounting

Message ID 20220307153050.3392700-3-dmitry.baryshkov@linaro.org (mailing list archive)
State New, archived
Delegated to: viresh kumar
Headers show
Series [1/4] cpufreq: qcom-hw: drop affinity hint before freeing the IRQ | expand

Commit Message

Dmitry Baryshkov March 7, 2022, 3:30 p.m. UTC
The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
throttling, but will not put it, ending up with leaking a reference
count and the following backtrace when putting the CPU offline.

Correctly put the reference count of the returned opp instance.

[   84.418025] ------------[ cut here ]------------
[   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
[   84.431966] Modules linked in:
[   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
[   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
[   84.458781] pc : _opp_table_kref_release+0x188/0x190
[   84.463878] lr : _opp_table_kref_release+0x78/0x190
[   84.468885] sp : ffff80000841bc70
[   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
[   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
[   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
[   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
[   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
[   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
[   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
[   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
[   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
[   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
[   84.545579] Call trace:
[   84.548101]  _opp_table_kref_release+0x188/0x190
[   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
[   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
[   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
[   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
[   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
[   84.576231]  cpuhp_thread_fun+0x190/0x250
[   84.580353]  smpboot_thread_fn+0x12c/0x230
[   84.584568]  kthread+0xfc/0x100
[   84.587810]  ret_from_fork+0x10/0x20
[   84.591490] irq event stamp: 3482
[   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
[   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
[   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
[   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
[   84.627691] ---[ end trace 0000000000000000 ]---

Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
---
 drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
 1 file changed, 8 insertions(+), 4 deletions(-)

Comments

Bjorn Andersson March 7, 2022, 10:16 p.m. UTC | #1
On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:

> The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
> throttling, but will not put it, ending up with leaking a reference
> count and the following backtrace when putting the CPU offline.
> 

Good catch, and nice to see this kind of testing of the driver!

> Correctly put the reference count of the returned opp instance.
> 
> [   84.418025] ------------[ cut here ]------------
> [   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
> [   84.431966] Modules linked in:
> [   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
> [   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
> [   84.458781] pc : _opp_table_kref_release+0x188/0x190
> [   84.463878] lr : _opp_table_kref_release+0x78/0x190
> [   84.468885] sp : ffff80000841bc70
> [   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
> [   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
> [   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
> [   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
> [   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
> [   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
> [   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
> [   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
> [   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
> [   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
> [   84.545579] Call trace:
> [   84.548101]  _opp_table_kref_release+0x188/0x190
> [   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
> [   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
> [   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
> [   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
> [   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
> [   84.576231]  cpuhp_thread_fun+0x190/0x250
> [   84.580353]  smpboot_thread_fn+0x12c/0x230
> [   84.584568]  kthread+0xfc/0x100
> [   84.587810]  ret_from_fork+0x10/0x20
> [   84.591490] irq event stamp: 3482
> [   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
> [   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
> [   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
> [   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
> [   84.627691] ---[ end trace 0000000000000000 ]---
> 
> Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> ---
>  drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
>  1 file changed, 8 insertions(+), 4 deletions(-)
> 
> diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> index 920c80d91c21..580520215ee7 100644
> --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
>  
>  	opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
>  	if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
> -		dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> +		opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
>  
> -	throttled_freq = freq_hz / HZ_PER_KHZ;

Maybe I'm missing something, but where did this division go after your
change?

> +	if (IS_ERR(opp)) {
> +		dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);

qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware
for changing circumstances during thermal pressure. If for some reason
dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably
continue to fail every 10ms.

As such I think you should either omit the warning print, or possibly
use dev_warn_once().

Regards,
Bjorn

> +	} else {
> +		/* Update thermal pressure (the boost frequencies are accepted) */
> +		arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
>  
> -	/* Update thermal pressure (the boost frequencies are accepted) */
> -	arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> +		dev_pm_opp_put(opp);
> +	}
>  
>  	/*
>  	 * In the unlikely case policy is unregistered do not enable
> -- 
> 2.34.1
>
Dmitry Baryshkov March 9, 2022, 6:49 p.m. UTC | #2
On Tue, 8 Mar 2022 at 01:14, Bjorn Andersson <bjorn.andersson@linaro.org> wrote:
>
> On Mon 07 Mar 07:30 PST 2022, Dmitry Baryshkov wrote:
>
> > The qcom_lmh_dcvs_notify() will get the dev_pm_opp instance for
> > throttling, but will not put it, ending up with leaking a reference
> > count and the following backtrace when putting the CPU offline.
> >
>
> Good catch, and nice to see this kind of testing of the driver!
>
> > Correctly put the reference count of the returned opp instance.
> >
> > [   84.418025] ------------[ cut here ]------------
> > [   84.422770] WARNING: CPU: 7 PID: 43 at drivers/opp/core.c:1396 _opp_table_kref_release+0x188/0x190
> > [   84.431966] Modules linked in:
> > [   84.435106] CPU: 7 PID: 43 Comm: cpuhp/7 Tainted: G S                5.17.0-rc6-00388-g7cf3c0d89c44-dirty #721
> > [   84.451631] pstate: 82400005 (Nzcv daif +PAN -UAO +TCO -DIT -SSBS BTYPE=--)
> > [   84.458781] pc : _opp_table_kref_release+0x188/0x190
> > [   84.463878] lr : _opp_table_kref_release+0x78/0x190
> > [   84.468885] sp : ffff80000841bc70
> > [   84.472294] x29: ffff80000841bc70 x28: ffff6664afe3d000 x27: ffff1db6729e5908
> > [   84.479621] x26: 0000000000000000 x25: 0000000000000000 x24: ffff1db6729e58e0
> > [   84.486946] x23: ffff8000080a5000 x22: ffff1db40aad80e0 x21: ffff1db4002fec80
> > [   84.494277] x20: ffff1db40aad8000 x19: ffffb751c3186300 x18: ffffffffffffffff
> > [   84.501603] x17: 5300326563697665 x16: 645f676e696c6f6f x15: 00001186c1df5448
> > [   84.508928] x14: 00000000000002e9 x13: 0000000000000000 x12: 0000000000000000
> > [   84.516256] x11: ffffb751c3186368 x10: ffffb751c39a2a70 x9 : 0000000000000000
> > [   84.523585] x8 : ffff1db4008edf00 x7 : ffffb751c328c000 x6 : 0000000000000001
> > [   84.530916] x5 : 0000000000040000 x4 : 0000000000000001 x3 : ffff1db4008edf00
> > [   84.538247] x2 : 0000000000000000 x1 : ffff1db400aa6100 x0 : ffff1db40aad80d0
> > [   84.545579] Call trace:
> > [   84.548101]  _opp_table_kref_release+0x188/0x190
> > [   84.552842]  dev_pm_opp_remove_all_dynamic+0x8c/0xc0
> > [   84.557949]  qcom_cpufreq_hw_cpu_exit+0x30/0xdc
> > [   84.562608]  cpufreq_offline.isra.0+0x1b4/0x1d8
> > [   84.567270]  cpuhp_cpufreq_offline+0x10/0x6c
> > [   84.571663]  cpuhp_invoke_callback+0x16c/0x2b0
> > [   84.576231]  cpuhp_thread_fun+0x190/0x250
> > [   84.580353]  smpboot_thread_fn+0x12c/0x230
> > [   84.584568]  kthread+0xfc/0x100
> > [   84.587810]  ret_from_fork+0x10/0x20
> > [   84.591490] irq event stamp: 3482
> > [   84.594901] hardirqs last  enabled at (3481): [<ffffb751c13c3db0>] call_rcu+0x39c/0x50c
> > [   84.603119] hardirqs last disabled at (3482): [<ffffb751c236b518>] el1_dbg+0x24/0x8c
> > [   84.611074] softirqs last  enabled at (310): [<ffffb751c1290410>] _stext+0x410/0x588
> > [   84.619028] softirqs last disabled at (305): [<ffffb751c131bf68>] __irq_exit_rcu+0x158/0x174
> > [   84.627691] ---[ end trace 0000000000000000 ]---
> >
> > Fixes: 275157b367f4 ("cpufreq: qcom-cpufreq-hw: Add dcvs interrupt support")
> > Signed-off-by: Dmitry Baryshkov <dmitry.baryshkov@linaro.org>
> > ---
> >  drivers/cpufreq/qcom-cpufreq-hw.c | 12 ++++++++----
> >  1 file changed, 8 insertions(+), 4 deletions(-)
> >
> > diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
> > index 920c80d91c21..580520215ee7 100644
> > --- a/drivers/cpufreq/qcom-cpufreq-hw.c
> > +++ b/drivers/cpufreq/qcom-cpufreq-hw.c
> > @@ -309,12 +309,16 @@ static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
> >
> >       opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
> >       if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
> > -             dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> > +             opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
> >
> > -     throttled_freq = freq_hz / HZ_PER_KHZ;
>
> Maybe I'm missing something, but where did this division go after your
> change?

Oops. It got dropped when fixing the conflict. Let's get it back.

>
> > +     if (IS_ERR(opp)) {
> > +             dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);
>
> qcom_lmh_dcvs_notify() will be invoked repeatedly to poll the hardware
> for changing circumstances during thermal pressure. If for some reason
> dev_pm_opp_find_freq_ceil() is unable to find an opp it will probably
> continue to fail every 10ms.

I'll change it to dev_warn_ratelimited() instead. dev_warn_once() can
easily get lost.

>
> As such I think you should either omit the warning print, or possibly
> use dev_warn_once().
>
> Regards,
> Bjorn
>
> > +     } else {
> > +             /* Update thermal pressure (the boost frequencies are accepted) */
> > +             arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> >
> > -     /* Update thermal pressure (the boost frequencies are accepted) */
> > -     arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
> > +             dev_pm_opp_put(opp);
> > +     }
> >
> >       /*
> >        * In the unlikely case policy is unregistered do not enable
> > --
> > 2.34.1
> >
diff mbox series

Patch

diff --git a/drivers/cpufreq/qcom-cpufreq-hw.c b/drivers/cpufreq/qcom-cpufreq-hw.c
index 920c80d91c21..580520215ee7 100644
--- a/drivers/cpufreq/qcom-cpufreq-hw.c
+++ b/drivers/cpufreq/qcom-cpufreq-hw.c
@@ -309,12 +309,16 @@  static void qcom_lmh_dcvs_notify(struct qcom_cpufreq_data *data)
 
 	opp = dev_pm_opp_find_freq_floor(dev, &freq_hz);
 	if (IS_ERR(opp) && PTR_ERR(opp) == -ERANGE)
-		dev_pm_opp_find_freq_ceil(dev, &freq_hz);
+		opp = dev_pm_opp_find_freq_ceil(dev, &freq_hz);
 
-	throttled_freq = freq_hz / HZ_PER_KHZ;
+	if (IS_ERR(opp)) {
+		dev_warn(dev, "Can't find the OPP for throttling: %pe!\n", opp);
+	} else {
+		/* Update thermal pressure (the boost frequencies are accepted) */
+		arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
 
-	/* Update thermal pressure (the boost frequencies are accepted) */
-	arch_update_thermal_pressure(policy->related_cpus, throttled_freq);
+		dev_pm_opp_put(opp);
+	}
 
 	/*
 	 * In the unlikely case policy is unregistered do not enable