Message ID | 1385514296-26702-4-git-send-email-soren.brinkmann@xilinx.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Hi Daniel, On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: > It is not allowed to call clk_get_rate() from interrupt context. To > avoid such calls the timer input frequency is stored in the driver's > data struct which makes it accessible to the driver in any context. > > Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> > Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> I doubt that we'll resolve all issues with this series before the holidays or even the next merge window. Could you take this patch into your tree for 3.14? It is not directly related to the cpufreq work and fixes an actual issue that triggers a kernel WARN under some condition (I missed preserving the details and the trace). That would take the easy stuff out of the way and we can focus on the more controversial changes. Thanks, Sören
On 12/17/2013 08:21 PM, Sören Brinkmann wrote: > Hi Daniel, > > On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: >> It is not allowed to call clk_get_rate() from interrupt context. To >> avoid such calls the timer input frequency is stored in the driver's >> data struct which makes it accessible to the driver in any context. >> >> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> >> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > I doubt that we'll resolve all issues with this series before the > holidays or even the next merge window. Could you take this patch into > your tree for 3.14? It is not directly related to the cpufreq work and > fixes an actual issue that triggers a kernel WARN under some condition > (I missed preserving the details and the trace). That would take the > easy stuff out of the way and we can focus on the more controversial > changes. You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? -- Daniel
Hi Daniel, On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: > On 12/17/2013 08:21 PM, Sören Brinkmann wrote: > >Hi Daniel, > > > >On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: > >>It is not allowed to call clk_get_rate() from interrupt context. To > >>avoid such calls the timer input frequency is stored in the driver's > >>data struct which makes it accessible to the driver in any context. > >> > >>Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> > >>Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> > > > >I doubt that we'll resolve all issues with this series before the > >holidays or even the next merge window. Could you take this patch into > >your tree for 3.14? It is not directly related to the cpufreq work and > >fixes an actual issue that triggers a kernel WARN under some condition > >(I missed preserving the details and the trace). That would take the > >easy stuff out of the way and we can focus on the more controversial > >changes. > > You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? That's also an option. As I remember, the patch fixes a kernel WARN. The system still seemed operational though. Up to you whether this is considered severe enough for the 3.13 series. I'm happy either way. Sören
On 12/18/2013 05:47 PM, Sören Brinkmann wrote: > Hi Daniel, > > On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: >> On 12/17/2013 08:21 PM, Sören Brinkmann wrote: >>> Hi Daniel, >>> >>> On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: >>>> It is not allowed to call clk_get_rate() from interrupt context. To >>>> avoid such calls the timer input frequency is stored in the driver's >>>> data struct which makes it accessible to the driver in any context. >>>> >>>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> >>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> >>> >>> I doubt that we'll resolve all issues with this series before the >>> holidays or even the next merge window. Could you take this patch into >>> your tree for 3.14? It is not directly related to the cpufreq work and >>> fixes an actual issue that triggers a kernel WARN under some condition >>> (I missed preserving the details and the trace). That would take the >>> easy stuff out of the way and we can focus on the more controversial >>> changes. >> >> You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? > > That's also an option. As I remember, the patch fixes a kernel WARN. The > system still seemed operational though. Up to you whether this is > considered severe enough for the 3.13 series. I'm happy either way. I was not able to reproduce the WARN with my board. Please, could you give the WARN or give the procedure to reproduce it ? Thanks -- Daniel
Hi Daniel, On Wed, Dec 18, 2013 at 10:58:26PM +0100, Daniel Lezcano wrote: > On 12/18/2013 05:47 PM, Sören Brinkmann wrote: > >Hi Daniel, > > > >On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: > >>On 12/17/2013 08:21 PM, Sören Brinkmann wrote: > >>>Hi Daniel, > >>> > >>>On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: > >>>>It is not allowed to call clk_get_rate() from interrupt context. To > >>>>avoid such calls the timer input frequency is stored in the driver's > >>>>data struct which makes it accessible to the driver in any context. > >>>> > >>>>Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> > >>>>Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> > >>> > >>>I doubt that we'll resolve all issues with this series before the > >>>holidays or even the next merge window. Could you take this patch into > >>>your tree for 3.14? It is not directly related to the cpufreq work and > >>>fixes an actual issue that triggers a kernel WARN under some condition > >>>(I missed preserving the details and the trace). That would take the > >>>easy stuff out of the way and we can focus on the more controversial > >>>changes. > >> > >>You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? > > > >That's also an option. As I remember, the patch fixes a kernel WARN. The > >system still seemed operational though. Up to you whether this is > >considered severe enough for the 3.13 series. I'm happy either way. > > I was not able to reproduce the WARN with my board. > > Please, could you give the WARN or give the procedure to reproduce it ? I can't either... I thought I saw the WARN on a vanilla kernel during boot (IIRC, when cpuidle started). Is there any chance the timer core calls the timer's set_mode() from interrupt context? Anyway, let's drop it for now. I'll make sure to record more information in case it reappears. Sören
On 12/19/2013 07:32 PM, Sören Brinkmann wrote: > Hi Daniel, > > On Wed, Dec 18, 2013 at 10:58:26PM +0100, Daniel Lezcano wrote: >> On 12/18/2013 05:47 PM, Sören Brinkmann wrote: >>> Hi Daniel, >>> >>> On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: >>>> On 12/17/2013 08:21 PM, Sören Brinkmann wrote: >>>>> Hi Daniel, >>>>> >>>>> On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: >>>>>> It is not allowed to call clk_get_rate() from interrupt context. To >>>>>> avoid such calls the timer input frequency is stored in the driver's >>>>>> data struct which makes it accessible to the driver in any context. >>>>>> >>>>>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> >>>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> >>>>> >>>>> I doubt that we'll resolve all issues with this series before the >>>>> holidays or even the next merge window. Could you take this patch into >>>>> your tree for 3.14? It is not directly related to the cpufreq work and >>>>> fixes an actual issue that triggers a kernel WARN under some condition >>>>> (I missed preserving the details and the trace). That would take the >>>>> easy stuff out of the way and we can focus on the more controversial >>>>> changes. >>>> >>>> You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? >>> >>> That's also an option. As I remember, the patch fixes a kernel WARN. The >>> system still seemed operational though. Up to you whether this is >>> considered severe enough for the 3.13 series. I'm happy either way. >> >> I was not able to reproduce the WARN with my board. >> >> Please, could you give the WARN or give the procedure to reproduce it ? > > I can't either... I thought I saw the WARN on a vanilla kernel during > boot (IIRC, when cpuidle started). Is there any chance the timer core > calls the timer's set_mode() from interrupt context? > Anyway, let's drop it for now. I'll make sure to record more information > in case it reappears. Finally I was able to reproduce it with the highres timers disabled, the periodic tick system and the locks debug. Indeed, we are in an interrupt context (IPI) and we are calling clk_get_rate in the the set_mode function which in turn ends up by getting a mutex... Even if that does not hang, it is a potential kernel crash so I will apply the patch with an updated changelog. ------------[ cut here ]------------ WARNING: CPU: 1 PID: 0 at /home/dlezcano/Work/src/cpuidle-next/kernel/mutex.c:856 mutex_trylock+0x70/0x1fc() DEBUG_LOCKS_WARN_ON(in_interrupt()) Modules linked in: CPU: 1 PID: 0 Comm: swapper/1 Not tainted 3.12.0-xilinx-dirty #93 [<c0014a78>] (unwind_backtrace+0x0/0x11c) from [<c0011b6c>] (show_stack+0x10/0x14) [<c0011b6c>] (show_stack+0x10/0x14) from [<c039120c>] (dump_stack+0x7c/0xc0) [<c039120c>] (dump_stack+0x7c/0xc0) from [<c001fda4>] (warn_slowpath_common+0x60/0x84) [<c001fda4>] (warn_slowpath_common+0x60/0x84) from [<c001fe48>] (warn_slowpath_fmt+0x2c/0x3c) [<c001fe48>] (warn_slowpath_fmt+0x2c/0x3c) from [<c0392658>] (mutex_trylock+0x70/0x1fc) [<c0392658>] (mutex_trylock+0x70/0x1fc) from [<c02dfc08>] (clk_prepare_lock+0xc/0xe4) [<c02dfc08>] (clk_prepare_lock+0xc/0xe4) from [<c02e099c>] (clk_get_rate+0xc/0x44) [<c02e099c>] (clk_get_rate+0xc/0x44) from [<c02d0394>] (ttc_set_mode+0x34/0x78) [<c02d0394>] (ttc_set_mode+0x34/0x78) from [<c005f794>] (clockevents_set_mode+0x28/0x5c) [<c005f794>] (clockevents_set_mode+0x28/0x5c) from [<c00607fc>] (tick_broadcast_on_off+0x190/0x1c0) [<c00607fc>] (tick_broadcast_on_off+0x190/0x1c0) from [<c005f168>] (clockevents_notify+0x58/0x1ac) [<c005f168>] (clockevents_notify+0x58/0x1ac) from [<c02b99dc>] (cpuidle_setup_broadcast_timer+0x20/0x24) [<c02b99dc>] (cpuidle_setup_broadcast_timer+0x20/0x24) from [<c006cd04>] (generic_smp_call_function_single_interrupt+0) [<c006cd04>] (generic_smp_call_function_single_interrupt+0xe0/0x130) from [<c00138c8>] (handle_IPI+0x88/0x118) [<c00138c8>] (handle_IPI+0x88/0x118) from [<c0008504>] (gic_handle_irq+0x58/0x60) [<c0008504>] (gic_handle_irq+0x58/0x60) from [<c0012644>] (__irq_svc+0x44/0x78) Exception stack(0xef099fa0 to 0xef099fe8) 9fa0: 00000001 ef092100 00000000 ef092100 ef098000 00000015 c0399f2c c0579d74 9fc0: 0000406a 413fc090 00000000 00000000 00000000 ef099fe8 c00666ec c000f46c 9fe0: 20000113 ffffffff [<c0012644>] (__irq_svc+0x44/0x78) from [<c000f46c>] (arch_cpu_idle+0x34/0x3c) [<c000f46c>] (arch_cpu_idle+0x34/0x3c) from [<c0053980>] (cpu_startup_entry+0xa8/0x10c) [<c0053980>] (cpu_startup_entry+0xa8/0x10c) from [<000085a4>] (0x85a4) ---[ end trace 8185ad1c7a65f6e7 ]---
Hi Daniel, On Thu, Dec 19, 2013 at 09:53:14PM +0100, Daniel Lezcano wrote: > On 12/19/2013 07:32 PM, Sören Brinkmann wrote: > >Hi Daniel, > > > >On Wed, Dec 18, 2013 at 10:58:26PM +0100, Daniel Lezcano wrote: > >>On 12/18/2013 05:47 PM, Sören Brinkmann wrote: > >>>Hi Daniel, > >>> > >>>On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: > >>>>On 12/17/2013 08:21 PM, Sören Brinkmann wrote: > >>>>>Hi Daniel, > >>>>> > >>>>>On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: > >>>>>>It is not allowed to call clk_get_rate() from interrupt context. To > >>>>>>avoid such calls the timer input frequency is stored in the driver's > >>>>>>data struct which makes it accessible to the driver in any context. > >>>>>> > >>>>>>Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> > >>>>>>Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> > >>>>> > >>>>>I doubt that we'll resolve all issues with this series before the > >>>>>holidays or even the next merge window. Could you take this patch into > >>>>>your tree for 3.14? It is not directly related to the cpufreq work and > >>>>>fixes an actual issue that triggers a kernel WARN under some condition > >>>>>(I missed preserving the details and the trace). That would take the > >>>>>easy stuff out of the way and we can focus on the more controversial > >>>>>changes. > >>>> > >>>>You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? > >>> > >>>That's also an option. As I remember, the patch fixes a kernel WARN. The > >>>system still seemed operational though. Up to you whether this is > >>>considered severe enough for the 3.13 series. I'm happy either way. > >> > >>I was not able to reproduce the WARN with my board. > >> > >>Please, could you give the WARN or give the procedure to reproduce it ? > > > >I can't either... I thought I saw the WARN on a vanilla kernel during > >boot (IIRC, when cpuidle started). Is there any chance the timer core > >calls the timer's set_mode() from interrupt context? > >Anyway, let's drop it for now. I'll make sure to record more information > >in case it reappears. > > Finally I was able to reproduce it with the highres timers disabled, > the periodic tick system and the locks debug. > > Indeed, we are in an interrupt context (IPI) and we are calling > clk_get_rate in the the set_mode function which in turn ends up by > getting a mutex... Even if that does not hang, it is a potential > kernel crash so I will apply the patch with an updated changelog. Thanks! Kind of comforting to know that the issue I tried to fix actually exists. Sören
On 12/19/2013 10:23 PM, Sören Brinkmann wrote: > Hi Daniel, > > On Thu, Dec 19, 2013 at 09:53:14PM +0100, Daniel Lezcano wrote: >> On 12/19/2013 07:32 PM, Sören Brinkmann wrote: >>> Hi Daniel, >>> >>> On Wed, Dec 18, 2013 at 10:58:26PM +0100, Daniel Lezcano wrote: >>>> On 12/18/2013 05:47 PM, Sören Brinkmann wrote: >>>>> Hi Daniel, >>>>> >>>>> On Wed, Dec 18, 2013 at 03:53:51PM +0100, Daniel Lezcano wrote: >>>>>> On 12/17/2013 08:21 PM, Sören Brinkmann wrote: >>>>>>> Hi Daniel, >>>>>>> >>>>>>> On Tue, Nov 26, 2013 at 05:04:50PM -0800, Soren Brinkmann wrote: >>>>>>>> It is not allowed to call clk_get_rate() from interrupt context. To >>>>>>>> avoid such calls the timer input frequency is stored in the driver's >>>>>>>> data struct which makes it accessible to the driver in any context. >>>>>>>> >>>>>>>> Signed-off-by: Soren Brinkmann <soren.brinkmann@xilinx.com> >>>>>>>> Acked-by: Daniel Lezcano <daniel.lezcano@linaro.org> >>>>>>> >>>>>>> I doubt that we'll resolve all issues with this series before the >>>>>>> holidays or even the next merge window. Could you take this patch into >>>>>>> your tree for 3.14? It is not directly related to the cpufreq work and >>>>>>> fixes an actual issue that triggers a kernel WARN under some condition >>>>>>> (I missed preserving the details and the trace). That would take the >>>>>>> easy stuff out of the way and we can focus on the more controversial >>>>>>> changes. >>>>>> >>>>>> You are asking to take it for 3.14 but shouldn't it go as a 3.13 fix ? >>>>> >>>>> That's also an option. As I remember, the patch fixes a kernel WARN. The >>>>> system still seemed operational though. Up to you whether this is >>>>> considered severe enough for the 3.13 series. I'm happy either way. >>>> >>>> I was not able to reproduce the WARN with my board. >>>> >>>> Please, could you give the WARN or give the procedure to reproduce it ? >>> >>> I can't either... I thought I saw the WARN on a vanilla kernel during >>> boot (IIRC, when cpuidle started). Is there any chance the timer core >>> calls the timer's set_mode() from interrupt context? >>> Anyway, let's drop it for now. I'll make sure to record more information >>> in case it reappears. >> >> Finally I was able to reproduce it with the highres timers disabled, >> the periodic tick system and the locks debug. >> >> Indeed, we are in an interrupt context (IPI) and we are calling >> clk_get_rate in the the set_mode function which in turn ends up by >> getting a mutex... Even if that does not hang, it is a potential >> kernel crash so I will apply the patch with an updated changelog. > > Thanks! Kind of comforting to know that the issue I tried to fix actually exists. Applied to my tree as a 3.13 fix. Thanks ! -- Daniel
diff --git a/drivers/clocksource/cadence_ttc_timer.c b/drivers/clocksource/cadence_ttc_timer.c index b2bb3a4bc205..a92350b55d32 100644 --- a/drivers/clocksource/cadence_ttc_timer.c +++ b/drivers/clocksource/cadence_ttc_timer.c @@ -67,11 +67,13 @@ * struct ttc_timer - This definition defines local timer structure * * @base_addr: Base address of timer + * @freq: Timer input clock frequency * @clk: Associated clock source * @clk_rate_change_nb Notifier block for clock rate changes */ struct ttc_timer { void __iomem *base_addr; + unsigned long freq; struct clk *clk; struct notifier_block clk_rate_change_nb; }; @@ -196,9 +198,8 @@ static void ttc_set_mode(enum clock_event_mode mode, switch (mode) { case CLOCK_EVT_MODE_PERIODIC: - ttc_set_interval(timer, - DIV_ROUND_CLOSEST(clk_get_rate(ttce->ttc.clk), - PRESCALE * HZ)); + ttc_set_interval(timer, DIV_ROUND_CLOSEST(ttce->ttc.freq, + PRESCALE * HZ)); break; case CLOCK_EVT_MODE_ONESHOT: case CLOCK_EVT_MODE_UNUSED: @@ -273,6 +274,8 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base) return; } + ttccs->ttc.freq = clk_get_rate(ttccs->ttc.clk); + ttccs->ttc.clk_rate_change_nb.notifier_call = ttc_rate_change_clocksource_cb; ttccs->ttc.clk_rate_change_nb.next = NULL; @@ -298,16 +301,14 @@ static void __init ttc_setup_clocksource(struct clk *clk, void __iomem *base) __raw_writel(CNT_CNTRL_RESET, ttccs->ttc.base_addr + TTC_CNT_CNTRL_OFFSET); - err = clocksource_register_hz(&ttccs->cs, - clk_get_rate(ttccs->ttc.clk) / PRESCALE); + err = clocksource_register_hz(&ttccs->cs, ttccs->ttc.freq / PRESCALE); if (WARN_ON(err)) { kfree(ttccs); return; } ttc_sched_clock_val_reg = base + TTC_COUNT_VAL_OFFSET; - setup_sched_clock(ttc_sched_clock_read, 16, - clk_get_rate(ttccs->ttc.clk) / PRESCALE); + setup_sched_clock(ttc_sched_clock_read, 16, ttccs->ttc.freq / PRESCALE); } static int ttc_rate_change_clockevent_cb(struct notifier_block *nb, @@ -334,6 +335,9 @@ static int ttc_rate_change_clockevent_cb(struct notifier_block *nb, ndata->new_rate / PRESCALE); local_irq_restore(flags); + /* update cached frequency */ + ttc->freq = ndata->new_rate; + /* fall through */ } case PRE_RATE_CHANGE: @@ -367,6 +371,7 @@ static void __init ttc_setup_clockevent(struct clk *clk, if (clk_notifier_register(ttcce->ttc.clk, &ttcce->ttc.clk_rate_change_nb)) pr_warn("Unable to register clock notifier.\n"); + ttcce->ttc.freq = clk_get_rate(ttcce->ttc.clk); ttcce->ttc.base_addr = base; ttcce->ce.name = "ttc_clockevent"; @@ -396,7 +401,7 @@ static void __init ttc_setup_clockevent(struct clk *clk, } clockevents_config_and_register(&ttcce->ce, - clk_get_rate(ttcce->ttc.clk) / PRESCALE, 1, 0xfffe); + ttcce->ttc.freq / PRESCALE, 1, 0xfffe); } /**