Message ID | 1342567672-29071-1-git-send-email-ccross@android.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On Wed, Jul 18, 2012 at 1:27 AM, Colin Cross <ccross@android.com> wrote: > This will have a side effect of causing SoCs that have clocks that > continue to count in suspend to appear to stop counting, reporting the > same sched_clock() value before and after suspend. So for our platform (ux500) that has a sched clock that *does* continue to run during suspend, drivers/clocksource/clksrc-dbx500-prcmu.c how do we opt out of this behaviour? Since sched_clock is used for the debug prints, if we have a crash immediately after resume() it will appear to be at resume time in the log which kinda sucks. :-( Isn't the proper way to do this either: - Assign suspend/resume hooks to the sched_clock code in the platform and let the code that reads the hardware clock deal with this Or - If it absolutely needs to be in the core code, also have a bool field indicating whether the clock is going to die during suspend and add new registration functions for setting that sched_clock type, e.g. setup_sched_clock_nonsuspendable() Yours, Linus Walleij
On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij <linus.walleij@linaro.org> wrote: > On Wed, Jul 18, 2012 at 1:27 AM, Colin Cross <ccross@android.com> wrote: > >> This will have a side effect of causing SoCs that have clocks that >> continue to count in suspend to appear to stop counting, reporting the >> same sched_clock() value before and after suspend. > > So for our platform (ux500) that has a sched clock that *does* > continue to run during suspend, > drivers/clocksource/clksrc-dbx500-prcmu.c > how do we opt out of this behaviour? Does the clock you use for sched_clock continue to run in all suspend modes? All the SoC's I've used only have a 32kHz clock in the deepest suspend mode, which is not ideal for sched_clock. For example, on Tegra2 the faster 1MHz clock used for sched_clock resets in the deepest suspend state (LP0) but not the shallowest suspend state (LP2), and which suspend state the chip hits depends on which hardware is active. Opting out of this patch would cause Tegra's clock to sometimes run in suspend, and sometimes not, which seems worse for debugging than consistently not running in suspend. I'd be surprised if a similar situation didn't apply to your platform. > Since sched_clock is used for the debug prints, if we have a > crash immediately after resume() it will appear to be at resume > time in the log which kinda sucks. :-( Most resume implementations I've seen print a resume message very early, making it fairly easy to distinguish between suspend and resume crashes, but I can see why an accurate timestamp would be helpful. > Isn't the proper way to do this either: > > - Assign suspend/resume hooks to the sched_clock code in the > platform and let the code that reads the hardware clock deal with > this That's effectively what I've done on 3 different platforms, which is why I thought it might be better to put it in the core code. > Or > > - If it absolutely needs to be in the core code, also have a bool > field indicating whether the clock is going to die during suspend > and add new registration functions for setting that sched_clock > type, e.g. setup_sched_clock_nonsuspendable() Sounds reasonable if some platforms need the extra complexity.
On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccross@android.com> wrote: > On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij > Does the clock you use for sched_clock continue to run in all suspend > modes? All the SoC's I've used only have a 32kHz clock in the deepest > suspend mode, Yes, and yes it is 32kHz. > which is not ideal for sched_clock. Not that I know why scheduling with 32kHz is so bad compared to the default system scheduling granularity which is HZ if you don't have sched_clock() implemented. Since this seems to be such an important point, what makes you want MHz:es for scheduling granularity? To me the biggest impact is actually the granularity of the timestamps in the printk:s. (It's not that I doubt your needs, more curiosity.) > For example, on > Tegra2 the faster 1MHz clock used for sched_clock resets in the > deepest suspend state (LP0) but not the shallowest suspend state > (LP2), and which suspend state the chip hits depends on which hardware > is active. Opting out of this patch would cause Tegra's clock to > sometimes run in suspend, and sometimes not, which seems worse for > debugging than consistently not running in suspend. I'd be surprised > if a similar situation didn't apply to your platform. Well being able to switch between different sched_clock() providers may be the ideal... >> - If it absolutely needs to be in the core code, also have a bool >> field indicating whether the clock is going to die during suspend >> and add new registration functions for setting that sched_clock >> type, e.g. setup_sched_clock_nonsuspendable() > > Sounds reasonable if some platforms need the extra complexity. OK agreed. A connecting theme is that of being avle to flag clock sources as sched_clock providers. If all clocksources were tagged with rating, and only clocksources were used for sched_clock(), the kernel could select the highest-rated clock under all circumstances. But that's quite intrusive, more of an idea. :-P Yours, Linus Walleij
On Mon, Jul 23, 2012 at 5:14 PM, Linus Walleij <linus.walleij@linaro.org> wrote: > On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccross@android.com> wrote: >> On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij > >> Does the clock you use for sched_clock continue to run in all suspend >> modes? All the SoC's I've used only have a 32kHz clock in the deepest >> suspend mode, > > Yes, and yes it is 32kHz. > >> which is not ideal for sched_clock. > > Not that I know why scheduling with 32kHz is so bad compared to the > default system scheduling granularity which is HZ if you don't have > sched_clock() implemented. > > Since this seems to be such an important point, what makes you > want MHz:es for scheduling granularity? To me the biggest impact > is actually the granularity of the timestamps in the printk:s. > > (It's not that I doubt your needs, more curiosity.) There's a comment somewhere about higher resolution sched_clock providing fairer scheduling. With 32 kHz sched_clock, every runtime measured by the scheduler will be wrong by up to 31.25 us. Most systems have a faster clock, and if it's available it seems silly not to use it. It's also used for tracing, where 31.25 us resolution is a little low for function tracing or function graph tracing. >> For example, on >> Tegra2 the faster 1MHz clock used for sched_clock resets in the >> deepest suspend state (LP0) but not the shallowest suspend state >> (LP2), and which suspend state the chip hits depends on which hardware >> is active. Opting out of this patch would cause Tegra's clock to >> sometimes run in suspend, and sometimes not, which seems worse for >> debugging than consistently not running in suspend. I'd be surprised >> if a similar situation didn't apply to your platform. > > Well being able to switch between different sched_clock() providers > may be the ideal... > >>> - If it absolutely needs to be in the core code, also have a bool >>> field indicating whether the clock is going to die during suspend >>> and add new registration functions for setting that sched_clock >>> type, e.g. setup_sched_clock_nonsuspendable() >> >> Sounds reasonable if some platforms need the extra complexity. > > OK agreed. > > A connecting theme is that of being avle to flag clock sources as > sched_clock providers. If all clocksources were tagged with > rating, and only clocksources were used for sched_clock(), the > kernel could select the highest-rated clock under all circumstances. > > But that's quite intrusive, more of an idea. :-P sched_clock is supposed to be very low overhead compared to ktime_get, and has some strict requirements if CONFIG_HAVE_UNSTABLE_SCHED_CLOCK is not set (see kernel/sched/clock.c), but it might be possible.
2012/7/18 Colin Cross <ccross@android.com>: > Many clocks that are used to provide sched_clock will reset during > suspend. If read_sched_clock returns 0 after suspend, sched_clock will > appear to jump forward. This patch resets cd.epoch_cyc to the current > value of read_sched_clock during resume, which causes sched_clock() just > after suspend to return the same value as sched_clock() just before > suspend. > > In addition, during the window where epoch_ns has been updated before > suspend, but epoch_cyc has not been updated after suspend, it is unknown > whether the clock has reset or not, and sched_clock() could return a > bogus value. Add a suspended flag, and return the pre-suspend epoch_ns > value during this period. Acked-by: Barry Song <21cnbao@gmail.com> this patch should also fix the issue that: 1. launch some rt threads, rt threads sleep before suspend 2. repeat to suspend/resume 3. after resuming, waking up rt threads repeat 1-3 again and again, sometimes all rt threads will hang after resuming due to wrong sched_clock will make sched_rt think rt_time is much more than rt_runtime (default 950ms in 1s). then rt threads will lost cpu timeslot to run since the 95% threshold is there. > > This will have a side effect of causing SoCs that have clocks that > continue to count in suspend to appear to stop counting, reporting the > same sched_clock() value before and after suspend. > > Signed-off-by: Colin Cross <ccross@android.com> > --- > arch/arm/kernel/sched_clock.c | 13 +++++++++++++ > 1 files changed, 13 insertions(+), 0 deletions(-) > > diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c > index 27d186a..46c7d32 100644 > --- a/arch/arm/kernel/sched_clock.c > +++ b/arch/arm/kernel/sched_clock.c > @@ -21,6 +21,7 @@ struct clock_data { > u32 epoch_cyc_copy; > u32 mult; > u32 shift; > + bool suspended; > }; > > static void sched_clock_poll(unsigned long wrap_ticks); > @@ -49,6 +50,9 @@ static unsigned long long cyc_to_sched_clock(u32 cyc, u32 mask) > u64 epoch_ns; > u32 epoch_cyc; > > + if (cd.suspended) > + return cd.epoch_ns; > + > /* > * Load the epoch_cyc and epoch_ns atomically. We do this by > * ensuring that we always write epoch_cyc, epoch_ns and > @@ -169,11 +173,20 @@ void __init sched_clock_postinit(void) > static int sched_clock_suspend(void) > { > sched_clock_poll(sched_clock_timer.data); > + cd.suspended = true; > return 0; > } > > +static void sched_clock_resume(void) > +{ > + cd.epoch_cyc = read_sched_clock(); > + cd.epoch_cyc_copy = cd.epoch_cyc; > + cd.suspended = false; > +} > + > static struct syscore_ops sched_clock_ops = { > .suspend = sched_clock_suspend, > + .resume = sched_clock_resume, > }; > > static int __init sched_clock_syscore_init(void) -barry
On Tue, Jul 24, 2012 at 05:58:32, Colin Cross wrote: > On Mon, Jul 23, 2012 at 5:14 PM, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Mon, Jul 23, 2012 at 9:27 PM, Colin Cross <ccross@android.com> wrote: > >> On Mon, Jul 23, 2012 at 11:55 AM, Linus Walleij > > > >> Does the clock you use for sched_clock continue to run in all suspend > >> modes? All the SoC's I've used only have a 32kHz clock in the deepest > >> suspend mode, > > > > Yes, and yes it is 32kHz. > > On the platform that I am working on (AM335x) the sched_clock continues to run and yes even on this it runs @ 32KHz. > >> which is not ideal for sched_clock. > > > > Not that I know why scheduling with 32kHz is so bad compared to the > > default system scheduling granularity which is HZ if you don't have > > sched_clock() implemented. > > > > Since this seems to be such an important point, what makes you > > want MHz:es for scheduling granularity? To me the biggest impact > > is actually the granularity of the timestamps in the printk:s. > > I personally find the timestamps in the printks very late in the suspend or very early in the resume extremely helpful for debugging. The printk timestamps also help figure out which portions of the platform suspend code are taking long and can be optimized. > > (It's not that I doubt your needs, more curiosity.) > > There's a comment somewhere about higher resolution sched_clock > providing fairer scheduling. With 32 kHz sched_clock, every runtime > measured by the scheduler will be wrong by up to 31.25 us. Most > systems have a faster clock, and if it's available it seems silly not > to use it. > > It's also used for tracing, where 31.25 us resolution is a little low > for function tracing or function graph tracing. > > >> For example, on > >> Tegra2 the faster 1MHz clock used for sched_clock resets in the > >> deepest suspend state (LP0) but not the shallowest suspend state > >> (LP2), and which suspend state the chip hits depends on which hardware > >> is active. Opting out of this patch would cause Tegra's clock to > >> sometimes run in suspend, and sometimes not, which seems worse for > >> debugging than consistently not running in suspend. I'd be surprised > >> if a similar situation didn't apply to your platform. > > Is there any other slower on Tegra2 which doesn't reset and hence can be used for sched_clock in LP0 as well as LP2? There's a trade-off that the end-user might choose to make - either use a faster clock for sched_clock which stops in suspend or use a slower, in most cases 32KHz, clock which doesn't. IMO this aspect should be configurable. > > Well being able to switch between different sched_clock() providers > > may be the ideal... > > Hmm... not exactly. More below. > >>> - If it absolutely needs to be in the core code, also have a bool > >>> field indicating whether the clock is going to die during suspend > >>> and add new registration functions for setting that sched_clock > >>> type, e.g. setup_sched_clock_nonsuspendable() > >> > >> Sounds reasonable if some platforms need the extra complexity. > > I am not sure but I am guessing even on OMAPx the timer used as sched_clock keeps running. > > OK agreed. > > > > A connecting theme is that of being avle to flag clock sources as > > sched_clock providers. If all clocksources were tagged with > > rating, and only clocksources were used for sched_clock(), the > > kernel could select the highest-rated clock under all circumstances. > > > > But that's quite intrusive, more of an idea. :-P > Too intrusive I guess ;) There were some discussions on this in the context of AM335x [1] [2]. Right now only sched_clock can be registered and I guess this restriction is not going to go away any time soon. Regards, Vaibhav B. [1] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-January/080862.html [2] http://lists.infradead.org/pipermail/linux-arm-kernel/2012-April/092890.html > sched_clock is supposed to be very low overhead compared to ktime_get, > and has some strict requirements if CONFIG_HAVE_UNSTABLE_SCHED_CLOCK > is not set (see kernel/sched/clock.c), but it might be possible. > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel >
On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote: >> > A connecting theme is that of being avle to flag clock sources as >> > sched_clock providers. If all clocksources were tagged with >> > rating, and only clocksources were used for sched_clock(), the >> > kernel could select the highest-rated clock under all circumstances. >> > >> > But that's quite intrusive, more of an idea. :-P >> > > Too intrusive I guess ;) > > There were some discussions on this in the context of AM335x [1] [2]. > Right now only sched_clock can be registered and I guess this restriction > is not going to go away any time soon. Why do you think that? The restriction to only assign sched_clock() at compile-time was recently removed so its now runtime assigned. So yes, a clock source that can die and change frequency is no good as primary system time, but the abstraction could still be used for those that do, just add another flag NOT_CONTINUOUS or so, and make sure the system does not select this for primary system clock source. Then modelling sched_clock() on clock sources makes all more sense: just select the best one. For primary system clock source, do not select one which is non-continous. Yours, Linus Walleij
On Sat, Jul 28, 2012 at 12:23:05AM +0200, Linus Walleij wrote: > On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote: > > >> > A connecting theme is that of being avle to flag clock sources as > >> > sched_clock providers. If all clocksources were tagged with > >> > rating, and only clocksources were used for sched_clock(), the > >> > kernel could select the highest-rated clock under all circumstances. > >> > > >> > But that's quite intrusive, more of an idea. :-P > >> > > > > Too intrusive I guess ;) > > > > There were some discussions on this in the context of AM335x [1] [2]. > > Right now only sched_clock can be registered and I guess this restriction > > is not going to go away any time soon. > > Why do you think that? The restriction to only assign sched_clock() at > compile-time was recently removed so its now runtime assigned. > > So yes, a clock source that can die and change frequency is no good > as primary system time, but the abstraction could still be used for > those that do, just add another flag NOT_CONTINUOUS or so, and > make sure the system does not select this for primary system > clock source. > > Then modelling sched_clock() on clock sources makes all more > sense: just select the best one. For primary system clock source, > do not select one which is non-continous. Except for the overhead. sched_clock() is supposed to be as _fast_ as it can possibly be, because it has a direct impact on the performance of the system.
On Sat, Jul 28, 2012 at 03:53:05, Linus Walleij wrote: > On Tue, Jul 24, 2012 at 11:16 AM, Bedia, Vaibhav <vaibhav.bedia@ti.com> wrote: > > >> > A connecting theme is that of being avle to flag clock sources as > >> > sched_clock providers. If all clocksources were tagged with > >> > rating, and only clocksources were used for sched_clock(), the > >> > kernel could select the highest-rated clock under all circumstances. > >> > > >> > But that's quite intrusive, more of an idea. :-P > >> > > > > Too intrusive I guess ;) > > > > There were some discussions on this in the context of AM335x [1] [2]. > > Right now only sched_clock can be registered and I guess this restriction > > is not going to go away any time soon. > > Why do you think that? The restriction to only assign sched_clock() at > compile-time was recently removed so its now runtime assigned. > Yes it's runtime assigned but multiple calls to setup_sched_clock() will trigger a WARN_ON() from arch/arm/kernel/sched_clock.c unless I missing something basic here. > So yes, a clock source that can die and change frequency is no good > as primary system time, but the abstraction could still be used for > those that do, just add another flag NOT_CONTINUOUS or so, and > make sure the system does not select this for primary system > clock source. > > Then modelling sched_clock() on clock sources makes all more > sense: just select the best one. For primary system clock source, > do not select one which is non-continous. > I think what you are suggesting above is similar to what AM335x was trying to do but Russell pointed out issues with switching of timing sources and that's what I was trying to highlight. Regards, Vaibhav B.
Barry Song <21cnbao@gmail.com> writes: > 2012/7/18 Colin Cross <ccross@android.com>: >> Many clocks that are used to provide sched_clock will reset during >> suspend. If read_sched_clock returns 0 after suspend, sched_clock will >> appear to jump forward. This patch resets cd.epoch_cyc to the current >> value of read_sched_clock during resume, which causes sched_clock() just >> after suspend to return the same value as sched_clock() just before >> suspend. >> >> In addition, during the window where epoch_ns has been updated before >> suspend, but epoch_cyc has not been updated after suspend, it is unknown >> whether the clock has reset or not, and sched_clock() could return a >> bogus value. Add a suspended flag, and return the pre-suspend epoch_ns >> value during this period. > > Acked-by: Barry Song <21cnbao@gmail.com> > > this patch should also fix the issue that: > 1. launch some rt threads, rt threads sleep before suspend > 2. repeat to suspend/resume > 3. after resuming, waking up rt threads > > repeat 1-3 again and again, sometimes all rt threads will hang after > resuming due to wrong sched_clock will make sched_rt think rt_time is > much more than rt_runtime (default 950ms in 1s). then rt threads will > lost cpu timeslot to run since the 95% threshold is there. Re-visiting this in light of a related problem. I've run into a similar issue where IRQ threads are prevented from running during resume becase the RT throttling kicks because RT runtime is accumulated during suspend. Using the 'needs_suspend' version fixes this problem too. However, because of the RT throttling issue, it seems like *all* platforms should be using the 'needs_suspend' version always. But, as already pointed out, that makes the timed printk output during suspend/resume rather unhelpful. Having to choose between useful printk times during suspend/resume and functioning IRQ threads during suspend/resume isn't a choice I want to make. I'd rather have both. Any ideas? Kevin
diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c index 27d186a..46c7d32 100644 --- a/arch/arm/kernel/sched_clock.c +++ b/arch/arm/kernel/sched_clock.c @@ -21,6 +21,7 @@ struct clock_data { u32 epoch_cyc_copy; u32 mult; u32 shift; + bool suspended; }; static void sched_clock_poll(unsigned long wrap_ticks); @@ -49,6 +50,9 @@ static unsigned long long cyc_to_sched_clock(u32 cyc, u32 mask) u64 epoch_ns; u32 epoch_cyc; + if (cd.suspended) + return cd.epoch_ns; + /* * Load the epoch_cyc and epoch_ns atomically. We do this by * ensuring that we always write epoch_cyc, epoch_ns and @@ -169,11 +173,20 @@ void __init sched_clock_postinit(void) static int sched_clock_suspend(void) { sched_clock_poll(sched_clock_timer.data); + cd.suspended = true; return 0; } +static void sched_clock_resume(void) +{ + cd.epoch_cyc = read_sched_clock(); + cd.epoch_cyc_copy = cd.epoch_cyc; + cd.suspended = false; +} + static struct syscore_ops sched_clock_ops = { .suspend = sched_clock_suspend, + .resume = sched_clock_resume, }; static int __init sched_clock_syscore_init(void)
Many clocks that are used to provide sched_clock will reset during suspend. If read_sched_clock returns 0 after suspend, sched_clock will appear to jump forward. This patch resets cd.epoch_cyc to the current value of read_sched_clock during resume, which causes sched_clock() just after suspend to return the same value as sched_clock() just before suspend. In addition, during the window where epoch_ns has been updated before suspend, but epoch_cyc has not been updated after suspend, it is unknown whether the clock has reset or not, and sched_clock() could return a bogus value. Add a suspended flag, and return the pre-suspend epoch_ns value during this period. This will have a side effect of causing SoCs that have clocks that continue to count in suspend to appear to stop counting, reporting the same sched_clock() value before and after suspend. Signed-off-by: Colin Cross <ccross@android.com> --- arch/arm/kernel/sched_clock.c | 13 +++++++++++++ 1 files changed, 13 insertions(+), 0 deletions(-)