Message ID | 20110602094622.GS3660@n2100.arm.linux.org.uk (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: > Why don't we just find a way of fixing sched_clock so that the value > doesn't reset over a suspend/resume cycle? Even if the value isn't reset during suspend/resume we want the clocksource to keep counting. Or is it ok to have the clocksource stop or freeze during suspend? Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle sleep states? My view is that sched_clock and the clocksource read should return the time since boot and keep counting during sleep and suspend. (In this case we use the same timer for both sched_clk and clocksource.) > IOW, lets fix the problem > for_everyone_ rather than only fixing it for one platform at a time. I agree that fixing it for everyone would be better and I can give it a try if I get pointed in the right direction.
On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: > On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: >> Why don't we just find a way of fixing sched_clock so that the value >> doesn't reset over a suspend/resume cycle? > Even if the value isn't reset during suspend/resume we want the > clocksource to keep counting. Or is it ok to have the clocksource stop > or freeze during suspend? kernel/time/timekeeping.c:timekeeping_suspend(): timekeeping_forward_now(); which does: cycle_now = clock->read(clock); cycle_delta = (cycle_now - clock->cycle_last) & clock->mask; clock->cycle_last = cycle_now; So that updates the time with the current offset between cycle_last and the current value. kernel/time/timekeeping.c:timekeeping_resume(): /* re-base the last cycle value */ timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); So this re-sets cycle_last to the current value of the clocksource. This means that on resume, the clocksource can start counting from any value it likes. So, without any additional external inputs, time resumes incrementing at the point where the suspend occurred without any jump backwards or forwards. The code accounts for the sleep time by using read_persistent_clock() read a timer which continues running during sleep to calculate the delta between suspend and resume, and injects the delta between them to wind the time forward. > Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle > sleep states? During _idle_ (iow, no task running) sched_clock and the clocksource should both continue to run - the scheduler needs to know how long the system has been idle for, and the clocksource can't stop because we'll lose track of time. Remember that the clockevent stuff is used as a trigger to the timekeeping code to read the clocksource, and update the current time. Time is moved forward by the delta between a previous clocksource read and the current clocksource read. So stopping or resetting the clocksource in unexpected ways (other than over suspend/resume as mentioned above) will result in time going weird.
On 06/02/2011 01:01 PM, Russell King - ARM Linux wrote: > On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: >> On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: >>> Why don't we just find a way of fixing sched_clock so that the value >>> doesn't reset over a suspend/resume cycle? >> Even if the value isn't reset during suspend/resume we want the >> clocksource to keep counting. Or is it ok to have the clocksource stop >> or freeze during suspend? > > kernel/time/timekeeping.c:timekeeping_suspend(): > > timekeeping_forward_now(); > > which does: > cycle_now = clock->read(clock); > cycle_delta = (cycle_now - clock->cycle_last)& clock->mask; > clock->cycle_last = cycle_now; > > So that updates the time with the current offset between cycle_last and > the current value. > > kernel/time/timekeeping.c:timekeeping_resume(): > /* re-base the last cycle value */ > timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); > > So this re-sets cycle_last to the current value of the clocksource. This > means that on resume, the clocksource can start counting from any value it > likes. > > So, without any additional external inputs, time resumes incrementing at > the point where the suspend occurred without any jump backwards or forwards. > > The code accounts for the sleep time by using read_persistent_clock() read > a timer which continues running during sleep to calculate the delta between > suspend and resume, and injects the delta between them to wind the time > forward. > >> Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle >> sleep states? > > During _idle_ (iow, no task running) sched_clock and the clocksource > should both continue to run - the scheduler needs to know how long the > system has been idle for, and the clocksource can't stop because we'll > lose track of time. > > Remember that the clockevent stuff is used as a trigger to the timekeeping > code to read the clocksource, and update the current time. Time is moved > forward by the delta between a previous clocksource read and the current > clocksource read. So stopping or resetting the clocksource in unexpected > ways (other than over suspend/resume as mentioned above) will result in > time going weird. Hmm, I have missed the existence of the read_persistent_clock(). It sounds like I should keep the MTU as my clocksource / sched_clock and have the PRCMU Timer as a persistent_clock instead. Then one problem remains. The MTU will be powered during cstates: running, wfi, ApIdle (arm retenetion). The MTU will loose power during cstates ApSleep and ApDeepSleep. So I need to do a similar sync as suspend does against the persistent_clock but when leaving and enter the deeper cstates. Should I solve it in the clocksource framework with a flag telling which cstates the timer will stop/freeze and then inject time from the persistent_clock for those cstates? (I am thinking something like the CLOCK_EVT_FEAT_C3STOP flag) Am I on the wrong track here or how should I solve it?
+ John, On 6/2/2011 5:40 PM, Mattias Wallin wrote: > On 06/02/2011 01:01 PM, Russell King - ARM Linux wrote: >> On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: >>> On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: >>>> Why don't we just find a way of fixing sched_clock so that the value >>>> doesn't reset over a suspend/resume cycle? >>> Even if the value isn't reset during suspend/resume we want the >>> clocksource to keep counting. Or is it ok to have the clocksource stop >>> or freeze during suspend? >> >> kernel/time/timekeeping.c:timekeeping_suspend(): >> >> timekeeping_forward_now(); >> >> which does: >> cycle_now = clock->read(clock); >> cycle_delta = (cycle_now - clock->cycle_last)& clock->mask; >> clock->cycle_last = cycle_now; >> >> So that updates the time with the current offset between cycle_last and >> the current value. >> >> kernel/time/timekeeping.c:timekeeping_resume(): >> /* re-base the last cycle value */ >> timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); >> >> So this re-sets cycle_last to the current value of the clocksource. This >> means that on resume, the clocksource can start counting from any >> value it >> likes. >> >> So, without any additional external inputs, time resumes incrementing at >> the point where the suspend occurred without any jump backwards or >> forwards. >> >> The code accounts for the sleep time by using read_persistent_clock() >> read >> a timer which continues running during sleep to calculate the delta >> between >> suspend and resume, and injects the delta between them to wind the time >> forward. >> >>> Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle >>> sleep states? >> >> During _idle_ (iow, no task running) sched_clock and the clocksource >> should both continue to run - the scheduler needs to know how long the >> system has been idle for, and the clocksource can't stop because we'll >> lose track of time. >> >> Remember that the clockevent stuff is used as a trigger to the >> timekeeping >> code to read the clocksource, and update the current time. Time is moved >> forward by the delta between a previous clocksource read and the current >> clocksource read. So stopping or resetting the clocksource in unexpected >> ways (other than over suspend/resume as mentioned above) will result in >> time going weird. > > Hmm, I have missed the existence of the read_persistent_clock(). It > sounds like I should keep the MTU as my clocksource / sched_clock and > have the PRCMU Timer as a persistent_clock instead. > > Then one problem remains. The MTU will be powered during cstates: > running, wfi, ApIdle (arm retenetion). The MTU will loose power during > cstates ApSleep and ApDeepSleep. So I need to do a similar sync as > suspend does against the persistent_clock but when leaving and enter the > deeper cstates. > > Should I solve it in the clocksource framework with a flag telling which > cstates the timer will stop/freeze and then inject time from the > persistent_clock for those cstates? (I am thinking something like the > CLOCK_EVT_FEAT_C3STOP flag) > > Am I on the wrong track here or how should I solve it? > IIUC, what you are trying here is to use high-precision clock-source but since it doesn't work in low power modes, you want it to supplement with always running low resolution timer. Now just making the persistent_clock() read from low-resolution timer is not going to help. Because there is no reference available for the kernel on whatever counting is done by the low-resolution timer. In other words, it has to be a registered clock-source first. Earlier this year at ELC SFO, I had a discussion with John and Thomas on how to have a high-resolution clock-source and a low-resolution clock-source working together to cover the low power scenario and still manage to get the highest timer resolution. The idea was to do dynamic switching of clock-source which initially looked simple. Here the idea was to have this working for suspend and as well as cupidle. John mentioned that because of frequent clock-source switching, will affect the NTP correction badly to an extent that NTP correction may not work. Here is what John suggested to do but I got busy with other stuff and this one got pushed out from my todo. -------------------- John wrote ... A different approach would be to create a meta-clocksource, that utilizes the two underlying clocks to create a what looks like a unified counter. Basically you use the slow-always running counter as your long-term freq adjusted clock, but supplement its low granularity with the highres halting clock. This works best if both counters are driven by the same crystal, so there won't be much drift between them. ---------------------- This approach should solve most of the issues and get the functionality what you are looking for. If you like, you can work on this scheme. Regards Santosh
On Thu, Jun 02, 2011 at 06:27:22PM +0530, Santosh Shilimkar wrote: > Earlier this year at ELC SFO, I had a discussion with > John and Thomas on how to have a high-resolution clock-source > and a low-resolution clock-source working together to cover > the low power scenario and still manage to get the highest > timer resolution. > The idea was to do dynamic switching of clock-source > which initially looked simple. Here the idea was to > have this working for suspend and as well as cupidle. I don't think you can do this, because you'll lose precision whenever you switch from the high resolution clocksource to the low resolution clocksource. While you can quickly update the current time of day from the highres one, the lowres may or may not be a long way from its next tick rollover. So you lose the precision whenever you switch. However, over a suspend/resume cycle, the precision loss is normally very small compared to the time which you have been suspended, so the %age error also becomes very small. With cpuidle, it becomes a completely different matter. Here, the %age error is much larger because of the smaller sleep periods, and chances are we can't wait for the low-res timer to change. So if you're using cpuidle, you really need a clocksource which runs continuously, even in whatever states cpuidle drops you into.
On 6/2/2011 6:34 PM, Russell King - ARM Linux wrote: > On Thu, Jun 02, 2011 at 06:27:22PM +0530, Santosh Shilimkar wrote: >> Earlier this year at ELC SFO, I had a discussion with >> John and Thomas on how to have a high-resolution clock-source >> and a low-resolution clock-source working together to cover >> the low power scenario and still manage to get the highest >> timer resolution. >> The idea was to do dynamic switching of clock-source >> which initially looked simple. Here the idea was to >> have this working for suspend and as well as cupidle. > > I don't think you can do this, because you'll lose precision whenever you > switch from the high resolution clocksource to the low resolution > clocksource. > > While you can quickly update the current time of day from the highres one, > the lowres may or may not be a long way from its next tick rollover. So > you lose the precision whenever you switch. > > However, over a suspend/resume cycle, the precision loss is normally very > small compared to the time which you have been suspended, so the %age > error also becomes very small. > > With cpuidle, it becomes a completely different matter. Here, the %age > error is much larger because of the smaller sleep periods, and chances > are we can't wait for the low-res timer to change. > > So if you're using cpuidle, you really need a clocksource which runs > continuously, even in whatever states cpuidle drops you into. According to John, that's what the meta-clock source will for. It will be continuous and will make use of underneath high res, low res clock-sources based on the availability. I let John comment on this on details but I think any other method would have shortcoming. Regards Santosh
On Thu, 2011-06-02 at 18:27 +0530, Santosh Shilimkar wrote: > On 6/2/2011 5:40 PM, Mattias Wallin wrote: > > On 06/02/2011 01:01 PM, Russell King - ARM Linux wrote: > >> On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: > >>> On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: > >>>> Why don't we just find a way of fixing sched_clock so that the value > >>>> doesn't reset over a suspend/resume cycle? > >>> Even if the value isn't reset during suspend/resume we want the > >>> clocksource to keep counting. Or is it ok to have the clocksource stop > >>> or freeze during suspend? > >> > >> kernel/time/timekeeping.c:timekeeping_suspend(): > >> > >> timekeeping_forward_now(); > >> > >> which does: > >> cycle_now = clock->read(clock); > >> cycle_delta = (cycle_now - clock->cycle_last)& clock->mask; > >> clock->cycle_last = cycle_now; > >> > >> So that updates the time with the current offset between cycle_last and > >> the current value. > >> > >> kernel/time/timekeeping.c:timekeeping_resume(): > >> /* re-base the last cycle value */ > >> timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); > >> > >> So this re-sets cycle_last to the current value of the clocksource. This > >> means that on resume, the clocksource can start counting from any > >> value it > >> likes. > >> > >> So, without any additional external inputs, time resumes incrementing at > >> the point where the suspend occurred without any jump backwards or > >> forwards. > >> > >> The code accounts for the sleep time by using read_persistent_clock() > >> read > >> a timer which continues running during sleep to calculate the delta > >> between > >> suspend and resume, and injects the delta between them to wind the time > >> forward. > >> > >>> Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle > >>> sleep states? > >> > >> During _idle_ (iow, no task running) sched_clock and the clocksource > >> should both continue to run - the scheduler needs to know how long the > >> system has been idle for, and the clocksource can't stop because we'll > >> lose track of time. > >> > >> Remember that the clockevent stuff is used as a trigger to the > >> timekeeping > >> code to read the clocksource, and update the current time. Time is moved > >> forward by the delta between a previous clocksource read and the current > >> clocksource read. So stopping or resetting the clocksource in unexpected > >> ways (other than over suspend/resume as mentioned above) will result in > >> time going weird. > > > > Hmm, I have missed the existence of the read_persistent_clock(). It > > sounds like I should keep the MTU as my clocksource / sched_clock and > > have the PRCMU Timer as a persistent_clock instead. > > > > Then one problem remains. The MTU will be powered during cstates: > > running, wfi, ApIdle (arm retenetion). The MTU will loose power during > > cstates ApSleep and ApDeepSleep. So I need to do a similar sync as > > suspend does against the persistent_clock but when leaving and enter the > > deeper cstates. > > > > Should I solve it in the clocksource framework with a flag telling which > > cstates the timer will stop/freeze and then inject time from the > > persistent_clock for those cstates? (I am thinking something like the > > CLOCK_EVT_FEAT_C3STOP flag) > > > > Am I on the wrong track here or how should I solve it? > > [snip] > John mentioned that because of frequent clock-source > switching, will affect the NTP correction badly to an > extent that NTP correction may not work. > > Here is what John suggested to do but I got busy with > other stuff and this one got pushed out from my todo. > > -------------------- > John wrote ... > A different approach would be to create a meta-clocksource, that > utilizes the two underlying clocks to create a what looks like a unified > counter. > > Basically you use the slow-always running counter as your long-term freq > adjusted clock, but supplement its low granularity with the highres > halting clock. > > This works best if both counters are driven by the same crystal, so > there won't be much drift between them. > ---------------------- Yea. The basic idea is to interpolate between two counters to produce a a continuous clocksource for the timekeeping core. As Russell pointed out, error is injected to the timekeeping code every time you switch between clocksources or the persistent clock. Doing this very frequently will provide very poor time. So by using an interpolated clocksource, we would be able to maintain clock adjustments via ntp and provide accurate long term time, while still having fine-grained resolution (along with some short term error). However, its all easier said then done. There are lots of gotchas and corner-cases with interpolation. That's the whole reason we moved from tick based timekeeping to continuous clocksource based timekeeping. I believe the KVM folks have used similar interpolation method for their kvm clocksource, and I know they had numerous issues. But it seems to work well enough, so I would take a peek at their code to start. thanks -john
On 06/02/2011 08:47 PM, john stultz wrote: > On Thu, 2011-06-02 at 18:27 +0530, Santosh Shilimkar wrote: >> On 6/2/2011 5:40 PM, Mattias Wallin wrote: >>> On 06/02/2011 01:01 PM, Russell King - ARM Linux wrote: >>>> On Thu, Jun 02, 2011 at 12:18:35PM +0200, Mattias Wallin wrote: >>>>> On 06/02/2011 11:46 AM, Russell King - ARM Linux wrote: >>>>>> Why don't we just find a way of fixing sched_clock so that the value >>>>>> doesn't reset over a suspend/resume cycle? >>>>> Even if the value isn't reset during suspend/resume we want the >>>>> clocksource to keep counting. Or is it ok to have the clocksource stop >>>>> or freeze during suspend? >>>> >>>> kernel/time/timekeeping.c:timekeeping_suspend(): >>>> >>>> timekeeping_forward_now(); >>>> >>>> which does: >>>> cycle_now = clock->read(clock); >>>> cycle_delta = (cycle_now - clock->cycle_last)& clock->mask; >>>> clock->cycle_last = cycle_now; >>>> >>>> So that updates the time with the current offset between cycle_last and >>>> the current value. >>>> >>>> kernel/time/timekeeping.c:timekeeping_resume(): >>>> /* re-base the last cycle value */ >>>> timekeeper.clock->cycle_last = timekeeper.clock->read(timekeeper.clock); >>>> >>>> So this re-sets cycle_last to the current value of the clocksource. This >>>> means that on resume, the clocksource can start counting from any >>>> value it >>>> likes. >>>> >>>> So, without any additional external inputs, time resumes incrementing at >>>> the point where the suspend occurred without any jump backwards or >>>> forwards. >>>> >>>> The code accounts for the sleep time by using read_persistent_clock() >>>> read >>>> a timer which continues running during sleep to calculate the delta >>>> between >>>> suspend and resume, and injects the delta between them to wind the time >>>> forward. >>>> >>>>> Then we have cpuidle. Is it ok to stop/freeze the timer during cpuidle >>>>> sleep states? >>>> >>>> During _idle_ (iow, no task running) sched_clock and the clocksource >>>> should both continue to run - the scheduler needs to know how long the >>>> system has been idle for, and the clocksource can't stop because we'll >>>> lose track of time. >>>> >>>> Remember that the clockevent stuff is used as a trigger to the >>>> timekeeping >>>> code to read the clocksource, and update the current time. Time is moved >>>> forward by the delta between a previous clocksource read and the current >>>> clocksource read. So stopping or resetting the clocksource in unexpected >>>> ways (other than over suspend/resume as mentioned above) will result in >>>> time going weird. >>> >>> Hmm, I have missed the existence of the read_persistent_clock(). It >>> sounds like I should keep the MTU as my clocksource / sched_clock and >>> have the PRCMU Timer as a persistent_clock instead. >>> >>> Then one problem remains. The MTU will be powered during cstates: >>> running, wfi, ApIdle (arm retenetion). The MTU will loose power during >>> cstates ApSleep and ApDeepSleep. So I need to do a similar sync as >>> suspend does against the persistent_clock but when leaving and enter the >>> deeper cstates. >>> >>> Should I solve it in the clocksource framework with a flag telling which >>> cstates the timer will stop/freeze and then inject time from the >>> persistent_clock for those cstates? (I am thinking something like the >>> CLOCK_EVT_FEAT_C3STOP flag) >>> >>> Am I on the wrong track here or how should I solve it? >>> > [snip] >> John mentioned that because of frequent clock-source >> switching, will affect the NTP correction badly to an >> extent that NTP correction may not work. >> >> Here is what John suggested to do but I got busy with >> other stuff and this one got pushed out from my todo. >> >> -------------------- >> John wrote ... >> A different approach would be to create a meta-clocksource, that >> utilizes the two underlying clocks to create a what looks like a unified >> counter. >> >> Basically you use the slow-always running counter as your long-term freq >> adjusted clock, but supplement its low granularity with the highres >> halting clock. >> >> This works best if both counters are driven by the same crystal, so >> there won't be much drift between them. >> ---------------------- > > Yea. The basic idea is to interpolate between two counters to produce a > a continuous clocksource for the timekeeping core. > > As Russell pointed out, error is injected to the timekeeping code every > time you switch between clocksources or the persistent clock. Doing this > very frequently will provide very poor time. > > So by using an interpolated clocksource, we would be able to maintain > clock adjustments via ntp and provide accurate long term time, while > still having fine-grained resolution (along with some short term error). > > However, its all easier said then done. There are lots of gotchas and > corner-cases with interpolation. That's the whole reason we moved from > tick based timekeeping to continuous clocksource based timekeeping. I > believe the KVM folks have used similar interpolation method for their > kvm clocksource, and I know they had numerous issues. But it seems to > work well enough, so I would take a peek at their code to start. Russell, What is your gut feeling about above interpolated clocksource strategy suggestion? While we (I) look at an alternative clocksource strategy and experiment are you ok with the three patches I sent? I thought about the persistant_clock that I did not know about but that will only help me in suspend so I conclude that I still need the a clocksource that counts in all low power modes. I would like to have something merged in case the alternate strategy fails. Something that work in the current kernel. The warnings above hints that it will be difficult and can take a long time to find a solution where both my timers can work together. It also easier to discuss framework changes with improvements of in-kernel drivers. Thanks, /Mattias Wallin
On Wed, Jun 08, 2011 at 03:44:01PM +0200, Mattias Wallin wrote: > On 06/02/2011 08:47 PM, john stultz wrote: >> Yea. The basic idea is to interpolate between two counters to produce a >> a continuous clocksource for the timekeeping core. >> >> As Russell pointed out, error is injected to the timekeeping code every >> time you switch between clocksources or the persistent clock. Doing this >> very frequently will provide very poor time. >> >> So by using an interpolated clocksource, we would be able to maintain >> clock adjustments via ntp and provide accurate long term time, while >> still having fine-grained resolution (along with some short term error). >> >> However, its all easier said then done. There are lots of gotchas and >> corner-cases with interpolation. That's the whole reason we moved from >> tick based timekeeping to continuous clocksource based timekeeping. I >> believe the KVM folks have used similar interpolation method for their >> kvm clocksource, and I know they had numerous issues. But it seems to >> work well enough, so I would take a peek at their code to start. > > Russell, > What is your gut feeling about above interpolated clocksource strategy > suggestion? I think you're not going to maintain precision even with the above idea. There's two problems: 1. When you switch from the high resolution to the low resolution, you need to know how many high-res count increments have elapsed since the last low-res count increment occurred. From this you know the offset between current time and the last low-res count increment and so can correct for this change. 2. When you switch from the low-res to high-res, because the high-res has restarted, you have lost track of the high-res counter. The only point when things are re-synchronisable is at the point when the low-res counter increments. So you either: a) have to wait for the low-res to increment, b) or you use the high-res counter to time until the low-res increments and then factor in some correction factor. Either way looks like it will be fraught with problems - particularly the resulting increase in latency caused by this when switching between states, or the period of loss of precision caused by (2b) which may be more disruptive to NTP. > While we (I) look at an alternative clocksource strategy and experiment > are you ok with the three patches I sent? No - I'd like you to check whether the patch I sent fixes your sched_clock problem across suspend/resume so that we can get that sorted for everyone, instead of coming up with platform specific fixes. Then we can work out how to hook that into cpuidle so that it works properly there. We're no longer going to go down the route of platforms working around problems with common infrastructure in their own individual private ways. That madness has to stop right now.
On 06/09/2011 11:59 PM, Russell King - ARM Linux wrote: > No - I'd like you to check whether the patch I sent fixes your > sched_clock problem across suspend/resume so that we can get that sorted > for everyone, instead of coming up with platform specific fixes. Then > we can work out how to hook that into cpuidle so that it works properly > there. Ahh, I see your plan. I was focused on the cpuidle problem and your patch only addressed suspend/resume. Anyway, now that I understand more what you are after I'll have a look at it and come back to you.
On 06/10/2011 10:54 AM, Mattias Wallin wrote: > On 06/09/2011 11:59 PM, Russell King - ARM Linux wrote: >> No - I'd like you to check whether the patch I sent fixes your >> sched_clock problem across suspend/resume so that we can get that sorted >> for everyone, instead of coming up with platform specific fixes. Then >> we can work out how to hook that into cpuidle so that it works properly >> there. > > Ahh, I see your plan. I was focused on the cpuidle problem and your > patch only addressed suspend/resume. > Anyway, now that I understand more what you are after I'll have a look > at it and come back to you. Unfortunately you will not get an answer if your patch works or not now. Our u8500 mainline code is missing suspend support and I have been trying to get suspend for u8500 to work without success. Since I will be away for two weeks I will try to get the suspend guys to work on it meanwhile. Kind Regards, Mattias Wallin
On Thu, Jun 02, 2011 at 10:46:22AM +0100, Russell King - ARM Linux wrote: > On Thu, Jun 02, 2011 at 11:34:31AM +0200, Mattias Wallin wrote: > > The Multi Timer Unit (MTU) is currently used as clocksource and sched_clk > > for the u8500 machine. The MTU block loose power during cpuidle sleep states > > so an alternate clocksource is needed and these patches adds the db8500 PRCMU > > timer. > > Why don't we just find a way of fixing sched_clock so that the value > doesn't reset over a suspend/resume cycle? IOW, lets fix the problem > for _everyone_ rather than only fixing it for one platform at a time. > > Could you try this patch to check whether sched_clock() behaves better > across a suspend/resume cycle please? So do I merge this patch?
diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c index 9a46370..4be4019 100644 --- a/arch/arm/kernel/sched_clock.c +++ b/arch/arm/kernel/sched_clock.c @@ -10,6 +10,7 @@ #include <linux/jiffies.h> #include <linux/kernel.h> #include <linux/sched.h> +#include <linux/syscore_ops.h> #include <linux/timer.h> #include <asm/sched_clock.h> @@ -72,3 +73,20 @@ void __init sched_clock_postinit(void) { sched_clock_poll(sched_clock_timer.data); } + +static int sched_clock_suspend(void) +{ + sched_clock_poll(sched_clock_timer.data); + return 0; +} + +static struct syscore_ops sched_clock_ops = { + .suspend = sched_clock_suspend, +}; + +static int __init sched_clock_syscore_init(void) +{ + register_syscore_ops(&sched_clock_ops); + return 0; +} +device_initcall(sched_clock_syscore_init);