diff mbox

arm: sched: stop sched_clock() during suspend

Message ID 1350906877-19410-1-git-send-email-balbi@ti.com (mailing list archive)
State New, archived
Headers show

Commit Message

Felipe Balbi Oct. 22, 2012, 11:54 a.m. UTC
The scheduler imposes a requirement to sched_clock()
which is to stop the clock during suspend, if we don't
do that IRQ threads will be rescheduled in the future
which might cause transfers to timeout depending on
how driver is written.

This became an issue on OMAP when we converted omap-i2c.c
to use threaded IRQs, it turned out that depending on how
much time we spent on suspend, the I2C IRQ thread would
end up being rescheduled so far in the future that I2C
transfers would timeout and, because omap_hsmmc depends
on an I2C-connected device to detect if an MMC card is
inserted in the slot, our rootfs would just vanish.

arch/arm/kernel/sched_clock.c already had an optional
implementation (sched_clock_needs_suspend()) which would
handle scheduler's requirement properly, what this patch
does is simply to make that implementation non-optional.

This has been tested with beagleboard XM (OMAP3630) and
pandaboard rev A3 (OMAP4430). Suspend to RAM is now working
after this patch.

Signed-off-by: Felipe Balbi <balbi@ti.com>
---
 arch/arm/include/asm/sched_clock.h |  2 --
 arch/arm/kernel/sched_clock.c      | 18 ++++--------------
 2 files changed, 4 insertions(+), 16 deletions(-)

Comments

Felipe Balbi Oct. 22, 2012, 12:07 p.m. UTC | #1
Hi,

On Mon, Oct 22, 2012 at 02:54:37PM +0300, Felipe Balbi wrote:
> The scheduler imposes a requirement to sched_clock()
> which is to stop the clock during suspend, if we don't
> do that IRQ threads will be rescheduled in the future
> which might cause transfers to timeout depending on
> how driver is written.
> 
> This became an issue on OMAP when we converted omap-i2c.c
> to use threaded IRQs, it turned out that depending on how
> much time we spent on suspend, the I2C IRQ thread would
> end up being rescheduled so far in the future that I2C
> transfers would timeout and, because omap_hsmmc depends
> on an I2C-connected device to detect if an MMC card is
> inserted in the slot, our rootfs would just vanish.
> 
> arch/arm/kernel/sched_clock.c already had an optional
> implementation (sched_clock_needs_suspend()) which would
> handle scheduler's requirement properly, what this patch
> does is simply to make that implementation non-optional.
> 
> This has been tested with beagleboard XM (OMAP3630) and
> pandaboard rev A3 (OMAP4430). Suspend to RAM is now working
> after this patch.
> 
> Signed-off-by: Felipe Balbi <balbi@ti.com>

just adding more guys to Cc. Please add more if I have missed someone.

> ---
>  arch/arm/include/asm/sched_clock.h |  2 --
>  arch/arm/kernel/sched_clock.c      | 18 ++++--------------
>  2 files changed, 4 insertions(+), 16 deletions(-)
> 
> diff --git a/arch/arm/include/asm/sched_clock.h b/arch/arm/include/asm/sched_clock.h
> index 05b8e82..e3f7572 100644
> --- a/arch/arm/include/asm/sched_clock.h
> +++ b/arch/arm/include/asm/sched_clock.h
> @@ -10,7 +10,5 @@
>  
>  extern void sched_clock_postinit(void);
>  extern void setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate);
> -extern void setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
> -		unsigned long rate);
>  
>  #endif
> diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
> index e21bac2..fc6692e 100644
> --- a/arch/arm/kernel/sched_clock.c
> +++ b/arch/arm/kernel/sched_clock.c
> @@ -107,13 +107,6 @@ static void sched_clock_poll(unsigned long wrap_ticks)
>  	update_sched_clock();
>  }
>  
> -void __init setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
> -		unsigned long rate)
> -{
> -	setup_sched_clock(read, bits, rate);
> -	cd.needs_suspend = true;
> -}
> -
>  void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate)
>  {
>  	unsigned long r, w;
> @@ -189,18 +182,15 @@ void __init sched_clock_postinit(void)
>  static int sched_clock_suspend(void)
>  {
>  	sched_clock_poll(sched_clock_timer.data);
> -	if (cd.needs_suspend)
> -		cd.suspended = true;
> +	cd.suspended = true;
>  	return 0;
>  }
>  
>  static void sched_clock_resume(void)
>  {
> -	if (cd.needs_suspend) {
> -		cd.epoch_cyc = read_sched_clock();
> -		cd.epoch_cyc_copy = cd.epoch_cyc;
> -		cd.suspended = false;
> -	}
> +	cd.epoch_cyc = read_sched_clock();
> +	cd.epoch_cyc_copy = cd.epoch_cyc;
> +	cd.suspended = false;
>  }
>  
>  static struct syscore_ops sched_clock_ops = {
> -- 
> 1.8.0.rc0
>
Kevin Hilman Oct. 22, 2012, 5:05 p.m. UTC | #2
+Colin Cross, Barry Song also

Felipe Balbi <balbi@ti.com> writes:

> The scheduler imposes a requirement to sched_clock()
> which is to stop the clock during suspend, if we don't
> do that IRQ threads will be rescheduled in the future
> which might cause transfers to timeout depending on
> how driver is written.

It's not just about IRQ threads, it's about RT throttling.  IOW, not
just IRQ threads will be postponed, but all RT tasks will be throttled
temporarily as well.

The changelog should also mention that this has an inconvenient side
effect of stopping the printk times during suspend.  Based on the
original thread where this feature was discussed and introduced, some
platforms wanted to opt out of this behavior[1], so the optional API was
added.

However, in light of RT throttling, this a correctness issue for process
accounting, so I agree that this should be done for all platforms
instead of providing an optional 'needs suspend' version of the API,
even though it means printk times no longer reflect time spent
suspended.

After a discussion with peterz on this topic, it seems that x86
already ensures that sched_clock stops during suspend for similar
reasons[2].

The question then is whether this is a fix that belongs in v3.7.
Technically, it is not a regression, so I think this should probably be
v3.8 material.  If that's the decision, then the threaded IRQ support
for the OMAP I2C driver needs to be reverted for v3.7 until this fix is merged.

Kevin

[1] http://marc.info/?l=linux-arm-kernel&m=134307004508708&w=2
[2] http://marc.info/?l=linux-arm-kernel&m=135065529907297&w=2
Linus Walleij Oct. 22, 2012, 10:28 p.m. UTC | #3
On Mon, Oct 22, 2012 at 7:05 PM, Kevin Hilman
<khilman@deeprootsystems.com> wrote:

> However, in light of RT throttling, this a correctness issue for process
> accounting, so I agree that this should be done for all platforms
> instead of providing an optional 'needs suspend' version of the API,
> even though it means printk times no longer reflect time spent
> suspended.

Maybe we should get printk() to use the best clocksource
instead.

The reason AFAICT that printk() is using sched_clock() is that
it's supposed to be fast. But now it seems that it's not going
to return what printk() needs anymore.

(Dragging Stultz & Gleixner into this...)

Yours,
Linus Walleij
Russell King - ARM Linux Oct. 23, 2012, 9:22 a.m. UTC | #4
On Tue, Oct 23, 2012 at 12:28:32AM +0200, Linus Walleij wrote:
> On Mon, Oct 22, 2012 at 7:05 PM, Kevin Hilman
> <khilman@deeprootsystems.com> wrote:
> 
> > However, in light of RT throttling, this a correctness issue for process
> > accounting, so I agree that this should be done for all platforms
> > instead of providing an optional 'needs suspend' version of the API,
> > even though it means printk times no longer reflect time spent
> > suspended.
> 
> Maybe we should get printk() to use the best clocksource
> instead.
> 
> The reason AFAICT that printk() is using sched_clock() is that
> it's supposed to be fast. But now it seems that it's not going
> to return what printk() needs anymore.

No, printk() does not need this.  You think it does, but it doesn't.  What
we have is a difference between ARM and x86, and this difference is breaking
the scheduler.

The fact that the printk timestamp increments while suspended is a bug.  It
doesn't on x86.

There's three other functions in the kernel which do return updated time.
I'm sure one of those can be used to printk() the time passed in suspend
at resume time, which would give the information required.
Linus Walleij Oct. 23, 2012, 10:11 a.m. UTC | #5
On Mon, Oct 22, 2012 at 1:54 PM, Felipe Balbi <balbi@ti.com> wrote:

> The scheduler imposes a requirement to sched_clock()
> which is to stop the clock during suspend, if we don't
> do that IRQ threads will be rescheduled in the future
> which might cause transfers to timeout depending on
> how driver is written.
>
> This became an issue on OMAP when we converted omap-i2c.c
> to use threaded IRQs, it turned out that depending on how
> much time we spent on suspend, the I2C IRQ thread would
> end up being rescheduled so far in the future that I2C
> transfers would timeout and, because omap_hsmmc depends
> on an I2C-connected device to detect if an MMC card is
> inserted in the slot, our rootfs would just vanish.
>
> arch/arm/kernel/sched_clock.c already had an optional
> implementation (sched_clock_needs_suspend()) which would
> handle scheduler's requirement properly, what this patch
> does is simply to make that implementation non-optional.
>
> This has been tested with beagleboard XM (OMAP3630) and
> pandaboard rev A3 (OMAP4430). Suspend to RAM is now working
> after this patch.
>
> Signed-off-by: Felipe Balbi <balbi@ti.com>

After Russell explains so I get it:
Acked-by: Linus Walleij <linus.walleij@linaro.org>

Yours,
Linus Walleij
Kevin Hilman Oct. 23, 2012, 2:17 p.m. UTC | #6
Russell King - ARM Linux <linux@arm.linux.org.uk> writes:

> On Tue, Oct 23, 2012 at 12:28:32AM +0200, Linus Walleij wrote:
>> On Mon, Oct 22, 2012 at 7:05 PM, Kevin Hilman
>> <khilman@deeprootsystems.com> wrote:
>> 
>> > However, in light of RT throttling, this a correctness issue for process
>> > accounting, so I agree that this should be done for all platforms
>> > instead of providing an optional 'needs suspend' version of the API,
>> > even though it means printk times no longer reflect time spent
>> > suspended.
>> 
>> Maybe we should get printk() to use the best clocksource
>> instead.
>> 
>> The reason AFAICT that printk() is using sched_clock() is that
>> it's supposed to be fast. But now it seems that it's not going
>> to return what printk() needs anymore.
>
> No, printk() does not need this.  You think it does, but it doesn't.  What
> we have is a difference between ARM and x86, and this difference is breaking
> the scheduler.
>
> The fact that the printk timestamp increments while suspended is a bug.  It
> doesn't on x86.

Russell, I agree that it's a bug, but does it qualify as a something
you're willing to take for v3.7-rc?

For OMAP, we need to know if this will go for v3.7 or not because
there's a regression in the OMAP I2C driver, and if this doesn't go in,
we'll need to revert something in the I2C driver until it does.

Thanks,

Kevin
Russell King - ARM Linux Oct. 23, 2012, 4:03 p.m. UTC | #7
On Tue, Oct 23, 2012 at 07:17:33AM -0700, Kevin Hilman wrote:
> Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> > No, printk() does not need this.  You think it does, but it doesn't.  What
> > we have is a difference between ARM and x86, and this difference is breaking
> > the scheduler.
> >
> > The fact that the printk timestamp increments while suspended is a bug.  It
> > doesn't on x86.
> 
> Russell, I agree that it's a bug, but does it qualify as a something
> you're willing to take for v3.7-rc?

Definitely.  Our current behaviour across suspend for the scheduler is
wrong.  This is one of the questions I had when I created the sched_clock
stuff - but no one at the time could answer.  So, now that we have our
answer, let's get it fixed to conform.
Felipe Balbi Oct. 23, 2012, 5:24 p.m. UTC | #8
Hi,

On Tue, Oct 23, 2012 at 05:03:31PM +0100, Russell King - ARM Linux wrote:
> On Tue, Oct 23, 2012 at 07:17:33AM -0700, Kevin Hilman wrote:
> > Russell King - ARM Linux <linux@arm.linux.org.uk> writes:
> > > No, printk() does not need this.  You think it does, but it doesn't.  What
> > > we have is a difference between ARM and x86, and this difference is breaking
> > > the scheduler.
> > >
> > > The fact that the printk timestamp increments while suspended is a bug.  It
> > > doesn't on x86.
> > 
> > Russell, I agree that it's a bug, but does it qualify as a something
> > you're willing to take for v3.7-rc?
> 
> Definitely.  Our current behaviour across suspend for the scheduler is
> wrong.  This is one of the questions I had when I created the sched_clock
> stuff - but no one at the time could answer.  So, now that we have our
> answer, let's get it fixed to conform.

I have just pushed the patch the your patch system, hope I did it all
right.
diff mbox

Patch

diff --git a/arch/arm/include/asm/sched_clock.h b/arch/arm/include/asm/sched_clock.h
index 05b8e82..e3f7572 100644
--- a/arch/arm/include/asm/sched_clock.h
+++ b/arch/arm/include/asm/sched_clock.h
@@ -10,7 +10,5 @@ 
 
 extern void sched_clock_postinit(void);
 extern void setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate);
-extern void setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
-		unsigned long rate);
 
 #endif
diff --git a/arch/arm/kernel/sched_clock.c b/arch/arm/kernel/sched_clock.c
index e21bac2..fc6692e 100644
--- a/arch/arm/kernel/sched_clock.c
+++ b/arch/arm/kernel/sched_clock.c
@@ -107,13 +107,6 @@  static void sched_clock_poll(unsigned long wrap_ticks)
 	update_sched_clock();
 }
 
-void __init setup_sched_clock_needs_suspend(u32 (*read)(void), int bits,
-		unsigned long rate)
-{
-	setup_sched_clock(read, bits, rate);
-	cd.needs_suspend = true;
-}
-
 void __init setup_sched_clock(u32 (*read)(void), int bits, unsigned long rate)
 {
 	unsigned long r, w;
@@ -189,18 +182,15 @@  void __init sched_clock_postinit(void)
 static int sched_clock_suspend(void)
 {
 	sched_clock_poll(sched_clock_timer.data);
-	if (cd.needs_suspend)
-		cd.suspended = true;
+	cd.suspended = true;
 	return 0;
 }
 
 static void sched_clock_resume(void)
 {
-	if (cd.needs_suspend) {
-		cd.epoch_cyc = read_sched_clock();
-		cd.epoch_cyc_copy = cd.epoch_cyc;
-		cd.suspended = false;
-	}
+	cd.epoch_cyc = read_sched_clock();
+	cd.epoch_cyc_copy = cd.epoch_cyc;
+	cd.suspended = false;
 }
 
 static struct syscore_ops sched_clock_ops = {