diff mbox

[2/3] clocksource: exynos_mct: cache mct upper count

Message ID 1401903034-20074-2-git-send-email-dianders@chromium.org (mailing list archive)
State New, archived
Headers show

Commit Message

Doug Anderson June 4, 2014, 5:30 p.m. UTC
From: Mandeep Singh Baines <msb@chromium.org>

Saves one register read.  Note that the upper count only changes every
~178 seconds with a 24MHz source clock, so it's likely it hasn't
changed from call to call.

Before: 1323852 us for 1000000 gettimeofday in userspace
After:  1173084 us for 1000000 gettimeofday in userspace

Note that even with this change the CPU is in exynos_frc_read() more
than 2% of the time in real world profiles of ChromeOS.  That
indicates that it's important to optimize.

Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
Signed-off-by: Doug Anderson <dianders@chromium.org>
---
 drivers/clocksource/exynos_mct.c | 4 ++--
 1 file changed, 2 insertions(+), 2 deletions(-)

Comments

Vincent Guittot June 5, 2014, 7:55 a.m. UTC | #1
On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
> From: Mandeep Singh Baines <msb@chromium.org>
>
> Saves one register read.  Note that the upper count only changes every
> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
> changed from call to call.

Hi Doug,

Have you checked that the time saved in using a 32bits counter instead
of a 64bits one is not lost in the handle of the wrap of sched_clock
which occurs every 178s instead of never ?

sched_clock_poll function will be called every 178s

Regards,
Vincent


>
> Before: 1323852 us for 1000000 gettimeofday in userspace
> After:  1173084 us for 1000000 gettimeofday in userspace
>
> Note that even with this change the CPU is in exynos_frc_read() more
> than 2% of the time in real world profiles of ChromeOS.  That
> indicates that it's important to optimize.
>
> Signed-off-by: Mandeep Singh Baines <msb@chromium.org>
> Signed-off-by: Andrew Bresticker <abrestic@chromium.org>
> Signed-off-by: Doug Anderson <dianders@chromium.org>
> ---
>  drivers/clocksource/exynos_mct.c | 4 ++--
>  1 file changed, 2 insertions(+), 2 deletions(-)
>
> diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
> index ba3a683..7cbe4aa 100644
> --- a/drivers/clocksource/exynos_mct.c
> +++ b/drivers/clocksource/exynos_mct.c
> @@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo)
>
>  static inline cycle_t notrace _exynos4_frc_read(void)
>  {
> -       unsigned int lo, hi;
> -       u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
> +       u32 lo, hi;
> +       static u32 hi2;
>
>         do {
>                 hi = hi2;
> --
> 2.0.0.526.g5318336
>
Doug Anderson June 5, 2014, 5:14 p.m. UTC | #2
Vincent,

On Thu, Jun 5, 2014 at 12:55 AM, Vincent Guittot
<vincent.guittot@linaro.org> wrote:
> On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote:
>> From: Mandeep Singh Baines <msb@chromium.org>
>>
>> Saves one register read.  Note that the upper count only changes every
>> ~178 seconds with a 24MHz source clock, so it's likely it hasn't
>> changed from call to call.
>
> Hi Doug,
>
> Have you checked that the time saved in using a 32bits counter instead
> of a 64bits one is not lost in the handle of the wrap of sched_clock
> which occurs every 178s instead of never ?
>
> sched_clock_poll function will be called every 178s

Probably should move this to the discussion in the other patch where
we're talking about moving to 32-bits?

...but in general I have no benchmarks at all that how much the
scheduler is affected by this change.  All of my benchmarks are based
on gettimeofday(), where the improvement is clear.  Do you have a
suggested way to check?

I think that it's pretty clear that we'll want to take either the
32-bit conversion or the 64-bit optimizations I sent out.  If you take
out gettimeofday() overhead, I believe that the actual read of the MCT
(exynos_frc_read) took about:
* ~800ns per call with the original 64-bit code
* ~500ns per call with the optimized 64-bit code (both patch 2 and 3)
* ~500ns per call with the 32-bit code

The 32-bit code is slightly faster but not significantly (like 530ns
vs 500ns).  ...but it is definitely simpler.

If we find that the schedule clock wrap overhead is real then that
would be a good reason to take the 64-bit version.
diff mbox

Patch

diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c
index ba3a683..7cbe4aa 100644
--- a/drivers/clocksource/exynos_mct.c
+++ b/drivers/clocksource/exynos_mct.c
@@ -167,8 +167,8 @@  static void exynos4_mct_frc_start(u32 hi, u32 lo)
 
 static inline cycle_t notrace _exynos4_frc_read(void)
 {
-	unsigned int lo, hi;
-	u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U);
+	u32 lo, hi;
+	static u32 hi2;
 
 	do {
 		hi = hi2;