Message ID | 1401903034-20074-2-git-send-email-dianders@chromium.org (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote: > From: Mandeep Singh Baines <msb@chromium.org> > > Saves one register read. Note that the upper count only changes every > ~178 seconds with a 24MHz source clock, so it's likely it hasn't > changed from call to call. Hi Doug, Have you checked that the time saved in using a 32bits counter instead of a 64bits one is not lost in the handle of the wrap of sched_clock which occurs every 178s instead of never ? sched_clock_poll function will be called every 178s Regards, Vincent > > Before: 1323852 us for 1000000 gettimeofday in userspace > After: 1173084 us for 1000000 gettimeofday in userspace > > Note that even with this change the CPU is in exynos_frc_read() more > than 2% of the time in real world profiles of ChromeOS. That > indicates that it's important to optimize. > > Signed-off-by: Mandeep Singh Baines <msb@chromium.org> > Signed-off-by: Andrew Bresticker <abrestic@chromium.org> > Signed-off-by: Doug Anderson <dianders@chromium.org> > --- > drivers/clocksource/exynos_mct.c | 4 ++-- > 1 file changed, 2 insertions(+), 2 deletions(-) > > diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c > index ba3a683..7cbe4aa 100644 > --- a/drivers/clocksource/exynos_mct.c > +++ b/drivers/clocksource/exynos_mct.c > @@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo) > > static inline cycle_t notrace _exynos4_frc_read(void) > { > - unsigned int lo, hi; > - u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U); > + u32 lo, hi; > + static u32 hi2; > > do { > hi = hi2; > -- > 2.0.0.526.g5318336 >
Vincent, On Thu, Jun 5, 2014 at 12:55 AM, Vincent Guittot <vincent.guittot@linaro.org> wrote: > On 4 June 2014 19:30, Doug Anderson <dianders@chromium.org> wrote: >> From: Mandeep Singh Baines <msb@chromium.org> >> >> Saves one register read. Note that the upper count only changes every >> ~178 seconds with a 24MHz source clock, so it's likely it hasn't >> changed from call to call. > > Hi Doug, > > Have you checked that the time saved in using a 32bits counter instead > of a 64bits one is not lost in the handle of the wrap of sched_clock > which occurs every 178s instead of never ? > > sched_clock_poll function will be called every 178s Probably should move this to the discussion in the other patch where we're talking about moving to 32-bits? ...but in general I have no benchmarks at all that how much the scheduler is affected by this change. All of my benchmarks are based on gettimeofday(), where the improvement is clear. Do you have a suggested way to check? I think that it's pretty clear that we'll want to take either the 32-bit conversion or the 64-bit optimizations I sent out. If you take out gettimeofday() overhead, I believe that the actual read of the MCT (exynos_frc_read) took about: * ~800ns per call with the original 64-bit code * ~500ns per call with the optimized 64-bit code (both patch 2 and 3) * ~500ns per call with the 32-bit code The 32-bit code is slightly faster but not significantly (like 530ns vs 500ns). ...but it is definitely simpler. If we find that the schedule clock wrap overhead is real then that would be a good reason to take the 64-bit version.
diff --git a/drivers/clocksource/exynos_mct.c b/drivers/clocksource/exynos_mct.c index ba3a683..7cbe4aa 100644 --- a/drivers/clocksource/exynos_mct.c +++ b/drivers/clocksource/exynos_mct.c @@ -167,8 +167,8 @@ static void exynos4_mct_frc_start(u32 hi, u32 lo) static inline cycle_t notrace _exynos4_frc_read(void) { - unsigned int lo, hi; - u32 hi2 = __raw_readl(reg_base + EXYNOS4_MCT_G_CNT_U); + u32 lo, hi; + static u32 hi2; do { hi = hi2;