Message ID | 20200929111347.1967438-1-r.stratiienko@gmail.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | RFC: arm64: arch_timer: Fix timer inconsistency test for A64 | expand |
Hello Roman, On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote: > Fixes linux_kselftest:timers_inconsistency-check_arm_64 > > Test logs without the fix: > ''' > binary returned non-zero. Exit code: 1, stderr: , stdout: > Consistent CLOCK_REALTIME > 1601335525:467086804 > 1601335525:467087554 > 1601335525:467088345 > 1601335525:467089095 > 1601335525:467089887 > 1601335525:467090637 > 1601335525:467091429 > 1601335525:467092179 > 1601335525:467092929 > 1601335525:467093720 > 1601335525:467094470 > 1601335525:467095262 > 1601335525:467096012 > 1601335525:467096804 > -------------------- > 1601335525:467097554 > 1601335525:467077012 > -------------------- > 1601335525:467099095 > 1601335525:467099845 > 1601335525:467100637 > 1601335525:467101387 > 1601335525:467102179 > 1601335525:467102929 > ''' Can you reproduce the issue with a fixed CPU frequency. I suspect the root cause is around CPU frequency scaling code on A64, and timer jumps happen when the kernel is changing CPU frequency. I fixed a similar issue on H3 SoC just by changing the CPU frequency scaling code, without having to touch the timer readout code. https://megous.com/git/linux/commit/?h=ths-5.9&id=51ff1a6d80126f678efca42555f93efa611f50c4 regards, o. > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > CC: linux-arm-kernel@lists.infradead.org > CC: linux-kernel@vger.kernel.org > CC: linux-sunxi@googlegroups.com > CC: megous@megous.com > --- > drivers/clocksource/arm_arch_timer.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index 6c3e841801461..d50aa43cb654b 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void) > * number of CPU cycles in 3 consecutive 24 MHz counter periods. > */ > #define __sun50i_a64_read_reg(reg) ({ \ > - u64 _val; \ > + u64 _val1, _val2; \ > int _retries = 150; \ > \ > do { \ > - _val = read_sysreg(reg); \ > + _val1 = read_sysreg(reg); \ > + _val2 = read_sysreg(reg); \ > _retries--; \ > - } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries); \ > + } while (((_val2 - _val1) > 0x10) && _retries); \ > \ > WARN_ON_ONCE(!_retries); \ > - _val; \ > + _val2; \ > }) > > static u64 notrace sun50i_a64_read_cntpct_el0(void) > -- > 2.25.1 >
Hi, Please Cc maintainers for drivers -- Marc and I maintain the arch timer driver. On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote: > Fixes linux_kselftest:timers_inconsistency-check_arm_64 > > Test logs without the fix: > ''' > binary returned non-zero. Exit code: 1, stderr: , stdout: > Consistent CLOCK_REALTIME > 1601335525:467086804 > 1601335525:467087554 > 1601335525:467088345 > 1601335525:467089095 > 1601335525:467089887 > 1601335525:467090637 > 1601335525:467091429 > 1601335525:467092179 > 1601335525:467092929 > 1601335525:467093720 > 1601335525:467094470 > 1601335525:467095262 > 1601335525:467096012 > 1601335525:467096804 > -------------------- > 1601335525:467097554 > 1601335525:467077012 That's 0x1BD757D2 followed by 0x1BD70794. The rollback is somewhere in bits 15:12 to go from 0x1BD75xxx to 0x1BD70xxx, which suggests the analysis in the existing comment is incomplete. > -------------------- > 1601335525:467099095 > 1601335525:467099845 > 1601335525:467100637 > 1601335525:467101387 > 1601335525:467102179 > 1601335525:467102929 > ''' It would be very helpful if the commit message could explain the rough idea behind the change, because the rationale is not clear to me. > Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> > CC: linux-arm-kernel@lists.infradead.org > CC: linux-kernel@vger.kernel.org > CC: linux-sunxi@googlegroups.com > CC: megous@megous.com > --- > drivers/clocksource/arm_arch_timer.c | 9 +++++---- > 1 file changed, 5 insertions(+), 4 deletions(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index 6c3e841801461..d50aa43cb654b 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void) > * number of CPU cycles in 3 consecutive 24 MHz counter periods. > */ > #define __sun50i_a64_read_reg(reg) ({ \ > - u64 _val; \ > + u64 _val1, _val2; \ > int _retries = 150; \ > \ > do { \ > - _val = read_sysreg(reg); \ > + _val1 = read_sysreg(reg); \ > + _val2 = read_sysreg(reg); \ > _retries--; \ > - } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries); \ > + } while (((_val2 - _val1) > 0x10) && _retries); \ This is going to fail quite often at low CPU frequencies, and it's not clear to me that this solves the problem any more generally. DO we know what the underlying erratum is here? Thanks, Mark. > \ > WARN_ON_ONCE(!_retries); \ > - _val; \ > + _val2; \ > }) > > static u64 notrace sun50i_a64_read_cntpct_el0(void) > -- > 2.25.1 > > > _______________________________________________ > linux-arm-kernel mailing list > linux-arm-kernel@lists.infradead.org > http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
On 9/29/20 10:40 AM, Mark Rutland wrote: > Hi, > > Please Cc maintainers for drivers -- Marc and I maintain the arch timer > driver. > > On Tue, Sep 29, 2020 at 02:13:47PM +0300, Roman Stratiienko wrote: >> Fixes linux_kselftest:timers_inconsistency-check_arm_64 >> >> Test logs without the fix: >> ''' >> binary returned non-zero. Exit code: 1, stderr: , stdout: >> Consistent CLOCK_REALTIME >> 1601335525:467086804 >> 1601335525:467087554 >> 1601335525:467088345 >> 1601335525:467089095 >> 1601335525:467089887 >> 1601335525:467090637 >> 1601335525:467091429 >> 1601335525:467092179 >> 1601335525:467092929 >> 1601335525:467093720 >> 1601335525:467094470 >> 1601335525:467095262 >> 1601335525:467096012 >> 1601335525:467096804 >> -------------------- >> 1601335525:467097554 >> 1601335525:467077012 > > That's 0x1BD757D2 followed by 0x1BD70794. The rollback is somewhere in > bits 15:12 to go from 0x1BD75xxx to 0x1BD70xxx, which suggests the > analysis in the existing comment is incomplete. My analysis only looked at consecutive timer reads on a single core. Apparently, from the vendor comment that they needed CLOCKSOURCE_VALIDATE_LAST_CYCLE (linked below), there is another inconsistency with reads across cores. >> -------------------- >> 1601335525:467099095 >> 1601335525:467099845 >> 1601335525:467100637 >> 1601335525:467101387 >> 1601335525:467102179 >> 1601335525:467102929 >> ''' > > It would be very helpful if the commit message could explain the rough > idea behind the change, because the rationale is not clear to me. > >> Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> >> CC: linux-arm-kernel@lists.infradead.org >> CC: linux-kernel@vger.kernel.org >> CC: linux-sunxi@googlegroups.com >> CC: megous@megous.com >> --- >> drivers/clocksource/arm_arch_timer.c | 9 +++++---- >> 1 file changed, 5 insertions(+), 4 deletions(-) >> >> diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c >> index 6c3e841801461..d50aa43cb654b 100644 >> --- a/drivers/clocksource/arm_arch_timer.c >> +++ b/drivers/clocksource/arm_arch_timer.c >> @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void) >> * number of CPU cycles in 3 consecutive 24 MHz counter periods. >> */ >> #define __sun50i_a64_read_reg(reg) ({ \ >> - u64 _val; \ >> + u64 _val1, _val2; \ >> int _retries = 150; \ >> \ >> do { \ >> - _val = read_sysreg(reg); \ >> + _val1 = read_sysreg(reg); \ >> + _val2 = read_sysreg(reg); \ >> _retries--; \ >> - } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries); \ >> + } while (((_val2 - _val1) > 0x10) && _retries); \ > > This is going to fail quite often at low CPU frequencies, and it's not > clear to me that this solves the problem any more generally. DO we know > what the underlying erratum is here? This is what we have from the vendor: https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/arch/arm64/include/asm/arch_timer.h#L155 https://github.com/Allwinner-Homlet/H6-BSP4.9-linux/blob/master/drivers/clocksource/arm_arch_timer.c#L272 and they select CLOCKSOURCE_VALIDATE_LAST_CYCLE. Everything else we know (e.g. the tables and explanation in my original commit log) is from testing by users. I had not seen the vendor changes to arch_timer.h when I wrote the existing workaround. Their retry loop is similar to this patch, but as you mention, it won't work if the CPU frequency is too low. That may not be a problem for the vendor kernel, but it breaks badly on mainline. As Ondrej referenced, the mainline DVFS implementation bypasses the PLL during frequency changes. At that point, both the CPU and the timer are running at 24 MHz, so the chance of hitting the retry limit is high... and there's a udelay() call in the PLL bypass code (ccu_mux_notifier_cb()) that will need to read the timer. > Thanks, > Mark. Cheers, Samuel >> \ >> WARN_ON_ONCE(!_retries); \ >> - _val; \ >> + _val2; \ >> }) >> >> static u64 notrace sun50i_a64_read_cntpct_el0(void) >> -- >> 2.25.1 >> >> >> _______________________________________________ >> linux-arm-kernel mailing list >> linux-arm-kernel@lists.infradead.org >> http://lists.infradead.org/mailman/listinfo/linux-arm-kernel
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index 6c3e841801461..d50aa43cb654b 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -346,16 +346,17 @@ static u64 notrace arm64_858921_read_cntvct_el0(void) * number of CPU cycles in 3 consecutive 24 MHz counter periods. */ #define __sun50i_a64_read_reg(reg) ({ \ - u64 _val; \ + u64 _val1, _val2; \ int _retries = 150; \ \ do { \ - _val = read_sysreg(reg); \ + _val1 = read_sysreg(reg); \ + _val2 = read_sysreg(reg); \ _retries--; \ - } while (((_val + 1) & GENMASK(9, 0)) <= 1 && _retries); \ + } while (((_val2 - _val1) > 0x10) && _retries); \ \ WARN_ON_ONCE(!_retries); \ - _val; \ + _val2; \ }) static u64 notrace sun50i_a64_read_cntpct_el0(void)
Fixes linux_kselftest:timers_inconsistency-check_arm_64 Test logs without the fix: ''' binary returned non-zero. Exit code: 1, stderr: , stdout: Consistent CLOCK_REALTIME 1601335525:467086804 1601335525:467087554 1601335525:467088345 1601335525:467089095 1601335525:467089887 1601335525:467090637 1601335525:467091429 1601335525:467092179 1601335525:467092929 1601335525:467093720 1601335525:467094470 1601335525:467095262 1601335525:467096012 1601335525:467096804 -------------------- 1601335525:467097554 1601335525:467077012 -------------------- 1601335525:467099095 1601335525:467099845 1601335525:467100637 1601335525:467101387 1601335525:467102179 1601335525:467102929 ''' Signed-off-by: Roman Stratiienko <r.stratiienko@gmail.com> CC: linux-arm-kernel@lists.infradead.org CC: linux-kernel@vger.kernel.org CC: linux-sunxi@googlegroups.com CC: megous@megous.com --- drivers/clocksource/arm_arch_timer.c | 9 +++++---- 1 file changed, 5 insertions(+), 4 deletions(-)