Message ID | 20210807191428.3488948-1-oupton@google.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [v2] clocksource/arm_arch_timer: Fix masking for high freq counters | expand |
On Sat, Aug 7, 2021 at 9:14 PM Oliver Upton <oupton@google.com> wrote: > Unfortunately, the architecture provides no means to determine the bit > width of the system counter. However, we do know the following from the > specification: > > - the system counter is at least 56 bits wide > - Roll-over time of not less than 40 years > > To date, the arch timer driver has depended on the first property, > assuming any system counter to be 56 bits wide and masking off the rest. > However, combining a narrow clocksource mask with a high frequency > counter could result in prematurely wrapping the system counter by a > significant margin. For example, a 56 bit wide, 1GHz system counter > would wrap in a mere 2.28 years! > > This is a problem for two reasons: v8.6+ implementations are required to > provide a 64 bit, 1GHz system counter. Furthermore, before v8.6, > implementers may select a counter frequency of their choosing. > > Fix the issue by deriving a valid clock mask based on the second > property from above. Set the floor at 56 bits, since we know no system > counter is narrower than that. > > Suggested-by: Marc Zyngier <maz@kernel.org> > Signed-off-by: Oliver Upton <oupton@google.com> This patch looks good to me: Reviewed-by: Linus Walleij <linus.walleij@linaro.org> Just a thought that crossed my mind: as this is real hardware we are talking about mostly, how hard would it be for arch_counter_get_width() to detect how wide it actually is if nbits > 56? I would do something like this pseudocode: nbits = 56; while (nbits < 64) startval = GENMASK(nbits, 0); write_counter(startval); start_counter; nsleep(1); stop_counter; now = read_counter; if (now < startval) /* Ooops it wrapped */ break; nbits++ pr_info("counter has %d bits\n", nbits); Or did you folks already try this approach? Yours, Linus Walleij
Hi Linus, On Sat, Aug 7, 2021 at 3:30 PM Linus Walleij <linus.walleij@linaro.org> wrote: > > On Sat, Aug 7, 2021 at 9:14 PM Oliver Upton <oupton@google.com> wrote: > > > Unfortunately, the architecture provides no means to determine the bit > > width of the system counter. However, we do know the following from the > > specification: > > > > - the system counter is at least 56 bits wide > > - Roll-over time of not less than 40 years > > > > To date, the arch timer driver has depended on the first property, > > assuming any system counter to be 56 bits wide and masking off the rest. > > However, combining a narrow clocksource mask with a high frequency > > counter could result in prematurely wrapping the system counter by a > > significant margin. For example, a 56 bit wide, 1GHz system counter > > would wrap in a mere 2.28 years! > > > > This is a problem for two reasons: v8.6+ implementations are required to > > provide a 64 bit, 1GHz system counter. Furthermore, before v8.6, > > implementers may select a counter frequency of their choosing. > > > > Fix the issue by deriving a valid clock mask based on the second > > property from above. Set the floor at 56 bits, since we know no system > > counter is narrower than that. > > > > Suggested-by: Marc Zyngier <maz@kernel.org> > > Signed-off-by: Oliver Upton <oupton@google.com> > > This patch looks good to me: > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > Thanks for the review! > Just a thought that crossed my mind: as this is real hardware we are > talking about mostly, how hard would it be for arch_counter_get_width() > to detect how wide it actually is if nbits > 56? > > I would do something like this pseudocode: > > nbits = 56; > while (nbits < 64) > startval = GENMASK(nbits, 0); > write_counter(startval); > start_counter; > nsleep(1); > stop_counter; > now = read_counter; > if (now < startval) > /* Ooops it wrapped */ > break; > nbits++ > > pr_info("counter has %d bits\n", nbits); > > Or did you folks already try this approach? This would be a good idea, although I believe our only means of offsetting the counter are available in EL2. I had thought we could use a CVAL register instead, but this quote from the ARM ARM doesn't imply the CVAL bit width matches that of the system counter: <quote> If the Generic counter is implemented at a size less than 64 bits, then this field is permitted to be implemented at the same width as the counter, and the upper bits are RES0. </quote> The only other sane idea that I could come up with is providing this information to the kernel through DT, although that would leave ACPI systems behind. -- Thanks, Oliver
On Sat, 07 Aug 2021 23:30:20 +0100, Linus Walleij <linus.walleij@linaro.org> wrote: > > On Sat, Aug 7, 2021 at 9:14 PM Oliver Upton <oupton@google.com> wrote: > > > Unfortunately, the architecture provides no means to determine the bit > > width of the system counter. However, we do know the following from the > > specification: > > > > - the system counter is at least 56 bits wide > > - Roll-over time of not less than 40 years > > > > To date, the arch timer driver has depended on the first property, > > assuming any system counter to be 56 bits wide and masking off the rest. > > However, combining a narrow clocksource mask with a high frequency > > counter could result in prematurely wrapping the system counter by a > > significant margin. For example, a 56 bit wide, 1GHz system counter > > would wrap in a mere 2.28 years! > > > > This is a problem for two reasons: v8.6+ implementations are required to > > provide a 64 bit, 1GHz system counter. Furthermore, before v8.6, > > implementers may select a counter frequency of their choosing. > > > > Fix the issue by deriving a valid clock mask based on the second > > property from above. Set the floor at 56 bits, since we know no system > > counter is narrower than that. > > > > Suggested-by: Marc Zyngier <maz@kernel.org> > > Signed-off-by: Oliver Upton <oupton@google.com> > > This patch looks good to me: > Reviewed-by: Linus Walleij <linus.walleij@linaro.org> > > Just a thought that crossed my mind: as this is real hardware we are > talking about mostly, how hard would it be for arch_counter_get_width() > to detect how wide it actually is if nbits > 56? > > I would do something like this pseudocode: > > nbits = 56; > while (nbits < 64) > startval = GENMASK(nbits, 0); > write_counter(startval); That's where things stop. The counter is not writable, and for good reasons (it is shared with all the CPUs in the system). > start_counter; > nsleep(1); > stop_counter; > now = read_counter; > if (now < startval) > /* Ooops it wrapped */ > break; > nbits++ > > pr_info("counter has %d bits\n", nbits); > > Or did you folks already try this approach? The only way to emulate this behaviour is to use CNTVOFF_EL2 at EL2 to offset a guest view of the counter, and to run minimal guest that will do the start/stop/compare work. Given that it involves running a guest at a point where we are unable to do so, and that it cannot work when booted at EL1, we're left with guesswork. Thanks, M.
On Sun, 08 Aug 2021 02:14:35 +0100, Oliver Upton <oupton@google.com> wrote: > The only other sane idea that I could come up with is providing this > information to the kernel through DT, although that would leave ACPI > systems behind. It also has the disadvantage that a large number of DT timer nodes are a mess of cargo-culted, copy-pasted idioms, and that adding another property would only make it worse. I'm more confident with something that can be either: - checked from EL2 using CNTVOFF, which is complicated, doesn't work at EL1, and leaves us in a weird state if we have different counter width views in the system (BL is such a wonderful concept) - or computed from first principle based on the requirements of the architecture. Thanks, M.
On Sun, Aug 8, 2021 at 3:40 AM Marc Zyngier <maz@kernel.org> wrote: > > On Sun, 08 Aug 2021 02:14:35 +0100, > Oliver Upton <oupton@google.com> wrote: > > > The only other sane idea that I could come up with is providing this > > information to the kernel through DT, although that would leave ACPI > > systems behind. > > It also has the disadvantage that a large number of DT timer nodes are > a mess of cargo-culted, copy-pasted idioms, and that adding another > property would only make it worse. Agreed, this does seem like the best solution, short of the architecture actually providing something to determine the counter width. On that note, I wonder how (if ever) we will be able to move away from unnecessarily masking a 64 bit counter, i.e. a v8.6 or above implementation. With this patch, one such counter would wrap after 36.56 years, short of the 40 year guarantee we have from the architecture for < v8.6 implementations. Getting it to 64 bits would squarely make it someone else's problem ~585 years from now :) -- Thanks, Oliver
On Sun, 08 Aug 2021 20:01:10 +0100, Oliver Upton <oupton@google.com> wrote: > > On Sun, Aug 8, 2021 at 3:40 AM Marc Zyngier <maz@kernel.org> wrote: > > > > On Sun, 08 Aug 2021 02:14:35 +0100, > > Oliver Upton <oupton@google.com> wrote: > > > > > The only other sane idea that I could come up with is providing this > > > information to the kernel through DT, although that would leave ACPI > > > systems behind. > > > > It also has the disadvantage that a large number of DT timer nodes are > > a mess of cargo-culted, copy-pasted idioms, and that adding another > > property would only make it worse. > > Agreed, this does seem like the best solution, short of the > architecture actually providing something to determine the counter > width. > > On that note, I wonder how (if ever) we will be able to move away from > unnecessarily masking a 64 bit counter, i.e. a v8.6 or above > implementation. With this patch, one such counter would wrap after > 36.56 years, short of the 40 year guarantee we have from the > architecture for < v8.6 implementations. Getting it to 64 bits would > squarely make it someone else's problem ~585 years from now :) Hmmm. If you end-up with something that falls short of 40 years, then I suspect something is wrong in the way you compute the required width. 40 years @1GHz (which we shall call FY1G from now on) fits comfortably in 61 bits, and I fear that your use of ilog2() gives you one less bit than what it should be: log2(FY1G) ~= 60.13 What you are after is probably (ilog2(FY1G - 1) + 1), similar to the way roundup_pow_of_two() works. Thanks, M.
On Sat, 07 Aug 2021 20:14:28 +0100, Oliver Upton <oupton@google.com> wrote: > > Unfortunately, the architecture provides no means to determine the bit > width of the system counter. However, we do know the following from the > specification: > > - the system counter is at least 56 bits wide > - Roll-over time of not less than 40 years > > To date, the arch timer driver has depended on the first property, > assuming any system counter to be 56 bits wide and masking off the rest. > However, combining a narrow clocksource mask with a high frequency > counter could result in prematurely wrapping the system counter by a > significant margin. For example, a 56 bit wide, 1GHz system counter > would wrap in a mere 2.28 years! > > This is a problem for two reasons: v8.6+ implementations are required to > provide a 64 bit, 1GHz system counter. Furthermore, before v8.6, > implementers may select a counter frequency of their choosing. > > Fix the issue by deriving a valid clock mask based on the second > property from above. Set the floor at 56 bits, since we know no system > counter is narrower than that. > > Suggested-by: Marc Zyngier <maz@kernel.org> > Signed-off-by: Oliver Upton <oupton@google.com> > --- > This patch was tested on QEMU, tweaked to provide a 1GHz system counter > frequency. The 'bp.refcounter.base_frequency' property does not seem to > have any affect on the 'ARMvA Base RevC AEM FVP', and instead provides a > 100MHz counter. > > Parent commit: 0c32706dac1b ("arm64: stacktrace: avoid tracing arch_stack_walk()") > > drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++---- > 1 file changed, 28 insertions(+), 4 deletions(-) > > diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c > index be6d741d404c..f4816b22213c 100644 > --- a/drivers/clocksource/arm_arch_timer.c > +++ b/drivers/clocksource/arm_arch_timer.c > @@ -52,6 +52,12 @@ > #define CNTV_TVAL 0x38 > #define CNTV_CTL 0x3c > > +/* > + * The minimum amount of time a generic counter is guaranteed to not roll over > + * (40 years) > + */ > +#define MIN_ROLLOVER_SECS (40ULL * 365 * 24 * 3600) > + > static unsigned arch_timers_present __initdata; > > static void __iomem *arch_counter_base __ro_after_init; > @@ -205,13 +211,11 @@ static struct clocksource clocksource_counter = { > .id = CSID_ARM_ARCH_COUNTER, > .rating = 400, > .read = arch_counter_read, > - .mask = CLOCKSOURCE_MASK(56), > .flags = CLOCK_SOURCE_IS_CONTINUOUS, > }; > > static struct cyclecounter cyclecounter __ro_after_init = { > .read = arch_counter_read_cc, > - .mask = CLOCKSOURCE_MASK(56), > }; > > struct ate_acpi_oem_info { > @@ -1004,9 +1008,26 @@ struct arch_timer_kvm_info *arch_timer_get_kvm_info(void) > return &arch_timer_kvm_info; > } > > +/* > + * Makes an educated guess at a valid counter width based on the Generic Timer > + * specification. Of note: > + * 1) the system counter is at least 56 bits wide > + * 2) a roll-over time of not less than 40 years > + * > + * See 'ARM DDI 0487G.a D11.1.2 ("The system counter")' for more details. > + */ > +static int __init arch_counter_get_width(void) > +{ > + u64 min_cycles = MIN_ROLLOVER_SECS * arch_timer_rate; > + > + /* guarantee the returned width is within the valid range */ > + return clamp_val(ilog2(min_cycles), 56, 64); See my comment somewhere else in the thread about the potential wasted bit. > +} > + > static void __init arch_counter_register(unsigned type) > { > u64 start_count; > + int width; > > /* Register the CP15 based counter if we have one */ > if (type & ARCH_TIMER_TYPE_CP15) { > @@ -1031,6 +1052,10 @@ static void __init arch_counter_register(unsigned type) > arch_timer_read_counter = arch_counter_get_cntvct_mem; > } > > + width = arch_counter_get_width(); > + clocksource_counter.mask = CLOCKSOURCE_MASK(width); > + cyclecounter.mask = CLOCKSOURCE_MASK(width); > + > if (!arch_counter_suspend_stop) > clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; > start_count = arch_timer_read_counter(); > @@ -1040,8 +1065,7 @@ static void __init arch_counter_register(unsigned type) > timecounter_init(&arch_timer_kvm_info.timecounter, > &cyclecounter, start_count); > > - /* 56 bits minimum, so we assume worst case rollover */ > - sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate); > + sched_clock_register(arch_timer_read_counter, width, arch_timer_rate); For the record, there is one spot where the clockevent gets registered and configured that also needs addressing (there is a mask harcoded to 31 bits there, which is pretty odd). It cannot be fixed directly in this patch though. I'll probably take this patch on top of my series and adjust the relevant bits. M.
On Mon, Aug 9, 2021 at 3:45 AM Marc Zyngier <maz@kernel.org> wrote: > > On that note, I wonder how (if ever) we will be able to move away from > > unnecessarily masking a 64 bit counter, i.e. a v8.6 or above > > implementation. With this patch, one such counter would wrap after > > 36.56 years, short of the 40 year guarantee we have from the > > architecture for < v8.6 implementations. Getting it to 64 bits would > > squarely make it someone else's problem ~585 years from now :) > > Hmmm. If you end-up with something that falls short of 40 years, then > I suspect something is wrong in the way you compute the required > width. > > 40 years @1GHz (which we shall call FY1G from now on) fits comfortably > in 61 bits, and I fear that your use of ilog2() gives you one less bit > than what it should be: > > log2(FY1G) ~= 60.13 Right, this is round-down behavior was deliberate. Reading the ARM ARM D11.1.2 'The system counter', I did not find any language that suggested the counter saturates the register width before rolling over. So, it may be paranoid, but I presumed it to be safer to wrap within the guaranteed interval rather than assume the sanity of the system counter implementation. That being said, fine with rounding up instead, so long as we don't believe there's any chance of hardware doing something crazy. -- Thanks, Oliver > Thanks, > > M. > > -- > Without deviation from the norm, progress is not possible.
diff --git a/drivers/clocksource/arm_arch_timer.c b/drivers/clocksource/arm_arch_timer.c index be6d741d404c..f4816b22213c 100644 --- a/drivers/clocksource/arm_arch_timer.c +++ b/drivers/clocksource/arm_arch_timer.c @@ -52,6 +52,12 @@ #define CNTV_TVAL 0x38 #define CNTV_CTL 0x3c +/* + * The minimum amount of time a generic counter is guaranteed to not roll over + * (40 years) + */ +#define MIN_ROLLOVER_SECS (40ULL * 365 * 24 * 3600) + static unsigned arch_timers_present __initdata; static void __iomem *arch_counter_base __ro_after_init; @@ -205,13 +211,11 @@ static struct clocksource clocksource_counter = { .id = CSID_ARM_ARCH_COUNTER, .rating = 400, .read = arch_counter_read, - .mask = CLOCKSOURCE_MASK(56), .flags = CLOCK_SOURCE_IS_CONTINUOUS, }; static struct cyclecounter cyclecounter __ro_after_init = { .read = arch_counter_read_cc, - .mask = CLOCKSOURCE_MASK(56), }; struct ate_acpi_oem_info { @@ -1004,9 +1008,26 @@ struct arch_timer_kvm_info *arch_timer_get_kvm_info(void) return &arch_timer_kvm_info; } +/* + * Makes an educated guess at a valid counter width based on the Generic Timer + * specification. Of note: + * 1) the system counter is at least 56 bits wide + * 2) a roll-over time of not less than 40 years + * + * See 'ARM DDI 0487G.a D11.1.2 ("The system counter")' for more details. + */ +static int __init arch_counter_get_width(void) +{ + u64 min_cycles = MIN_ROLLOVER_SECS * arch_timer_rate; + + /* guarantee the returned width is within the valid range */ + return clamp_val(ilog2(min_cycles), 56, 64); +} + static void __init arch_counter_register(unsigned type) { u64 start_count; + int width; /* Register the CP15 based counter if we have one */ if (type & ARCH_TIMER_TYPE_CP15) { @@ -1031,6 +1052,10 @@ static void __init arch_counter_register(unsigned type) arch_timer_read_counter = arch_counter_get_cntvct_mem; } + width = arch_counter_get_width(); + clocksource_counter.mask = CLOCKSOURCE_MASK(width); + cyclecounter.mask = CLOCKSOURCE_MASK(width); + if (!arch_counter_suspend_stop) clocksource_counter.flags |= CLOCK_SOURCE_SUSPEND_NONSTOP; start_count = arch_timer_read_counter(); @@ -1040,8 +1065,7 @@ static void __init arch_counter_register(unsigned type) timecounter_init(&arch_timer_kvm_info.timecounter, &cyclecounter, start_count); - /* 56 bits minimum, so we assume worst case rollover */ - sched_clock_register(arch_timer_read_counter, 56, arch_timer_rate); + sched_clock_register(arch_timer_read_counter, width, arch_timer_rate); } static void arch_timer_stop(struct clock_event_device *clk)
Unfortunately, the architecture provides no means to determine the bit width of the system counter. However, we do know the following from the specification: - the system counter is at least 56 bits wide - Roll-over time of not less than 40 years To date, the arch timer driver has depended on the first property, assuming any system counter to be 56 bits wide and masking off the rest. However, combining a narrow clocksource mask with a high frequency counter could result in prematurely wrapping the system counter by a significant margin. For example, a 56 bit wide, 1GHz system counter would wrap in a mere 2.28 years! This is a problem for two reasons: v8.6+ implementations are required to provide a 64 bit, 1GHz system counter. Furthermore, before v8.6, implementers may select a counter frequency of their choosing. Fix the issue by deriving a valid clock mask based on the second property from above. Set the floor at 56 bits, since we know no system counter is narrower than that. Suggested-by: Marc Zyngier <maz@kernel.org> Signed-off-by: Oliver Upton <oupton@google.com> --- This patch was tested on QEMU, tweaked to provide a 1GHz system counter frequency. The 'bp.refcounter.base_frequency' property does not seem to have any affect on the 'ARMvA Base RevC AEM FVP', and instead provides a 100MHz counter. Parent commit: 0c32706dac1b ("arm64: stacktrace: avoid tracing arch_stack_walk()") drivers/clocksource/arm_arch_timer.c | 32 ++++++++++++++++++++++++---- 1 file changed, 28 insertions(+), 4 deletions(-)