diff mbox series

x86/timer: Fix boot on Intel systems using ITSSPRC static PIT clock gating

Message ID 20210107010625.5993-1-andrew.cooper3@citrix.com (mailing list archive)
State Superseded
Headers show
Series x86/timer: Fix boot on Intel systems using ITSSPRC static PIT clock gating | expand

Commit Message

Andrew Cooper Jan. 7, 2021, 1:06 a.m. UTC
Recent Intel client devices have disabled the legacy PIT for powersaving
reasons, breaking compatibility with a traditional IBM PC.  Xen depends on a
legacy timer interrupt to check that the IO-APIC/PIC routing is configured
correctly, and fails to boot with:

  (XEN) *******************************
  (XEN) Panic on CPU 0:
  (XEN) IO-APIC + timer doesn't work!  Boot with apic_verbosity=debug and send report.  Then try booting with the `noapic` option
  (XEN) *******************************

While this setting can be undone by Xen, the details of how to differ by
chipset, and would be very short sighted for battery based devices.  See bit 2
"8254 Static Clock Gating Enable" in:

  https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/comet-lake-u/intel-400-series-chipset-on-package-platform-controller-hub-register-database/itss-power-reduction-control-itssprc-offset-3300/

All impacted systems have an HPET, but there is no indication of the absence
of PIT functionality, nor a suitable way to probe for its absence.  As a short
term fix, reconfigure the HPET into legacy replacement mode.  A better
longterm fix would be to avoid the reliance on the timer interrupt entirely.

Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>
---
CC: Jan Beulich <JBeulich@suse.com>
CC: Roger Pau Monné <roger.pau@citrix.com>
CC: Wei Liu <wl@xen.org>
CC: Aaron Janse <aaron@ajanse.me>
CC: Jason Andryuk <jandryuk@gmail.com>
CC: Ondrej Balaz <blami@blami.net>
CC: Tamas K Lengyel <tamas@tklengyel.com>
CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>

Slightly RFC.  On older platforms this does generate some spurious PIC
interrupts during boot, but they're benign.  I was hoping to have time to fix
those too, but I'm getting an increasing number of requests to post this
patch.

Other followup actions:
 * Overhaul our setup logic.  Large quantities of it is legacy for pre-64bit
   systems, and not applicable to Xen these days.
 * Have Xen turn the PIT off when it isn't being used as the timesource.  Its
   a waste of time servicing useless interrupts.
 * Make `clocksource=pit` not enter an infinite loop on these systems
 * Drop all references to `noapic`.  These days, the only thing it will ever
   do is make a bad situation worse.
---
 xen/arch/x86/hpet.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
 1 file changed, 66 insertions(+), 1 deletion(-)

Comments

Jan Beulich Jan. 7, 2021, 1:53 p.m. UTC | #1
On 07.01.2021 02:06, Andrew Cooper wrote:
> Slightly RFC.  On older platforms this does generate some spurious PIC
> interrupts during boot, but they're benign.  I was hoping to have time to fix
> those too, but I'm getting an increasing number of requests to post this
> patch.

We still will want to try to suppress those by the release of 4.15,
and ideally even before we backport this one.

> @@ -793,6 +793,71 @@ u64 __init hpet_setup(void)
>      if ( (rem * 2) > hpet_period )
>          hpet_rate++;
>  
> +    /*
> +     * Intel chipsets from Skylake/ApolloLake onwards can statically clock
> +     * gate the 8259 PIT.  This option is enabled by default in slightly later
> +     * systems, as turning the PIT off is a prerequisite to entering the C11
> +     * power saving state.
> +     *
> +     * Xen currently depends on the legacy timer interrupt being active while
> +     * IRQ routing is configured.
> +     *
> +     * Reconfigure the HPET into legacy mode to re-establish the timer
> +     * interrupt.
> +     */
> +    if ( hpet_id & HPET_ID_LEGSUP &&

Add parentheses here?

> +         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
> +    {
> +        unsigned int c0_cfg, ticks, count;
> +
> +        /* Stop the main counter. */
> +        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
> +
> +        /* Reconfigure channel 0 to be 32bit periodic. */
> +        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
> +        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
> +                   HPET_TN_32BIT);
> +        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
> +
> +        /*
> +         * The exact period doesn't have to match a legacy PIT.  All we need
> +         * is an interrupt queued up via the IO-APIC to check routing.
> +         *
> +         * Use HZ as the frequency.
> +         */
> +        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
> +
> +        count = hpet_read32(HPET_COUNTER);
> +
> +        /*
> +         * HPET_TN_SETVAL above is atrociously documented in the spec.
> +         *
> +         * Periodic HPET channels have a main comparator register, and cache
> +         * the "last write to cmp", as a hidden register.
> +         *
> +         * The semantics on generating a periodic interrupt is:
> +         *   cmp += "last value written to the cmp register"
> +         * which will reload a new period.
> +         *
> +         * Normally, writes to cmp update the main comparator as well as being
> +         * cached for the reload value.  However, under these semantics, the
> +         * HPET main counter needs resetting to 0 to be able to configure the
> +         * period correctly.
> +         *
> +         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
> +         * use for periodic timers to mean that the second write to the
> +         * comparator updates only the "last written" cache, and not the
> +         * absolute comparator value.
> +         *
> +         * This lets us set a period when the main counter isn't at 0.
> +         */
> +        hpet_write32(count + ticks, HPET_Tn_CMP(0));
> +        hpet_write32(ticks,         HPET_Tn_CMP(0));

As you say, documentation is poor here. In fact the wording in the
HPET spec talks about updating of the "accumulator" instead;
perhaps just an unhelpful use of a different term. (Respective
Linux code has a comment indicating this is needed on a specific
AMD chipset only.)

It would seem more natural to me if things were explained a little
differently: Writes to the comparator with SETVAL clear always
update the "last written" value, i.e. the increment to be used
once the main counter equals the comparator. Writes with SETVAL set
update the comparator itself. (Assuming that's how it really is, of
course, but that's what I infer from the doc available.)

Linux additionally puts udelay(1) between the two writes. Do you
think we're fine without such, on all platforms?

> +        /* Restart the main counter, and legacy mode. */
> +        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);

This isn't necessarily "restart" - you may start the counter for
the first time. Hence in the comment maybe "(Re)start ..."?

As to the spurious IRQs, does it perhaps matter at which point
CFG_LEGACY gets set? We could try setting it when clearing
CFG_ENABLE further up, or we could also try two separate writes
each setting just one of the bits. (At least I can't deduce
from the spec that we ought to be prepared to observe spurious
IRQs here.)

Jan
Jason Andryuk Jan. 8, 2021, 2:30 p.m. UTC | #2
On Wed, Jan 6, 2021 at 8:06 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote:
>
> Recent Intel client devices have disabled the legacy PIT for powersaving
> reasons, breaking compatibility with a traditional IBM PC.  Xen depends on a
> legacy timer interrupt to check that the IO-APIC/PIC routing is configured
> correctly, and fails to boot with:
>
>   (XEN) *******************************
>   (XEN) Panic on CPU 0:
>   (XEN) IO-APIC + timer doesn't work!  Boot with apic_verbosity=debug and send report.  Then try booting with the `noapic` option
>   (XEN) *******************************
>
> While this setting can be undone by Xen, the details of how to differ by
> chipset, and would be very short sighted for battery based devices.  See bit 2
> "8254 Static Clock Gating Enable" in:
>
>   https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/comet-lake-u/intel-400-series-chipset-on-package-platform-controller-hub-register-database/itss-power-reduction-control-itssprc-offset-3300/
>
> All impacted systems have an HPET, but there is no indication of the absence
> of PIT functionality, nor a suitable way to probe for its absence.  As a short
> term fix, reconfigure the HPET into legacy replacement mode.  A better
> longterm fix would be to avoid the reliance on the timer interrupt entirely.
>
> Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com>

Tested-by: Jason Andryuk <jandryuk@gmail.com>

On the Dell 7200 that couldn't boot without Xen modification, but with
a build fix below.

> ---
> CC: Jan Beulich <JBeulich@suse.com>
> CC: Roger Pau Monné <roger.pau@citrix.com>
> CC: Wei Liu <wl@xen.org>
> CC: Aaron Janse <aaron@ajanse.me>
> CC: Jason Andryuk <jandryuk@gmail.com>
> CC: Ondrej Balaz <blami@blami.net>
> CC: Tamas K Lengyel <tamas@tklengyel.com>
> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com>
>
> Slightly RFC.  On older platforms this does generate some spurious PIC
> interrupts during boot, but they're benign.  I was hoping to have time to fix
> those too, but I'm getting an increasing number of requests to post this
> patch.

Spurious being the timer interrupt is now running?  In my local
patches, I have a function clear HPET_CFG_LEGACY after check_timer():

diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c
index e98e08e9c8..b62dea190a 100644
--- a/xen/arch/x86/io_apic.c
+++ b/xen/arch/x86/io_apic.c
@@ -2047,6 +2048,7 @@ void __init setup_IO_APIC(void)
     setup_IO_APIC_irqs();
     init_IO_APIC_traps();
     check_timer();
+    hpet_disable_legacy();
     print_IO_APIC();
     ioapic_pm_state_alloc();

> Other followup actions:
>  * Overhaul our setup logic.  Large quantities of it is legacy for pre-64bit
>    systems, and not applicable to Xen these days.
>  * Have Xen turn the PIT off when it isn't being used as the timesource.  Its
>    a waste of time servicing useless interrupts.
>  * Make `clocksource=pit` not enter an infinite loop on these systems
>  * Drop all references to `noapic`.  These days, the only thing it will ever
>    do is make a bad situation worse.
> ---
>  xen/arch/x86/hpet.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++-
>  1 file changed, 66 insertions(+), 1 deletion(-)
>
> diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
> index e6fab8acd8..f9541af537 100644
> --- a/xen/arch/x86/hpet.c
> +++ b/xen/arch/x86/hpet.c

<snip>

> +        /*
> +         * The exact period doesn't have to match a legacy PIT.  All we need
> +         * is an interrupt queued up via the IO-APIC to check routing.
> +         *
> +         * Use HZ as the frequency.
> +         */
> +        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;

hpet.c: In function ‘hpet_setup’:
hpet.c:828:70: error: expected ‘;’ before ‘)’ token
  828 |   ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;

Missing an additional leading '('
ticks = ((SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;

Thanks,
Jason
Andrew Cooper Jan. 26, 2021, 5:31 p.m. UTC | #3
On 07/01/2021 13:53, Jan Beulich wrote:
>> +         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
>> +    {
>> +        unsigned int c0_cfg, ticks, count;
>> +
>> +        /* Stop the main counter. */
>> +        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
>> +
>> +        /* Reconfigure channel 0 to be 32bit periodic. */
>> +        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
>> +        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
>> +                   HPET_TN_32BIT);
>> +        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
>> +
>> +        /*
>> +         * The exact period doesn't have to match a legacy PIT.  All we need
>> +         * is an interrupt queued up via the IO-APIC to check routing.
>> +         *
>> +         * Use HZ as the frequency.
>> +         */
>> +        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
>> +
>> +        count = hpet_read32(HPET_COUNTER);
>> +
>> +        /*
>> +         * HPET_TN_SETVAL above is atrociously documented in the spec.
>> +         *
>> +         * Periodic HPET channels have a main comparator register, and cache
>> +         * the "last write to cmp", as a hidden register.
>> +         *
>> +         * The semantics on generating a periodic interrupt is:
>> +         *   cmp += "last value written to the cmp register"
>> +         * which will reload a new period.
>> +         *
>> +         * Normally, writes to cmp update the main comparator as well as being
>> +         * cached for the reload value.  However, under these semantics, the
>> +         * HPET main counter needs resetting to 0 to be able to configure the
>> +         * period correctly.
>> +         *
>> +         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
>> +         * use for periodic timers to mean that the second write to the
>> +         * comparator updates only the "last written" cache, and not the
>> +         * absolute comparator value.
>> +         *
>> +         * This lets us set a period when the main counter isn't at 0.
>> +         */
>> +        hpet_write32(count + ticks, HPET_Tn_CMP(0));
>> +        hpet_write32(ticks,         HPET_Tn_CMP(0));
> As you say, documentation is poor here. In fact the wording in the
> HPET spec talks about updating of the "accumulator" instead;
> perhaps just an unhelpful use of a different term. (Respective
> Linux code has a comment indicating this is needed on a specific
> AMD chipset only.)

I'm fairly certain that Linux's comment is wrong.  This behaviour is
described in the HPET spec, which is an Intel spec.

It smells very much like a bug discovered during software bringup on
early alpha platforms with an HPET, and fixed at v0.9 of the spec (or
thereabouts).  I can entirely believe that it is the kind of thing which
would have been fixed in beta silicon before the spec was formally updated.

> It would seem more natural to me if things were explained a little
> differently: Writes to the comparator with SETVAL clear always
> update the "last written" value, i.e. the increment to be used
> once the main counter equals the comparator. Writes with SETVAL set
> update the comparator itself. (Assuming that's how it really is, of
> course, but that's what I infer from the doc available.)

By default, all writes update both the accumulator and comparator registers.

Except for the 2nd write of CMP following a write of SETVAL, where only
the accumulator is updated, and the comparator retains its old value.


I can only guess at the horrors for the internals of the HPET to make
this work.

SETVAL is RAZ so can't be observed in the CFG register.  To get the
observed semantics, every write to CMG will have to copy the SEVAL bit
from CFG to a local flipflop, and on the falling edge of bit, forgo the
comparator update.

This also smells of being a "least invasive fix" at a late point in
development.

> Linux additionally puts udelay(1) between the two writes. Do you
> think we're fine without such, on all platforms?

HPETs substantially predate 64bit capable systems.

There is no spec requirement for a pause, and there is a good chance
that implementation bugs like that were shaken out back in the 32bit days.

>> +        /* Restart the main counter, and legacy mode. */
>> +        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
> This isn't necessarily "restart" - you may start the counter for
> the first time. Hence in the comment maybe "(Re)start ..."?

Well - it is strictly a restart because of how the logic is laid out,
and even if that weren't the case, "restart" is fine to use in this context.

> As to the spurious IRQs, does it perhaps matter at which point
> CFG_LEGACY gets set? We could try setting it when clearing
> CFG_ENABLE further up, or we could also try two separate writes
> each setting just one of the bits. (At least I can't deduce
> from the spec that we ought to be prepared to observe spurious
> IRQs here.)

It will an (extra) electrical load stabilising on the line into the
PIC.  Repositioning control writes of the HPET won't make a difference.

~Andrew
Jan Beulich Jan. 27, 2021, 9:28 a.m. UTC | #4
On 26.01.2021 18:31, Andrew Cooper wrote:
> On 07/01/2021 13:53, Jan Beulich wrote:
>>> +         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
>>> +    {
>>> +        unsigned int c0_cfg, ticks, count;
>>> +
>>> +        /* Stop the main counter. */
>>> +        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
>>> +
>>> +        /* Reconfigure channel 0 to be 32bit periodic. */
>>> +        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
>>> +        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
>>> +                   HPET_TN_32BIT);
>>> +        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
>>> +
>>> +        /*
>>> +         * The exact period doesn't have to match a legacy PIT.  All we need
>>> +         * is an interrupt queued up via the IO-APIC to check routing.
>>> +         *
>>> +         * Use HZ as the frequency.
>>> +         */
>>> +        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
>>> +
>>> +        count = hpet_read32(HPET_COUNTER);
>>> +
>>> +        /*
>>> +         * HPET_TN_SETVAL above is atrociously documented in the spec.
>>> +         *
>>> +         * Periodic HPET channels have a main comparator register, and cache
>>> +         * the "last write to cmp", as a hidden register.
>>> +         *
>>> +         * The semantics on generating a periodic interrupt is:
>>> +         *   cmp += "last value written to the cmp register"
>>> +         * which will reload a new period.
>>> +         *
>>> +         * Normally, writes to cmp update the main comparator as well as being
>>> +         * cached for the reload value.  However, under these semantics, the
>>> +         * HPET main counter needs resetting to 0 to be able to configure the
>>> +         * period correctly.
>>> +         *
>>> +         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
>>> +         * use for periodic timers to mean that the second write to the
>>> +         * comparator updates only the "last written" cache, and not the
>>> +         * absolute comparator value.
>>> +         *
>>> +         * This lets us set a period when the main counter isn't at 0.
>>> +         */
>>> +        hpet_write32(count + ticks, HPET_Tn_CMP(0));
>>> +        hpet_write32(ticks,         HPET_Tn_CMP(0));
>> As you say, documentation is poor here. In fact the wording in the
>> HPET spec talks about updating of the "accumulator" instead;
>> perhaps just an unhelpful use of a different term. (Respective
>> Linux code has a comment indicating this is needed on a specific
>> AMD chipset only.)
> 
> I'm fairly certain that Linux's comment is wrong.  This behaviour is
> described in the HPET spec, which is an Intel spec.

I agree. I nevertheless wanted to mention it because it has been
there all the time, for many years.

>> It would seem more natural to me if things were explained a little
>> differently: Writes to the comparator with SETVAL clear always
>> update the "last written" value, i.e. the increment to be used
>> once the main counter equals the comparator. Writes with SETVAL set
>> update the comparator itself. (Assuming that's how it really is, of
>> course, but that's what I infer from the doc available.)
> 
> By default, all writes update both the accumulator and comparator registers.
> 
> Except for the 2nd write of CMP following a write of SETVAL, where only
> the accumulator is updated, and the comparator retains its old value.

IOW you don't want to change how you describe it. I found it
somewhat confusing (or at least not clear enough) to read this
way, so had to compare and match the comment with the spec to
follow what exactly you mean.

>>> +        /* Restart the main counter, and legacy mode. */
>>> +        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
>> This isn't necessarily "restart" - you may start the counter for
>> the first time. Hence in the comment maybe "(Re)start ..."?
> 
> Well - it is strictly a restart because of how the logic is laid out,
> and even if that weren't the case, "restart" is fine to use in this context.

I don't follow you here. If it was strictly a restart, where
is the place where we first started it? Higher up you have

        /* Stop the main counter. */
        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);

which of course stops the counter only if HPET_CFG_ENABLE
was set at that point. In find it less relevant to point out
this fact in the comment here, than I do in the "restart"
case; in fact I wouldn't have commented at all if the
comment simply was saying "Start ...".

Furthermore, wrt legacy mode I'm even less convinced there's
necessarily any _re_-starting.

>> As to the spurious IRQs, does it perhaps matter at which point
>> CFG_LEGACY gets set? We could try setting it when clearing
>> CFG_ENABLE further up, or we could also try two separate writes
>> each setting just one of the bits. (At least I can't deduce
>> from the spec that we ought to be prepared to observe spurious
>> IRQs here.)
> 
> It will an (extra) electrical load stabilising on the line into the
> PIC.  Repositioning control writes of the HPET won't make a difference.

I didn't think chipsets, where all of the various functionality
is bundled in a single chip, would necessarily use electrical
signals over an actual "line", but instead components would act
on a set of shared registers, using effectively software along
the lines of e.g. our vPIC emulation state machine. But I may
be entirely wrong with this ...

Assuming my implications above are correct that you don't really
want to change any of the commentary, I guess I don't want to
further hold this up, so preferably with the one pair of missing
parentheses added
Acked-by: Jan Beulich <jbeulich@suse.com>

Jan
Andrew Cooper Jan. 27, 2021, 11:36 a.m. UTC | #5
On 27/01/2021 09:28, Jan Beulich wrote:
> On 26.01.2021 18:31, Andrew Cooper wrote:
>> On 07/01/2021 13:53, Jan Beulich wrote:
>>>> +         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
>>>> +    {
>>>> +        unsigned int c0_cfg, ticks, count;
>>>> +
>>>> +        /* Stop the main counter. */
>>>> +        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
>>>> +
>>>> +        /* Reconfigure channel 0 to be 32bit periodic. */
>>>> +        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
>>>> +        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
>>>> +                   HPET_TN_32BIT);
>>>> +        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
>>>> +
>>>> +        /*
>>>> +         * The exact period doesn't have to match a legacy PIT.  All we need
>>>> +         * is an interrupt queued up via the IO-APIC to check routing.
>>>> +         *
>>>> +         * Use HZ as the frequency.
>>>> +         */
>>>> +        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
>>>> +
>>>> +        count = hpet_read32(HPET_COUNTER);
>>>> +
>>>> +        /*
>>>> +         * HPET_TN_SETVAL above is atrociously documented in the spec.
>>>> +         *
>>>> +         * Periodic HPET channels have a main comparator register, and cache
>>>> +         * the "last write to cmp", as a hidden register.
>>>> +         *
>>>> +         * The semantics on generating a periodic interrupt is:
>>>> +         *   cmp += "last value written to the cmp register"
>>>> +         * which will reload a new period.
>>>> +         *
>>>> +         * Normally, writes to cmp update the main comparator as well as being
>>>> +         * cached for the reload value.  However, under these semantics, the
>>>> +         * HPET main counter needs resetting to 0 to be able to configure the
>>>> +         * period correctly.
>>>> +         *
>>>> +         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
>>>> +         * use for periodic timers to mean that the second write to the
>>>> +         * comparator updates only the "last written" cache, and not the
>>>> +         * absolute comparator value.
>>>> +         *
>>>> +         * This lets us set a period when the main counter isn't at 0.
>>>> +         */
>>>> +        hpet_write32(count + ticks, HPET_Tn_CMP(0));
>>>> +        hpet_write32(ticks,         HPET_Tn_CMP(0));
>>> As you say, documentation is poor here. In fact the wording in the
>>> HPET spec talks about updating of the "accumulator" instead;
>>> perhaps just an unhelpful use of a different term. (Respective
>>> Linux code has a comment indicating this is needed on a specific
>>> AMD chipset only.)
>> I'm fairly certain that Linux's comment is wrong.  This behaviour is
>> described in the HPET spec, which is an Intel spec.
> I agree. I nevertheless wanted to mention it because it has been
> there all the time, for many years.
>
>>> It would seem more natural to me if things were explained a little
>>> differently: Writes to the comparator with SETVAL clear always
>>> update the "last written" value, i.e. the increment to be used
>>> once the main counter equals the comparator. Writes with SETVAL set
>>> update the comparator itself. (Assuming that's how it really is, of
>>> course, but that's what I infer from the doc available.)
>> By default, all writes update both the accumulator and comparator registers.
>>
>> Except for the 2nd write of CMP following a write of SETVAL, where only
>> the accumulator is updated, and the comparator retains its old value.
> IOW you don't want to change how you describe it. I found it
> somewhat confusing (or at least not clear enough) to read this
> way, so had to compare and match the comment with the spec to
> follow what exactly you mean.

I can see about trying to make my description more clear, but your
attempt to summarise it is incorrect.  You've got the behaviour
backwards WRT when SETVAL is logically set.

It is the second write to CMP after SETVAL which is special, not the
first write.

>
>>>> +        /* Restart the main counter, and legacy mode. */
>>>> +        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
>>> This isn't necessarily "restart" - you may start the counter for
>>> the first time. Hence in the comment maybe "(Re)start ..."?
>> Well - it is strictly a restart because of how the logic is laid out,
>> and even if that weren't the case, "restart" is fine to use in this context.
> I don't follow you here. If it was strictly a restart, where
> is the place where we first started it? Higher up you have
>
>         /* Stop the main counter. */
>         hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
>
> which of course stops the counter only if HPET_CFG_ENABLE
> was set at that point. In find it less relevant to point out
> this fact in the comment here, than I do in the "restart"
> case; in fact I wouldn't have commented at all if the
> comment simply was saying "Start ...".
>
> Furthermore, wrt legacy mode I'm even less convinced there's
> necessarily any _re_-starting.

"stop" and "restart" are a perfectly valid pair to use together, in this
and other contexts.

>
>>> As to the spurious IRQs, does it perhaps matter at which point
>>> CFG_LEGACY gets set? We could try setting it when clearing
>>> CFG_ENABLE further up, or we could also try two separate writes
>>> each setting just one of the bits. (At least I can't deduce
>>> from the spec that we ought to be prepared to observe spurious
>>> IRQs here.)
>> It will an (extra) electrical load stabilising on the line into the
>> PIC.  Repositioning control writes of the HPET won't make a difference.
> I didn't think chipsets, where all of the various functionality
> is bundled in a single chip, would necessarily use electrical
> signals over an actual "line", but instead components would act
> on a set of shared registers, using effectively software along
> the lines of e.g. our vPIC emulation state machine. But I may
> be entirely wrong with this ...

The legacy silicon block won't have changed in this regard.

Back when there were PCI lanes, they had to be real wires.  PCIe will
have had to mux INTx onto those real wires.  These days even with an
IO-APIC, some GPIO pins are still real wires.

Everything upstream of the PIC/IOAPIC is message based (both devices
actually send MSIs since the P4 era).  Nothing below it will have changed.

~Andrew
diff mbox series

Patch

diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c
index e6fab8acd8..f9541af537 100644
--- a/xen/arch/x86/hpet.c
+++ b/xen/arch/x86/hpet.c
@@ -758,7 +758,7 @@  static u32 *hpet_boot_cfg;
 u64 __init hpet_setup(void)
 {
     static u64 __initdata hpet_rate;
-    u32 hpet_id, hpet_period;
+    unsigned int hpet_id, hpet_period, hpet_cfg;
     unsigned int last, rem;
 
     if ( hpet_rate )
@@ -793,6 +793,71 @@  u64 __init hpet_setup(void)
     if ( (rem * 2) > hpet_period )
         hpet_rate++;
 
+    /*
+     * Intel chipsets from Skylake/ApolloLake onwards can statically clock
+     * gate the 8259 PIT.  This option is enabled by default in slightly later
+     * systems, as turning the PIT off is a prerequisite to entering the C11
+     * power saving state.
+     *
+     * Xen currently depends on the legacy timer interrupt being active while
+     * IRQ routing is configured.
+     *
+     * Reconfigure the HPET into legacy mode to re-establish the timer
+     * interrupt.
+     */
+    if ( hpet_id & HPET_ID_LEGSUP &&
+         !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) )
+    {
+        unsigned int c0_cfg, ticks, count;
+
+        /* Stop the main counter. */
+        hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG);
+
+        /* Reconfigure channel 0 to be 32bit periodic. */
+        c0_cfg = hpet_read32(HPET_Tn_CFG(0));
+        c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL |
+                   HPET_TN_32BIT);
+        hpet_write32(c0_cfg, HPET_Tn_CFG(0));
+
+        /*
+         * The exact period doesn't have to match a legacy PIT.  All we need
+         * is an interrupt queued up via the IO-APIC to check routing.
+         *
+         * Use HZ as the frequency.
+         */
+        ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32;
+
+        count = hpet_read32(HPET_COUNTER);
+
+        /*
+         * HPET_TN_SETVAL above is atrociously documented in the spec.
+         *
+         * Periodic HPET channels have a main comparator register, and cache
+         * the "last write to cmp", as a hidden register.
+         *
+         * The semantics on generating a periodic interrupt is:
+         *   cmp += "last value written to the cmp register"
+         * which will reload a new period.
+         *
+         * Normally, writes to cmp update the main comparator as well as being
+         * cached for the reload value.  However, under these semantics, the
+         * HPET main counter needs resetting to 0 to be able to configure the
+         * period correctly.
+         *
+         * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can
+         * use for periodic timers to mean that the second write to the
+         * comparator updates only the "last written" cache, and not the
+         * absolute comparator value.
+         *
+         * This lets us set a period when the main counter isn't at 0.
+         */
+        hpet_write32(count + ticks, HPET_Tn_CMP(0));
+        hpet_write32(ticks,         HPET_Tn_CMP(0));
+
+        /* Restart the main counter, and legacy mode. */
+        hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG);
+    }
+
     return hpet_rate;
 }