Message ID | 20210107010625.5993-1-andrew.cooper3@citrix.com (mailing list archive) |
---|---|
State | Superseded |
Headers | show |
Series | x86/timer: Fix boot on Intel systems using ITSSPRC static PIT clock gating | expand |
On 07.01.2021 02:06, Andrew Cooper wrote: > Slightly RFC. On older platforms this does generate some spurious PIC > interrupts during boot, but they're benign. I was hoping to have time to fix > those too, but I'm getting an increasing number of requests to post this > patch. We still will want to try to suppress those by the release of 4.15, and ideally even before we backport this one. > @@ -793,6 +793,71 @@ u64 __init hpet_setup(void) > if ( (rem * 2) > hpet_period ) > hpet_rate++; > > + /* > + * Intel chipsets from Skylake/ApolloLake onwards can statically clock > + * gate the 8259 PIT. This option is enabled by default in slightly later > + * systems, as turning the PIT off is a prerequisite to entering the C11 > + * power saving state. > + * > + * Xen currently depends on the legacy timer interrupt being active while > + * IRQ routing is configured. > + * > + * Reconfigure the HPET into legacy mode to re-establish the timer > + * interrupt. > + */ > + if ( hpet_id & HPET_ID_LEGSUP && Add parentheses here? > + !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) ) > + { > + unsigned int c0_cfg, ticks, count; > + > + /* Stop the main counter. */ > + hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); > + > + /* Reconfigure channel 0 to be 32bit periodic. */ > + c0_cfg = hpet_read32(HPET_Tn_CFG(0)); > + c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | > + HPET_TN_32BIT); > + hpet_write32(c0_cfg, HPET_Tn_CFG(0)); > + > + /* > + * The exact period doesn't have to match a legacy PIT. All we need > + * is an interrupt queued up via the IO-APIC to check routing. > + * > + * Use HZ as the frequency. > + */ > + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; > + > + count = hpet_read32(HPET_COUNTER); > + > + /* > + * HPET_TN_SETVAL above is atrociously documented in the spec. > + * > + * Periodic HPET channels have a main comparator register, and cache > + * the "last write to cmp", as a hidden register. > + * > + * The semantics on generating a periodic interrupt is: > + * cmp += "last value written to the cmp register" > + * which will reload a new period. > + * > + * Normally, writes to cmp update the main comparator as well as being > + * cached for the reload value. However, under these semantics, the > + * HPET main counter needs resetting to 0 to be able to configure the > + * period correctly. > + * > + * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can > + * use for periodic timers to mean that the second write to the > + * comparator updates only the "last written" cache, and not the > + * absolute comparator value. > + * > + * This lets us set a period when the main counter isn't at 0. > + */ > + hpet_write32(count + ticks, HPET_Tn_CMP(0)); > + hpet_write32(ticks, HPET_Tn_CMP(0)); As you say, documentation is poor here. In fact the wording in the HPET spec talks about updating of the "accumulator" instead; perhaps just an unhelpful use of a different term. (Respective Linux code has a comment indicating this is needed on a specific AMD chipset only.) It would seem more natural to me if things were explained a little differently: Writes to the comparator with SETVAL clear always update the "last written" value, i.e. the increment to be used once the main counter equals the comparator. Writes with SETVAL set update the comparator itself. (Assuming that's how it really is, of course, but that's what I infer from the doc available.) Linux additionally puts udelay(1) between the two writes. Do you think we're fine without such, on all platforms? > + /* Restart the main counter, and legacy mode. */ > + hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG); This isn't necessarily "restart" - you may start the counter for the first time. Hence in the comment maybe "(Re)start ..."? As to the spurious IRQs, does it perhaps matter at which point CFG_LEGACY gets set? We could try setting it when clearing CFG_ENABLE further up, or we could also try two separate writes each setting just one of the bits. (At least I can't deduce from the spec that we ought to be prepared to observe spurious IRQs here.) Jan
On Wed, Jan 6, 2021 at 8:06 PM Andrew Cooper <andrew.cooper3@citrix.com> wrote: > > Recent Intel client devices have disabled the legacy PIT for powersaving > reasons, breaking compatibility with a traditional IBM PC. Xen depends on a > legacy timer interrupt to check that the IO-APIC/PIC routing is configured > correctly, and fails to boot with: > > (XEN) ******************************* > (XEN) Panic on CPU 0: > (XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send report. Then try booting with the `noapic` option > (XEN) ******************************* > > While this setting can be undone by Xen, the details of how to differ by > chipset, and would be very short sighted for battery based devices. See bit 2 > "8254 Static Clock Gating Enable" in: > > https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/comet-lake-u/intel-400-series-chipset-on-package-platform-controller-hub-register-database/itss-power-reduction-control-itssprc-offset-3300/ > > All impacted systems have an HPET, but there is no indication of the absence > of PIT functionality, nor a suitable way to probe for its absence. As a short > term fix, reconfigure the HPET into legacy replacement mode. A better > longterm fix would be to avoid the reliance on the timer interrupt entirely. > > Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> Tested-by: Jason Andryuk <jandryuk@gmail.com> On the Dell 7200 that couldn't boot without Xen modification, but with a build fix below. > --- > CC: Jan Beulich <JBeulich@suse.com> > CC: Roger Pau Monné <roger.pau@citrix.com> > CC: Wei Liu <wl@xen.org> > CC: Aaron Janse <aaron@ajanse.me> > CC: Jason Andryuk <jandryuk@gmail.com> > CC: Ondrej Balaz <blami@blami.net> > CC: Tamas K Lengyel <tamas@tklengyel.com> > CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> > > Slightly RFC. On older platforms this does generate some spurious PIC > interrupts during boot, but they're benign. I was hoping to have time to fix > those too, but I'm getting an increasing number of requests to post this > patch. Spurious being the timer interrupt is now running? In my local patches, I have a function clear HPET_CFG_LEGACY after check_timer(): diff --git a/xen/arch/x86/io_apic.c b/xen/arch/x86/io_apic.c index e98e08e9c8..b62dea190a 100644 --- a/xen/arch/x86/io_apic.c +++ b/xen/arch/x86/io_apic.c @@ -2047,6 +2048,7 @@ void __init setup_IO_APIC(void) setup_IO_APIC_irqs(); init_IO_APIC_traps(); check_timer(); + hpet_disable_legacy(); print_IO_APIC(); ioapic_pm_state_alloc(); > Other followup actions: > * Overhaul our setup logic. Large quantities of it is legacy for pre-64bit > systems, and not applicable to Xen these days. > * Have Xen turn the PIT off when it isn't being used as the timesource. Its > a waste of time servicing useless interrupts. > * Make `clocksource=pit` not enter an infinite loop on these systems > * Drop all references to `noapic`. These days, the only thing it will ever > do is make a bad situation worse. > --- > xen/arch/x86/hpet.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++- > 1 file changed, 66 insertions(+), 1 deletion(-) > > diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c > index e6fab8acd8..f9541af537 100644 > --- a/xen/arch/x86/hpet.c > +++ b/xen/arch/x86/hpet.c <snip> > + /* > + * The exact period doesn't have to match a legacy PIT. All we need > + * is an interrupt queued up via the IO-APIC to check routing. > + * > + * Use HZ as the frequency. > + */ > + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; hpet.c: In function ‘hpet_setup’: hpet.c:828:70: error: expected ‘;’ before ‘)’ token 828 | ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; Missing an additional leading '(' ticks = ((SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; Thanks, Jason
On 07/01/2021 13:53, Jan Beulich wrote: >> + !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) ) >> + { >> + unsigned int c0_cfg, ticks, count; >> + >> + /* Stop the main counter. */ >> + hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); >> + >> + /* Reconfigure channel 0 to be 32bit periodic. */ >> + c0_cfg = hpet_read32(HPET_Tn_CFG(0)); >> + c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | >> + HPET_TN_32BIT); >> + hpet_write32(c0_cfg, HPET_Tn_CFG(0)); >> + >> + /* >> + * The exact period doesn't have to match a legacy PIT. All we need >> + * is an interrupt queued up via the IO-APIC to check routing. >> + * >> + * Use HZ as the frequency. >> + */ >> + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; >> + >> + count = hpet_read32(HPET_COUNTER); >> + >> + /* >> + * HPET_TN_SETVAL above is atrociously documented in the spec. >> + * >> + * Periodic HPET channels have a main comparator register, and cache >> + * the "last write to cmp", as a hidden register. >> + * >> + * The semantics on generating a periodic interrupt is: >> + * cmp += "last value written to the cmp register" >> + * which will reload a new period. >> + * >> + * Normally, writes to cmp update the main comparator as well as being >> + * cached for the reload value. However, under these semantics, the >> + * HPET main counter needs resetting to 0 to be able to configure the >> + * period correctly. >> + * >> + * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can >> + * use for periodic timers to mean that the second write to the >> + * comparator updates only the "last written" cache, and not the >> + * absolute comparator value. >> + * >> + * This lets us set a period when the main counter isn't at 0. >> + */ >> + hpet_write32(count + ticks, HPET_Tn_CMP(0)); >> + hpet_write32(ticks, HPET_Tn_CMP(0)); > As you say, documentation is poor here. In fact the wording in the > HPET spec talks about updating of the "accumulator" instead; > perhaps just an unhelpful use of a different term. (Respective > Linux code has a comment indicating this is needed on a specific > AMD chipset only.) I'm fairly certain that Linux's comment is wrong. This behaviour is described in the HPET spec, which is an Intel spec. It smells very much like a bug discovered during software bringup on early alpha platforms with an HPET, and fixed at v0.9 of the spec (or thereabouts). I can entirely believe that it is the kind of thing which would have been fixed in beta silicon before the spec was formally updated. > It would seem more natural to me if things were explained a little > differently: Writes to the comparator with SETVAL clear always > update the "last written" value, i.e. the increment to be used > once the main counter equals the comparator. Writes with SETVAL set > update the comparator itself. (Assuming that's how it really is, of > course, but that's what I infer from the doc available.) By default, all writes update both the accumulator and comparator registers. Except for the 2nd write of CMP following a write of SETVAL, where only the accumulator is updated, and the comparator retains its old value. I can only guess at the horrors for the internals of the HPET to make this work. SETVAL is RAZ so can't be observed in the CFG register. To get the observed semantics, every write to CMG will have to copy the SEVAL bit from CFG to a local flipflop, and on the falling edge of bit, forgo the comparator update. This also smells of being a "least invasive fix" at a late point in development. > Linux additionally puts udelay(1) between the two writes. Do you > think we're fine without such, on all platforms? HPETs substantially predate 64bit capable systems. There is no spec requirement for a pause, and there is a good chance that implementation bugs like that were shaken out back in the 32bit days. >> + /* Restart the main counter, and legacy mode. */ >> + hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG); > This isn't necessarily "restart" - you may start the counter for > the first time. Hence in the comment maybe "(Re)start ..."? Well - it is strictly a restart because of how the logic is laid out, and even if that weren't the case, "restart" is fine to use in this context. > As to the spurious IRQs, does it perhaps matter at which point > CFG_LEGACY gets set? We could try setting it when clearing > CFG_ENABLE further up, or we could also try two separate writes > each setting just one of the bits. (At least I can't deduce > from the spec that we ought to be prepared to observe spurious > IRQs here.) It will an (extra) electrical load stabilising on the line into the PIC. Repositioning control writes of the HPET won't make a difference. ~Andrew
On 26.01.2021 18:31, Andrew Cooper wrote: > On 07/01/2021 13:53, Jan Beulich wrote: >>> + !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) ) >>> + { >>> + unsigned int c0_cfg, ticks, count; >>> + >>> + /* Stop the main counter. */ >>> + hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); >>> + >>> + /* Reconfigure channel 0 to be 32bit periodic. */ >>> + c0_cfg = hpet_read32(HPET_Tn_CFG(0)); >>> + c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | >>> + HPET_TN_32BIT); >>> + hpet_write32(c0_cfg, HPET_Tn_CFG(0)); >>> + >>> + /* >>> + * The exact period doesn't have to match a legacy PIT. All we need >>> + * is an interrupt queued up via the IO-APIC to check routing. >>> + * >>> + * Use HZ as the frequency. >>> + */ >>> + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; >>> + >>> + count = hpet_read32(HPET_COUNTER); >>> + >>> + /* >>> + * HPET_TN_SETVAL above is atrociously documented in the spec. >>> + * >>> + * Periodic HPET channels have a main comparator register, and cache >>> + * the "last write to cmp", as a hidden register. >>> + * >>> + * The semantics on generating a periodic interrupt is: >>> + * cmp += "last value written to the cmp register" >>> + * which will reload a new period. >>> + * >>> + * Normally, writes to cmp update the main comparator as well as being >>> + * cached for the reload value. However, under these semantics, the >>> + * HPET main counter needs resetting to 0 to be able to configure the >>> + * period correctly. >>> + * >>> + * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can >>> + * use for periodic timers to mean that the second write to the >>> + * comparator updates only the "last written" cache, and not the >>> + * absolute comparator value. >>> + * >>> + * This lets us set a period when the main counter isn't at 0. >>> + */ >>> + hpet_write32(count + ticks, HPET_Tn_CMP(0)); >>> + hpet_write32(ticks, HPET_Tn_CMP(0)); >> As you say, documentation is poor here. In fact the wording in the >> HPET spec talks about updating of the "accumulator" instead; >> perhaps just an unhelpful use of a different term. (Respective >> Linux code has a comment indicating this is needed on a specific >> AMD chipset only.) > > I'm fairly certain that Linux's comment is wrong. This behaviour is > described in the HPET spec, which is an Intel spec. I agree. I nevertheless wanted to mention it because it has been there all the time, for many years. >> It would seem more natural to me if things were explained a little >> differently: Writes to the comparator with SETVAL clear always >> update the "last written" value, i.e. the increment to be used >> once the main counter equals the comparator. Writes with SETVAL set >> update the comparator itself. (Assuming that's how it really is, of >> course, but that's what I infer from the doc available.) > > By default, all writes update both the accumulator and comparator registers. > > Except for the 2nd write of CMP following a write of SETVAL, where only > the accumulator is updated, and the comparator retains its old value. IOW you don't want to change how you describe it. I found it somewhat confusing (or at least not clear enough) to read this way, so had to compare and match the comment with the spec to follow what exactly you mean. >>> + /* Restart the main counter, and legacy mode. */ >>> + hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG); >> This isn't necessarily "restart" - you may start the counter for >> the first time. Hence in the comment maybe "(Re)start ..."? > > Well - it is strictly a restart because of how the logic is laid out, > and even if that weren't the case, "restart" is fine to use in this context. I don't follow you here. If it was strictly a restart, where is the place where we first started it? Higher up you have /* Stop the main counter. */ hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); which of course stops the counter only if HPET_CFG_ENABLE was set at that point. In find it less relevant to point out this fact in the comment here, than I do in the "restart" case; in fact I wouldn't have commented at all if the comment simply was saying "Start ...". Furthermore, wrt legacy mode I'm even less convinced there's necessarily any _re_-starting. >> As to the spurious IRQs, does it perhaps matter at which point >> CFG_LEGACY gets set? We could try setting it when clearing >> CFG_ENABLE further up, or we could also try two separate writes >> each setting just one of the bits. (At least I can't deduce >> from the spec that we ought to be prepared to observe spurious >> IRQs here.) > > It will an (extra) electrical load stabilising on the line into the > PIC. Repositioning control writes of the HPET won't make a difference. I didn't think chipsets, where all of the various functionality is bundled in a single chip, would necessarily use electrical signals over an actual "line", but instead components would act on a set of shared registers, using effectively software along the lines of e.g. our vPIC emulation state machine. But I may be entirely wrong with this ... Assuming my implications above are correct that you don't really want to change any of the commentary, I guess I don't want to further hold this up, so preferably with the one pair of missing parentheses added Acked-by: Jan Beulich <jbeulich@suse.com> Jan
On 27/01/2021 09:28, Jan Beulich wrote: > On 26.01.2021 18:31, Andrew Cooper wrote: >> On 07/01/2021 13:53, Jan Beulich wrote: >>>> + !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) ) >>>> + { >>>> + unsigned int c0_cfg, ticks, count; >>>> + >>>> + /* Stop the main counter. */ >>>> + hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); >>>> + >>>> + /* Reconfigure channel 0 to be 32bit periodic. */ >>>> + c0_cfg = hpet_read32(HPET_Tn_CFG(0)); >>>> + c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | >>>> + HPET_TN_32BIT); >>>> + hpet_write32(c0_cfg, HPET_Tn_CFG(0)); >>>> + >>>> + /* >>>> + * The exact period doesn't have to match a legacy PIT. All we need >>>> + * is an interrupt queued up via the IO-APIC to check routing. >>>> + * >>>> + * Use HZ as the frequency. >>>> + */ >>>> + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; >>>> + >>>> + count = hpet_read32(HPET_COUNTER); >>>> + >>>> + /* >>>> + * HPET_TN_SETVAL above is atrociously documented in the spec. >>>> + * >>>> + * Periodic HPET channels have a main comparator register, and cache >>>> + * the "last write to cmp", as a hidden register. >>>> + * >>>> + * The semantics on generating a periodic interrupt is: >>>> + * cmp += "last value written to the cmp register" >>>> + * which will reload a new period. >>>> + * >>>> + * Normally, writes to cmp update the main comparator as well as being >>>> + * cached for the reload value. However, under these semantics, the >>>> + * HPET main counter needs resetting to 0 to be able to configure the >>>> + * period correctly. >>>> + * >>>> + * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can >>>> + * use for periodic timers to mean that the second write to the >>>> + * comparator updates only the "last written" cache, and not the >>>> + * absolute comparator value. >>>> + * >>>> + * This lets us set a period when the main counter isn't at 0. >>>> + */ >>>> + hpet_write32(count + ticks, HPET_Tn_CMP(0)); >>>> + hpet_write32(ticks, HPET_Tn_CMP(0)); >>> As you say, documentation is poor here. In fact the wording in the >>> HPET spec talks about updating of the "accumulator" instead; >>> perhaps just an unhelpful use of a different term. (Respective >>> Linux code has a comment indicating this is needed on a specific >>> AMD chipset only.) >> I'm fairly certain that Linux's comment is wrong. This behaviour is >> described in the HPET spec, which is an Intel spec. > I agree. I nevertheless wanted to mention it because it has been > there all the time, for many years. > >>> It would seem more natural to me if things were explained a little >>> differently: Writes to the comparator with SETVAL clear always >>> update the "last written" value, i.e. the increment to be used >>> once the main counter equals the comparator. Writes with SETVAL set >>> update the comparator itself. (Assuming that's how it really is, of >>> course, but that's what I infer from the doc available.) >> By default, all writes update both the accumulator and comparator registers. >> >> Except for the 2nd write of CMP following a write of SETVAL, where only >> the accumulator is updated, and the comparator retains its old value. > IOW you don't want to change how you describe it. I found it > somewhat confusing (or at least not clear enough) to read this > way, so had to compare and match the comment with the spec to > follow what exactly you mean. I can see about trying to make my description more clear, but your attempt to summarise it is incorrect. You've got the behaviour backwards WRT when SETVAL is logically set. It is the second write to CMP after SETVAL which is special, not the first write. > >>>> + /* Restart the main counter, and legacy mode. */ >>>> + hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG); >>> This isn't necessarily "restart" - you may start the counter for >>> the first time. Hence in the comment maybe "(Re)start ..."? >> Well - it is strictly a restart because of how the logic is laid out, >> and even if that weren't the case, "restart" is fine to use in this context. > I don't follow you here. If it was strictly a restart, where > is the place where we first started it? Higher up you have > > /* Stop the main counter. */ > hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); > > which of course stops the counter only if HPET_CFG_ENABLE > was set at that point. In find it less relevant to point out > this fact in the comment here, than I do in the "restart" > case; in fact I wouldn't have commented at all if the > comment simply was saying "Start ...". > > Furthermore, wrt legacy mode I'm even less convinced there's > necessarily any _re_-starting. "stop" and "restart" are a perfectly valid pair to use together, in this and other contexts. > >>> As to the spurious IRQs, does it perhaps matter at which point >>> CFG_LEGACY gets set? We could try setting it when clearing >>> CFG_ENABLE further up, or we could also try two separate writes >>> each setting just one of the bits. (At least I can't deduce >>> from the spec that we ought to be prepared to observe spurious >>> IRQs here.) >> It will an (extra) electrical load stabilising on the line into the >> PIC. Repositioning control writes of the HPET won't make a difference. > I didn't think chipsets, where all of the various functionality > is bundled in a single chip, would necessarily use electrical > signals over an actual "line", but instead components would act > on a set of shared registers, using effectively software along > the lines of e.g. our vPIC emulation state machine. But I may > be entirely wrong with this ... The legacy silicon block won't have changed in this regard. Back when there were PCI lanes, they had to be real wires. PCIe will have had to mux INTx onto those real wires. These days even with an IO-APIC, some GPIO pins are still real wires. Everything upstream of the PIC/IOAPIC is message based (both devices actually send MSIs since the P4 era). Nothing below it will have changed. ~Andrew
diff --git a/xen/arch/x86/hpet.c b/xen/arch/x86/hpet.c index e6fab8acd8..f9541af537 100644 --- a/xen/arch/x86/hpet.c +++ b/xen/arch/x86/hpet.c @@ -758,7 +758,7 @@ static u32 *hpet_boot_cfg; u64 __init hpet_setup(void) { static u64 __initdata hpet_rate; - u32 hpet_id, hpet_period; + unsigned int hpet_id, hpet_period, hpet_cfg; unsigned int last, rem; if ( hpet_rate ) @@ -793,6 +793,71 @@ u64 __init hpet_setup(void) if ( (rem * 2) > hpet_period ) hpet_rate++; + /* + * Intel chipsets from Skylake/ApolloLake onwards can statically clock + * gate the 8259 PIT. This option is enabled by default in slightly later + * systems, as turning the PIT off is a prerequisite to entering the C11 + * power saving state. + * + * Xen currently depends on the legacy timer interrupt being active while + * IRQ routing is configured. + * + * Reconfigure the HPET into legacy mode to re-establish the timer + * interrupt. + */ + if ( hpet_id & HPET_ID_LEGSUP && + !((hpet_cfg = hpet_read32(HPET_CFG)) & HPET_CFG_LEGACY) ) + { + unsigned int c0_cfg, ticks, count; + + /* Stop the main counter. */ + hpet_write32(hpet_cfg & ~HPET_CFG_ENABLE, HPET_CFG); + + /* Reconfigure channel 0 to be 32bit periodic. */ + c0_cfg = hpet_read32(HPET_Tn_CFG(0)); + c0_cfg |= (HPET_TN_ENABLE | HPET_TN_PERIODIC | HPET_TN_SETVAL | + HPET_TN_32BIT); + hpet_write32(c0_cfg, HPET_Tn_CFG(0)); + + /* + * The exact period doesn't have to match a legacy PIT. All we need + * is an interrupt queued up via the IO-APIC to check routing. + * + * Use HZ as the frequency. + */ + ticks = (SECONDS(1) / HZ) * div_sc(hpet_rate, SECONDS(1), 32)) >> 32; + + count = hpet_read32(HPET_COUNTER); + + /* + * HPET_TN_SETVAL above is atrociously documented in the spec. + * + * Periodic HPET channels have a main comparator register, and cache + * the "last write to cmp", as a hidden register. + * + * The semantics on generating a periodic interrupt is: + * cmp += "last value written to the cmp register" + * which will reload a new period. + * + * Normally, writes to cmp update the main comparator as well as being + * cached for the reload value. However, under these semantics, the + * HPET main counter needs resetting to 0 to be able to configure the + * period correctly. + * + * Instead, HPET_TN_SETVAL is a self-clearing control bit which we can + * use for periodic timers to mean that the second write to the + * comparator updates only the "last written" cache, and not the + * absolute comparator value. + * + * This lets us set a period when the main counter isn't at 0. + */ + hpet_write32(count + ticks, HPET_Tn_CMP(0)); + hpet_write32(ticks, HPET_Tn_CMP(0)); + + /* Restart the main counter, and legacy mode. */ + hpet_write32(hpet_cfg | HPET_CFG_ENABLE | HPET_CFG_LEGACY, HPET_CFG); + } + return hpet_rate; }
Recent Intel client devices have disabled the legacy PIT for powersaving reasons, breaking compatibility with a traditional IBM PC. Xen depends on a legacy timer interrupt to check that the IO-APIC/PIC routing is configured correctly, and fails to boot with: (XEN) ******************************* (XEN) Panic on CPU 0: (XEN) IO-APIC + timer doesn't work! Boot with apic_verbosity=debug and send report. Then try booting with the `noapic` option (XEN) ******************************* While this setting can be undone by Xen, the details of how to differ by chipset, and would be very short sighted for battery based devices. See bit 2 "8254 Static Clock Gating Enable" in: https://edc.intel.com/content/www/us/en/design/products-and-solutions/processors-and-chipsets/comet-lake-u/intel-400-series-chipset-on-package-platform-controller-hub-register-database/itss-power-reduction-control-itssprc-offset-3300/ All impacted systems have an HPET, but there is no indication of the absence of PIT functionality, nor a suitable way to probe for its absence. As a short term fix, reconfigure the HPET into legacy replacement mode. A better longterm fix would be to avoid the reliance on the timer interrupt entirely. Signed-off-by: Andrew Cooper <andrew.cooper3@citrix.com> --- CC: Jan Beulich <JBeulich@suse.com> CC: Roger Pau Monné <roger.pau@citrix.com> CC: Wei Liu <wl@xen.org> CC: Aaron Janse <aaron@ajanse.me> CC: Jason Andryuk <jandryuk@gmail.com> CC: Ondrej Balaz <blami@blami.net> CC: Tamas K Lengyel <tamas@tklengyel.com> CC: Marek Marczykowski-Górecki <marmarek@invisiblethingslab.com> Slightly RFC. On older platforms this does generate some spurious PIC interrupts during boot, but they're benign. I was hoping to have time to fix those too, but I'm getting an increasing number of requests to post this patch. Other followup actions: * Overhaul our setup logic. Large quantities of it is legacy for pre-64bit systems, and not applicable to Xen these days. * Have Xen turn the PIT off when it isn't being used as the timesource. Its a waste of time servicing useless interrupts. * Make `clocksource=pit` not enter an infinite loop on these systems * Drop all references to `noapic`. These days, the only thing it will ever do is make a bad situation worse. --- xen/arch/x86/hpet.c | 67 ++++++++++++++++++++++++++++++++++++++++++++++++++++- 1 file changed, 66 insertions(+), 1 deletion(-)