Message ID | 20240105213251.4141-1-ilkka@os.amperecomputing.com (mailing list archive) |
---|---|
State | New, archived |
Headers | show |
Series | [RFC] KVM: arm64: Workaround for Ampere AC03_CPU_36 (exception taken to an incorrect EL) | expand |
Hi Ilkka, On Fri, Jan 05, 2024 at 01:32:51PM -0800, Ilkka Koskinen wrote: > Due to erratum AC03_CPU_36 on AmpereOne, if an Asynchronous Exception > (interrupts or SErrors) occurs to EL2, while EL2 software is modifying > system register bits that control EL2 exception behavior, the processor > may take an exception to an incorrect Exception Level. > > The affected system registers are HCR_EL2 and SCTLR_EL2, which contain > control bits for routing and enabling of EL2 exceptions. > > The issue is triggered when HGE.TGE bit is cleared while having > AMO/IMO/FMO bits cleared too. To avoid the exception getting taken > at a wrong Exception Level, we set AMO/IMO/FMO. We toggle HCR_EL2 for other things besides TLB invalidations, and the changelog does not describe why they're apparently unaffected. > Suggested-by: D Scott Phillips <scott@os.amperecomputing.com> > Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> This isn't an acceptable way to go about errata mitigations. Besides extremely unusual circumstances, the pattern is to use a cpucap && alternatives to only enable the workaround on affected designs. We then document the errata in the expected places (Kconfig and kernel documentation) such that the folks saddled with maintaining this stuff know how to handle it years down the line. > --- > arch/arm64/kvm/hyp/vhe/tlb.c | 12 +++++++++--- > 1 file changed, 9 insertions(+), 3 deletions(-) > > diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c > index b32e2940df7d..c72fdd2e4549 100644 > --- a/arch/arm64/kvm/hyp/vhe/tlb.c > +++ b/arch/arm64/kvm/hyp/vhe/tlb.c > @@ -61,9 +61,15 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, > * has an ISB in order to deal with this. > */ > __load_stage2(mmu, mmu->arch); > - val = read_sysreg(hcr_el2); > - val &= ~HCR_TGE; > - write_sysreg(val, hcr_el2); > + > + /* > + * With {E2H,TGE} == {1,0}, IMO == 1 is required so that IRQs are not > + * all masked. Huh? HCR_EL2.IMO affects the *routing* of IRQs at exception levels *lower than* EL2. > This also works around AmpereOne erratum AC03_CPU_36 > + * which can incorrectly route an IRQ to EL1 when HCR_EL2.{E2H,TGE} is > + * written from {1,1} to {1,0} with interrupts unmasked. > + */ > + sysreg_clear_set(hcr_el2, HCR_TGE, HCR_AMO | HCR_IMO | HCR_FMO); > + Rather than further modifying the exception controls, why not mask DAIF like we do on guest entry/exit? AFAICT that is the only reason KVM_RUN is unaffected by this erratum. This is what I have been using on a few AmpereOne systems: From 265cb193190c13c651d8e008d34d1d18505d4804 Mon Sep 17 00:00:00 2001 From: Oliver Upton <oliver.upton@linux.dev> Date: Fri, 5 Jan 2024 23:18:14 +0000 Subject: [PATCH] KVM: arm64: Mitigate AmpereOne erratum AC03_CPU_36 The AmpereOne design suffers from an erratum where if an asynchronous exception arrives while EL2 is modifying hypervisor exception controls (i.e. HCR_EL2, SCTLR_EL2) the PE may take an invalid exception to another EL. KVM's guest entry/exit path is unaffected because DAIF is already masked when modifying HCR_EL2. However, KVM only masks IRQs when switching to the guest MMU context for a TLB invalidation, meaning it is possible for an intervening exception to trigger the erratum. This becomes immediately obvious when profiling a KVM VM with pseudo-NMIs enabled, such as: perf record ./dirty_log_perf_test -m 2 -s anonymous_thp -v 4 -b 4G Mitigate the erratum by masking DAIF completely before switching to the guest MMU context. Signed-off-by: Oliver Upton <oliver.upton@linux.dev> --- Documentation/arch/arm64/silicon-errata.rst | 2 ++ arch/arm64/Kconfig | 16 ++++++++++++++++ arch/arm64/kernel/cpu_errata.c | 7 +++++++ arch/arm64/kvm/hyp/vhe/tlb.c | 20 ++++++++++++++++++-- arch/arm64/tools/cpucaps | 1 + 5 files changed, 44 insertions(+), 2 deletions(-) diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst index f47f63bcf67c..aea7a75ae434 100644 --- a/Documentation/arch/arm64/silicon-errata.rst +++ b/Documentation/arch/arm64/silicon-errata.rst @@ -52,6 +52,8 @@ stable kernels. | Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ +| Ampere | AmpereOne | AC03_CPU_36 | AMPERE_ERRATUM_AC03_CPU_36 | ++----------------+-----------------+-----------------+-----------------------------+ | Ampere | AmpereOne | AC03_CPU_38 | AMPERE_ERRATUM_AC03_CPU_38 | +----------------+-----------------+-----------------+-----------------------------+ +----------------+-----------------+-----------------+-----------------------------+ diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig index 7b071a00425d..9ba04b90e945 100644 --- a/arch/arm64/Kconfig +++ b/arch/arm64/Kconfig @@ -434,6 +434,22 @@ config AMPERE_ERRATUM_AC03_CPU_38 If unsure, say Y. +config AMPERE_ERRATUM_AC03_CPU_36 + bool "AmpereOne: AC03_CPU_36: CPU takes invalid asynchronous exception while changing exception controls" + default y + help + This option adds an alternative code sequence to work around Ampere + erratum AC03_CPU_36 on AmpereOne parts. + + The affected designs can take an invalid exception to an incorrect + exception level if an asynchronous exception arrives while software + is changing the EL2 exception controls (i.e. HCR_EL2, SCTLR_EL2). + + The workaround forces KVM to mask all asynchronous exception sources + when switching to the guest MMU context for TLB invalidations. + + If unsure, say Y. + config ARM64_WORKAROUND_CLEAN_CACHE bool diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c index e29e0fea63fb..7e2856360f38 100644 --- a/arch/arm64/kernel/cpu_errata.c +++ b/arch/arm64/kernel/cpu_errata.c @@ -727,6 +727,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = { .capability = ARM64_WORKAROUND_AMPERE_AC03_CPU_38, ERRATA_MIDR_ALL_VERSIONS(MIDR_AMPERE1), }, +#endif +#ifdef CONFIG_AMPERE_ERRATUM_AC03_CPU_36 + { + .desc = "AmpereOne erratum AC03_CPU_36", + .capability = ARM64_WORKAROUND_AMPERE_AC03_CPU_36, + ERRATA_MIDR_ALL_VERSIONS(MIDR_AMPERE1), + }, #endif { } diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c index b636b4111dbf..cedfb0a32fa0 100644 --- a/arch/arm64/kvm/hyp/vhe/tlb.c +++ b/arch/arm64/kvm/hyp/vhe/tlb.c @@ -17,13 +17,29 @@ struct tlb_inv_context { u64 sctlr; }; +#define __tlb_daif_save(flags) \ +({ \ + if (cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_36)) \ + flags = local_daif_save(); \ + else \ + local_irq_save(flags); \ +}) + +#define __tlb_daif_restore(flags) \ +({ \ + if (cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_36)) \ + local_daif_restore(flags); \ + else \ + local_irq_restore(flags); \ +}) + static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, struct tlb_inv_context *cxt) { struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); u64 val; - local_irq_save(cxt->flags); + __tlb_daif_save(cxt->flags); if (vcpu && mmu != vcpu->arch.hw_mmu) cxt->mmu = vcpu->arch.hw_mmu; @@ -86,7 +102,7 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt) write_sysreg_el1(cxt->sctlr, SYS_SCTLR); } - local_irq_restore(cxt->flags); + __tlb_daif_restore(cxt->flags); } void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps index b98c38288a9d..cfaae3b4d2d0 100644 --- a/arch/arm64/tools/cpucaps +++ b/arch/arm64/tools/cpucaps @@ -85,6 +85,7 @@ WORKAROUND_2457168 WORKAROUND_2645198 WORKAROUND_2658417 WORKAROUND_2966298 +WORKAROUND_AMPERE_AC03_CPU_36 WORKAROUND_AMPERE_AC03_CPU_38 WORKAROUND_TRBE_OVERWRITE_FILL_MODE WORKAROUND_TSB_FLUSH_FAILURE
[reviewing both patches in one go, as it is way easier] On Fri, 05 Jan 2024 23:53:10 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > Hi Ilkka, > > On Fri, Jan 05, 2024 at 01:32:51PM -0800, Ilkka Koskinen wrote: > > Due to erratum AC03_CPU_36 on AmpereOne, if an Asynchronous Exception > > (interrupts or SErrors) occurs to EL2, while EL2 software is modifying > > system register bits that control EL2 exception behavior, the processor > > may take an exception to an incorrect Exception Level. What needs to be described (both in the commit message and as part of the code) is under what circumstances this mis-routing happens. Is it that just clearing TGE while being at EL2 always results in the asynchronous exception being routed to the wrong exception level? Or is it a more subtle issue related to synchronisation? Also worth describing is to which other exception level is the exception delivered? EL1? EL3? > > > > The affected system registers are HCR_EL2 and SCTLR_EL2, which contain > > control bits for routing and enabling of EL2 exceptions. How does SCTLR_EL2 affects interrupt delivery? Is this related to FEAT_NMI and SCTLR_EL2.{NMI,SPINTMASK}? Because this is the only part of this register that has anything to do with interrupts. > > > > The issue is triggered when HGE.TGE bit is cleared while having > > AMO/IMO/FMO bits cleared too. To avoid the exception getting taken > > at a wrong Exception Level, we set AMO/IMO/FMO. > > We toggle HCR_EL2 for other things besides TLB invalidations, and the > changelog does not describe why they're apparently unaffected. > > > Suggested-by: D Scott Phillips <scott@os.amperecomputing.com> > > Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> > > This isn't an acceptable way to go about errata mitigations. Besides > extremely unusual circumstances, the pattern is to use a cpucap && > alternatives to only enable the workaround on affected designs. We then > document the errata in the expected places (Kconfig and kernel > documentation) such that the folks saddled with maintaining this stuff > know how to handle it years down the line. +1. This hack will have to live forever, while the lack of documentation makes it totally unmaintainable. The KVM code *will* change in ways that cannot be anticipated today, and without exhaustive documentation, we will not be able to do a good job at maintaining this system alive by correctly mitigating the erratum. > > > --- > > arch/arm64/kvm/hyp/vhe/tlb.c | 12 +++++++++--- > > 1 file changed, 9 insertions(+), 3 deletions(-) > > > > diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c > > index b32e2940df7d..c72fdd2e4549 100644 > > --- a/arch/arm64/kvm/hyp/vhe/tlb.c > > +++ b/arch/arm64/kvm/hyp/vhe/tlb.c > > @@ -61,9 +61,15 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, > > * has an ISB in order to deal with this. > > */ > > __load_stage2(mmu, mmu->arch); > > - val = read_sysreg(hcr_el2); > > - val &= ~HCR_TGE; > > - write_sysreg(val, hcr_el2); > > + > > + /* > > + * With {E2H,TGE} == {1,0}, IMO == 1 is required so that IRQs are not > > + * all masked. > > Huh? HCR_EL2.IMO affects the *routing* of IRQs at exception levels > *lower than* EL2. Yup, and there is *zero* requirement for IMO to have any particular value while running at EL2. As long as you're at EL2, physical interrupts that are not targeting EL3 are taken at EL2, full stop. I'm all for being creative, but reinventing the architecture as a justification for an erratum workaround is taking it *way* too far. > > > This also works around AmpereOne erratum AC03_CPU_36 > > + * which can incorrectly route an IRQ to EL1 when HCR_EL2.{E2H,TGE} is > > + * written from {1,1} to {1,0} with interrupts unmasked. > > + */ > > + sysreg_clear_set(hcr_el2, HCR_TGE, HCR_AMO | HCR_IMO | HCR_FMO); > > + > > Rather than further modifying the exception controls, why not mask DAIF > like we do on guest entry/exit? AFAICT that is the only reason KVM_RUN > is unaffected by this erratum. > > This is what I have been using on a few AmpereOne systems: > > From 265cb193190c13c651d8e008d34d1d18505d4804 Mon Sep 17 00:00:00 2001 > From: Oliver Upton <oliver.upton@linux.dev> > Date: Fri, 5 Jan 2024 23:18:14 +0000 > Subject: [PATCH] KVM: arm64: Mitigate AmpereOne erratum AC03_CPU_36 > > The AmpereOne design suffers from an erratum where if an asynchronous > exception arrives while EL2 is modifying hypervisor exception controls > (i.e. HCR_EL2, SCTLR_EL2) the PE may take an invalid exception to > another EL. Same questions about SCTLR_EL2 and the notion of "another EL". > > KVM's guest entry/exit path is unaffected because DAIF is already masked > when modifying HCR_EL2. However, KVM only masks IRQs when switching to > the guest MMU context for a TLB invalidation, meaning it is possible for > an intervening exception to trigger the erratum. This is a bit unclear, because masking interrupts is commonly done using DAIF. You may want to talk about pseudo-NMIs early on and explain that we rely on PMR for masking in this case, still allowing interrupts to be handled. > This becomes > immediately obvious when profiling a KVM VM with pseudo-NMIs enabled, > such as: > > perf record ./dirty_log_perf_test -m 2 -s anonymous_thp -v 4 -b 4G > > Mitigate the erratum by masking DAIF completely before switching to the > guest MMU context. > > Signed-off-by: Oliver Upton <oliver.upton@linux.dev> > --- > Documentation/arch/arm64/silicon-errata.rst | 2 ++ > arch/arm64/Kconfig | 16 ++++++++++++++++ > arch/arm64/kernel/cpu_errata.c | 7 +++++++ > arch/arm64/kvm/hyp/vhe/tlb.c | 20 ++++++++++++++++++-- > arch/arm64/tools/cpucaps | 1 + > 5 files changed, 44 insertions(+), 2 deletions(-) > > diff --git a/Documentation/arch/arm64/silicon-errata.rst b/Documentation/arch/arm64/silicon-errata.rst > index f47f63bcf67c..aea7a75ae434 100644 > --- a/Documentation/arch/arm64/silicon-errata.rst > +++ b/Documentation/arch/arm64/silicon-errata.rst > @@ -52,6 +52,8 @@ stable kernels. > | Allwinner | A64/R18 | UNKNOWN1 | SUN50I_ERRATUM_UNKNOWN1 | > +----------------+-----------------+-----------------+-----------------------------+ > +----------------+-----------------+-----------------+-----------------------------+ > +| Ampere | AmpereOne | AC03_CPU_36 | AMPERE_ERRATUM_AC03_CPU_36 | > ++----------------+-----------------+-----------------+-----------------------------+ > | Ampere | AmpereOne | AC03_CPU_38 | AMPERE_ERRATUM_AC03_CPU_38 | > +----------------+-----------------+-----------------+-----------------------------+ > +----------------+-----------------+-----------------+-----------------------------+ > diff --git a/arch/arm64/Kconfig b/arch/arm64/Kconfig > index 7b071a00425d..9ba04b90e945 100644 > --- a/arch/arm64/Kconfig > +++ b/arch/arm64/Kconfig > @@ -434,6 +434,22 @@ config AMPERE_ERRATUM_AC03_CPU_38 > > If unsure, say Y. > > +config AMPERE_ERRATUM_AC03_CPU_36 > + bool "AmpereOne: AC03_CPU_36: CPU takes invalid asynchronous exception while changing exception controls" > + default y > + help > + This option adds an alternative code sequence to work around Ampere > + erratum AC03_CPU_36 on AmpereOne parts. > + > + The affected designs can take an invalid exception to an incorrect > + exception level if an asynchronous exception arrives while software > + is changing the EL2 exception controls (i.e. HCR_EL2, SCTLR_EL2). > + > + The workaround forces KVM to mask all asynchronous exception sources > + when switching to the guest MMU context for TLB invalidations. > + > + If unsure, say Y. > + > config ARM64_WORKAROUND_CLEAN_CACHE > bool > > diff --git a/arch/arm64/kernel/cpu_errata.c b/arch/arm64/kernel/cpu_errata.c > index e29e0fea63fb..7e2856360f38 100644 > --- a/arch/arm64/kernel/cpu_errata.c > +++ b/arch/arm64/kernel/cpu_errata.c > @@ -727,6 +727,13 @@ const struct arm64_cpu_capabilities arm64_errata[] = { > .capability = ARM64_WORKAROUND_AMPERE_AC03_CPU_38, > ERRATA_MIDR_ALL_VERSIONS(MIDR_AMPERE1), > }, > +#endif > +#ifdef CONFIG_AMPERE_ERRATUM_AC03_CPU_36 > + { > + .desc = "AmpereOne erratum AC03_CPU_36", > + .capability = ARM64_WORKAROUND_AMPERE_AC03_CPU_36, > + ERRATA_MIDR_ALL_VERSIONS(MIDR_AMPERE1), > + }, > #endif > { > } > diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c > index b636b4111dbf..cedfb0a32fa0 100644 > --- a/arch/arm64/kvm/hyp/vhe/tlb.c > +++ b/arch/arm64/kvm/hyp/vhe/tlb.c > @@ -17,13 +17,29 @@ struct tlb_inv_context { > u64 sctlr; > }; > > +#define __tlb_daif_save(flags) \ > +({ \ > + if (cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_36)) \ > + flags = local_daif_save(); \ > + else \ > + local_irq_save(flags); \ > +}) > + > +#define __tlb_daif_restore(flags) \ > +({ \ > + if (cpus_have_final_cap(ARM64_WORKAROUND_AMPERE_AC03_CPU_36)) \ > + local_daif_restore(flags); \ > + else \ > + local_irq_restore(flags); \ > +}) > + I'd really like a comment about *why* we are doing that. > static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, > struct tlb_inv_context *cxt) > { > struct kvm_vcpu *vcpu = kvm_get_running_vcpu(); > u64 val; > > - local_irq_save(cxt->flags); > + __tlb_daif_save(cxt->flags); > > if (vcpu && mmu != vcpu->arch.hw_mmu) > cxt->mmu = vcpu->arch.hw_mmu; > @@ -86,7 +102,7 @@ static void __tlb_switch_to_host(struct tlb_inv_context *cxt) > write_sysreg_el1(cxt->sctlr, SYS_SCTLR); > } > > - local_irq_restore(cxt->flags); > + __tlb_daif_restore(cxt->flags); > } > > void __kvm_tlb_flush_vmid_ipa(struct kvm_s2_mmu *mmu, > diff --git a/arch/arm64/tools/cpucaps b/arch/arm64/tools/cpucaps > index b98c38288a9d..cfaae3b4d2d0 100644 > --- a/arch/arm64/tools/cpucaps > +++ b/arch/arm64/tools/cpucaps > @@ -85,6 +85,7 @@ WORKAROUND_2457168 > WORKAROUND_2645198 > WORKAROUND_2658417 > WORKAROUND_2966298 > +WORKAROUND_AMPERE_AC03_CPU_36 > WORKAROUND_AMPERE_AC03_CPU_38 > WORKAROUND_TRBE_OVERWRITE_FILL_MODE > WORKAROUND_TSB_FLUSH_FAILURE Other than the passing comments, I'm OK with this patch. However, I am very worried that this is only the start of a very long game of whack-a-mole, because there is no actual documentation on what goes wrong. For example, we have plenty of writes to SCTLR_EL2 (using the SCTLR_EL1 alias if running VHE) for MTE. Are any of those affected? Short of having some solid handle on what is happening, I don't see how we can promise to support this system. M.
On Sat, Jan 06, 2024 at 12:13:09PM +0000, Marc Zyngier wrote: [...] > > From 265cb193190c13c651d8e008d34d1d18505d4804 Mon Sep 17 00:00:00 2001 > > From: Oliver Upton <oliver.upton@linux.dev> > > Date: Fri, 5 Jan 2024 23:18:14 +0000 > > Subject: [PATCH] KVM: arm64: Mitigate AmpereOne erratum AC03_CPU_36 > > > > The AmpereOne design suffers from an erratum where if an asynchronous > > exception arrives while EL2 is modifying hypervisor exception controls > > (i.e. HCR_EL2, SCTLR_EL2) the PE may take an invalid exception to > > another EL. > > Same questions about SCTLR_EL2 and the notion of "another EL". I've got the same questions :) This is just a rewording of Ampere's erratum description. https://amperecomputing.com/customer-connect/products/AmpereOne-device-documentation > Other than the passing comments, I'm OK with this patch. However, I am > very worried that this is only the start of a very long game of > whack-a-mole, because there is no actual documentation on what goes > wrong. > > For example, we have plenty of writes to SCTLR_EL2 (using the > SCTLR_EL1 alias if running VHE) for MTE. Are any of those affected? > > Short of having some solid handle on what is happening, I don't see > how we can promise to support this system. Completely agree. At least on the AmpereOne machines I have access to this seems to do the trick, but that observation is no replacement for full documentation.
On Sat, 06 Jan 2024 17:50:23 +0000, Oliver Upton <oliver.upton@linux.dev> wrote: > > On Sat, Jan 06, 2024 at 12:13:09PM +0000, Marc Zyngier wrote: > > [...] > > > > From 265cb193190c13c651d8e008d34d1d18505d4804 Mon Sep 17 00:00:00 2001 > > > From: Oliver Upton <oliver.upton@linux.dev> > > > Date: Fri, 5 Jan 2024 23:18:14 +0000 > > > Subject: [PATCH] KVM: arm64: Mitigate AmpereOne erratum AC03_CPU_36 > > > > > > The AmpereOne design suffers from an erratum where if an asynchronous > > > exception arrives while EL2 is modifying hypervisor exception controls > > > (i.e. HCR_EL2, SCTLR_EL2) the PE may take an invalid exception to > > > another EL. > > > > Same questions about SCTLR_EL2 and the notion of "another EL". > > I've got the same questions :) This is just a rewording of Ampere's > erratum description. > > https://amperecomputing.com/customer-connect/products/AmpereOne-device-documentation Huh. That's full of... not a lot. > > Other than the passing comments, I'm OK with this patch. However, I am > > very worried that this is only the start of a very long game of > > whack-a-mole, because there is no actual documentation on what goes > > wrong. > > > > For example, we have plenty of writes to SCTLR_EL2 (using the > > SCTLR_EL1 alias if running VHE) for MTE. Are any of those affected? > > > > Short of having some solid handle on what is happening, I don't see > > how we can promise to support this system. > > Completely agree. At least on the AmpereOne machines I have access to > this seems to do the trick, but that observation is no replacement for > full documentation. Indeed. The document you quoted acknowledges that there are issues (one step up from the previous situation), but this is not enough to independently develop a workaround that will survive the test of time. For example, AC03_CPU_39 doesn't even list the failing encodings, which may lead to real funnies if these encodings get used at some point. M.
Marc Zyngier <maz@kernel.org> writes: > [reviewing both patches in one go, as it is way easier] > > On Fri, 05 Jan 2024 23:53:10 +0000, > Oliver Upton <oliver.upton@linux.dev> wrote: >> >> Hi Ilkka, >> >> On Fri, Jan 05, 2024 at 01:32:51PM -0800, Ilkka Koskinen wrote: >> > Due to erratum AC03_CPU_36 on AmpereOne, if an Asynchronous Exception >> > (interrupts or SErrors) occurs to EL2, while EL2 software is modifying >> > system register bits that control EL2 exception behavior, the processor >> > may take an exception to an incorrect Exception Level. > > What needs to be described (both in the commit message and as part of > the code) is under what circumstances this mis-routing happens. > > Is it that just clearing TGE while being at EL2 always results in the > asynchronous exception being routed to the wrong exception level? Or > is it a more subtle issue related to synchronisation? > > Also worth describing is to which other exception level is the > exception delivered? EL1? EL3? > >> > >> > The affected system registers are HCR_EL2 and SCTLR_EL2, which contain >> > control bits for routing and enabling of EL2 exceptions. > > How does SCTLR_EL2 affects interrupt delivery? Is this related to > FEAT_NMI and SCTLR_EL2.{NMI,SPINTMASK}? Because this is the only part > of this register that has anything to do with interrupts. > >> > >> > The issue is triggered when HGE.TGE bit is cleared while having >> > AMO/IMO/FMO bits cleared too. To avoid the exception getting taken >> > at a wrong Exception Level, we set AMO/IMO/FMO. >> >> We toggle HCR_EL2 for other things besides TLB invalidations, and the >> changelog does not describe why they're apparently unaffected. >> >> > Suggested-by: D Scott Phillips <scott@os.amperecomputing.com> >> > Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> >> >> This isn't an acceptable way to go about errata mitigations. Besides >> extremely unusual circumstances, the pattern is to use a cpucap && >> alternatives to only enable the workaround on affected designs. We then >> document the errata in the expected places (Kconfig and kernel >> documentation) such that the folks saddled with maintaining this stuff >> know how to handle it years down the line. > > +1. This hack will have to live forever, while the lack of > documentation makes it totally unmaintainable. The KVM code *will* > change in ways that cannot be anticipated today, and without > exhaustive documentation, we will not be able to do a good job at > maintaining this system alive by correctly mitigating the erratum. > >> >> > --- >> > arch/arm64/kvm/hyp/vhe/tlb.c | 12 +++++++++--- >> > 1 file changed, 9 insertions(+), 3 deletions(-) >> > >> > diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c >> > index b32e2940df7d..c72fdd2e4549 100644 >> > --- a/arch/arm64/kvm/hyp/vhe/tlb.c >> > +++ b/arch/arm64/kvm/hyp/vhe/tlb.c >> > @@ -61,9 +61,15 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, >> > * has an ISB in order to deal with this. >> > */ >> > __load_stage2(mmu, mmu->arch); >> > - val = read_sysreg(hcr_el2); >> > - val &= ~HCR_TGE; >> > - write_sysreg(val, hcr_el2); >> > + >> > + /* >> > + * With {E2H,TGE} == {1,0}, IMO == 1 is required so that IRQs are not >> > + * all masked. >> >> Huh? HCR_EL2.IMO affects the *routing* of IRQs at exception levels >> *lower than* EL2. > > Yup, and there is *zero* requirement for IMO to have any particular > value while running at EL2. As long as you're at EL2, physical > interrupts that are not targeting EL3 are taken at EL2, full stop. What's meant here is that when the configurations of HCR.{E2H,TGE,IMO} == {1,0,0}, that's a "C" for irq target in the big table of IRQ target ELs under "Establishing the target Exception level of an asynchronous exception" meaning that no IRQs are taken, regardless of PSTATE.I. It doesn't seem like the intent of the tlb flush code was to also mask IRQs via HCR, which is why Ilkka proposed the change to also set IMO. With that changed, there weren't any remaining places left that needed the irq masking workaround which is why Ilkka added a comment instead of a cpucap. Agreed that this leaves open the possibility for future code running afoul of the erratum. So I'd suggest that we make Ilkka's change to add IMO here to take out the incidental hard IRQ masking but remove any connection to the erratum, and then for the erratum make writes to hcr_el2 on AmpereOne become: mrs tmp, daif msr daifset, #0xf msr hcr_el2, value isb msr daif, tmp
Marc Zyngier <maz@kernel.org> writes: > [reviewing both patches in one go, as it is way easier] > > On Fri, 05 Jan 2024 23:53:10 +0000, > Oliver Upton <oliver.upton@linux.dev> wrote: >> >> Hi Ilkka, >> >> On Fri, Jan 05, 2024 at 01:32:51PM -0800, Ilkka Koskinen wrote: >> > Due to erratum AC03_CPU_36 on AmpereOne, if an Asynchronous Exception >> > (interrupts or SErrors) occurs to EL2, while EL2 software is modifying >> > system register bits that control EL2 exception behavior, the processor >> > may take an exception to an incorrect Exception Level. > > What needs to be described (both in the commit message and as part of > the code) is under what circumstances this mis-routing happens. > > Is it that just clearing TGE while being at EL2 always results in the > asynchronous exception being routed to the wrong exception level? Or > is it a more subtle issue related to synchronisation? > > Also worth describing is to which other exception level is the > exception delivered? EL1? EL3? The mis-routing happens when the old value of HCR_EL2 allows IRQs in EL2 and the new value does not (a "C" entry in the target EL table), and then an IRQ arrives. In that case the exception is routed to EL1. >> > >> > The affected system registers are HCR_EL2 and SCTLR_EL2, which contain >> > control bits for routing and enabling of EL2 exceptions. > > How does SCTLR_EL2 affects interrupt delivery? Is this related to > FEAT_NMI and SCTLR_EL2.{NMI,SPINTMASK}? Because this is the only part > of this register that has anything to do with interrupts. It looks like sctlr isn't involved in the irq routing problem of ac03_cpu_36. I'm working on pinning down whether there's some other somewhat related bug with the sctlr bits, or if it's just a mistake that it's listed in the write up at all. >> > >> > The issue is triggered when HGE.TGE bit is cleared while having >> > AMO/IMO/FMO bits cleared too. To avoid the exception getting taken >> > at a wrong Exception Level, we set AMO/IMO/FMO. >> >> We toggle HCR_EL2 for other things besides TLB invalidations, and the >> changelog does not describe why they're apparently unaffected. >> >> > Suggested-by: D Scott Phillips <scott@os.amperecomputing.com> >> > Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> >> >> This isn't an acceptable way to go about errata mitigations. Besides >> extremely unusual circumstances, the pattern is to use a cpucap && >> alternatives to only enable the workaround on affected designs. We then >> document the errata in the expected places (Kconfig and kernel >> documentation) such that the folks saddled with maintaining this stuff >> know how to handle it years down the line. > > +1. This hack will have to live forever, while the lack of > documentation makes it totally unmaintainable. The KVM code *will* > change in ways that cannot be anticipated today, and without > exhaustive documentation, we will not be able to do a good job at > maintaining this system alive by correctly mitigating the erratum. Point taken on the documentation. I'll work on getting it updated with the info above and get ac03_cpu_39 updated with the problem encodings. Scott
diff --git a/arch/arm64/kvm/hyp/vhe/tlb.c b/arch/arm64/kvm/hyp/vhe/tlb.c index b32e2940df7d..c72fdd2e4549 100644 --- a/arch/arm64/kvm/hyp/vhe/tlb.c +++ b/arch/arm64/kvm/hyp/vhe/tlb.c @@ -61,9 +61,15 @@ static void __tlb_switch_to_guest(struct kvm_s2_mmu *mmu, * has an ISB in order to deal with this. */ __load_stage2(mmu, mmu->arch); - val = read_sysreg(hcr_el2); - val &= ~HCR_TGE; - write_sysreg(val, hcr_el2); + + /* + * With {E2H,TGE} == {1,0}, IMO == 1 is required so that IRQs are not + * all masked. This also works around AmpereOne erratum AC03_CPU_36 + * which can incorrectly route an IRQ to EL1 when HCR_EL2.{E2H,TGE} is + * written from {1,1} to {1,0} with interrupts unmasked. + */ + sysreg_clear_set(hcr_el2, HCR_TGE, HCR_AMO | HCR_IMO | HCR_FMO); + isb(); }
Due to erratum AC03_CPU_36 on AmpereOne, if an Asynchronous Exception (interrupts or SErrors) occurs to EL2, while EL2 software is modifying system register bits that control EL2 exception behavior, the processor may take an exception to an incorrect Exception Level. The affected system registers are HCR_EL2 and SCTLR_EL2, which contain control bits for routing and enabling of EL2 exceptions. The issue is triggered when HGE.TGE bit is cleared while having AMO/IMO/FMO bits cleared too. To avoid the exception getting taken at a wrong Exception Level, we set AMO/IMO/FMO. Suggested-by: D Scott Phillips <scott@os.amperecomputing.com> Signed-off-by: Ilkka Koskinen <ilkka@os.amperecomputing.com> --- arch/arm64/kvm/hyp/vhe/tlb.c | 12 +++++++++--- 1 file changed, 9 insertions(+), 3 deletions(-)